Phelio Gnomi

My words from my mind

Monthly Archives: May 2013

Speeding up your query with Temp Tables

If you have a query that:

  1. have a lot of subqueries
  2. join many different tables or subqueries
  3. going to return a huge amount of data
  4. take forever to run (more than 1 minutes)
  5. you have a busy database server

You can run your query in a hundred times faster speed by storing some groups of subqueries results into a temp table first before running the final joins using these temp tables.

This way, you will have:

  1. a smaller sets of queries to run
  2. which will reduce the memory usage
  3. and reduce the chance of blocking other running queries
  4. and it runs a hell lot faster.

Let’s make a super simple example.

--Old and Slow
--let's assume that huge_table_a and huge_table_b both contain 1 billion records each
SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   (SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item) tbl_a
       JOIN (SELECT item,
                    Sum(profit) / Max(profit) AS profit_share
             FROM   (SELECT item,
                            profit,
                            UNNEST(regions) AS region
                     FROM   huge_table_a
                     WHERE  region = "north")
             GROUP  BY item) tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

Using Temp Tables

-- Will run a lot faster especially in a busy DB
CREATE TEMP TABLE tmp_table1 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

CREATE TEMP TABLE tmp_table2 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   tmp_table1 tbl_a
       JOIN tmp_table2 tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

 

Advertisements

Better way to execute sql files in PostgreSQL

My previous post  is kind of dumb. It’s simple, but dumb. First of all, the /o command will only write the output of a particular query into file. It won’t write any error or any other information that pop out during the execution. The last time I ran that, it failed and I don’t know what happened or where it stop.

So, a better way to do it is by running psql with -f option.

psql -h HOSTNAME -U USERNAME -d DBNAME -a -f FILENAME > LOGFILENAME 2>&1

A little explanation here:

  • -a tells the psql to print out all the input commands to Standard Output. It basically print out all the queries inside your SQL file. This is handy so that you know what went wrong and where did it stop.
  • -f FILENAME tells the psql to run the given file
  • > LOGFILENAM is the basic UNIX command to redirect the standard output to a file. In other words, it says that whatever output that you are supposed to see in psql will be written into a file instead.
  • 2>&1 is very important because it asks the system to also print any errors into the standard output. Since we’ve redirected that standard output into a file, it means that any error will be logged into that file also.

For more information, checkout the psql manual and also the Bash redirection.

PostgreSQL Input and Output

NOTE: this is one way of doing it. Find the better solution here.

Let’s imagine a scenario where you have 10 different SQL queries to run and they will take few hours each. You don’t want to run it in the peak hours because it will slow down your entire server. You don’t want to stay back late either. So you decided to put all the queries in a text file and run them with your PostgreSQL client at the end of the day and you go for a bottle of beer at the nearest club and call it a day.

Here is how you do it in PostgreSQL. First, fire up your PostgreSQL client such as

psql -h your-host-name -U your-username -d your-dbname

and then, you run the \i command. Well, “i” is obviously stands for “input”.

\i /path/to/your/sql/file

VOILA, the sql are running. And then you pack your stuffs and call up your friends to meet you up at that bar you talked about earlier. But wait a minute. What if something went wrong? what if my console suddenly close itself? I won’t know if all the queries ran successfully or not. Don’t worry, you say. We can stream the output into a log file. And when we come back tomorrow morning, we will know where to check. And so, you pressed CTRL+C to cancel your previous job and waited a while until it finish cancelling it and then you start stomping some other commands:

\o /path/to/your/log/file
\i /path/to/your/sql/file

And voila!! you can really go to that pub where your friends are waiting now. Have a good evening, folks!!