Phelio Gnomi

My words from my mind

Tag Archives: optimization

Slow fgets in PHP. or does it?

We have a piece of codes that looks something like this

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');

while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer = $new_buffer;
  fwrite($new_file, $buffer);
}

...

It all works fine and very speedy all the while until one day we decided to run this piece of codes from a network drive.

Oh ya, did I mention that it’s running on a Windows Server?

Good.

When the codes above is running on the Network Drive, it slows down to almost 1 minutes per file, which normally only takes 1 seconds or less on a local drive. What went wrong?

Probably just like any debuggers will do, I went and print the execution time needed for each blocks of codes. Given that a while loop is a while loop, I didn’t put a timer inside the while loop, but only before and after the while loop.

The conclusion was that this particular while loop is taking most of the processing time. And as what most modern programmers will do, we ask Google about it.

“PHP fgets slow on network” I searched. Lot’s of results are complaining that fgets is slow. Hmm, now I know. But what’s the alternative? many. But hmm TL:DR. Too long, let’s find another shorter more to the point article (probably not this one I’m writing).

Searching on I read somewhere suggesting to comment out the fwrite() and try again. I did that, and it works faster. And of course it doesn’t solve my problem, but it seems that I’m going the right way. And voila, move the fwrite outside of the while loop, and it goes back to the better speed.

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');
$buffer_holder = "";
while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer_holder .= $new_buffer; 
}
fwrite($new_file, $buffer_holder);
...

So in short, don’t write into file line by line, write them at once. This way, you don’t have to go back and fort from Malaysia to US, but one trip is enough.

a real man only takes one trip

Speeding up your query with Temp Tables

If you have a query that:

  1. have a lot of subqueries
  2. join many different tables or subqueries
  3. going to return a huge amount of data
  4. take forever to run (more than 1 minutes)
  5. you have a busy database server

You can run your query in a hundred times faster speed by storing some groups of subqueries results into a temp table first before running the final joins using these temp tables.

This way, you will have:

  1. a smaller sets of queries to run
  2. which will reduce the memory usage
  3. and reduce the chance of blocking other running queries
  4. and it runs a hell lot faster.

Let’s make a super simple example.

--Old and Slow
--let's assume that huge_table_a and huge_table_b both contain 1 billion records each
SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   (SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item) tbl_a
       JOIN (SELECT item,
                    Sum(profit) / Max(profit) AS profit_share
             FROM   (SELECT item,
                            profit,
                            UNNEST(regions) AS region
                     FROM   huge_table_a
                     WHERE  region = "north")
             GROUP  BY item) tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

Using Temp Tables

-- Will run a lot faster especially in a busy DB
CREATE TEMP TABLE tmp_table1 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

CREATE TEMP TABLE tmp_table2 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   tmp_table1 tbl_a
       JOIN tmp_table2 tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;