Phelio Gnomi

My words from my mind

Tag Archives: PHP

Slow fgets in PHP. or does it?

We have a piece of codes that looks something like this

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');

while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer = $new_buffer;
  fwrite($new_file, $buffer);
}

...

It all works fine and very speedy all the while until one day we decided to run this piece of codes from a network drive.

Oh ya, did I mention that it’s running on a Windows Server?

Good.

When the codes above is running on the Network Drive, it slows down to almost 1 minutes per file, which normally only takes 1 seconds or less on a local drive. What went wrong?

Probably just like any debuggers will do, I went and print the execution time needed for each blocks of codes. Given that a while loop is a while loop, I didn’t put a timer inside the while loop, but only before and after the while loop.

The conclusion was that this particular while loop is taking most of the processing time. And as what most modern programmers will do, we ask Google about it.

“PHP fgets slow on network” I searched. Lot’s of results are complaining that fgets is slow. Hmm, now I know. But what’s the alternative? many. But hmm TL:DR. Too long, let’s find another shorter more to the point article (probably not this one I’m writing).

Searching on I read somewhere suggesting to comment out the fwrite() and try again. I did that, and it works faster. And of course it doesn’t solve my problem, but it seems that I’m going the right way. And voila, move the fwrite outside of the while loop, and it goes back to the better speed.

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');
$buffer_holder = "";
while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer_holder .= $new_buffer; 
}
fwrite($new_file, $buffer_holder);
...

So in short, don’t write into file line by line, write them at once. This way, you don’t have to go back and fort from Malaysia to US, but one trip is enough.

a real man only takes one trip

The theory of Object Oriented Programming vs its application.

Speaking of Object Orientated Programming (OOP), I remember how absurd it was to learn for the first time. In the university, they first teach you a bunch of concepts, ideas and terminologies. After that, you are on your own.

Being a totally newbie in the programming at the time, and a very horrible English speaker in a All English Speaking programming class, I was so proud to be able to use For loops in my C++ class. The same things goes to how proud I was to understand theoretically all the terms that the lecturer explained. I was so proud to get it right about the hierarchy of the daily product family. And the similarity between how your father will left you some fortunes when he died and when a child class inherit a mother class. Theoretically, I was an expert.

When it’s time to apply the theory into practice, everybody was left amazed by why you don’t create a Milk super class and have cheese, butter, and yoghurt extends the Mother Milk just like the Hierarchy diagram explained. You don’t create any classes for them, but instead, you created a class called <<ProgramName>> and store the daily product in the database.

The OOP era ended for me pretty quickly because I was introduced PHP and suddenly I’m happy with my life without thinking much of making the codes look more like what it does in the real world. Carrying the OOP theory in the very deep side of my backpack, I fast forward 5 years, surviving finely. Throughout my working history, I’ve never been in a company that really embrace OOP.

However, things is going to change. The first time ever in the real world I was in a job interview where the people actually care about how much I know about OOP. Digging back into my memory, I managed to answer some of the questions correctly and miss some. I have to be honest with them that I have no experience with OOP at all. But they offered me a job and now I’m happy that I can finally create the Milk class that I didn’t create 5 years ago.

For the moment, OOP is still a very vague concept for me. But I’m looking forward to apply those theories into real practice this time.

I wish that I’m done with Yii

No offence Yii, but I’m going to move on. First, your lack of examples in your user manual – where the the manual itself is mediocre to start with – is tolerable. But, I started to wonder if people really use you. For a tool like you, you should be proud if people talk a lot about you, especially best if they talks about the problems they are having with you. At least by then, there will be lots of devoted worshipper that are willing to find the solutions for every problem that people talk about.

I just can’t do it any more. I’m stuck here with a problem that nobody (that I can reach) knows. It all started with one very promising book “Agile Web Application Sevelopment with Yii 1.1 and PHP5” which is really good to start with. But it only take 3 chapters for you to left me bleeding at a corn field near my uncle Jimmy’s house. I’m still bleeding by the way, just so you know. I just can’t figure out how to make the Selenium works without giving me that “Failed opening ‘SiteTest: Firefox.php'”error. I can’t proceed without Selenium, the whole book is about Test-driven approach. And if the testing tools doesn’t work, how to proceed?

There are still plenty of fishes in the sea. Though you are the chosen one and I have to come back to you one day, I think I’ll just forget about you until then.

The Ruby, C++, PHP and C++

I’m taking the Stanford Online Course called the Design and Analysis of Algorithm 1. So far so good, and then the Programming Exercise 4 hit me hard.

First of all, we are allowed to use any programming language. They only want the final result, regardless of how you process them. So, at week 4, we are supposed to read through a file that represent a Graph with thousands of nodes. And we are asked to find the five largest strongly connected components or SCC.

So, I started with Ruby, which is my language of choice for this course. The algorithm that the professor teaches use recursive. So, after finishing the program and running it trough few test cases which seems to work properly, I run it on the actual file (the thousands of nodes graph). It crashed in the first few seconds saying that “stack level Too deep”. Apparently the recursive is too deep for my Ruby to process. I tried to find how to increase the stack level allowed, but it’s either I don’t understand what they are talking about or the solution is for Mac or Linux. I’m working on a Windows 7.

So, I think maybe I should do it on C++. One reason is because it’s fast, and it’s difficult (so I can learn to be a better programmer, not just a guy who get pampered by modern programming language). Then, if you know C++ and also know Ruby or PHP, you can see that C++ is so much different. So much more difficult. PHP pretty much take care of many things behind the screen for you,things like dynamic array or array with random index name (hash in Ruby, associative array in PHP), or passing the array around like tossing pancake from the pan to the plate. They are not that simple in C++. Array is just a pointer with the size. Passing array is complicated, not to mention about multilevel arrays. It doesn’t have hash or associative array.

I have to admit that I’m too pampered by modern (high level) programming language and it’s not the C++ fault, and creating your own function to search an array or to enable dynamic index naming is just too much for me. So, I go back to PHP and try out it’s maximum stack level allowed. First try, it return error on the 100th recursive. But don’t worry, it turned out that it’s just the limitation that X-Debug set. I remove X-Debug and it can run for as deep as your computer memory can supply (which is a lot and a good news).

I immediately converted the Ruby code into PHP code and voilà, it works on the test cases. Working on PHP is like returning home. It feels natural for me as if I’m talking in my mother tongue. However, When I run it on the real graph, I hit a lot of memory limit error. I increase it few times, from 128M to 256M, to 1024M to 2048M. It consumed more than 1GB or RAM to process 5 millions connections of 800 thousand of nodes. And it run forever. And ever. I never get the result back yet. And I’m suspecting an infinite loop, but can’t really find the proof of it. Therefore, I have to conclude that it’s the PHP problem. It’s simply to slow.

So, my only hope is C++ now, which I’m not good at, and I have to build many functions from scratch. Well, probably I can find some library online. But, I don’t know. It feels like I’m forced to speak Spanish when I only know a few words like “la nina come pan” and “el niño bebemos leche”. And I’m literally not sure is it bebemos or beben? Oh, I’m screwed.

to Ruby from PHP

I’ve been a PHP developer for years (well, around 3 years) and after reading so many bad comments on the web about PHP, I started to worry. I still love PHP though, but it’s not a crime to have another one to love. And by spreading love, we will learn more.

Picking up new languages doesn’t mean I will ditch PHP. I don’t think it will happen unless PHP die out. But probably will never happen as you still can see C around these days.

In my list is Java, Ruby, Python, and probably .Net especially C#. These are just a list, I don’t think I will be able to master all of them. But at least I will explore them all.

Let’s start from Java. I learned Java when I was doing my degree. So, it’s not a big deal for me to just pick it up again. Trying to do the similar thing I did perfectly well in PHP, I failed in Java. They are just different. Although I can still remember most of the syntax of Java, I have problem translating my PHP codes into Java. The problem is I can’t do associated array as easily as in PHP. It’s not hard, but I’m not familiar with it. So I gave up. Probably will come back next time.

Next is Ruby, browsing here and there, most of the talks are about Rails. But I want to start from Ruby. so I do.

Ruby is designed to be easily understood by human. The syntax is perfect just like spoken English. So cool that I can’t understand using my PHP mindset. Probably it was easier to learn C++ back then when I didn’t even know how to speak English well nor any holy languages called the programming language. Or am I too old to learn?

unless 10.is_a? Number
   put 10
end

The codes above is simple, isn’t it?. It is so easy to understand that I need to hit my head on the table several times to absorb it. Maybe I’m a fool, but this is the truth. When I read a code, I think in PHP language style, trying to get the TRUE and FALSE of the condition and then make decision. But when I first saw these lines, it took me to think for a while to apply my thinking style to this syntax. And this is just the beginning. I continue to read more about the syntax and I found this

def join( sep = $,, format = "%s" )
     collect do |item|
       sprintf( format, item )
     end.join( sep )
   end

In my sense, this is a syntax error to have a double comma in a function declaration. End of story. But it’s not in Ruby… Game Over

Probably I should had given up, but Ruby’s uniqueness and differences from PHP that make me want to know more (not forget to mention that it’s also beautiful). It sounded like a love story now.

To learn Ruby, simple visit its website. You will find a lot of getting starter tutorials which are boring and some are fun. There is one that is ridiculous because it’s between telling a story and teaching Ruby. But it’s fun to read.

reCaptcha

I just recently read about reCaptcha and found it really interesting. Maybe it’s just for me or probably you already know about this. reCaptcha is a CAPTCHA (stand for Completely Automated Public Turing test to tell Computers and Humans Apart) system owned by Google. You will find this almost everywhere nowadays on websites to prevent spam or bots attack etc. Remember when some websites require you to type in the words from some ugly looking i in order to proceed or submit the form? that is CAPTCHA.

Why is it interesting? Because apart from preventing bots (human created programs) to enter our website, we (the internet users) are made to be a voluntary Human OCR Machine. Yes, we are working for Google for free!!

Well, that’s not my point. I am willingly contributing to this project because the reCaptcha itself is free for me to use. So, it’s fair enough.

How It Works

Google apparently scan a lot of old magazine, newspapers, textbooks etc to be digitalized. Those ancient papers are distorted and ugly of course. So, a normal OCR system will not be able to convert them into digital texts accurately. Therefor, they will have collection of documents with images of words that computer don’t understand.

reCaptcha presents 2 words to us. One of these words is taken from the documents above (which Google can’t read yet). This will be the “fake” word. Another one is a computer generated word (probably from those documents as well but is already converted to digital text) and will be the “real” word.

Human is able to perceive a lot more accurately than machine. So, when we see these images, we have more chance to identify what words they are. When we enter the 2 words and submit the form, reCaptcha will check if the “real” word above is answered correctly. If it does, the answer for the “fake” word will be added to the database. In other word, we only need to answer the “real” word correctly in order to pass the test. As we don’t know which one is real and which is fake, and also we already offer to volunteer in this project, we will normally answer both words.

reCaptcha will normally repeat the use of the same “fake” words in order to collect more answers. For sure, some of us might answer correctly and some might not. So, Google will have different sets answer for every single word. The set with higher answered will then be used as the translation of that word. One shot two birds.

Using It in PHP

First of all, you need to register your website and get 2 keys. They are some random letters that you need to put them in your PHP codes. Then you need to download the library from the reCaptcha website and include it in your php files. Call the recaptcha_get_html() function to display the CAPTCHA input box and recaptcha_check_answer() to check if the answer is correct. Here is the complete tutorial.

Security Issue

I am so proud that I can contribute to this project. But hackers are everywhere and many technologies had been deprecated just because they were hacked once. According to Google, they already applied some security measurements for it. Read more here. So, no worry about that.

But, I’m sure there should be some flaw in that system. So, I googled again. Something very interesting show up here. The system is not hacked yet so far (though there are some rumors that I think it’s just a rumor). But, there is something called “P**** Flood Attack”.

The attack is surprisingly easy to launch. Everybody can do it in fact just by following some simple guidelines provided here. The key to perform this attack is to identify which is the “real” word. After that, you can replace/answer the “fake” word whatever you want, including but not limited to “P****” word. So why is this flooding? If millions of people are doing this, the answer set discussed above will be flooded with the P word. And don’t be surprised if in the near future, you are reading some books or magazines online with some random P words appearing in the text.

Wait there…

What benefit do you get? thought we had agreed to volunteer this project? don’t worry, because the bad news is, the reCaptcha team already know about this and they had numerous protections implemented to prevent the flooding. I don’t know how the protections works anyway. But I think I’m secured enough to use reCaptcha in my websites. Happy CAPTCHA-ing