Phelio Gnomi

My words from my mind

Category Archives: Languages

Reduce SQL Injection Risk in Python and psycopg2

It will be surprising that a slight different in your line of code can have a great impact in preventing SQL injection. Try to spot the difference between these 2 lines below.

# code 1 (with python string format() function)
db.execute("select * from some_table where filter = '{}'".format("going to Sam's party"))
# code 2 (with psycopg2 sql prepare syntax)
db.execute("select * from some_table where filter = %s", ("going to Sam's party"))

It probably looks obviously different, but for a second it looks like it shouldn’t give much different result. But sadly it does.

The first code use the python standard string formatting feature where given a string containing curly brackets as a placeholders like “This is a {}”, and with the format() method, it will fill those placeholders with other strings.

# Example
sample1 = "this is a {}, and also a {}".format("pen", "weapon")
print(sample1)
 
#results
# this is a pen, and also a weapon 

this looks fine for now. But try to do one for the string that we pass into the db.execute() above. If we print the string, it will give you the result below

select * from some_table where filter = 'going to Sam's party'

Notice the extra single quote on the filter? This will cause error and of course opening a whole world of opportunity for SQL injection attack. With the single quote unescaped, the rest of the string can be injected with other commands that will cause serious maintenance headache.

-- example: imagine the replacer string is "bleh'; drop table some_table; insert into user values ('some new malicious users'); --"
-- your query will become
select * from some_table where filter = 'bleh'; drop table some_table; insert into user values ('some new malicious users'); --'
-- note that double dash (--) is used for commenting. So the extra single quote will be ignored.

So, why does code 2 is a better way of string replacements? Because it has built in special character escaping mechanism in which all strings that are passed thorough this method will remain as a string instead of becoming a malicious codes.

db.execute("select * from some_table where filter = %s", ("bleh'; drop table some_table; insert into user values ('some new malicious users'); --"))
the code above will produce sql below

 

select * from some_table where filter = E'bleh\'; drop table some_table; insert into user values (\'some new malicious users\'); --'
Advertisements

Slow fgets in PHP. or does it?

We have a piece of codes that looks something like this

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');

while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer = $new_buffer;
  fwrite($new_file, $buffer);
}

...

It all works fine and very speedy all the while until one day we decided to run this piece of codes from a network drive.

Oh ya, did I mention that it’s running on a Windows Server?

Good.

When the codes above is running on the Network Drive, it slows down to almost 1 minutes per file, which normally only takes 1 seconds or less on a local drive. What went wrong?

Probably just like any debuggers will do, I went and print the execution time needed for each blocks of codes. Given that a while loop is a while loop, I didn’t put a timer inside the while loop, but only before and after the while loop.

The conclusion was that this particular while loop is taking most of the processing time. And as what most modern programmers will do, we ask Google about it.

“PHP fgets slow on network” I searched. Lot’s of results are complaining that fgets is slow. Hmm, now I know. But what’s the alternative? many. But hmm TL:DR. Too long, let’s find another shorter more to the point article (probably not this one I’m writing).

Searching on I read somewhere suggesting to comment out the fwrite() and try again. I did that, and it works faster. And of course it doesn’t solve my problem, but it seems that I’m going the right way. And voila, move the fwrite outside of the while loop, and it goes back to the better speed.

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');
$buffer_holder = "";
while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer_holder .= $new_buffer; 
}
fwrite($new_file, $buffer_holder);
...

So in short, don’t write into file line by line, write them at once. This way, you don’t have to go back and fort from Malaysia to US, but one trip is enough.

a real man only takes one trip

From Unicorn to Unicode

What is worse than knowing that Unicorn exists in some other dimensions but you will never be able to see it?

My answer will be the xA0 character from some encoding world that I don’t even know to exist. Being an Earthling, the only encoding world I’ve been and known is the Unicode. More specifically the UTF-8 realm.

Interestingly, many Unicode based systems reject the xA0 (or any nonconvertible characters) and totally crashes the system. Take Python for example, and also PostgreSQL later on.

Python

In Python, there is a function call unicode() that convert a string from other encoding to Unicode.

unicode(object[encoding[errors]])

However, the “errors” handling is defaulted to “strict”. It means that it will complain that “Something is wrong” whenever there is an error. Basically it means that it will break the system when there is an untranslatable character in the object that you are trying to convert.

There are two other options in handling conversion errors.

  • “replace” to replace the untranslatable character to the official Unicode replacement character
  • “ignore” basically replace the untranslatable character with an empty string.

PostgreSQL

When inserting non Unicode strings into an UTF-8 (Unicode based) databases, PostgreSQL will try to translate them first. Same thing will happen if the said string contain an untranslatable character, it will throw you an error.

This can be a hell of a problem because it technically break your system if your system is a one of those systems that process input and save them into a database.

So the solution is usually to try to catch these unicorns before they escaped into the database.

The adventure of the Old Mac line breaker in the Python world

There are many representations of a new line, End Of Line indicator, or a line breaker. You probably heard of the terms Line Feed (LF) and Carriage Return (CR). They are technically characters like the letter “A” and small letter “a”. But instead of printing the letter, they tell the system that it’s the end of a line. However, different computer system uses these 2 common characters in different ways but let’s narrow it down into the 2 most common ones, namely the Unix version “LF” and the Windows version “CR+LF”. But wait a minute, there is this Old Mac version as well that uses only CR character to represent the end of line.

Interestingly in the Python’s universe (and probably some other even more racist universes), the Old Mac convention is by default not a line breaker. If you read a file full of lines that only ends with “CR” using the standard file open() function in Python, they will come out as a single line text.

As a slightly less racist developer, we need to build applications that can support as many types of stuff as possible. Here are 2 tricks to help you ensure the file you are reading is read properly the next time you use it.

When reading a file

# Use the 'rU' mode so it understand the Old Mac properly
file = open('filename', 'rU')

If you happen to be working with File upload in Django, this might be useful

# http://stackoverflow.com/questions/1875956/how-can-i-access-an-uploaded-file-in-universal-newline-mode

# First, read the uploaded file and convert it to unicode using unicode() function
# Second, stream the file using io.StringIO function with the Universal-newline mode turn on by setting newline=None
import io
stream = io.StringIO(unicode(request.FILES['foo'].read()), newline=None)

Speeding up your query with Temp Tables

If you have a query that:

  1. have a lot of subqueries
  2. join many different tables or subqueries
  3. going to return a huge amount of data
  4. take forever to run (more than 1 minutes)
  5. you have a busy database server

You can run your query in a hundred times faster speed by storing some groups of subqueries results into a temp table first before running the final joins using these temp tables.

This way, you will have:

  1. a smaller sets of queries to run
  2. which will reduce the memory usage
  3. and reduce the chance of blocking other running queries
  4. and it runs a hell lot faster.

Let’s make a super simple example.

--Old and Slow
--let's assume that huge_table_a and huge_table_b both contain 1 billion records each
SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   (SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item) tbl_a
       JOIN (SELECT item,
                    Sum(profit) / Max(profit) AS profit_share
             FROM   (SELECT item,
                            profit,
                            UNNEST(regions) AS region
                     FROM   huge_table_a
                     WHERE  region = "north")
             GROUP  BY item) tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

Using Temp Tables

-- Will run a lot faster especially in a busy DB
CREATE TEMP TABLE tmp_table1 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

CREATE TEMP TABLE tmp_table2 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   tmp_table1 tbl_a
       JOIN tmp_table2 tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

 

Better way to execute sql files in PostgreSQL

My previous post  is kind of dumb. It’s simple, but dumb. First of all, the /o command will only write the output of a particular query into file. It won’t write any error or any other information that pop out during the execution. The last time I ran that, it failed and I don’t know what happened or where it stop.

So, a better way to do it is by running psql with -f option.

psql -h HOSTNAME -U USERNAME -d DBNAME -a -f FILENAME > LOGFILENAME 2>&1

A little explanation here:

  • -a tells the psql to print out all the input commands to Standard Output. It basically print out all the queries inside your SQL file. This is handy so that you know what went wrong and where did it stop.
  • -f FILENAME tells the psql to run the given file
  • > LOGFILENAM is the basic UNIX command to redirect the standard output to a file. In other words, it says that whatever output that you are supposed to see in psql will be written into a file instead.
  • 2>&1 is very important because it asks the system to also print any errors into the standard output. Since we’ve redirected that standard output into a file, it means that any error will be logged into that file also.

For more information, checkout the psql manual and also the Bash redirection.

PostgreSQL Input and Output

NOTE: this is one way of doing it. Find the better solution here.

Let’s imagine a scenario where you have 10 different SQL queries to run and they will take few hours each. You don’t want to run it in the peak hours because it will slow down your entire server. You don’t want to stay back late either. So you decided to put all the queries in a text file and run them with your PostgreSQL client at the end of the day and you go for a bottle of beer at the nearest club and call it a day.

Here is how you do it in PostgreSQL. First, fire up your PostgreSQL client such as

psql -h your-host-name -U your-username -d your-dbname

and then, you run the \i command. Well, “i” is obviously stands for “input”.

\i /path/to/your/sql/file

VOILA, the sql are running. And then you pack your stuffs and call up your friends to meet you up at that bar you talked about earlier. But wait a minute. What if something went wrong? what if my console suddenly close itself? I won’t know if all the queries ran successfully or not. Don’t worry, you say. We can stream the output into a log file. And when we come back tomorrow morning, we will know where to check. And so, you pressed CTRL+C to cancel your previous job and waited a while until it finish cancelling it and then you start stomping some other commands:

\o /path/to/your/log/file
\i /path/to/your/sql/file

And voila!! you can really go to that pub where your friends are waiting now. Have a good evening, folks!!

The theory of Object Oriented Programming vs its application.

Speaking of Object Orientated Programming (OOP), I remember how absurd it was to learn for the first time. In the university, they first teach you a bunch of concepts, ideas and terminologies. After that, you are on your own.

Being a totally newbie in the programming at the time, and a very horrible English speaker in a All English Speaking programming class, I was so proud to be able to use For loops in my C++ class. The same things goes to how proud I was to understand theoretically all the terms that the lecturer explained. I was so proud to get it right about the hierarchy of the daily product family. And the similarity between how your father will left you some fortunes when he died and when a child class inherit a mother class. Theoretically, I was an expert.

When it’s time to apply the theory into practice, everybody was left amazed by why you don’t create a Milk super class and have cheese, butter, and yoghurt extends the Mother Milk just like the Hierarchy diagram explained. You don’t create any classes for them, but instead, you created a class called <<ProgramName>> and store the daily product in the database.

The OOP era ended for me pretty quickly because I was introduced PHP and suddenly I’m happy with my life without thinking much of making the codes look more like what it does in the real world. Carrying the OOP theory in the very deep side of my backpack, I fast forward 5 years, surviving finely. Throughout my working history, I’ve never been in a company that really embrace OOP.

However, things is going to change. The first time ever in the real world I was in a job interview where the people actually care about how much I know about OOP. Digging back into my memory, I managed to answer some of the questions correctly and miss some. I have to be honest with them that I have no experience with OOP at all. But they offered me a job and now I’m happy that I can finally create the Milk class that I didn’t create 5 years ago.

For the moment, OOP is still a very vague concept for me. But I’m looking forward to apply those theories into real practice this time.

The Ruby, C++, PHP and C++

I’m taking the Stanford Online Course called the Design and Analysis of Algorithm 1. So far so good, and then the Programming Exercise 4 hit me hard.

First of all, we are allowed to use any programming language. They only want the final result, regardless of how you process them. So, at week 4, we are supposed to read through a file that represent a Graph with thousands of nodes. And we are asked to find the five largest strongly connected components or SCC.

So, I started with Ruby, which is my language of choice for this course. The algorithm that the professor teaches use recursive. So, after finishing the program and running it trough few test cases which seems to work properly, I run it on the actual file (the thousands of nodes graph). It crashed in the first few seconds saying that “stack level Too deep”. Apparently the recursive is too deep for my Ruby to process. I tried to find how to increase the stack level allowed, but it’s either I don’t understand what they are talking about or the solution is for Mac or Linux. I’m working on a Windows 7.

So, I think maybe I should do it on C++. One reason is because it’s fast, and it’s difficult (so I can learn to be a better programmer, not just a guy who get pampered by modern programming language). Then, if you know C++ and also know Ruby or PHP, you can see that C++ is so much different. So much more difficult. PHP pretty much take care of many things behind the screen for you,things like dynamic array or array with random index name (hash in Ruby, associative array in PHP), or passing the array around like tossing pancake from the pan to the plate. They are not that simple in C++. Array is just a pointer with the size. Passing array is complicated, not to mention about multilevel arrays. It doesn’t have hash or associative array.

I have to admit that I’m too pampered by modern (high level) programming language and it’s not the C++ fault, and creating your own function to search an array or to enable dynamic index naming is just too much for me. So, I go back to PHP and try out it’s maximum stack level allowed. First try, it return error on the 100th recursive. But don’t worry, it turned out that it’s just the limitation that X-Debug set. I remove X-Debug and it can run for as deep as your computer memory can supply (which is a lot and a good news).

I immediately converted the Ruby code into PHP code and voilà, it works on the test cases. Working on PHP is like returning home. It feels natural for me as if I’m talking in my mother tongue. However, When I run it on the real graph, I hit a lot of memory limit error. I increase it few times, from 128M to 256M, to 1024M to 2048M. It consumed more than 1GB or RAM to process 5 millions connections of 800 thousand of nodes. And it run forever. And ever. I never get the result back yet. And I’m suspecting an infinite loop, but can’t really find the proof of it. Therefore, I have to conclude that it’s the PHP problem. It’s simply to slow.

So, my only hope is C++ now, which I’m not good at, and I have to build many functions from scratch. Well, probably I can find some library online. But, I don’t know. It feels like I’m forced to speak Spanish when I only know a few words like “la nina come pan” and “el niño bebemos leche”. And I’m literally not sure is it bebemos or beben? Oh, I’m screwed.

PHP Unit Testing : Installing PHPUnit in Windows 7

After playing with Ruby on Rails (RoR) for a while, I’m now back to PHP again. Since PHP is the language that I am using in my job, I have to come back anyway. I’m gonna miss RoR nevertheless. But who cares, I don’t need her right now. And she’s got a little too complicated to understand lately. Probably I will give her another chance if I got a RoR job in the future.

But one thing that RoR keep telling me (even after we broke up) is that I need to test my codes. RoR has this build in Testing module that is just awesome and beautifully made. That bring me to the needs of doing some automated testing for PHP too. So now, let’s welcome PHPUnit *crowd  applauding*.

To begin with, I will first need to install PHPUnit on my computer. Since most of the PHP people out there uses Linux, the installation tutorials are mostly in Linux. While I’m using Windows 7, I need to put some crack on googling it. And that brought me to Joshua Doodnauth’s WebLife. The post is about installing PHPUnit in wamp server. So, it’s exactly what I needed since I’m using wamp server too. However, the post is kind of outdated, so it’s only 70% accurate. But it’s still helpful because the only thing outdated is the PEAR installation part. Yes, you need to install PEAR first. So, if you having problem finding the go-pear.bat file, you should read on.

I do another round of googling, and found out that the latest WAMP server doesn’t include the go-pear.bat in the PHP bin. So, you got to get it yourself from PEAR website here. After you got the file, follow the instruction below which I got it from the Wamp Forum.

NOTE: As Samuel has pointed it out on the comment section, if you already has the php bin folder in your environment variable, you can ignore the phrase “C:\wamp\bin\php\php<your php version>\” in step number four, so it become “php go-pear.phar” (without the double quotes).

1> download latest go-pear.phar from pear.php.net
2> save that in directory” C:\wamp\bin\php\php<your php version>”
3> in run type ‘cmd’.
4> in comand prompt type: “C:\wamp\bin\php\php<your php version>\php.exe C:\wamp\bin\php\php<your php version>\go-pear.phar”
5> type in “system” in first option
6> change $prefix by pressing 1 and a file explorer will pop out. and browse to “C:\wamp\bin\php\php<your php version>” and press OK
7> now type 11 and change it to “$prefix\pear.ini”
8> now press enter and installation will begin.
9>enter Y and then press enter when it asks to alter php.ini
10> after installation run this file “C:\wamp\bin\php\php<your php version>\PEAR_ENV.reg”
11> add include_path = “.;c:\wamp\bin\php\php<your php version>\pear” to C:\wamp\bin\apache\Apache<your apache version>\bin\php.ini. Practically below the line ; Windows: “\path1;\path2”

After that, you can continue the instruction from Joshua’s blog after the part where PEAR is installed. Have fun and good luck.