Phelio Gnomi

My words from my mind

Category Archives: PostgreSQL

Reduce SQL Injection Risk in Python and psycopg2

It will be surprising that a slight different in your line of code can have a great impact in preventing SQL injection. Try to spot the difference between these 2 lines below.

# code 1 (with python string format() function)
db.execute("select * from some_table where filter = '{}'".format("going to Sam's party"))
# code 2 (with psycopg2 sql prepare syntax)
db.execute("select * from some_table where filter = %s", ("going to Sam's party"))

It probably looks obviously different, but for a second it looks like it shouldn’t give much different result. But sadly it does.

The first code use the python standard string formatting feature where given a string containing curly brackets as a placeholders like “This is a {}”, and with the format() method, it will fill those placeholders with other strings.

# Example
sample1 = "this is a {}, and also a {}".format("pen", "weapon")
print(sample1)
 
#results
# this is a pen, and also a weapon 

this looks fine for now. But try to do one for the string that we pass into the db.execute() above. If we print the string, it will give you the result below

select * from some_table where filter = 'going to Sam's party'

Notice the extra single quote on the filter? This will cause error and of course opening a whole world of opportunity for SQL injection attack. With the single quote unescaped, the rest of the string can be injected with other commands that will cause serious maintenance headache.

-- example: imagine the replacer string is "bleh'; drop table some_table; insert into user values ('some new malicious users'); --"
-- your query will become
select * from some_table where filter = 'bleh'; drop table some_table; insert into user values ('some new malicious users'); --'
-- note that double dash (--) is used for commenting. So the extra single quote will be ignored.

So, why does code 2 is a better way of string replacements? Because it has built in special character escaping mechanism in which all strings that are passed thorough this method will remain as a string instead of becoming a malicious codes.

db.execute("select * from some_table where filter = %s", ("bleh'; drop table some_table; insert into user values ('some new malicious users'); --"))
the code above will produce sql below

 

select * from some_table where filter = E'bleh\'; drop table some_table; insert into user values (\'some new malicious users\'); --'

From Unicorn to Unicode

What is worse than knowing that Unicorn exists in some other dimensions but you will never be able to see it?

My answer will be the xA0 character from some encoding world that I don’t even know to exist. Being an Earthling, the only encoding world I’ve been and known is the Unicode. More specifically the UTF-8 realm.

Interestingly, many Unicode based systems reject the xA0 (or any nonconvertible characters) and totally crashes the system. Take Python for example, and also PostgreSQL later on.

Python

In Python, there is a function call unicode() that convert a string from other encoding to Unicode.

unicode(object[encoding[errors]])

However, the “errors” handling is defaulted to “strict”. It means that it will complain that “Something is wrong” whenever there is an error. Basically it means that it will break the system when there is an untranslatable character in the object that you are trying to convert.

There are two other options in handling conversion errors.

  • “replace” to replace the untranslatable character to the official Unicode replacement character
  • “ignore” basically replace the untranslatable character with an empty string.

PostgreSQL

When inserting non Unicode strings into an UTF-8 (Unicode based) databases, PostgreSQL will try to translate them first. Same thing will happen if the said string contain an untranslatable character, it will throw you an error.

This can be a hell of a problem because it technically break your system if your system is a one of those systems that process input and save them into a database.

So the solution is usually to try to catch these unicorns before they escaped into the database.

Speeding up your query with Temp Tables

If you have a query that:

  1. have a lot of subqueries
  2. join many different tables or subqueries
  3. going to return a huge amount of data
  4. take forever to run (more than 1 minutes)
  5. you have a busy database server

You can run your query in a hundred times faster speed by storing some groups of subqueries results into a temp table first before running the final joins using these temp tables.

This way, you will have:

  1. a smaller sets of queries to run
  2. which will reduce the memory usage
  3. and reduce the chance of blocking other running queries
  4. and it runs a hell lot faster.

Let’s make a super simple example.

--Old and Slow
--let's assume that huge_table_a and huge_table_b both contain 1 billion records each
SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   (SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item) tbl_a
       JOIN (SELECT item,
                    Sum(profit) / Max(profit) AS profit_share
             FROM   (SELECT item,
                            profit,
                            UNNEST(regions) AS region
                     FROM   huge_table_a
                     WHERE  region = "north")
             GROUP  BY item) tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

Using Temp Tables

-- Will run a lot faster especially in a busy DB
CREATE TEMP TABLE tmp_table1 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

CREATE TEMP TABLE tmp_table2 AS (
       SELECT item,
               Sum(profit) / Max(profit) AS profit_share
        FROM   (SELECT item,
                       profit,
                       UNNEST(regions) AS region
                FROM   huge_table_a
                WHERE  region = "south")
        GROUP  BY item
);

SELECT tbl_a.item,
       tbl_a.profit_share,
       tbl_b.profit_share,
FROM   tmp_table1 tbl_a
       JOIN tmp_table2 tbl_b
         ON tbl_a.item = tbl_b.item
         ORDER BY tbl_a.profit_share DESC;

 

Better way to execute sql files in PostgreSQL

My previous post  is kind of dumb. It’s simple, but dumb. First of all, the /o command will only write the output of a particular query into file. It won’t write any error or any other information that pop out during the execution. The last time I ran that, it failed and I don’t know what happened or where it stop.

So, a better way to do it is by running psql with -f option.

psql -h HOSTNAME -U USERNAME -d DBNAME -a -f FILENAME > LOGFILENAME 2>&1

A little explanation here:

  • -a tells the psql to print out all the input commands to Standard Output. It basically print out all the queries inside your SQL file. This is handy so that you know what went wrong and where did it stop.
  • -f FILENAME tells the psql to run the given file
  • > LOGFILENAM is the basic UNIX command to redirect the standard output to a file. In other words, it says that whatever output that you are supposed to see in psql will be written into a file instead.
  • 2>&1 is very important because it asks the system to also print any errors into the standard output. Since we’ve redirected that standard output into a file, it means that any error will be logged into that file also.

For more information, checkout the psql manual and also the Bash redirection.

PostgreSQL Input and Output

NOTE: this is one way of doing it. Find the better solution here.

Let’s imagine a scenario where you have 10 different SQL queries to run and they will take few hours each. You don’t want to run it in the peak hours because it will slow down your entire server. You don’t want to stay back late either. So you decided to put all the queries in a text file and run them with your PostgreSQL client at the end of the day and you go for a bottle of beer at the nearest club and call it a day.

Here is how you do it in PostgreSQL. First, fire up your PostgreSQL client such as

psql -h your-host-name -U your-username -d your-dbname

and then, you run the \i command. Well, “i” is obviously stands for “input”.

\i /path/to/your/sql/file

VOILA, the sql are running. And then you pack your stuffs and call up your friends to meet you up at that bar you talked about earlier. But wait a minute. What if something went wrong? what if my console suddenly close itself? I won’t know if all the queries ran successfully or not. Don’t worry, you say. We can stream the output into a log file. And when we come back tomorrow morning, we will know where to check. And so, you pressed CTRL+C to cancel your previous job and waited a while until it finish cancelling it and then you start stomping some other commands:

\o /path/to/your/log/file
\i /path/to/your/sql/file

And voila!! you can really go to that pub where your friends are waiting now. Have a good evening, folks!!