Phelio Gnomi

My words from my mind

Debugging PHP the hard way

Having to maintain a massive Object Oriented web application written in PHP, it’s amazing how many times I’ll have to use the same functions over and over again in multiple different places. I’m listing them down here before I forget.

error_log($message)

The most used one, and the only way I can print anything right now in the application I’m working on without screwing up everything. It prints out the message into the application error log, which I then tail -f and keep it running in one monitor. Way to go multiple monitors.

get_class($object)

How do I live without this. Even with the power of IDE ( I’m using NetBeans ), it’s frustrating sometimes to find out what the class of the object actually is. Especially in “well designed” object oriented application full of decorator and factories, it’s easy to get lost. get_class() produce the exact class of the object. Can’t live without this function.

var_export($object, $return=false)

var_dump() used to be my ultimate savior, but with the limitation on the printing into the web without breaking anything, I’ll have to print everything to the log file. var_export() does exactly that when the “return” flag is set to true. Although most of the times it ended up with recursion error, it’s still pretty useful.

Exception()->getTraceAsString()

Got to find the way to trace where something started. Pretty straight forward when combined with var_export() and error_log() as mentioned in one of the brilliant StackOverflow answer.

$e = new Exception;

error_log(var_export($e->getTraceAsString(), true));

What are your favorite debugging snippets in PHP?

Sharing objects between Modules in AngularJs

It turns out that sharing objects between angular modules is as simple as injecting a module into another module. Say myApp has some controllers and providers or other objects and myApp2 want to use some of those controllers. When creating myApp2, inject myApp into it and myApp2 now has access to all the objects in myApp to the extend that the HTML can use the controller from myApp without any extra codes.

// Common library of some sort
app = angular.module("myApp", []);

app.value('someVal', 'This is a val');

app.controller("controller1", ['$scope','someVal', function($scope, someVal){
$scope.someVal = someVal;
}]);

// My actual module
app2 = angular.module("myApp2", ["myApp"]);

app2.controller("searchController", ['$scope', function($scope){
// ... some controller's codes
}]);
<body ng-app='myApp2'>
<div ng-controller='controller1'>
{{someVal}}
</div>
</body>

Note that the HTML is using ‘controller1’ which is from ‘myApp’. I’ve built an app in AngularJs and going to create another one using the similar format. It also means that most of the codes are going to be shared. I’m glad that I don’t have to do much refactoring.

Methods of getting direct feedback from Servers

With the web technology advancing so rapidly and information is getting bigger and flowing faster than ever, many web applications nowadays can’t live without constantly checking for new data from the servers.

The most basic form of a websites deals with HTTP request from the webpage to the server. The user send a request in an url form and the server response with the content that was requested. End of story.

Say that you are viewing a page that tell you how many times a jumping sheep has been jumping and it is jumping in average of once every 1 – 10 minutes, you might want to refresh your page every few minutes if it has been jumping or not. But with today’s technology, there should be a way that we get notified whenever the sheep jumps.

The main challenges of getting a direct feedback is that web application usually support a huge amount of users and they could be located anywhere in the world. It is simpler to think that the server just give the information whenever it’s requested. Instead of being busy trying to send updates to every connected clients (if they are connected). However, let’s explore the possible implementations.

Auto refresh

If you’re building a website in the 1990s, this is probably a very viable options. It’s simple to implement and logically sounds. The requirement is simple, you need to know when the sheep in your server jump, but the server is not capable of getting to you and every one of you. So we set up a simple Javascript to auto refresh the page every few seconds.

The drawback is, page refresh is rarely favourable in the fast moving content packed web applications. By refreshing the page, you will need to refresh other resources needed where we are only interested in a simple integer that tell us if the Sheep has jumped. So, let’s move on.

Ajax Long Pooling

Ajax open up many great possibilities that until this day has become one of the core protocol in building responsive web applications. With Ajax, we can send a request without refreshing the page. This also enable us to implement a much more efficient auto refresh that doesn’t request the full page request. Instead, we send a smaller HTTP request in the background and update the page using Javascript.

In addition to that, we can set the connection to stay alive until we get a response (or time out). Anyhow, this will still involve the web browser to constantly sending new requests every now and then to get updated. One drawback is that the server might be idle for few hours without any new worthy updates, so the resources used to send those requests might just end up in waste.

Web Sockets / Comet / Probably other terms

lastly we have web sockets. With the web servers and browsers needing more and more frequents interaction, there is finally a way that the server can be event driven instead of request driven only.

Web Sockets allow a more interaction connection where the server can be allowed to send response based on events that happens in the server itself. In this way, the web browser (client) doesn’t have to constantly send request for updates, but simply be event driven as well that only reacts when needed.

This will also reduce the overhead of constant new requests that happen between the client and server.

However, web sockets may cause some compatibility issues depending on the server technologies used and also the browser. But we’ve seen an increasing support for this technologies that allow developers to create more and more responsive applications.

One popular technology that works well with Web Sockets is Node.JS. But there are ways to work around Django through Redis or other library that can provide us with this capability.

Reduce SQL Injection Risk in Python and psycopg2

It will be surprising that a slight different in your line of code can have a great impact in preventing SQL injection. Try to spot the difference between these 2 lines below.

# code 1 (with python string format() function)
db.execute("select * from some_table where filter = '{}'".format("going to Sam's party"))
# code 2 (with psycopg2 sql prepare syntax)
db.execute("select * from some_table where filter = %s", ("going to Sam's party"))

It probably looks obviously different, but for a second it looks like it shouldn’t give much different result. But sadly it does.

The first code use the python standard string formatting feature where given a string containing curly brackets as a placeholders like “This is a {}”, and with the format() method, it will fill those placeholders with other strings.

# Example
sample1 = "this is a {}, and also a {}".format("pen", "weapon")
print(sample1)
 
#results
# this is a pen, and also a weapon 

this looks fine for now. But try to do one for the string that we pass into the db.execute() above. If we print the string, it will give you the result below

select * from some_table where filter = 'going to Sam's party'

Notice the extra single quote on the filter? This will cause error and of course opening a whole world of opportunity for SQL injection attack. With the single quote unescaped, the rest of the string can be injected with other commands that will cause serious maintenance headache.

-- example: imagine the replacer string is "bleh'; drop table some_table; insert into user values ('some new malicious users'); --"
-- your query will become
select * from some_table where filter = 'bleh'; drop table some_table; insert into user values ('some new malicious users'); --'
-- note that double dash (--) is used for commenting. So the extra single quote will be ignored.

So, why does code 2 is a better way of string replacements? Because it has built in special character escaping mechanism in which all strings that are passed thorough this method will remain as a string instead of becoming a malicious codes.

db.execute("select * from some_table where filter = %s", ("bleh'; drop table some_table; insert into user values ('some new malicious users'); --"))
the code above will produce sql below

 

select * from some_table where filter = E'bleh\'; drop table some_table; insert into user values (\'some new malicious users\'); --'

Slow fgets in PHP. or does it?

We have a piece of codes that looks something like this

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');

while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer = $new_buffer;
  fwrite($new_file, $buffer);
}

...

It all works fine and very speedy all the while until one day we decided to run this piece of codes from a network drive.

Oh ya, did I mention that it’s running on a Windows Server?

Good.

When the codes above is running on the Network Drive, it slows down to almost 1 minutes per file, which normally only takes 1 seconds or less on a local drive. What went wrong?

Probably just like any debuggers will do, I went and print the execution time needed for each blocks of codes. Given that a while loop is a while loop, I didn’t put a timer inside the while loop, but only before and after the while loop.

The conclusion was that this particular while loop is taking most of the processing time. And as what most modern programmers will do, we ask Google about it.

“PHP fgets slow on network” I searched. Lot’s of results are complaining that fgets is slow. Hmm, now I know. But what’s the alternative? many. But hmm TL:DR. Too long, let’s find another shorter more to the point article (probably not this one I’m writing).

Searching on I read somewhere suggesting to comment out the fwrite() and try again. I did that, and it works faster. And of course it doesn’t solve my problem, but it seems that I’m going the right way. And voila, move the fwrite outside of the while loop, and it goes back to the better speed.

...
$old_file = fopen($filename, 'r');
$new_file = fopen($new_filename, 'w');
$buffer_holder = "";
while($buffer = fgets($old_file)) {
  ... \\ $buffer is being edited here
  $buffer_holder .= $new_buffer; 
}
fwrite($new_file, $buffer_holder);
...

So in short, don’t write into file line by line, write them at once. This way, you don’t have to go back and fort from Malaysia to US, but one trip is enough.

a real man only takes one trip

From Unicorn to Unicode

What is worse than knowing that Unicorn exists in some other dimensions but you will never be able to see it?

My answer will be the xA0 character from some encoding world that I don’t even know to exist. Being an Earthling, the only encoding world I’ve been and known is the Unicode. More specifically the UTF-8 realm.

Interestingly, many Unicode based systems reject the xA0 (or any nonconvertible characters) and totally crashes the system. Take Python for example, and also PostgreSQL later on.

Python

In Python, there is a function call unicode() that convert a string from other encoding to Unicode.

unicode(object[encoding[errors]])

However, the “errors” handling is defaulted to “strict”. It means that it will complain that “Something is wrong” whenever there is an error. Basically it means that it will break the system when there is an untranslatable character in the object that you are trying to convert.

There are two other options in handling conversion errors.

  • “replace” to replace the untranslatable character to the official Unicode replacement character
  • “ignore” basically replace the untranslatable character with an empty string.

PostgreSQL

When inserting non Unicode strings into an UTF-8 (Unicode based) databases, PostgreSQL will try to translate them first. Same thing will happen if the said string contain an untranslatable character, it will throw you an error.

This can be a hell of a problem because it technically break your system if your system is a one of those systems that process input and save them into a database.

So the solution is usually to try to catch these unicorns before they escaped into the database.

The adventure of the Old Mac line breaker in the Python world

There are many representations of a new line, End Of Line indicator, or a line breaker. You probably heard of the terms Line Feed (LF) and Carriage Return (CR). They are technically characters like the letter “A” and small letter “a”. But instead of printing the letter, they tell the system that it’s the end of a line. However, different computer system uses these 2 common characters in different ways but let’s narrow it down into the 2 most common ones, namely the Unix version “LF” and the Windows version “CR+LF”. But wait a minute, there is this Old Mac version as well that uses only CR character to represent the end of line.

Interestingly in the Python’s universe (and probably some other even more racist universes), the Old Mac convention is by default not a line breaker. If you read a file full of lines that only ends with “CR” using the standard file open() function in Python, they will come out as a single line text.

As a slightly less racist developer, we need to build applications that can support as many types of stuff as possible. Here are 2 tricks to help you ensure the file you are reading is read properly the next time you use it.

When reading a file

# Use the 'rU' mode so it understand the Old Mac properly
file = open('filename', 'rU')

If you happen to be working with File upload in Django, this might be useful

# http://stackoverflow.com/questions/1875956/how-can-i-access-an-uploaded-file-in-universal-newline-mode

# First, read the uploaded file and convert it to unicode using unicode() function
# Second, stream the file using io.StringIO function with the Universal-newline mode turn on by setting newline=None
import io
stream = io.StringIO(unicode(request.FILES['foo'].read()), newline=None)

Slicker way to export crosstab in Tableau

Tableau has an awesome feature to allow users to export the processed data into crosstab, or I would say Comma Separated Value format.

We use Tableau to help us visualize data into graphs and charts. And our raw data is often messy and huge and doesn’t really make sense in that granular state. So the charts and graphs are really much better to look at. And even though some users are happy enough by looking at charts, some others find it important that we can get the post-calculated raw date, i.e. the numeric equivalent of the given chart.

Image

But when it comes to a grander scale of a few thousands rows of records, The Export Crosstab to Excel just doesn’t work well. It took a very long time to work, and that if it works at all.

So, instead of exporting it to Excel, use Copy -> Crosstab function. This function is surprisingly much faster than exporting to Excel.

Image

As a comparison, exporting 50,000 rows to Excel will take forever, and never but copying the same 50,000 rows to clipboard only takes 5 seconds.

The Magical Tableau Filtering and how to get the Tableau result limiting right

NOTE: this tutorial is based on the coffee shop sample data came with tableau desktop.

The other days long time ago, we all thought that there is a simple way to limit the results of a tableau report. Even though it’s a bit of an unwritten magic, but the legend stays true that it’s quite simple indeed.

Goal

Say that you have a coffee shop chain that is doing so well it has branches in all 4 corners of the imaginary planet of Zox. Now you have no other thing to do, you decided to look at the ups and downs of your profit throughout the year. So you decided to make a chart that looks something like this

Image

 

Now you can see how much sales your coffee shop is making along the year in a very nice chart. But it’s too much information on the chart, so you decided to just show the top 3 products based on its sales.

To do that, you pulled the product field from the Dimensions field box into the Filters box on the top left corner. A dialog window will pop up and navigate to the “Top” tab. You set it to only show the top 3 products based on the total sales of that product.

Image

Image

Problem

All looks good so far. After looking at the chart for like 10 seconds, you got bored again. So you decided to add more fun to the chart. “How about getting the top 3 products of each product type you have, Coffee, Espresso, Herbal Tea and Tea” you thought, then you proceed to pull the Product Type field into the Filters box and make the selection box appear on the right side of the chart.

Image

(Right click on the Product Type filter and click “Show Quick Filter”)

and tada… everything looks wrong now. It only shows 1 product, instead of Top 3 products as you’ve commanded it to show.

Image

 

Solution

So you play around with the filter on the right and figured that it might be the culprit. You looked back at the pictures that you’ve screenshotted and noticed something that might be useful, the “Add to Context” option on the right click menu.

Image

You do that to the Product Type filter and voila!! it works!!

Image

Conclusion

Tableau filter by default doesn’t apply to the “Top x” filter. By adding a filter to context, it’s specifically telling tableau that the filter is to be taken contextually. Without the “add to context” option, the “Top x” will get the top x across the product type, as if the filter is not there at all.

This problem had costed me a million seconds to fix, but I hope that this short article will save some of your time.

Alteryx encoding

Have you ever get so fed up because there is no such thing as “Encoding” tool in the vast collection of Alteryx tools and you are getting a non “CSV” export file filled with unreadable characters in it? Well, You’ve came to the right place.

Alteryx uses the term “code pages” to describe encoding, or UTF-8 or Character Sets setting. What the hack is code page anyway? can’t you just say “Encoding” or “Character Set” so it’s more commonly used nowadays?

Anyway, here is the solution. First, search the Alteryx help index for “code page” and it will list you a list of encoding standards with its respective code.

Image

Now, comes the biggest spoiler, the encoding tool is hidden inside the Formula tool as a “function” and named ConvertToCodePage and ConvertFromCodePage.

Image

 

 

 

Follow

Get every new post delivered to your Inbox.