Phelio Gnomi

My words from my mind

Monthly Archives: May 2014

From Unicorn to Unicode

What is worse than knowing that Unicorn exists in some other dimensions but you will never be able to see it?

My answer will be the xA0 character from some encoding world that I don’t even know to exist. Being an Earthling, the only encoding world I’ve been and known is the Unicode. More specifically the UTF-8 realm.

Interestingly, many Unicode based systems reject the xA0 (or any nonconvertible characters) and totally crashes the system. Take Python for example, and also PostgreSQL later on.

Python

In Python, there is a function call unicode() that convert a string from other encoding to Unicode.

unicode(object[encoding[errors]])

However, the “errors” handling is defaulted to “strict”. It means that it will complain that “Something is wrong” whenever there is an error. Basically it means that it will break the system when there is an untranslatable character in the object that you are trying to convert.

There are two other options in handling conversion errors.

  • “replace” to replace the untranslatable character to the official Unicode replacement character
  • “ignore” basically replace the untranslatable character with an empty string.

PostgreSQL

When inserting non Unicode strings into an UTF-8 (Unicode based) databases, PostgreSQL will try to translate them first. Same thing will happen if the said string contain an untranslatable character, it will throw you an error.

This can be a hell of a problem because it technically break your system if your system is a one of those systems that process input and save them into a database.

So the solution is usually to try to catch these unicorns before they escaped into the database.

The adventure of the Old Mac line breaker in the Python world

There are many representations of a new line, End Of Line indicator, or a line breaker. You probably heard of the terms Line Feed (LF) and Carriage Return (CR). They are technically characters like the letter “A” and small letter “a”. But instead of printing the letter, they tell the system that it’s the end of a line. However, different computer system uses these 2 common characters in different ways but let’s narrow it down into the 2 most common ones, namely the Unix version “LF” and the Windows version “CR+LF”. But wait a minute, there is this Old Mac version as well that uses only CR character to represent the end of line.

Interestingly in the Python’s universe (and probably some other even more racist universes), the Old Mac convention is by default not a line breaker. If you read a file full of lines that only ends with “CR” using the standard file open() function in Python, they will come out as a single line text.

As a slightly less racist developer, we need to build applications that can support as many types of stuff as possible. Here are 2 tricks to help you ensure the file you are reading is read properly the next time you use it.

When reading a file

# Use the 'rU' mode so it understand the Old Mac properly
file = open('filename', 'rU')

If you happen to be working with File upload in Django, this might be useful

# http://stackoverflow.com/questions/1875956/how-can-i-access-an-uploaded-file-in-universal-newline-mode

# First, read the uploaded file and convert it to unicode using unicode() function
# Second, stream the file using io.StringIO function with the Universal-newline mode turn on by setting newline=None
import io
stream = io.StringIO(unicode(request.FILES['foo'].read()), newline=None)

Slicker way to export crosstab in Tableau

Tableau has an awesome feature to allow users to export the processed data into crosstab, or I would say Comma Separated Value format.

We use Tableau to help us visualize data into graphs and charts. And our raw data is often messy and huge and doesn’t really make sense in that granular state. So the charts and graphs are really much better to look at. And even though some users are happy enough by looking at charts, some others find it important that we can get the post-calculated raw date, i.e. the numeric equivalent of the given chart.

Image

But when it comes to a grander scale of a few thousands rows of records, The Export Crosstab to Excel just doesn’t work well. It took a very long time to work, and that if it works at all.

So, instead of exporting it to Excel, use Copy -> Crosstab function. This function is surprisingly much faster than exporting to Excel.

Image

As a comparison, exporting 50,000 rows to Excel will take forever, and never but copying the same 50,000 rows to clipboard only takes 5 seconds.

The Magical Tableau Filtering and how to get the Tableau result limiting right

NOTE: this tutorial is based on the coffee shop sample data came with tableau desktop.

The other days long time ago, we all thought that there is a simple way to limit the results of a tableau report. Even though it’s a bit of an unwritten magic, but the legend stays true that it’s quite simple indeed.

Goal

Say that you have a coffee shop chain that is doing so well it has branches in all 4 corners of the imaginary planet of Zox. Now you have no other thing to do, you decided to look at the ups and downs of your profit throughout the year. So you decided to make a chart that looks something like this

Image

 

Now you can see how much sales your coffee shop is making along the year in a very nice chart. But it’s too much information on the chart, so you decided to just show the top 3 products based on its sales.

To do that, you pulled the product field from the Dimensions field box into the Filters box on the top left corner. A dialog window will pop up and navigate to the “Top” tab. You set it to only show the top 3 products based on the total sales of that product.

Image

Image

Problem

All looks good so far. After looking at the chart for like 10 seconds, you got bored again. So you decided to add more fun to the chart. “How about getting the top 3 products of each product type you have, Coffee, Espresso, Herbal Tea and Tea” you thought, then you proceed to pull the Product Type field into the Filters box and make the selection box appear on the right side of the chart.

Image

(Right click on the Product Type filter and click “Show Quick Filter”)

and tada… everything looks wrong now. It only shows 1 product, instead of Top 3 products as you’ve commanded it to show.

Image

 

Solution

So you play around with the filter on the right and figured that it might be the culprit. You looked back at the pictures that you’ve screenshotted and noticed something that might be useful, the “Add to Context” option on the right click menu.

Image

You do that to the Product Type filter and voila!! it works!!

Image

Conclusion

Tableau filter by default doesn’t apply to the “Top x” filter. By adding a filter to context, it’s specifically telling tableau that the filter is to be taken contextually. Without the “add to context” option, the “Top x” will get the top x across the product type, as if the filter is not there at all.

This problem had costed me a million seconds to fix, but I hope that this short article will save some of your time.

Alteryx encoding

Have you ever get so fed up because there is no such thing as “Encoding” tool in the vast collection of Alteryx tools and you are getting a non “CSV” export file filled with unreadable characters in it? Well, You’ve came to the right place.

Alteryx uses the term “code pages” to describe encoding, or UTF-8 or Character Sets setting. What the hack is code page anyway? can’t you just say “Encoding” or “Character Set” so it’s more commonly used nowadays?

Anyway, here is the solution. First, search the Alteryx help index for “code page” and it will list you a list of encoding standards with its respective code.

Image

Now, comes the biggest spoiler, the encoding tool is hidden inside the Formula tool as a “function” and named ConvertToCodePage and ConvertFromCodePage.

Image