Friday, May 25, 2012

Hash read performance (perl,ruby,python,node.js)


After having read the chapter "How Hashes Scale From One To One Million Elements" from the book "Ruby Under A Microscope" (This post does not dispense the reading of the respective chapter), I decided to play around with the subject and make some tests with other languages and do a small analysis of the performance of each. The chosen languages ​​were PerlRubyPython and Javascript (node.js).

The tests focused on measuring the time in ms when retrieving an element from a hash with N elements 10.000 times and were based on tests that were performed on the book.

The tests were run on an Amazon EC2 server instance (free tier) with the following characteristics:
  • Ubuntu Server 12.04 LTS
  • Instance type micro
  • Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
  • 604.364 KB of memory
  • 64-bit platform

Test description

Programs have been implemented for each of the languages ​​to be tested and they are similar as long as possible.The implementation passed by the creation of hashes with size equals to powers of 2, ranging from 1 to 20.

For each of the hashes created, are made 10.000 gets of the value for a key (key=target_index), and measuring the time in milliseconds.

The ​​measured values are placed in an output file for later to be consumed by a program that will generate the graphics.

Comparing Ruby Versions

From the graphs we can see greater improvements from ruby 1.8 to ruby 1.9 so 1.9 is the the way to go.

Time taken to retrieve 10.000 values (ms)

Comparing Perl Versions

Perl is a language with some years and performance has been stable at least since version 5.8, the results show it.Yet I hoped there were more commitment from the community in trying to optimize these values​​.

Time taken to retrieve 10.000 values (ms)

Comparing Python Versions

The versions of python show few variations and show some stability between versions. However python 3.2.3 shows a slight degradation in performance.
I may also say that it's performance is quite interesting.

Time taken to retrieve 10.000 values (ms)

Comparing Languages

Represented in the graph below are the tests for all languages ​​and respective versions.

Time taken to retrieve 10.000 values (ms)

Python was the winner for the interpreted languages. node.js has a great performance however the library used in node.js is not enough accurate to give us reliable values.
You can easily do the tests in your environment using my scripts, the source code used in this tests can be found on Github. I've used rvmperlbrew and pythonbrew to switch between versions, you can do the same if you want. Nevertheless, you can use your installed versions.

Run the tests:

$ ./experiment1.rb
$ ./experiment1.pl
$ ./experiment1.py

The execution of these scripts will create an output file in the form "values.#{lang}-#{version}" (ex: values.perl-5.8.8) for each script.

To generate the graph just run the ruby script (requires gem googlecharts):

$ ./chart.rb

This script will output the link to the googlechart image.

In the near future I might do the same but for hash writes and that will be more interesting.