November 3, 2013

Create a Random Image in Java Pixel by Pixel

Hey check it out how to create a random image in Java pixel by pixel:

        int width = 500;
        int height = 500;

October 23, 2013

Vim and Weirdos!

I always thought vim is for nerds and weirdos. I have been using vim for three years now but I always looked at it as a chore, something that if I don't have to use it I won't! Who would use vim as long as there are sexy IDEs out there?

Until I little by little got familiar with some of its feartures. Like speak vim: di" will Delete Inside Quotation marks, or there are other verb/ nouns there. Today I tried vim plugins for the first time: I started with AutoComplPop, this

October 14, 2013

Twitter Analytics Infrastructure: Parquet, Storm - Analytics @ WebScale

I was watching a talk by Dmitriy Ryaboy who is the engineering manager on Twitter's analytics infrastructure team. The talk was about Twitter Analytics Infrastructure like Parquet, Storm,... at the Analytics @ WebScale conference and making sure they provide the right and optimized tools /infrastructure for different teams there to use .

So let's get dirty:
Logs are available in HDFS and stored in Apache Thrift

October 8, 2013

Workflow Management Azkaban & Oozie

Here I went to a meetup that talked about workflow management tools namely Azkaban and Oozie.
So imagine you have a long process of jobs in different programming languages, libraries etc. and you are handling each time e.g. manually run them or check their exit status, or write a shell script with ifs/elses to proceed to the rest of the jobs.  That is UGLY!

So Azkaban will take care o this for you in a nice GUI interface and tell you which ones succeeded or failed, and email you if they failed or succeeded. Now I like that!

Apparently it is not able to take care of creating directories or etc based on the workflow job so for those stuff you still need to have your own bash script.

September 18, 2013

Node.js and Benchmarking it on Mac

I'm looking through node.js which is a webserver with javascript code. So one of the cool features is that all of its commands are non-blocking! the server just keeps running. There is no mutex or multi-thread locks or deadlocks etc.

If you test it with ab command which is apache benchmark for webservers. ab already ships in Mac

September 10, 2013

Hadoop Meetup

Today I went to a Hadoop Meetup event where people shared their fair piece of wrestling with Hadoop. The discussion were mostly around search, scalability, fault tolerance, workflow and responsiveness. There were a ton of challenges that from outside you might think that they are not big of a deal but when you get to do them it takes a good man's efforts to solve. The main host was the Grooveshark music streaming company with over 60 employees in two offices one here in Gaiensville, FL and another in New York. The next meeting we would be talking about Hadoop workflow, Oozie by Yahoo, Azkaban by Linkedin, etc.

August 24, 2013

Push the Limits of Big Data

You know you are working on big data when you reach the capacity of the giant text indexer Lucene. Apparently Lucene has a maximum number of documents that can be processed. Wen you are processing over two million compressed files each of which is composed of up to thousands of HTML files things tend to go wrong.

Checkout below

So the problem is: we are stuck

August 18, 2013

How to Know What Everybody Does in a City in a Machine Learning Way?

Suppose you’ve just moved to a new city. You’re a hipster and an anime fan, so you want to know where the other hipsters and anime geeks tend to hang out. Of course, as a hipster, you know you can’t just ask, so what do you do?
Here’s the scenario: you scope out a bunch of different

August 14, 2013

How to add Twitter Widget to Google Sites?

On your twitter account create a widget displaying your tweets. Store the widget id.
You need a github account. Click Fork on

August 7, 2013

Generate JSON in Scala & SBT Using Jerkson

In the build.sbt
resolvers += "" at ""
libraryDependencies += "com.codahale" % "jerkson_2.9.1" % "0.5.0"

August 5, 2013

How to get all aliases/nicknames/redirects of a wikipedia entity?

You have a wikipedia entity like Boris Berezovsky (businessman):

What you want is all the nicknames or aliases of this specific entity. Here I will describe how to get this information from DBPedia, Freebase and if you want to stay hard core from wikipedia itself.

July 8, 2013

Twitter Scalable Archtecture

Three is a talk Timelines at Scale by Raffi Krikorian @raffi.

The take away:

Outliers, those with huge follower lists, are becoming a common case. Sending a tweet from a user with a lot of followers, that is with a large fanout, can be slow. Twitter tries to

June 22, 2013

Why It is not Good to apply for a Milage Credit Card?

I did a thorough research on whether or not to apply for a milage credit card. Here is the final analysis:

June 17, 2013

Million Query Track - Knowledgebase Acceleration

Million Query Track

The Million Query Track serves two purposes.  First, it is an exploration of ad-hoc retrieval on a large collection of documents.  Second, it investigates questions of system evaluation, particularly whether it is better to evaluate using many shallow judgments or fewer thorough judgments and whether small sets of judgments are reusable. Participants in this track will run up to 40,000 queries against a large collection of web documents at least once. These queries will be classified by assessors as "precision-oriented" or "recall-oriented". Participants can, if so motivated, try to determine what the query class is and choose ranking algorithms specialized for each class.

The task is a standard ad hoc retrieval task, with the added feature that queries will be classified by hardness and by whether the user's intent is to find as much information about the topic as possible ("recall-oriented"), or to find one or a few highly-relevant documents about one particular aspect ("precision-oriented").
Here are the query types:

June 8, 2013

Wresting with java on running processes

Take away of a couple of hours of wrestling with java: Java is stupid in taking care of running external processes and loses track. I executed around two million system processes, to get there on big data processing.

(pool-5-thread-5) java.lang.OutOfMemoryError: unable to create new native thread   

Also generated error was: not enough resources to run the next processes.

The fix:

June 4, 2013

Myth For Loop O(n) but O(n^2)

What's the time complexity of
for(int i=0; i < linkedlist.size(); i++){

Contrary to the notion of that a simple for loop should be O(n), the above example is O(n^2). The reason being

May 31, 2013

How to(Steps)Getting Started in Cloud: Amazon EC2


1) Create an account on, you will need a credit card to enter but you won't be charged that's for security purposes.
2) Hover your mouse over My Account / Console, click AWS Management Console. This will take you to
3) Click on EC2, This will take you to
4) On the left pane of the screen click on Key Pairs. Click Create Key Pair. Enter a name for your key click create. Your key -pair will be created and your private key will start downloading. Save it at a  safe place.
5) Get back to EC2 Console   Click on Launch Instance, this will take you to  
6) Quick Launch Wizard > Choose a Launch Configuration, scroll down

May 20, 2013

Make Java Memory Efficient

1. in getting substrings from a BIG string make a new string like here:
String mySubstring = new String(orig.split(";")[i]); 
OR new String(offset + beginIndex, endIndex - beginIndex, value);
Taken from here and here.

If you don't mind sacrificing a little accuracy vs more memory efficeiency use bloomfilter instead of hashmap.
Taken from here and here.

Use efficient designs: stuff like Flyweight Pattern! مگس وزن
use as many abstractions as you can, don't worry about it, it would pay off

I didn't know there is a method called intern for String! Note that

May 19, 2013

How to master your tmux like a BOSS! Generate tmux panes

Here is a killer point! Whatever you do with your tmux call  

$ tmux source-file .tmux.conf   

So I was monitoring several processes on a server from my laptop and whenever I connected to an ssh and lost my connection my tmux would freeze and I had to rebuild the whole structure:

First row:

May 18, 2013

Why Java Calculates Date Time Wrong

I needed to iterate through a date-time range and observed some strange behavior from Java. So I want to iterate hour by hour from 2011-10-05-00 until  2013-02-13-23. The code I was expecting was
DateFormat format = new SimpleDateFormat("yyyy-MM-dd HH");
Calendar c = new GregorianCalendar(2011, 10, 5, 0, 0);  
Calendar cEnd = new GregorianCalendar(2013, 2, 14, 00, 0);
while (c.getTime().before(cEnd.getTime())) {
 System.out.print(format.format(c.getTime()) + "[");
 System.out.println(c.getTimeInMillis() + "](" + cEnd.getTimeInMillis() + ")");
 c.add(Calendar.HOUR, 1);

For some strange reason that I haven't figured out yet this will iterate from 2011-10-05-00 until 2013-03-13 23 (one month later than I'd expected!).

Parse the string first!

May 10, 2013

Vim colorscheme Mustang

I was having some issues setting up a custom color scheme for vim.

Download a colorsheme e.g. mustang from here move it to ~/.vim/colors
In your ~/.vimrc add line
       colorscheme mustang
I was getting

  Error detected while processing /home/morteza/.vimrc:
  line    2:
  E185: Cannot find color scheme mustang
  Press ENTER or type command to continue
To fix this rename

What if your connection drops while you are in the middle of SSH?

If your connection drops in the middle of an ssh session, the TTL of your session will terminate your job running via that session after a while. To avoid

May 1, 2013

Java CPU/Memory Heap Usage Monitoring A.K.A. Java Profiling

Java Profiling: One of the important tools needed when working with java is the use of profiling tools to tell you about the memory consumption and CPU usage of each method. VisualVM is a free and easy to use tool. Java profiling will tell you how heap is being used, Garbage Collector operations, threads activities, CPU time of each method, which object is taking most of memory and etc.

To set it up

April 18, 2013

How to master git rebase like a boss!

once your work on the forked project is done, then is the time to get everything back to the upstream.

$ git fetch upstream; git checkout master; git rebase upstream/master

if everything is ok this will go through hands-down. if there are merge conflicts you will be prompted, then

$ git rebase --continue; git rebase --skip; git mergetool

also every once in a while execute the following command to get rid of the .orig files that would appear after a merge

    $ find . -name "*.orig" -print0 | xargs -0 rm

note: You can use kdiff3 as your merge tool.

 More on Git

March 29, 2013

Java and Closures

I found that java is in the process of adding function closures (function as first class citizen) to its syntax.

Here is a good example

March 25, 2013

How to check Scala/Java in real Bytecode?

Imagine you have the following code in  AntiTailRecursive.scala

object bc {

  def factorial(n: BigInt): BigInt = {
    if (n == 0)
    else {
      val t1 = n - 1
      val t2 = factorial(t1)
      n * t2

  def main(args: Array[String]): Unit = {


$ scalac AntiTailRecursive.scala  #compile the code
$ scala b # verify that the code works
$ javap -l -p -c -v b     #To view the java byte code of this scala source code/class:

March 24, 2013

Alternative Way to Master Ubuntu

Recently I found some new tools that make you a master using linux specially ubuntu.

Try Krusader which is a file manager instead of Nautilus. Nautilus is cool but doesn't give you the feelign fo a good file manager.
Krudsader benefits:
tools to compare files,
open terminal here,

March 6, 2013

Peer-to-Peer Command-Line Chat in Go Lang

I just implemented a peer to peer (p2p) command-line chat in Go language version as an example to start with. I start to like Go!

You can view the source code here on Github:

Some notes :

1. Spent quite some time on finding and setting up debuggers

January 20, 2013

Annoying \r\n Bug

That awkward moment when you realize isn't doing something right, you check online realizing that it is a bug and there is a new update adderssing that specific issue as the first most important change!

notepad++ did not recognize \r\n in regular expression mode and that was annoying!

notepad++ from version npp.5.8.5 to version npp.6.2.3 fixes the bug:
Notepad++ v6.2.3 new features and fixed bugs: