August 24, 2013

Push the Limits of Big Data

You know you are working on big data when you reach the capacity of the giant text indexer Lucene. Apparently Lucene has a maximum number of documents that can be processed. Wen you are processing over two million compressed files each of which is composed of up to thousands of HTML files things tend to go wrong.

Checkout below
http://stackoverflow.com/questions/10247309/solr-exception-docid-must-be-0-and-maxdoc-20
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.1.0/org/apache/lucene/index/BaseCompositeReader.java

So the problem is: we are stuck

August 18, 2013

How to Know What Everybody Does in a City in a Machine Learning Way?

Suppose you’ve just moved to a new city. You’re a hipster and an anime fan, so you want to know where the other hipsters and anime geeks tend to hang out. Of course, as a hipster, you know you can’t just ask, so what do you do?
Here’s the scenario: you scope out a bunch of different

August 14, 2013

How to add Twitter Widget to Google Sites?

On your twitter account create a widget displaying your tweets. Store the widget id.
You need a github account. Click Fork on https://gist.github.com/mshahriarinia/6234659
Modify

August 7, 2013

Generate JSON in Scala & SBT Using Jerkson

In the build.sbt
resolvers += "repo.codahale.com" at "http://repo.codahale.com"
libraryDependencies += "com.codahale" % "jerkson_2.9.1" % "0.5.0"

August 5, 2013

How to get all aliases/nicknames/redirects of a wikipedia entity?

You have a wikipedia entity like Boris Berezovsky (businessman): http://en.wikipedia.org/wiki/Boris_Berezovsky_%28businessman%29

What you want is all the nicknames or aliases of this specific entity. Here I will describe how to get this information from DBPedia, Freebase and if you want to stay hard core from wikipedia itself.