You know you are working on big data when you reach the capacity of the giant text indexer Lucene. Apparently Lucene has a maximum number of documents that can be processed. Wen you are processing over two million compressed files each of which is composed of up to thousands of HTML files things tend to go wrong.
Checkout below
http://stackoverflow.com/questions/10247309/solr-exception-docid-must-be-0-and-maxdoc-20
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.1.0/org/apache/lucene/index/BaseCompositeReader.java
So the problem is: we are stuck
August 24, 2013
August 18, 2013
How to Know What Everybody Does in a City in a Machine Learning Way?
Suppose you’ve just moved to a new city. You’re a hipster and an
anime fan, so you want to know where the other hipsters and anime geeks
tend to hang out. Of course, as a hipster, you know you can’t just ask, so what do you do?
Here’s the scenario: you scope out a bunch of different
Here’s the scenario: you scope out a bunch of different
August 14, 2013
How to add Twitter Widget to Google Sites?
On your twitter account create a widget displaying your tweets. Store the widget id.
You need a github account. Click Fork on https://gist.github.com/mshahriarinia/6234659
Modify
You need a github account. Click Fork on https://gist.github.com/mshahriarinia/6234659
Modify
August 7, 2013
Generate JSON in Scala & SBT Using Jerkson
In the build.sbt
resolvers += "repo.codahale.com" at "http://repo.codahale.com"
libraryDependencies += "com.codahale" % "jerkson_2.9.1" % "0.5.0"
August 5, 2013
How to get all aliases/nicknames/redirects of a wikipedia entity?
You have a wikipedia entity like Boris Berezovsky (businessman): http://en.wikipedia.org/wiki/Boris_Berezovsky_%28businessman%29
What you want is all the nicknames or aliases of this specific entity. Here I will describe how to get this information from DBPedia, Freebase and if you want to stay hard core from wikipedia itself.
What you want is all the nicknames or aliases of this specific entity. Here I will describe how to get this information from DBPedia, Freebase and if you want to stay hard core from wikipedia itself.
Subscribe to:
Posts (Atom)