August 5, 2013

How to get all aliases/nicknames/redirects of a wikipedia entity?

You have a wikipedia entity like Boris Berezovsky (businessman): http://en.wikipedia.org/wiki/Boris_Berezovsky_%28businessman%29

What you want is all the nicknames or aliases of this specific entity. Here I will describe how to get this information from DBPedia, Freebase and if you want to stay hard core from wikipedia itself.

To use DBpedia we can use Jena   http://jena.apache.org/  I use scala SBT (maven equivalent for java) to import the dependency http://mvnrepository.com/artifact/org.apache.jena

In the build.sbt file add the following line:
       libraryDependencies += "org.apache.jena" % "jena-core" % "2.7.3"


import com.hp.hpl.jena.rdf.model.Model;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.rdf.model.RDFNode;
import com.hp.hpl.jena.rdf.model.Resource;
import com.hp.hpl.jena.rdf.model.Statement;
import com.hp.hpl.jena.rdf.model.StmtIterator;
import com.hp.hpl.jena.vocabulary.OWL;

public class z {
public static void main(String[] args) {
 String rdfFile = "http://live.dbpedia.org/data/Boris_Berezovsky_%28businessman%29.rdf";
 Model model = ModelFactory.createDefaultModel();
 model.read(rdfFile);
 
 System.out.println("Following onbjects have same owl:sameas property: ");
 StmtIterator statements = model.listStatements((Resource)null, OWL.sameAs, (RDFNode)null);
 
 while(statements.hasNext()){
  Statement statement = statements.nextStatement();
  Resource subject = statement.getSubject();
  if(subject.isAnon()){
   System.out.println(subject.getId());
  }else{
   System.out.println(subject.getURI());
  }
  
  System.out.println("Same as");
  Resource object = (Resource) statement.getObject();
  if(object.isAnon()){
   System.out.println(object);
  } else if(object.isLiteral()){
   System.out.println(object.toString());
  } else if(object.isResource()){
   System.out.println(object.getURI());
  } else{
   System.out.println("none: " + object);
  }
 }
}
}


This will require the RDF link of the entity you are looking for (here: http://live.dbpedia.org/data/Boris_Berezovsky_%28businessman%29.rdf) You can get this from the DBPedia page. The output would be:
Following onbjects have same owl:sameas property:
http://dbpedia.org/resource/Boris_Berezovsky_(businessman)
Same as
http://rdf.freebase.com/ns/m.01ztcq
http://dbpedia.org/resource/Boris_Berezovsky_(businessman)
Same as
http://dbpedia.org/resource/Boris_Berezovsky_(businessman)
 If you go to the freebase page you will notice an entry named Also known as /common/topic/alias which would enumerate all aliases of this person. 

-------------

Ok if you want aliases or nicknames from wikipedia itself, you can use wikipedia API as follows: http://en.wikipedia.org/w/api.php?action=query&list=backlinks&bltitle=Boris_Berezovsky_%28businessman%29&blfilterredir=redirects&bllimit=max&format=json

HAppy Aliasing :)

No comments:

Post a Comment