July 7, 2018

What is Apache Kafka? What is the use?

It's been a while I have heard a lot about Apache Kafka. I have read a lot of material and still have no idea what it is good for. Here I write down the gist of some of my findings and hope it helps someone along the way:

June 6, 2018

Tensorflow Serving Inception Model Using Docker

The docs regarding tensorflow serving are pretty convoluted and hard to follow. Here are the easy steps to take to serve Inception model in Docker:

Install docker (e.g. on ubuntu):

sudo apt-get update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \

Download Dockerfile, build its container image, access its shell, download Tensorflow repos and build them (building can take about an hour).

wget https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/tools/docker/Dockerfile.devel
docker build --pull -t $USER/tensorflow-serving-devel -f Dockerfile.devel .
docker run -it $USER/tensorflow-serving-devel
cd ~  # go to home
git clone --recurse-submodules https://github.com/tensorflow/serving
cd serving && git clone --recursive https://github.com/tensorflow/tensorflow.git  
cd tensorflow  && ./configure
cd ..
bazel test tensorflow_serving/...

Once completed we can test it out by running the model server


Output should look like this if install was successful

Usage: model_server [--port=8500] [--enable_batching] [--model_name=my_name] --model_base_path=/path/to/export

Download and run inception model:

curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
tar xzf inception-v3-2016-03-01.tar.gz

 ./serving/bazel-bin/tensorflow_serving/example/inception_saved_model --checkpoint_dir=inception-v3 --output_dir=/tmp/inception-export

ls /tmp/inception-export

ls inception-v3

./serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9002 --model_name=inception --model_base_path=/tmp/inception-export &> inception_log &

wget https://upload.wikimedia.org/wikipedia/en/a/ac/Xiang_Xiang_panda.jpg

./serving/bazel-bin/tensorflow_serving/example/inception_client --server=localhost:9002 --image=./Xiang_Xiang_panda.jpg

By now you should get the result of running the inception model on the panda image printed on your console. If oyu get timeout, just increase timeout from 10 to 30 seconds in the inception_client.py.

Some sources:

May 18, 2017

Ranking Pointwise or Pairwise

You can do ranking in two ways.

1- Pointwise
x1     engagement prob
x2     engagement prob

then train a model that gives a score for each and sort by score.

2- Pairwise
x1 precedes x2           1
x1 precedes x3           1
x4 precedes x2           0

have n choose 2 training like above and give scores to test set pairs in the same way.

Or you can do learn to rank: https://en.wikipedia.org/wiki/Learning_to_rank

March 6, 2017

Speed up Deep Learnign threading - Hogwild Training

One of the major issues in training deep learning models is that there is a TON of data and training as a single threaded process takes a lot of time. Here comes Hog Wild Training.

February 20, 2017

Regret Nothing

Regret nothing. Not the cruel novels you readto the end just to find out who killed the cook.
Not the insipid movies that made you cry in the dark.
Not the lover you left quivering in a hotel parking lot,
or the one who left you in your red dress and shoes,
the ones that crimped your toes,
don’t regret those.

October 3, 2016

Classifier Probability Calibration

One of the desired features of a classifier is that it generates probabilities for us. So we know with what probability a sample belongs to class A or class B (in the case of a binary classification problem). However it has been shown that the probabilities that the classifiers produce are not always correct.

Being on the Right Path

Yesterday I was hiking along hte lines in the Ocean Beach in San Francisco. On my way back from the Legion of Honor back to the beach around the sunset time I saw an inter-continental ship sailing right towards the setting sun. In exact direct line. What a beautiful scenery. I thought this ship is on its way to salvation. Right towards the sun. Towards the light.

September 8, 2016

خلاصه ی کتاب دو قرن سکوت - زرینکوب

از مسایل مهم این کتاب اینه که بر خلاف این عرق ملی که ما الان داریم و رگ گردنمون بر می خیزه برای شکوه ایران زمین، در اون دوران طرز فکر مردم به نحو دیگه ایی بوده. مردم دل خوشی از دولت ساسانی نداشتن. دین هایی مثل آیین مانی که الهام از شرق (بودا) و غرب (مسیحیت) داشته یا آیین مزدک که الهاماتی از آیین زرتشت داشته یا حتی خود آیین بودا و یا مسیح این ها همه حکومت رو که به آتشکده و معابد بهای زیادی می داده تا موبدان بر حکومت خدا داد شاه صحه بگذارن به شدت می لرزونده. فساد زیادی در دین و میان موبدان زرتشتی به لحاظ سیاسی شدنش وجود داشته. حکومت ساسانی به شدت متزلزل بوده و نمی تونسته قدرت یک پارچه متمرکز رو تو چنگ خودش داشته باشه و هرج و مرج جامعه رو فراگرفته بوده. همین مسایل به ظاهر کوچکه که باعث میشه اعرابی که هیچی نداشتن کل مملکت رو به تسخیر خودشون دربیارن. اون ها با ارایه دینی که همه در نزد خدا برابر و برادرند مگر آنهایی که تقوی پیشه کنند بخش زیادی از ایرانیان رو مشتاق اون ایدولوژی کردند، آنهایی هم که نپذیرفتند یا باید جزیه می پرداختند و تحقیر می شدند و یا از دم تیغ می گذشتند. کم کم ایرانی ها متوجه شدند حکومت خلیفه اون ادعاهای اسلام رو برقرار نکرد و با استثمار مردم و کشور گشایی و زور گویی تنها جایگاه خودش رو با اسم جهاد در برابر کفار گسترده کرد.

از سوی دیگه، 

July 24, 2016

Applying Big Data Technology To Remote Sensing For Species Identification

 Understanding the processes governing ecological systems from local to global scales is crucial to determining how they will respond to and influence environmental, economical and geopolitical issues such as climate change, invasive species, fire hazards, and land use change. To collect the data necessary to model ecological processes across scales the National Ecological Observatory Network (NEON) was built starting in 2012 to conduct intensive monitoring and measurements across the United States. Hundreds of ecological and environmental data products ranging from small local samples to large scale remote sensing using aircraft will be monitored across over 81 different observatory sites. The volume, velocity, and variety of data generated by this effort is far greater than anything being currently collected or analyzed by ecologists. Therefore maximizing the knowledge gained from this data will require bridging the gap between different disciplines including ecology, computer science, statistics, and data science. To help develop interdisciplinary approaches to working with and understanding these data, we propose an applied, multidisciplinary, multi-modal, big data challenge to NIST Data Science Evaluation (DSE) series to be used as a stepping stone, with an initial focus on using a combination of airborne remote sensing data and field measurements of forests to characterize the structure of the plant community at large scales.

NEON sites across the United States

May 20, 2016

PhD Thesis Preparation

While preparing for thesis in UF latex format there were some issues, here I post the things that I went through:

March 25, 2016

Choke hold for Spark Speedup

In our experiments in Spark on a 64 core machine with 512GB RAM. Spark chokes beyond about 8 cores (~ 6x speedup) and our hypothesis is that the central garbage collector becomes a choke hold which avoids parallelism. This is unavoidable unless you take large chunks of memory per thread and use tricks and local memory managements to avoid a central bottleneck.

January 20, 2016

Markov Logic Networks in Action

Here we are going through packages dealing with Markov Logic Networks (MLNs) which is a Probabilistic Graphical Model (PGM):