May 18, 2017

Randking Pointwise or Pairwise

You can do ranking in two ways.

1- Pointwise
Training:
x1     engagement prob
x2     engagement prob
...


then train a model that gives a score for each and sort by score.


2- Pairwise
x1 precedes x2           1
x1 precedes x3           1
x4 precedes x2           0

have n choose 2 training like above and give scores to test set pairs in the same way.


Or you can do learn to rank: https://en.wikipedia.org/wiki/Learning_to_rank

March 6, 2017

Speed up Deep Learnign threading - Hogwild Training

One of the major issues in training deep learning models is that there is a TON of data and training as a single threaded process takes a lot of time. Here comes Hog Wild Training.

February 20, 2017

Regret Nothing

Regret nothing. Not the cruel novels you readto the end just to find out who killed the cook.
Not the insipid movies that made you cry in the dark.
Not the lover you left quivering in a hotel parking lot,
or the one who left you in your red dress and shoes,
the ones that crimped your toes,
don’t regret those.

October 3, 2016

Classifier Probability Calibration

One of the desired features of a classifier is that it generates probabilities for us. So we know with what probability a sample belongs to class A or class B (in the case of a binary classification problem). However it has been shown that the probabilities that the classifiers produce are not always correct.

Being on the Right Path

Yesterday I was hiking along hte lines in the Ocean Beach in San Francisco. On my way back from the Legion of Honor back to the beach around the sunset time I saw an inter-continental ship sailing right towards the setting sun. In exact direct line. What a beautiful scenery. I thought this ship is on its way to salvation. Right towards the sun. Towards the light.

September 8, 2016

خلاصه ی کتاب دو قرن سکوت - زرینکوب

از مسایل مهم این کتاب اینه که بر خلاف این عرق ملی که ما الان داریم و رگ گردنمون بر می خیزه برای شکوه ایران زمین، در اون دوران طرز فکر مردم به نحو دیگه ایی بوده. مردم دل خوشی از دولت ساسانی نداشتن. دین هایی مثل آیین مانی که الهام از شرق (بودا) و غرب (مسیحیت) داشته یا آیین مزدک که الهاماتی از آیین زرتشت داشته یا حتی خود آیین بودا و یا مسیح این ها همه حکومت رو که به آتشکده و معابد بهای زیادی می داده تا موبدان بر حکومت خدا داد شاه صحه بگذارن به شدت می لرزونده. فساد زیادی در دین و میان موبدان زرتشتی به لحاظ سیاسی شدنش وجود داشته. حکومت ساسانی به شدت متزلزل بوده و نمی تونسته قدرت یک پارچه متمرکز رو تو چنگ خودش داشته باشه و هرج و مرج جامعه رو فراگرفته بوده. همین مسایل به ظاهر کوچکه که باعث میشه اعرابی که هیچی نداشتن کل مملکت رو به تسخیر خودشون دربیارن. اون ها با ارایه دینی که همه در نزد خدا برابر و برادرند مگر آنهایی که تقوی پیشه کنند بخش زیادی از ایرانیان رو مشتاق اون ایدولوژی کردند، آنهایی هم که نپذیرفتند یا باید جزیه می پرداختند و تحقیر می شدند و یا از دم تیغ می گذشتند. کم کم ایرانی ها متوجه شدند حکومت خلیفه اون ادعاهای اسلام رو برقرار نکرد و با استثمار مردم و کشور گشایی و زور گویی تنها جایگاه خودش رو با اسم جهاد در برابر کفار گسترده کرد.

از سوی دیگه، 



July 24, 2016

Applying Big Data Technology To Remote Sensing For Species Identification

 Understanding the processes governing ecological systems from local to global scales is crucial to determining how they will respond to and influence environmental, economical and geopolitical issues such as climate change, invasive species, fire hazards, and land use change. To collect the data necessary to model ecological processes across scales the National Ecological Observatory Network (NEON) was built starting in 2012 to conduct intensive monitoring and measurements across the United States. Hundreds of ecological and environmental data products ranging from small local samples to large scale remote sensing using aircraft will be monitored across over 81 different observatory sites. The volume, velocity, and variety of data generated by this effort is far greater than anything being currently collected or analyzed by ecologists. Therefore maximizing the knowledge gained from this data will require bridging the gap between different disciplines including ecology, computer science, statistics, and data science. To help develop interdisciplinary approaches to working with and understanding these data, we propose an applied, multidisciplinary, multi-modal, big data challenge to NIST Data Science Evaluation (DSE) series to be used as a stepping stone, with an initial focus on using a combination of airborne remote sensing data and field measurements of forests to characterize the structure of the plant community at large scales.


NEON sites across the United States

May 20, 2016

PhD Thesis Preparation

While preparing for thesis in UF latex format there were some issues, here I post the things that I went through:

March 25, 2016

Choke hold for Spark Speedup

In our experiments in Spark on a 64 core machine with 512GB RAM. Spark chokes beyond about 8 cores (~ 6x speedup) and our hypothesis is that the central garbage collector becomes a choke hold which avoids parallelism. This is unavoidable unless you take large chunks of memory per thread and use tricks and local memory managements to avoid a central bottleneck.

January 20, 2016

Markov Logic Networks in Action

Here we are going through packages dealing with Markov Logic Networks (MLNs) which is a Probabilistic Graphical Model (PGM):


Alchemy:


November 20, 2015

Mac Eclipse Debugging So Slow! Life saver tip!

If debugging in eclipse in mac has been so slow for you here is the solution: close the Variables window and what ever you wanna see hover over its variabel or add it to the watch list

Source: http://stackoverflow.com/questions/6893553/why-is-debugging-in-eclipse-pydev-so-slow-for-my-python-program

November 18, 2015

How to Receive Files in your Google Drive from Anyone

A school teacher wants to have a public drop box (not Dropbox) where students can upload homework assignments. A recruiter wants to have an online form where job applicants can upload their resumes. A designer may need a public drop box where clients can upload photographs easily.