October 3, 2016

Classifier Probability Calibration

One of the desired features of a classifier is that it generates probabilities for us. So we know with what probability a sample belongs to class A or class B (in the case of a binary classification problem). However it has been shown that the probabilities that the classifiers produce are not always correct.

Being on the Right Path

Yesterday I was hiking along hte lines in the Ocean Beach in San Francisco. On my way back from the Legion of Honor back to the beach around the sunset time I saw an inter-continental ship sailing right towards the setting sun. In exact direct line. What a beautiful scenery. I thought this ship is on its way to salvation. Right towards the sun. Towards the light.

September 8, 2016

خلاصه ی کتاب دو قرن سکوت - زرینکوب

از مسایل مهم این کتاب اینه که بر خلاف این عرق ملی که ما الان داریم و رگ گردنمون بر می خیزه برای شکوه ایران زمین، در اون دوران طرز فکر مردم به نحو دیگه ایی بوده. مردم دل خوشی از دولت ساسانی نداشتن. دین هایی مثل آیین مانی که الهام از شرق (بودا) و غرب (مسیحیت) داشته یا آیین مزدک که الهاماتی از آیین زرتشت داشته یا حتی خود آیین بودا و یا مسیح این ها همه حکومت رو که به آتشکده و معابد بهای زیادی می داده تا موبدان بر حکومت خدا داد شاه صحه بگذارن به شدت می لرزونده. فساد زیادی در دین و میان موبدان زرتشتی به لحاظ سیاسی شدنش وجود داشته. حکومت ساسانی به شدت متزلزل بوده و نمی تونسته قدرت یک پارچه متمرکز رو تو چنگ خودش داشته باشه و هرج و مرج جامعه رو فراگرفته بوده. همین مسایل به ظاهر کوچکه که باعث میشه اعرابی که هیچی نداشتن کل مملکت رو به تسخیر خودشون دربیارن. اون ها با ارایه دینی که همه در نزد خدا برابر و برادرند مگر آنهایی که تقوی پیشه کنند بخش زیادی از ایرانیان رو مشتاق اون ایدولوژی کردند، آنهایی هم که نپذیرفتند یا باید جزیه می پرداختند و تحقیر می شدند و یا از دم تیغ می گذشتند. کم کم ایرانی ها متوجه شدند حکومت خلیفه اون ادعاهای اسلام رو برقرار نکرد و با استثمار مردم و کشور گشایی و زور گویی تنها جایگاه خودش رو با اسم جهاد در برابر کفار گسترده کرد.

از سوی دیگه، 

July 24, 2016

Applying Big Data Technology To Remote Sensing For Species Identification

 Understanding the processes governing ecological systems from local to global scales is crucial to determining how they will respond to and influence environmental, economical and geopolitical issues such as climate change, invasive species, fire hazards, and land use change. To collect the data necessary to model ecological processes across scales the National Ecological Observatory Network (NEON) was built starting in 2012 to conduct intensive monitoring and measurements across the United States. Hundreds of ecological and environmental data products ranging from small local samples to large scale remote sensing using aircraft will be monitored across over 81 different observatory sites. The volume, velocity, and variety of data generated by this effort is far greater than anything being currently collected or analyzed by ecologists. Therefore maximizing the knowledge gained from this data will require bridging the gap between different disciplines including ecology, computer science, statistics, and data science. To help develop interdisciplinary approaches to working with and understanding these data, we propose an applied, multidisciplinary, multi-modal, big data challenge to NIST Data Science Evaluation (DSE) series to be used as a stepping stone, with an initial focus on using a combination of airborne remote sensing data and field measurements of forests to characterize the structure of the plant community at large scales.

NEON sites across the United States

May 20, 2016

PhD Thesis Preparation

While preparing for thesis in UF latex format there were some issues, here I post the things that I went through:

March 25, 2016

Choke hold for Spark Speedup

In our experiments in Spark on a 64 core machine with 512GB RAM. Spark chokes beyond about 8 cores (~ 6x speedup) and our hypothesis is that the central garbage collector becomes a choke hold which avoids parallelism. This is unavoidable unless you take large chunks of memory per thread and use tricks and local memory managements to avoid a central bottleneck.

January 20, 2016

Markov Logic Networks in Action

Here we are going through packages dealing with Markov Logic Networks (MLNs) which is a Probabilistic Graphical Model (PGM):