May 5, 2015

On Deciding between MongoDB vs MySQL

MongoDB offers sharding by key for horizontal scaling when doing fanout wirte or read depending on application: link

On the other hand
most majority of comments online about implementing a social network is to go with MySQL as a relational DB and use MongoDB only for cases when you need to be schema-less. like in messaging or chat. Also, Facebook uses WebScaleSQL built on MySQL, twitter flockDB is based on MySQL. So you can't go wrong.

Facebook does fanout read and twitter goes for fanout write. Facebook calculates everybody's timeline right at read time from the folks they are friend with, twitter wants no delay on read time. link   Twitter uses a massive Redis cluster to store user timelines in-memory and updates each follower of a new tweet. On the other hand, Facebook seems to be using TAO as the abstraction cum caching layer on top of MySQL shards to make the process of on-demand composition of user timelines more performant.  ------- Twitter is more real-time of the two, whereas, Facebook arguably has the more complex social network of the two with softer real-time constraints, which may explain these design decisions.

Facebook uses sharding to split each bunch of users in a different MySQL instance. Data is replicated across several MySQLs in different machines to reliability and performance. this is done in master-slave fashion master is the node that receives writes and slaves replay all the writes master does, but all master/slave nodes can be used for read operations.

Here is a link replicating facebook's architecture with details. facebook shards by user_id
This is the same for twitter. and This is the original twitter implementation (single table, vertical scaling, master-slave replication, memecache for read throughput). Now twitte partitions by time and keeps on looking at partitions until enough data is accumulated. twitter shards by tweet_id

if a shard gets too big go for online shard migration






More refrences
[1] The Architecture Twitter Uses to Deal with 150M Active Users
[2] Timelines at Scale
[3] TAO: Facebook’s Distributed Data Store for the Social Graph [paper]
[4] TAO: Facebook’s Distributed Data Store for the Social Graph [talk]


So we are off to MySQL. Language Nodejs
Hosting:
if you host your own computer at home the electricity cost for a year is 200  * 24 * 365 / 1000 * 0.04  = $70.08  (/1000 to convert from watts to kilowatts) every year, This is given you don't pay for internet.

Virtual Machine hostings are about $20/month for a year: $240  It might make sense to go for one's own hosting for the beginnning let's go for VM hosting and leave home hosting for a later time.

I looked at these hostings:
- https://www.linode.com/pricing
- https://www.digitalocean.com/pricing
- https://www.openshift.com/products/pricing
- https://www.joyent.com/public-cloud/pricing
- http://aws.amazon.com/ec2/pricing

From $20/month onwards linode beat others in services at same price. So we start by a $10/month plan and expand in the future if we get traction

No comments:

Post a Comment