Apache Kafka
- Fast, scalable, and fault-tolerant. much faster and resource savvy than DBs, Casandra, HDFS, ...
- Publish-subscribe messaging system between producers and consumers based on topics.
- Used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication.
- A platform for high-end new generation distributed applications.
- It can work in combination with Apache Storm, Apache H Base and Apache Spark for real-time analysis and rendering of streaming data. Integrate Flume with Kafka Kafka can message geospatial data from a fleet of long-haul trucks or sensor data from heating and cooling equipment in office buildings.
Whatever the industry or use case, Kafka brokers massive message streams for low-latency analysis in Enterprise Apache Hadoop.
Common use cases include:
- Stream Processing
- Website Activity Tracking
- Metrics Collection and Monitoring
- Log Aggregation.
- LinkedIn was facing the issue of low latency ingestion of huge amount of data from the website into a lambda architecture which could be able to process real-time events. Since none of the solutions were available to deal with this drawback, Kafka was developed in the year 2010 as a solution to this problem.
To this (decoupled):
What you sacrifice is:
- Ordering: each topic (kind of message) is written to multiple partitions. If you want strict ordering you can use ONE partition for your topic and you get strict ordering! Just like a DB! Just keep in mind to update the configs. Also, try to observe where in your business logic is parallelizable: actions on cart of one customer needs ordering, but carts of different customers can be put into different partitions. So you can use multiple partitions and maintain parallelism.
- Persistence: Kafka stores to disk but discards after default 7 days. So if you want to persist you can write to file/db, etc. New York Times famously uses infinite timing and has kept all its data in Kafka since 1800s.
https://www.youtube.com/watch?v=hyJZP-rgooc
https://www.youtube.com/watch?v=1vLMuWsfMcA
Note: Each partition replicas has a lead and the rest are follower partitions. the lead is the entry point
No comments:
Post a Comment