Kafka is an
event streaming platform, which provides following functionalities
An event has 3 main properties
Each event is a part of a stream, which stores its sequence for hundreds of years. They are encoded commonly using AVRO by Kafka Clients
Streams and KTables
|Provides immutable Data||Provides mutable data|
|Supports only inserts for new data||Supports insert, update and delete|
- Both are concepts of Kafka’s processing layer, which work on “raw” data in topics.
- Both can be partitioned and so it the processing on it.
- For streams, processing raw events in a topic for a partition is simple.
- For tables, to do aggregates, each partition may be associated with a light database. By default its RocksDB. It is stored within the application instance consuming the topic.
- Sequence of encoded events
- Defines a number of settings like Compression and data retention.
- They can be partitioned across different brokers
- Machines that actually store and serve the data
- Has no idea about the content of the messages since its encoded.
Kafka Consumer group
- Bunch of consumers listening to a topic parallely.
- Can rebalance load between consumers within the group, when some of them fails or more are added.
- Operates within a Consumer group protocol
Kafka Stream Task
- Unit of processing parallelism
- One Stream Task is created for each partition in topic
- Stream task is distributed across app instances
- Provides data on global events.
- Cannot be partitioned.
- Streams are fault tolearant since data is already stored in Kafka.
- Its a bit special for tables, since they store data on the application instance. They do it by sending the changes to a new Kafka topic. So just like a REDO log in Postgres, they can rebuild the table using the logs in case of any failure.
- When scaling up or scaling down, kafka needs to transfer the data for Tables to new application using the changelog kafka topic.