Amazon Kinesis Data Streams - Kinesis vs Kafka#

Keywords: AWS, Amazon, Kinesis, Data Stream, Best Practice

AWS Kinesis 对标的系统是 Apache Kafka, 两者的共同点有:

  1. 都是 Publish / Subscription 模型. Kafka 有 Partition, Kinesis 有 Shard.

  2. 都是分布式系统.

两者的不同点:

  1. Kafka 性能更好, 不过自己配置起来出错概率更高. Kinesis 性能稍差一点, 但是运维无压力.

  2. Kafka 的持久化可以做到永久. Kinesis 不行, 最多 keep 365 天. 再久你就要自己存到 S3 做持久化了.

Comparison#

Apache Kafka

Amazon Kinesis

Developed/Hosted By

LinkedIn

Amazon

Software

Open-Source

Proprietary

SDK Support

Kafka SDK supports Java

AWS SDK supports Android, Java, Go, .NET

Configuration & Features

More control on configuration and better performance.

Number of days/shards can only be configured

Data Stored In

Kafka Partition

Kinesis Shard

Reliability

Replication factor can be configured

Kinesis writes synchronously to 3 different machines/data-centers

Performance

Kafka wins

Kinesis writes each message synchronously to 3 different machines

Configuration Store

Apache Zookeeper

Amazon DynamoDB

Setup

Weeks

Couple Of hours

Data Retention

Configurable

7 days at max

Log Compaction

Supported

Only can store logs for 7 days

Processing Events

More than 1000s of events/sec

Atmost 1000s of events/sec

Checkpointing

Offsets stored in special topic

DynamoDB

Ordering

Partion level

Shard level

Human Costs

Require human support for installing and managing their clusters, and also accounting for requirements such as high availability, durability, and recovery

Kinesis is just about pay and use

Producer Throughput

Kafka Wins

Kinesis is bit slower than Kafka

Incident Risk/Maintainence

More In Kafka

Amazon takes care

Ordered sequence of immutable data records

Kafka Topic

Kinesis Stream

Each record has a unique number called

Offset number

Sequence number

Concepts

Kafka Streams

Kinesis Analytics

参考资料: