Partitioning & Reconciliation
Design Kafka topics that preserve per-entity ordering, tame skew, and make late arrivals safe with watermarks and versioning.
The Golden Rule of Kafka Ordering
Kafka guarantees the order of messages within a single partition. It makes no ordering guarantees across different partitions. Therefore, to maintain the correct sequence of changes for any given entity (like a specific customer or a single product), all changes for that entity must be sent to the same partition. This is the fundamental purpose of a partition key.
Partition sizing (rule-of-thumb)
Start with a target throughput per partition your consumers can sustain (1–5k events/s).
-
Baseline:
partitions ≈ ceil(total_events_per_s ÷ target_per_partition) - Include skew headroom: multiply by 1.5–2× if distribution is Zipf-y or you have hot tenants.
- Changing partitions later: adding partitions does not rehash existing keys; plan key strategy up front or use topic splits.
How to Choose a Good Partition Key
The quality of your partition key directly impacts both the correctness and performance of your system.
Good Keys (Preserve Order & Distribute Load)
-
customer_id,order_id,product_id: The primary key of the business entity. This is the most common and effective choice for preserving per-entity order. -
tenant_id+customer_id: In a multi-tenant system, a composite key can help distribute load from a single large tenant (the "noisy neighbor" problem) across multiple partitions.
Bad Keys (Lead to Incorrect Order or Skew)
-
event_id(a UUID): This distributes load perfectly but completely destroys per-entity ordering, as subsequent updates for the same customer will land on different partitions and can be processed out of order. -
A low-cardinality field like
country_code: All events for a large country would be sent to a single partition, creating a massive hot spot (partition skew) and slowing down processing for that entire partition.
Mitigations for hot keys (keep per-entity order)
-
Tenant-scoped keys:
tenant_id || ':' || entity_pkpreserves order per entity and spreads large tenants. -
Key salting (shards per entity-family): For
families like “all events of a giant tenant,” append a
bounded salt:
tenant_id || ':' || (hash(entity_pk) % N). Order is preserved within the salted sub-stream; readers that need global per-tenant order should avoid salting. - Topic split by domain: Put ultra-hot tenants or tables on their own topic to isolate spikes.
Partitioning Simulator: The Impact of Key Choice
Max partition rate
Events/s on the hottest partition.
P95 partition rate
Load tail—capacity planning.
Estimated lag
Seconds to drain if max > capacity.
Ordering guarantee
Global order is not guaranteed (per-key only).
Tip: If one bar dominates, you have partition skew. Consider a better key (include a tenant ID or sharding suffix), or move hot tenants to their own topic/cluster.
Making Sinks Bulletproof: Version Guards & Watermarks
Even with perfect partitioning, network retries or broker failures can cause events to arrive out of order. To protect against late-arriving events and guarantee correctness, your target table needs two key columns:
-
id: The primary key of the entity. -
event_timestamp(or version number): A strictly increasing value from the source (a transaction ID or a precise timestamp) that indicates when the change occurred.
The consumer then uses a MERGE (or
INSERT...ON CONFLICT) statement with a crucial
WHERE clause:
only update the target row if the incoming event's timestamp is
newer than the timestamp already stored.
Watermark: if your source timestamps can arrive late, advance a per-table watermark (“safe up to T-Δ”) before finalizing aggregates. Late events ≤ Δ update rows; older ones are quarantined or appended to a corrections table.
Production-Ready Idempotent Sink Pattern (Postgres)
-- Assumes target table has a primary key `id` and a column `event_timestamp`
-- This MERGE statement is atomic and safe from race conditions.
MERGE INTO target_table T
USING (
SELECT
:entity_id AS id,
:payload AS payload,
:event_timestamp AS ts
) AS S
ON (T.id = S.id)
WHEN MATCHED AND T.event_timestamp < S.ts THEN
-- Only update if the incoming event is newer
UPDATE SET
payload = S.payload,
event_timestamp = S.ts
WHEN NOT MATCHED THEN
-- If the record doesn't exist, create it
INSERT (id, payload, event_timestamp)
VALUES (S.id, S.payload, S.ts);
Tie-break equal timestamps (sequence) — BigQuery example
MERGE `dw.target` T
USING (SELECT @id AS id, @ts AS ts, @seq AS seq, @payload AS p) S
ON T.id = S.id
WHEN MATCHED AND (T.ts < S.ts OR (T.ts = S.ts AND T.seq < S.seq)) THEN
UPDATE SET payload = S.p, ts = S.ts, seq = S.seq
WHEN NOT MATCHED THEN
INSERT (id, ts, seq, payload) VALUES (S.id, S.ts, S.seq, S.p);
Version guards + watermarks make consumers idempotent and resilient to duplicates and late arrivals.
Audit loops (detect & repair)
- Windowed row-counts: compare source vs sink counts over event-time windows ( -minute buckets).
-
Windowed checksums: compute hash aggregates (
SHA1of concatenated key fields) per window and diff. - Automatic rewind: on mismatch, rewind the consumer for the affected keys/windows and re-process (idempotent sinks make this safe).
- Quarantine queue: route irreconcilable records to a DLQ with enough context to replay or manually fix.
Partitioning Knowledge Check
Test your understanding of partition key selection and ordering guarantees.
Why is choosing the right partition key critical in CDC?
The partition key determines which partition an event is written to. Since Kafka only guarantees ordering within a partition, using the entity's primary key (e.g., order_id, user_id) as the partition key ensures all changes for that entity are ordered correctly, preventing race conditions in downstream consumers.
Review the correct answer and explanation.
What is partition skew, and why is it a problem?
Partition skew occurs when data is unevenly distributed across partitions, often due to poor key selection (e.g., using a tenant ID when one tenant dominates traffic). This creates hotspots—overloading some partitions/brokers while others sit idle—leading to reduced throughput, increased latency, and consumer lag.
Review the correct answer and explanation.
What is a 'late-arriving' event in CDC?
Late-arriving events occur when network delays, retries, multi-partition reads, or replays cause an older event to arrive after a newer one. Without proper handling (timestamps, watermarks, or versioning), consumers might apply changes out of order, corrupting state (e.g., overwriting new data with old).
Review the correct answer and explanation.
How can you mitigate partition skew caused by a single hot key?
If a single entity generates massive traffic (e.g., a celebrity user_id), you can use a composite key that adds entropy (user_id + session_id) or implement key salting (appending a hash suffix). This spreads load across partitions while maintaining ordering within the composite key or requiring consumers to reassemble order.
Review the correct answer and explanation.
What is an audit loop in the context of CDC partitioning?
An audit loop (or reconciliation job) compares the source database with the downstream sink to detect discrepancies—missing events, late arrivals, or out-of-order updates. This provides a safety net, catching issues that might slip through the streaming pipeline due to partitioning challenges or failures.
Review the correct answer and explanation.