Advanced

Partitioning & Reconciliation

Design Kafka topics that preserve per-entity ordering, tame skew, and make late arrivals safe with watermarks and versioning.

Respect Ordering Pick Partition Keys

The Golden Rule of Kafka Ordering

Kafka guarantees the order of messages within a single partition. It makes no ordering guarantees across different partitions. Therefore, to maintain the correct sequence of changes for any given entity (like a specific customer or a single product), all changes for that entity must be sent to the same partition. This is the fundamental purpose of a partition key.

Partition sizing (rule-of-thumb)

Start with a target throughput per partition your consumers can sustain (1–5k events/s).

Baseline: partitions ≈ ceil(total_events_per_s ÷ target_per_partition)
Include skew headroom: multiply by 1.5–2× if distribution is Zipf-y or you have hot tenants.
Changing partitions later: adding partitions does not rehash existing keys; plan key strategy up front or use topic splits.

How to Choose a Good Partition Key

The quality of your partition key directly impacts both the correctness and performance of your system.

Good Keys (Preserve Order & Distribute Load)

customer_id, order_id, product_id: The primary key of the business entity. This is the most common and effective choice for preserving per-entity order.
tenant_id + customer_id: In a multi-tenant system, a composite key can help distribute load from a single large tenant (the "noisy neighbor" problem) across multiple partitions.

Bad Keys (Lead to Incorrect Order or Skew)

event_id (a UUID): This distributes load perfectly but completely destroys per-entity ordering, as subsequent updates for the same customer will land on different partitions and can be processed out of order.
A low-cardinality field like country_code: All events for a large country would be sent to a single partition, creating a massive hot spot (partition skew) and slowing down processing for that entire partition.

Mitigations for hot keys (keep per-entity order)

Tenant-scoped keys: tenant_id || ':' || entity_pk preserves order per entity and spreads large tenants.
Key salting (shards per entity-family): For families like “all events of a giant tenant,” append a bounded salt: tenant_id || ':' || (hash(entity_pk) % N). Order is preserved within the salted sub-stream; readers that need global per-tenant order should avoid salting.
Topic split by domain: Put ultra-hot tenants or tables on their own topic to isolate spikes.

Partitioning Simulator: The Impact of Key Choice

Controls

Partitions 6 Distinct keys (entities) 1000 Events / sec (total) 5000 Key distribution Skew stresses single partitions. If “Hot”: % of traffic to 1 key 40% Consumer capacity per partition (events/s) 2000 Shards per hot key (salting) 1

Max partition rate

—

Events/s on the hottest partition.

P95 partition rate

—

Load tail—capacity planning.

Estimated lag

—

Seconds to drain if max > capacity.

Ordering guarantee

Per-key

Global order is not guaranteed (per-key only).

Partition load (events/s)

Tip: If one bar dominates, you have partition skew. Consider a better key (include a tenant ID or sharding suffix), or move hot tenants to their own topic/cluster.

Making Sinks Bulletproof: Version Guards & Watermarks

Even with perfect partitioning, network retries or broker failures can cause events to arrive out of order. To protect against late-arriving events and guarantee correctness, your target table needs two key columns:

id: The primary key of the entity.
event_timestamp (or version number): A strictly increasing value from the source (a transaction ID or a precise timestamp) that indicates when the change occurred.

The consumer then uses a MERGE (or INSERT...ON CONFLICT) statement with a crucial WHERE clause: only update the target row if the incoming event's timestamp is newer than the timestamp already stored.

Watermark: if your source timestamps can arrive late, advance a per-table watermark (“safe up to T-Δ”) before finalizing aggregates. Late events ≤ Δ update rows; older ones are quarantined or appended to a corrections table.

Production-Ready Idempotent Sink Pattern (Postgres)

-- Assumes target table has a primary key `id` and a column `event_timestamp`
-- This MERGE statement is atomic and safe from race conditions.

MERGE INTO target_table T
USING (
  SELECT
    :entity_id AS id,
    :payload AS payload,
    :event_timestamp AS ts
) AS S
ON (T.id = S.id)
WHEN MATCHED AND T.event_timestamp < S.ts THEN
  -- Only update if the incoming event is newer
  UPDATE SET
    payload = S.payload,
    event_timestamp = S.ts
WHEN NOT MATCHED THEN
  -- If the record doesn't exist, create it
  INSERT (id, payload, event_timestamp)
  VALUES (S.id, S.payload, S.ts);

Tie-break equal timestamps (sequence) — BigQuery example

MERGE `dw.target` T
USING (SELECT @id AS id, @ts AS ts, @seq AS seq, @payload AS p) S
ON T.id = S.id
WHEN MATCHED AND (T.ts < S.ts OR (T.ts = S.ts AND T.seq < S.seq)) THEN
  UPDATE SET payload = S.p, ts = S.ts, seq = S.seq
WHEN NOT MATCHED THEN
  INSERT (id, ts, seq, payload) VALUES (S.id, S.ts, S.seq, S.p);

Version guards + watermarks make consumers idempotent and resilient to duplicates and late arrivals.

Audit loops (detect & repair)

Windowed row-counts: compare source vs sink counts over event-time windows ( -minute buckets).
Windowed checksums: compute hash aggregates (SHA1 of concatenated key fields) per window and diff.
Automatic rewind: on mismatch, rewind the consumer for the affected keys/windows and re-process (idempotent sinks make this safe).
Quarantine queue: route irreconcilable records to a DLQ with enough context to replay or manually fix.

Progress 0% No progress yet

Progress is stored locally in this browser.

Partitioning & Reconciliation

The Golden Rule of Kafka Ordering

Partition sizing (rule-of-thumb)

How to Choose a Good Partition Key

Good Keys (Preserve Order & Distribute Load)

Bad Keys (Lead to Incorrect Order or Skew)

Mitigations for hot keys (keep per-entity order)

Partitioning Simulator: The Impact of Key Choice

Max partition rate

P95 partition rate

Estimated lag

Ordering guarantee

Making Sinks Bulletproof: Version Guards & Watermarks

Production-Ready Idempotent Sink Pattern (Postgres)

Tie-break equal timestamps (sequence) — BigQuery example

Audit loops (detect & repair)

Partitioning Knowledge Check

Why is choosing the right partition key critical in CDC?

What is partition skew, and why is it a problem?

What is a 'late-arriving' event in CDC?

How can you mitigate partition skew caused by a single hot key?

What is an audit loop in the context of CDC partitioning?

Let’s Talk CDC Interactive Dashboard