Beginner

Design the Event Envelope

Learn how CDC tools package change events, why before/after images matter, and how delivery guarantees shape your downstream processing.

Envelope readiness tracker

Roll up your readiness checklist progress before you dive into the deep-dive sections.

0 of 0 readiness checks complete (0%)

    What lives inside the envelope

    Change events are more than a JSON blob. Every field exists to keep data, ordering, and metadata aligned across services. Ground your consumers in the same vocabulary so investigations stay fast.

    {
      "op": "u",
      "ts_ms": 1712857612456,
      "source": {
        "db": "app",
        "table": "orders",
        "lsn": 368592330
      },
      "transaction": {
        "id": "4.25.901",
        "total_order": 442,
        "event_count": 2
      },
      "before": { "order_id": 42, "status": "processing" },
      "after": { "order_id": 42, "status": "shipped" },
      "key": { "order_id": 42 }
    }
    Essential envelope fields
    Field Purpose Questions it answers
    op / action Signals whether the change was an insert, update, or delete. Should I upsert or tombstone this record?
    ts_ms / source time The source commit timestamp or log position. Was this change applied before or after another event?
    Transaction identifiers Log sequence numbers, SCNs, or offsets for recovery. Where do I resume when reprocessing?
    Primary key The stable identifier for idempotent upserts. Which entity does this change belong to?
    Before/after payloads Snapshots of column values at either side of the change. How do I compute diffs or rollbacks?

    Why before/after images matter

    Many downstream jobs need both the new values and the previous state. Audit pipelines, SCD Type 2 tables, and cache invalidation logic all break without the before image.

    • Deduplication. With the prior primary-key hash you can detect replays without touching the target system.
    • Reversible operations. Rollbacks and compensation workflows replay the before payload to undo a change.
    • Selective projections. Consumers can skip heavy fields in the after image while still verifying that critical columns changed.

    Treat optional images as a contract. If you turn them off for throughput, publish the policy and add guards to your materialization layer.

    Delivery guarantees set your expectations

    CDC stacks normally promise at-least-once (ALO) delivery; exactly-once (EOS) is a design you layer on top with idempotency and transactional sinks.

    At-least-once

    • Replays or connector restarts may emit duplicate events.
    • Design consumers to be idempotent: upsert by key, avoid increment-only writes.
    • Lag metrics and DLQ hygiene keep ALO predictable.

    Exactly-once patterns

    • Persist offsets in the same transaction as your target writes.
    • Use sink-side dedupe keys (primary-key + log position).
    • Recover by rewinding to the last committed offset checkpoint.

    Ordering and compaction

    Per-key ordering is preserved within a partition, but cross-partition sequencing is best-effort. Design your envelope so downstream jobs can stitch history back together.

    1. Include a monotonically increasing per-key version or change number.
    2. Store soft deletes as explicit tombstones; compaction depends on it.
    3. Make idempotent materialization the default so out-of-order updates resolve deterministically.

    Schema contracts and validation

    • Version the payload schema and publish compatibility rules. Schema Registry, EventSchema, or protobuf descriptors make evolution visible.
    • Declare nullability explicitly. Optional before-images should use before: null, not a missing field, so consumers can detect configuration drift.
    • Emit consistent casing and data types. Normalize booleans and numbers so typed sinks (Snowflake, BigQuery) avoid coercion surprises.

    Build a contract test that fetches a production envelope, validates it against the schema, and asserts critical fields are non-null. Run it in CI before promoting connector changes.

    Connector field crosswalk

    Map envelope terminology across platforms
    Concept Debezium Fivetran Custom/Kafka Streams
    Operation op (`c`, `u`, `d`, `r`) op (`INSERT`, `UPDATE`, `DELETE`) Explicit `action` field or topic naming
    Source position source.lsn / source.ts_ms source_lsn / source_commit High-water mark stored beside offsets
    Transaction envelope transaction.id, total_order txn_id (when available) Custom headers (`x-tx-id`)
    Before image before before Previous state cached in state store
    Metadata source.db, schema, table source_table, source_schema Headers + payload wrapper object

    Align on vocabulary in your runbooks. Analysts should not have to learn a different event shape per connector.

    Implementation playbooks

    Use these checklists when reviewing connector configs or consumer code.

    Connector configuration review
    • Emit both before and after images for update operations.
    • Include metadata fields: transaction id, source host, schema.
    • Normalize column names and data types in the envelope schema.
    Consumer hardening
    • Guard against duplicate keys and out-of-order events.
    • Track the last processed offset per partition.
    • Alert on missing tombstones or schema drift.
    Schema migration rollout
    • Backfill new columns in the source before enabling them in CDC.
    • Verify downstream code tolerates nullable fields during rollout.
    • Advance schema version numbers atomically with deploys.

    Validate envelopes continuously

    1. Stream envelopes into a contract test that checks required fields and value ranges.
    2. Sample production events hourly and diff them against materialized tables.
    3. Alert if optional sections (before image, transaction id) disappear unexpectedly.

    These guardrails keep consumers from silently degrading when connector upgrades or config toggles change the envelope shape.

    Event envelope readiness scorecard

    Share this checklist before launch so producers, consumers, and platform teams align on envelope expectations and operating guardrails.

    0 of 4 ready (0%)

    Signals you are ready to ship
    Capability Ready when… If not, do this
    Versioned payload schema is published with required fields called out and nullability documented. Draft a schema README, add it to source control, and require schema diff reviews for connector merges.
    Before/after images, primary keys, and tombstones are explicitly enabled in configuration. Review connector settings with platform engineering and add automated config tests in CI.
    Downstream services have idempotent handlers and are storing envelope metadata they depend on. Host a contract walkthrough, annotate payload fields, and run a replay drill with each critical consumer.
    Envelope validation job runs on a schedule with alerting for missing fields or unexpected types. Add schema validation output to observability dashboards and alert on consecutive failures.

    Monitor drift and data contract guardrails

    Real-time monitors

    • Lag freshness checks for each topic partition and consumer group.
    • Contract-based anomaly detection on payload nullability and enum values.
    • Connector task health alerts tied to restart runbooks.

    Weekly governance loops

    • Review schema diff reports for new fields, renamed columns, or deleted metadata.
    • Spot check downstream warehouses for late-arrival reconciliation accuracy.
    • Capture breaking change proposals in an RFC log with owner sign-off.

    Tie monitors back to owners: every alert should have a responder rotation, a playbook link, and a communication channel to pause downstream consumers if contract drift is detected.

    Event Envelope Knowledge Check

    Test your understanding of CDC event structure and delivery guarantees.

    Q1

    What is the purpose of the 'before' and 'after' fields in a CDC event envelope?

    Q2

    What does 'at-least-once' (ALO) delivery guarantee mean in CDC?

    Q3

    What is a tombstone event in CDC?

    Q4

    Why is per-key ordering important in CDC streams?

    Q5

    What information does the 'source' metadata in a CDC event typically include?

    0/5 correct

    Further resources

    Progress 0% No progress yet
    Progress is stored locally in this browser.