Acceptance Tests — Kafka + Debezium Lab
Run copy/paste scripts on your machine to prove the lab stack is healthy: containers up, connector RUNNING, topics streaming events, and restarts staying idempotent.
What You’ll Run
- scripts/test_stack.sh — Zookeeper, Broker, Connect, PG are running & Connect REST is reachable
- scripts/test_connector.sh — Connector exists and is RUNNING
-
scripts/test_events.sh — Topic has data;
consumed messages parse as JSON and include
op∈ {c,u,d} - scripts/test_chaos_smoke.sh (optional) — Latest offsets increase after a controlled connector restart
Requirements: docker, docker compose,
curl, jq. Windows users: run under
WSL or
Git Bash.
Run Them (Mac/Linux + WSL)
chmod +x scripts/*.sh
bash scripts/test_stack.sh
bash scripts/test_connector.sh
bash scripts/test_events.sh
# optional:
bash scripts/test_chaos_smoke.sh
Expect green checks (✅). Any ❌ includes the failing step so you know what to fix.
Windows (PowerShell via WSL)
- Open “Ubuntu (WSL)”
-
cdinto the lab folder you mounted (/mnt/c/Users/you/lab) - Run the same
bashcommands as left
Native PowerShell can work if you have Git Bash; otherwise use WSL.
Manual Data Checks & Ops Tests
Quick sanity checks that aren’t scripted but catch the common footguns before you ship.
Duplicate primary keys (generic SQL)
-- replace table/pk
SELECT COUNT(*) AS rows, COUNT(DISTINCT pk) AS distinct_keys
FROM target_table; -- expect rows == distinct_keys
Latest-wins check (history vs target)
-- adapt names: business key + timestamp/version
WITH last AS (
SELECT key_col, MAX(op_ts) AS last_ts
FROM history_table
GROUP BY key_col
)
SELECT COUNT(*) AS stale
FROM target_table t
JOIN last l ON t.key_col = l.key_col
WHERE t.op_ts < l.last_ts; -- expect 0
Dead-letter queue sanity (Kafka)
# adjust topic; e.g. "dlq.inventory"
kafka-console-consumer --bootstrap-server localhost:9092 \
--topic dlq.inventory --from-beginning --timeout-ms 4000 \
--max-messages 10 | jq -C .
Connector status (Kafka Connect REST)
curl -s http://localhost:8083/connectors | jq
curl -s http://localhost:8083/connectors/<name>/status | jq
Consumer lag (sink group)
kafka-consumer-groups --bootstrap-server localhost:9092 \
--describe --group <your-sink-group>
Topic offsets (sum across partitions)
# inside the kafka container (or use EXTERNAL localhost:9092 from the host)
docker exec -it kafka kafka-run-class kafka.tools.GetOffsetShell \
--broker-list kafka:29092 --topic <topic> --time -1
Tip: run these after a controlled connector restart to confirm idempotency — no dupes, no stale rows, offsets advancing.
Customizing (if your names differ)
Override env vars inline:
CONNECT_URL=http://connect:8083 \
CONNECTOR_NAME=my-connector \
TOPIC=my.db.my_table \
bash scripts/test_events.sh
Defaults are CONNECT_URL=http://localhost:8083,
CONNECTOR_NAME=inventory-connector,
TOPIC=server1.public.app_customer.
What “Pass” Means
- Stack: All containers are up and Connect’s REST API responds
- Connector: The named source connector is RUNNING with at least one RUNNING task
-
Events: The Debezium topic has data and you can
parse JSON records with
op=c/u/d - Chaos Smoke: After a connector restart, the topic’s latest offset increases (someone is producing / the source didn’t wedge)
Troubleshooting Quickies
-
No messages consumed: Generate a change in
app.customer(insert/update/delete), then re-runtest_events.sh -
Connector not found: Register it (see lab step
“register the connector”), or set
CONNECTOR_NAME -
Timeouts: Ensure Docker is running;
docker compose psshould show 4 services up -
Permission denied: Run
chmod +x scripts/*.shonce