Multi-Tenancy: Cost vs Isolation
Model isolation levels, topic strategies, and egress math for shared CDC platforms. See how per-tenant choices affect blast radius, spend, and operations.
Primer: Why multi-tenancy matters for CDC
Sharing infrastructure saves money but couples tenants together. More isolation (per-tenant topics or clusters) reduces blast radius and gives clearer SLAs, but increases cost and operational load.
- Isolation ladder: shared topics → per-tenant topics → per-tenant clusters.
-
Egress (external) ≈
tenants × change_rate × (payload + envelope + overhead). Broker egress adds replication factor:intra_broker ≈ external × (RF − 1). -
Storage at retention ≈
egress_bytes_per_s × (86400 × days) ÷ compression. -
Operational heat: total partitions =
topics × partitions_per_topic(impacts rebalances, ISR, & quotas). - RBAC/retention easier per-tenant topics; cluster isolation for regulators/VIPs.
- Guardrails: per-tenant produce/consume quotas, DLQ policy, retention/compaction per topic, and ACLs scoped by tenant namespace.
Rule-of-thumb cutovers
- ↑ tenants & low compliance ⇒ shared topics until ops pain shows.
- ↑ compliance / RBAC / per-tenant SLAs ⇒ per-tenant topics.
- Regulatory isolation / VIPs / heavy tenants ⇒ per-tenant clusters.
Use the controls to explore topic counts, consumer groups, egress, and agent footprint.
1. Adjust Your Scenario
Controls
2. See the Impact
Total Topics
Shared: fixed; per-tenant:
tenants × topics_per_tenant.
Consumer Groups
≈ tenants × GROUPS_PER_TENANT.
Egress (MB/s)
Decimal MB (bytes ÷ 1e6).
Broker Egress (MB/s)
≈ external × (RF − 1).
Egress (GB/month)
60×60×24×30 seconds.
Total Partitions
topics × partitions_per_topic.
Storage @ Retention (TB)
egress × days ÷ compression (RF included).
Connector Footprint
Per-cluster ≈ 1:1; others scale with tenants.
This index is a first-order illustration: egress (MB/s) + connector footprint (+ topic management). Tune constants to your platform.
How we estimate
-
egress_bytes_per_s=tenants × change_rate × (payload + envelope + overhead) -
broker_egress_bytes_per_s≈egress_bytes_per_s × (RF − 1) -
consumer_groups≈tenants × GROUPS_PER_TENANT -
Topics:
- Shared:
shared_topics -
Per-tenant topics:
tenants × topics_per_tenant -
Per-tenant clusters: total topics =
tenants × topics_per_tenant
- Shared:
-
total_partitions=topics × partitions_per_topic -
storage_bytes_at_retention≈(egress_bytes_per_s × RF × 86400 × days) ÷ compression
Assumptions & levers
-
Consumer groups factor
GROUPS_PER_TENANT(default 2). - Connector footprint: shared/per-topic ≈ tenants × 0.2; per-cluster ≈ 1:1.
- Envelope defaults for JSON/Avro headers (200–400B typical) + small per-message overhead.
- If topics are compact, long-term storage is dominated by latest keys; adjust retention math accordingly.
-
Per-tenant quotas should scale with
change_rateand protect shared clusters from spikes. - RBAC/retention overhead grows with topic count, not egress.
Decision helper (rule-of-thumb thresholds)
- Shared topics until: topics ≤ ~50, low compliance, dup-tolerant consumers.
- Per-tenant topics when: per-tenant RBAC/retention, > ~50 topics total, noisy neighbors.
- Per-tenant clusters when: regulated isolation, VIPs, or tenant egress ≥ ~10% of fleet.
Heuristics; validate with SRE/Compliance.
FAQ — multi-tenant CDC
Do per-tenant topics improve security?
Yes—RBAC is simpler and audits map to tenants, but cost/metadata overhead rises. If regulators require isolation, consider cluster boundaries.
Will per-tenant clusters always cost more?
Typically yes (1:1 footprint), but noisy tenants stop taxing others and maintenance windows become tenant-scoped.
Should I use compaction or time retention for per-tenant topics?
Compaction keeps latest value per key (great for upsert sinks); time retention keeps history (needed for replays and backfills). Many teams use compacted + short retention together.
How do I fence noisy tenants?
Apply per-tenant quotas (produce/consume), isolate DLQs per tenant, and consider per-tenant topics or clusters when a tenant’s egress approaches ~10% of fleet.
Multi-Tenancy Knowledge Check
Test your understanding of tenant isolation strategies and operational trade-offs.
What is the key trade-off when choosing between shared and dedicated topics per tenant in a CDC platform?
Shared topics (with tenant IDs in the key/payload) reduce the number of topics to manage, simplifying operations. However, a misbehaving tenant can affect others. Dedicated topics per tenant provide strong isolation—failures are contained—but multiply operational overhead (monitoring, configuration, partition management).
Review the correct answer and explanation.
What does 'blast radius' mean in the context of multi-tenant CDC systems?
Blast radius refers to how many tenants are impacted by a single failure or resource contention. In shared infrastructure (shared topics, shared connectors), one tenant's spike or error can degrade service for all. Isolation strategies (dedicated topics, quotas, separate clusters) reduce blast radius.
Review the correct answer and explanation.
Why might you use topic prefixes or namespaces for multi-tenant CDC?
Topic prefixes/namespaces (e.g., tenant-123.orders, tenant-456.orders) help organize topics by tenant, making it easier to apply ACLs, monitor per-tenant metrics, identify ownership, and automate operations. This organizational structure supports scalable multi-tenancy without mixing tenant data.
Review the correct answer and explanation.
What is a common challenge when estimating egress costs in a multi-tenant CDC platform?
In multi-tenant CDC, tenants have different workloads: some produce high-volume changes, others low. They may stream to different clouds or regions, affecting egress rates. Accurately estimating per-tenant egress costs is crucial for billing, capacity planning, and avoiding surprises, especially when crossing cloud boundaries.
Review the correct answer and explanation.
What role do Kafka quotas play in multi-tenant CDC platforms?
Kafka supports producer and consumer quotas (throughput and request rate limits) per client ID or user. In multi-tenant CDC, quotas prevent a single tenant from consuming all broker resources (network, disk I/O, CPU), ensuring fair resource allocation and protecting the platform from abuse or runaway workloads.
Review the correct answer and explanation.