SYSTEM DESIGN #02 · INTERVIEW GUIDE
Ad Impressions & Clicks Ingestion Pipeline
Ad platforms charge you the moment a user sees or clicks your creative — so losing even a single impression event is the same as throwing money away. At 500,000 events per second across web pixels, mobile SDKs, and server-to-server APIs, the ingestion layer is where budget accuracy either lives or dies. This system covers every layer of that pipeline: edge collection on Cloudflare Workers that responds in under 50ms, Kafka with 512 partitions absorbing the full traffic burst, Flink enrichment jobs that join campaign metadata in real time, and a fraud filter that kills bot traffic before it ever touches your OLAP store. You will leave knowing how to build a zero-data-loss ingestion system that scales horizontally on Black Friday without a single on-call page.
KafkaApache FlinkS3 IcebergCloudflare WorkersLightGBM
💡
The Gist — What Problem Are We Solving?
Recording every single ad view and click at massive scale
Every time an ad is shown or clicked anywhere in the world, this pipeline captures that event within milliseconds, checks it is real (not a bot), deduplicates it, joins clicks to their impressions, and stores it permanently — handling the equivalent of recording every heartbeat of 5 billion people every day.
💬Think of it as a high-speed conveyor belt that catches every ad event, inspects it, removes fakes, and files it perfectly — never dropping a single one.
These are the capabilities the system must deliver — what users and operators can actually do with it.
📥
Event Collection
📥Ingest impressions and clicks from web SDK, mobile SDK, server-to-server
🔁
Deduplication
🔁Deduplicate events within a 24-hour window; idempotent processing
🔗
Click-Impression Join
🔗Join each click back to its originating impression within 30-minute window
🛡️
Fraud Detection
🛡️Filter bot traffic: IP blacklist, user-agent check, click rate anomaly
💾
Storage
💾S3/Iceberg for raw events; ClickHouse for aggregated reporting
⚡
Non-Functional Requirements
These define how well the system must perform — the quality attributes that separate a toy from a production system.
⚡ Peak Throughput
⚡500K events/sec; handles 10× burst without data loss
⏱️ End-to-End Latency
⏱️<60s from event fire to queryable in ClickHouse
🛡️ Durability
🛡️99.999%; RF=3 Kafka; exactly-once Flink checkpoints to S3
📅 Late Event Grace
📅Accept events up to 24h late from offline mobile SDKs
📐 Schema
📐Avro + Schema Registry; backward-compatible evolution only
📊
Key Metrics — The Numbers That Define This System
The headline numbers to know cold — and be ready to explain how each one is achieved.
🏗️
System Architecture Diagram
Full data flow from source to serving. Each layer scales independently.
Ingestion
→
Edge Collector
Cloudflare Workers
→
→
Flink
dedup + join + fraud
↓
Processing
S3 Iceberg
raw) + ClickHouse
🗺️
End-to-End User Journey
Trace a single request end-to-end — the story interviewers want you to tell fluently.
1
Ad renders in browser
— JS SDK fires beacon POST; edge collector responds in <50ms; Bloom filter dedup applied
2
Edge validates and forwards
— Rate-limit per IP; forwards to Kafka partition keyed by ad_id
3
Flink processes event
— Enriches with campaign metadata; click-impression join in 30-min watermark window; fraud score applied
4
Fraud filter
— Events with fraud_score > 0.8 routed to quarantine; legitimate events to main topic
🔭
High-Level Design — Component Breakdown
Core components — each with a single, well-defined responsibility. The key architectural insight: each layer scales independently, and failure in one component is isolated from the rest.
1 — SDKs/S2S
Collects raw events from browser, iOS, and Android surfaces. Batches events every 500ms or 50 events and sends via sendBeacon() for unload-safe delivery. Generates client-side ULIDs as event IDs for deduplication.
2 — Edge Layer
Cloudflare Worker running at 250+ PoPs worldwide. Validates schema, stamps server_received_at, applies IP-based pre-bid IVT filter, and proxies to Kafka REST proxy in <5ms CPU time.
3 — Kafka
Distributed event bus with RF=3 for durability. Partitioned by user_id_hash for per-user ordering. LZ4 compression reduces storage cost by 60%. Exactly-once semantics via idempotent producers and transactional consumers.
4 — Flink
Stateful stream processor with 30-second checkpointing to S3. KeyedProcessFunction provides per-key state (Union-Find, EWMA baseline, experiment assignment). Exactly-once processing via two-phase commit sink.
5 — S3 Iceberg
Object storage for raw events (Iceberg/Parquet), media assets, ML model artefacts, and snapshots. Lifecycle rules tier data to Glacier after 90 days. Versioning disabled on ephemeral buckets (snaps, stories) to ensure hard deletes.
🔬
Low-Level Design — Deep Dives
Deep dives worth explaining in detail in any senior engineering interview. For each: know the data structure, the algorithm, the why, and the trade-off you made.
1 — Edge Collector (Cloudflare Worker)
<50ms · sendBeacon
Cloudflare Worker runs at 250+ PoPs worldwide. Validates event schema (JSON Schema v7), stamps arrival metadata (cf-ipcountry, cf-ray for geo-enrichment), generates event_id if absent (ULID), and proxies batch to Kafka REST proxy via authenticated HTTPS. Worker executes in <5ms CPU time; total latency from client to Kafka: <50ms globally.
export default {
async fetch(req) {
const events = await req.json();
await kafka.produce(‘raw_events’, events);
return new Response(null, { status: 204 });
}
}
2 — Flink Enrichment Job
Campaign Metadata Join
Flink DataStream reads from Kafka raw_events topic. Async I/O operator joins each event against campaign metadata from Redis (sub-1ms lookup). Deduplication: stateful RichFlatMapFunction checks Redis SET event_id (SETNX, 24h TTL). Enriched events written to Kafka enriched_events topic. Job runs with exactly-once semantics via Flink checkpointing to S3.
stream
.keyBy(e -> e.event_id)
.process(new DedupeFunction())
.map(new AsyncCampaignEnricher(redis))
.sinkTo(KafkaSink.builder()
.setTopic(“enriched_events”)
.build())
3 — Fraud Detection Filter
ML + Rule Engine
Two-stage IVT filter on the enrichment pipeline: (1) Rule engine checks IP against IAB blocklist (Redis Bloom filter, 5M entries, 1.2GB), UA pattern match for known bots. (2) Flink ML scorer evaluates 12 features (click-to-impression ratio, geographic velocity, session length) using a pre-loaded LightGBM model (PMML format). Events with IVT score >0.8 routed to quarantine topic for review; >0.95 hard-dropped.
class IVTScorer(MapFunction):
model = LightGBM.load(‘ivt_model.pmml’)
def map(self, event):
score = self.model.predict(event.features)
event.ivt_score = score
return event
4 — S3 Iceberg Raw Store
Parquet · Time-partitioned
All events written to S3 in Apache Iceberg format, partitioned by event_date / hour. Iceberg provides ACID writes (concurrent producers without conflicts), schema evolution, and time-travel queries (replay any hour for debugging). Compaction job runs every 6 hours, merging small Flink checkpoint files into 512MB Parquet files optimised for Athena/Spark scans. Retention: 90 days hot in S3-Standard, 365 days in S3-Glacier.
iceberg_sink = IcebergSink.forRowType(schema)
.tableLoader(TableLoader.fromHadoop(catalog))
.upsertMode(false)
.build();
stream.sinkTo(iceberg_sink);
⚖️
Trade-offs & Decision Log
Every senior interview comes down to these decisions. Know the exact trade-off, the reasoning, and the specific numbers that justify each choice.
⚖️ Fire-and-Forget vs Synchronous Ingestion
✓
Fire-and-Forget (async) ✅ Chosen
- 204 returned in <50ms — never blocks the user's page load
- Back-pressure handled in Kafka, not in the browser
- SDK can retry independently if edge collector is slow
- Slightly higher risk of loss if device goes offline immediately after fire
→
Synchronous (wait for ack)
- Guarantees delivery before page continues
- Adds 200-500ms to every page interaction
- Unacceptable latency for impression events
- Not scalable at 500K events/sec
💡Decision: Fire-and-forget with sendBeacon(); S2S API used for high-value conversion events that need delivery guarantees
⚖️ Kafka vs Kinesis for Event Bus
✓
Kafka (self-managed) ✅ Chosen
- 512 partitions gives unlimited horizontal scale
- LZ4 compression reduces storage cost by ~60%
- RF=3 across AZs gives 99.999% durability
- Operational overhead of managing brokers
→
AWS Kinesis (managed)
- Zero operational overhead — fully managed
- Limited to 200 shards per stream without resharding
- Higher per-shard cost at >1K shards
- 7-day retention max (Kafka configurable)
💡Decision: Kafka for scale and cost at 500K/sec; Kinesis viable for <50K events/sec teams without Kafka expertise
🎯Interview Questions — Answered
The exact questions interviewers ask — with production-grade answers
Q1
How do you prevent duplicate events from being counted twice?
Two-layer deduplication: (1) Bloom filter at the edge collector — 1-billion-entry filter with <1% false positive rate, using ~1.2GB RAM. Rejects known-seen event_ids in <1ms. (2) Redis SETNX on event_id with 24h TTL — exact deduplication for the ~1% of events that pass the Bloom filter. Together, duplicates reaching Kafka are <0.001% of total volume. Event IDs are generated client-side as ULIDs (sortable UUIDs) — sortability improves Kafka partition locality.
Q2
How do you handle ad-blocker traffic that never reaches the JS pixel?
Server-side event collection (S2S API) is the primary fallback. When a user with an ad blocker converts, the advertiser’s server sends a conversion event directly to the edge collector via their backend — bypassing browser restrictions entirely. This is the Meta CAPI / Google Enhanced Conversions model. Typically 20-40% of conversions on high-blocker-rate sites (tech blogs, dev tools) are captured only via S2S. The S2S events are tagged with a ‘server’ signal_type field and weighted equally to browser events in attribution.
Q3
Why Kafka with 512 partitions rather than fewer?
Partition count determines maximum parallelism for both producers and consumers. At 500K events/sec, with each partition handling ~1K events/sec comfortably, 512 partitions provides 2× headroom above current peak. Partitioning by user_id_hash % 512 ensures all events for the same user land in the same partition — maintaining ordering per user which is critical for attribution window reconstruction. Kafka partition count is hard to change after creation (requires manual partition reassignment), so provisioning 2× headroom upfront avoids a painful rebalancing operation under load.
System Design Series · Every Tuesday & Thursday
Level up your system design interviews
Each post covers Gist, Functional & Non-Functional Requirements, Key Metrics, System Diagram, User Journey, HLD, LLD, and Trade-offs & FAQs.
Subscribe to never miss a post →
Previous Articles
Categories: System Design
Tags: ad impressions, adtech, clicks ingestion, data pipeline, fraud detection, interview prep, kafka, system design
Leave a Reply