Latency-Neutral Batching: Processing Bursts Without Penalizing The First Event

A pattern for collapsing webhook bursts while preserving immediate response for human-initiated events.

In distributed systems, there is a persistent tension between throughput (batch things for efficiency) and latency (respond immediately for humans). Most batching patterns sacrifice the latter for the former. This article describes a pattern that doesn’t.

We call it Latency-Neutral Batching: a design where burst traffic collapses into fewer operations, but the first event in any burst experiences zero artificial delay.

I. The Problem: Two Traffic Patterns, One System

Webhook-driven systems receive traffic in two distinct modes:

Mode	Example	Expectation
Single events	User saves an invoice at POS	Immediate processing
Bursts	CSV import of 50 invoices	Efficient batching

The constraint: a human at a terminal cannot wait 500ms for a debounce timer. The optimization: 50 webhooks should not spawn 50 jobs.

Standard batching patterns force a choice:

Debounce: Wait N ms after the last event → delays everything
Fixed window: Collect for N ms, then process → delays the first event
Queue depth: Batch if queue > 1 → fails when consumers are fast

None of these preserve the invariant: first event, zero delay.

II. The Core Principle: Time vs. Execution State

The insight that unlocks the solution:

Execution state tells you what the system is doing now. Time tells you what to expect next.

Queue depth, worker availability, and job counts are execution signals. They reflect current state, not upstream intent. When consumers process faster than events arrive, execution signals collapse to zero between events—even mid-burst.

Time does not collapse. A 1-second window remains 1 second regardless of processing speed. Therefore:

Burst detection must be time-based, not execution-based.

III. The Architecture: Temporal Lease + Single-Flight Runner

A. The Temporal Lease

On each webhook, set a short-lived Redis key:

SET sync:dirty:{tenant} "1" PX 1200

This key answers one question: Has this tenant been active in the last 1.2 seconds?

Properties:

Refreshed on each incoming event
Independent of job execution
Decays naturally when traffic stops

B. The Single-Flight Runner

One job per tenant, enforced via idempotent job ID:

queue.add('tenant-sync-runner', payload, {
  jobId: `sync-runner_${tenantId}`
});

Properties:

At most one runner active per tenant
Subsequent enqueues are no-ops
Runner owns exclusive execution for the tenant

C. The Execution Loop

The runner operates in an iterative loop:

1. Fetch all work since cursor
2. Process the batch
3. Check if temporal lease still exists
   → Yes: Sleep 200ms, return to step 1
   → No:  Traffic stopped, finalize and exit

IV. Why This Preserves Zero Latency

First event in a burst:

Sets dirty signal
Enqueues runner (no existing job)
Runner starts immediately
Processing begins with no artificial delay

Subsequent events:

Refresh dirty signal (extend TTL)
Runner enqueue is no-op (job already exists)
Runner’s loop absorbs new work in next iteration

The first event never waits. Batching emerges on the tail, not the head.

V. Contrast With Common Patterns

Pattern	Mechanism	First Event Delay
Debounce	Wait N ms after last event	+N ms
Fixed window	Collect for N ms	+0 to +N ms
Queue depth	Check if queue > 1	+0 ms (but fails under fast drain)
Latency-Neutral	Start immediately, linger	+0 ms

VI. The Generalized Pattern

This approach applies wherever you have:

Bursty ingress: Webhooks, polling, event streams
Mixed latency requirements: Some human-initiated, some bulk
Tenant isolation: Work must not cross boundaries
Acceptable eventual consistency: Final state matters, not per-event confirmation

The core abstractions:

Abstraction	Responsibility
Temporal Lease	Time-bounded burst detection
Single-Flight Runner	At most one executor per tenant
Iterative Sweep	Loop until temporal lease expires

VII. Invariants

These properties must hold:

At most one runner per tenant — enforced by idempotent job ID
First event is never delayed — runner starts immediately
No event is dropped — cursor-based sweep catches all changes
Convergence is deterministic — runner exits when lease expires
Ingestion never blocks — webhook handler is async

VIII. When To Use This Pattern

Use latency-neutral batching when:

User-facing latency matters (POS, invoicing, payments)
Upstream sends unbounded bursts (imports, reconnects, replays)
Downstream has rate limits or per-call overhead
Simple debouncing would degrade UX

Do not use when:

All events can tolerate uniform delay
Strict ordering per-event is required
The system is purely background/async

By the TaxBridge Engineering Team