How-to: Handle ingestion failures
When sending usage events to Zenskar via API, ingestion can fail silently if your system doesn't handle errors correctly. This document explains how failures happen, how to detect them, and how to build a reliable pipeline that prevents data loss.
Core concepts
How ingestion can fail
Ingestion failures fall into two categories with very different implications.
Rejected by Zenskar (non-retriable)
Zenskar received the event but rejected it because the payload was invalid. The API responds with an HTTP 4xx status code and a descriptive error message. If your system doesn't inspect that response, the event is silently dropped.
Never reached Zenskar (retriable)
The event never arrived due to a network outage on your side, an intermediate routing failure, or a Zenskar service disruption. Because the event never arrived, Zenskar has no record of it and cannot alert you. If you don't retry or persist the event locally, it is lost permanently.
What is a dead letter queue (DLQ)?
A dead letter queue (DLQ) is a holding area for events that failed to ingest, regardless of the reason. Instead of discarding a failed event, your system routes it to the DLQ so it can be retried, inspected, or manually replayed later. A DLQ is the primary mechanism for guaranteeing no usage data is lost.
sequenceDiagram
participant S as Your system
participant Z as Zenskar API
participant D as DLQ
S->>Z: POST /ingest (event payload)
alt 200 OK
Z-->>S: 200 OK
Note over S: Event ingested successfully
else 4xx validation error
Z-->>S: 4xx + error message
S->>D: Write event + error reason (non-retriable)
Note over D: Awaits manual inspection
D-->>S: Corrected payload
S->>Z: POST /ingest (corrected payload)
Z-->>S: 200 OK
else 5xx server error
Z-->>S: 5xx
loop Retry with exponential backoff
S->>Z: POST /ingest (same payload)
Z-->>S: 5xx
end
S->>D: Write event + error reason (retriable, retries exhausted)
else No response (network error)
S-xZ: POST /ingest (no response)
loop Retry with exponential backoff
S->>Z: POST /ingest (same payload)
S-xZ: No response
end
S->>D: Write event + error reason (retriable, retries exhausted)
end
Retriable vs. non-retriable failures
| Failure type | Cause | Should you retry? |
|---|---|---|
4xx validation error | Malformed or invalid payload | No: Fix the payload first, then retry. |
| Network / connectivity error | No response received | Yes: Retry with backoff |
5xx server error | Zenskar-side issue | Yes: Retry with backoff |
Important: Retrying a
4xxerror without fixing the payload will always fail again. Route these events to the DLQ for inspection and correction before re-sending.
Quickstart guide
This walkthrough shows you how to send a usage event with basic error handling that routes failures to a DLQ. It assumes you are calling the Zenskar ingestion API directly over HTTP.
Step 1: Send the event
Send a POST request with your event payload. A valid payload looks like this:
[
{
"data": {
"campaign_id": "sample_campaign_id_8",
"impressions": 74
},
"timestamp": "2025-06-28 23:44:47",
"customer_id": "c03"
}
]Step 2: Inspect the response
Always read the HTTP status code and response body. Do not assume success if you receive any response.
200: Event accepted. No further action needed.4xx: Event rejected. Read the error message, fix the payload, then retry. Do not retry the original payload.5xxor no response: Delivery failed. Retry with exponential backoff (see Step 4).
Step 3: Route failures to a DLQ
If the event cannot be delivered (network error or 5xx) or was rejected due to a validation error (4xx), write it to your DLQ immediately. Include the original payload, the error reason, a timestamp, and the failure type (retriable vs. non-retriable) so you can process them correctly later.
Step 4: Retry retriable failures with backoff
For network errors and 5xx responses, retry using exponential backoff with jitter to avoid thundering-herd problems. A reasonable starting point:
- Initial delay: 1 second
- Multiplier: 2×
- Maximum delay: 60 seconds
- Maximum attempts: 5
After exhausting retries, move the event to the DLQ rather than discarding it.
Step 5: Drain the DLQ
Periodically process events in the DLQ. For non-retriable (4xx) failures, inspect the error message, correct the payload, and re-send. For retriable failures that were exhausted, re-attempt delivery.
How-to guides
Choose a DLQ implementation
The right approach depends on your event volume and existing infrastructure.
| Method | Description | Best for |
|---|---|---|
| File-based logging | Write failed events to a local file | Low volume, simple setups, local development |
| Database table | Store failed events in a dedicated table for review and manual replay | Moderate volume, teams that want SQL-queryable failure logs |
| Message queue (e.g. Kafka, RabbitMQ) | Publish failed events to a dedicated DLQ topic or queue | High volume, existing queue infrastructure |
| Cloud-managed DLQ (e.g. AWS SQS DLQ) | Use a managed queue with built-in retry and failure handling | Cloud-native stacks, teams that prefer managed infrastructure |
Make events idempotent
Before retrying, ensure your events carry a stable unique identifier (e.g. a UUID tied to the originating action). Submit this as part of the payload so that if a retry delivers a duplicate, Zenskar can deduplicate it on ingestion. This prevents double-counting usage when a network failure causes an event to be delivered more than once.
Validate payloads before sending
Run basic schema validation on your side before calling the API. Check that:
- All required keys are present (
data,timestamp,customer_id) - All values match the expected types (see the Reference section below)
- No unexpected keys are included in the
dataobject - The timestamp is in the correct format (
YYYY-MM-DD HH:MM:SS) - The total payload size is under 1 MB
Catching these errors locally avoids unnecessary API calls and keeps your DLQ free of easily preventable failures.
Reference
Valid payload structure
[
{
"data": {
"campaign_id": "sample_campaign_id_8",
"impressions": 74
},
"timestamp": "2025-06-28 23:44:47",
"customer_id": "c03"
}
]The request body must be a JSON array. Each element represents one usage event.
| Field | Type | Required | Notes |
|---|---|---|---|
customer_id | String | Yes | Must match a customer in Zenskar |
timestamp | DateTime | Yes | Format: YYYY-MM-DD HH:MM:SS |
data | Object | Yes | Keys and value types must match your metric schema exactly |
HTTP error codes
Note: The
404status code below is returned by Zenskar specifically for unparseable JSON bodies. This is non-standard: most APIs use400 Bad Requestfor this case. If your HTTP client or logging tooling maps404to "resource not found," add explicit handling to avoid misclassifying this error.
| Status | Meaning | Retriable? | Example error message |
|---|---|---|---|
404 | Request body is not valid JSON (unparseable) | No: Fix the JSON | invalid character '}' looking for beginning of object key string |
413 | Payload exceeds 1 MB | No: Split into smaller batches | Payload too large |
422 | Payload is valid JSON but fails schema validation | No: Fix the payload | Invalid type for key: impressions. Expected Int64, got string |
5xx | Zenskar server error | Yes: Retry with backoff | — |
Validation error messages
When the API returns 422, the response body contains a message describing the exact problem.
Missing or unexpected keys
{ "error": "Missing key: impressions" }
{ "error": "Unexpected key in payload: extra_field" }Type mismatches
{ "error": "Invalid type for key: campaign_id. Expected String, got float64" }
{ "error": "Invalid type for key: impressions. Expected Int64, got string" }
{ "error": "Invalid type for key: value. Expected Float64, got string" }
{ "error": "Invalid type for key: is_active. Expected Bool, got string" }Date and time format errors
{ "error": "Invalid type for key: start_date. Expected Date32, got string" }
{ "error": "Invalid type for key: timestamp. Expected Date32/DateTime64, got string" }UUID format errors
{ "error": "Invalid type for key: user_id. Expected UUID, got string" }Nested object errors
{ "error": "Invalid type for key: data. Expected Object, got string" }
{ "error": "Invalid type for key: nested_field. Expected Int64, got string" }Worked examples
Example 1: Type mismatch: impressions sent as a string instead of an integer
Request:
{
"data": { "campaign_id": "sample_campaign_id_8", "impressions": "74" },
"timestamp": "2025-06-28 23:44:47",
"customer_id": "c03"
}Response:
{ "error": "Invalid type for key: impressions. Expected Int64, got string" }Fix: Send 74 (integer), not "74" (string).
Example 2: Missing required key
Request:
{
"data": { "campaign_id": "sample_campaign_id_8" },
"timestamp": "2025-06-28 23:44:47",
"customer_id": "c03"
}Response:
{ "error": "Missing key: impressions" }Fix: Include all required fields defined in your metric schema.
Example 3: Unexpected key in payload
Request:
{
"data": { "campaign_id": "sample_campaign_id_8", "impressions": 74, "extra_field": "not_allowed" },
"timestamp": "2025-06-28 23:44:47",
"customer_id": "c03"
}Response:
{ "error": "Unexpected key in payload: extra_field" }Fix: Remove any fields not defined in your metric schema.
Updated 11 days ago
