Every uploaded export has a status field in the exports D1 table that drives all processing decisions.
State Machine
┌─────────────────┐
│ pending_source │ ← no source_id assigned
└────────┬────────┘
│ source assigned (UI or /admin/exports/assign-source)
▼
upload → ─────────────────────► queued
│ queue consumer picks up message
▼
processing
/ \
quota exhausted / \ all days processed
▼ ▼
(re-enqueued weekly_processing
with delay) │ weekly summaries generated
▼
completed
At any stage: ──────────────────────► failed
Status Definitions
| Status | Meaning |
|---|---|
pending_source | Uploaded but no source_id assigned; not processed. |
queued | In queue or waiting to be picked up. |
processing | Queue consumer is actively parsing and generating daily digests. |
weekly_processing | Daily digests complete; generating weekly summaries. |
completed | All digests and summaries generated successfully. |
failed | Processing failed; error_message contains details and retry count. |
Transition Conditions
queued → processing
Queue consumer reads the export from R2, validates source_id, and updates status.
processing → weekly_processing
All days in the export have been processed (days_processed = days_total). The pipeline then queues weekly summary generation.
weekly_processing → completed
finalizeCompletedWeeklyExports() (cron) detects that weeks_processed >= weeks_total or that weekly work is already complete and sets status = 'completed'.
* → failed
Any unhandled error during processing sets status to failed and records the error in error_message.
failed → queued (auto-retry)
retryRecoverableFailedExports() (cron) checks failed exports every 5 minutes. If the error is recoverable (quota, transient network, etc.) and AUTO_RETRY_MAX_ATTEMPTS has not been reached, the export is reset to queued and re-enqueued.
Cron Retry Logic
The cron (*/5 * * * *) uses this query to find stuck exports:
SELECT * FROM exports
WHERE status IN ('processing', 'weekly_processing', 'queued')
AND (days_processed < days_total OR weeks_processed < weeks_total OR status = 'queued')
AND minutes_since_update >= 10Important: The cron only retries processing, weekly_processing, and queued exports. Exports with status = 'completed' are never automatically retried — even if they have zero daily digests (e.g., because quota exhausted after writing hashes but before generating digests). To re-trigger a completed export with missing digests: reset its status to processing via D1 or use the /replay/export endpoint.
Manual Recovery
Re-queue a stuck export
POST /replay/export
{ "export_id": "...", "community_id": "..." }
Reads the raw export from R2 and re-enqueues it. Fails if the raw R2 object has already been deleted.
Re-queue all exports for a community
POST /replay/all
{ "community_id": "...", "clear_first": false }
Skips exports whose raw R2 object is missing and reports how many were skipped.
Re-queue stuck exports manually
POST /replay/stuck
{ "community_id": "...", "max_age_minutes": 10 }
Defaults to 30-minute staleness threshold at the API layer.
Raw Export Retention
Raw exports stored in R2 are subject to automatic cleanup:
| Scenario | Default retention |
|---|---|
| Completed processing, no pending media | 72 hours after completion |
| Failed processing | 168 hours after failure |
| Pending media still attached | Held until media is complete or quota clears |
Cleanup runs in batches of 75 per cron tick (raised from 25 in April 2026). Once the raw object is deleted, replay from R2 is no longer possible.
Source Assignment
Exports uploaded without a source_id are immediately set to pending_source and are not processed until a source is assigned. If the community has exactly one source configured in chat_sources, the worker auto-assigns it at upload time. Otherwise, use the admin UI or /admin/exports/assign-source.
Progress Tracking
| Column | Meaning |
|---|---|
days_total | Number of unique dates in the export file |
days_processed | Days for which daily digests have been written |
weeks_total | Number of weeks containing daily data |
weeks_processed | Weeks for which weekly summaries have been written |
days_processed = days_total does not guarantee all daily digests exist — it means the processing loop completed. The pipeline queries daily_digests directly to determine which days still need work. See ingest for the duplicate-only path.