Media Analysis Decisions

These are the high-level decisions behind the WhatsApp media-analysis implementation.

ZIP Exports Are The Media Input

Beacon supports media-bearing imports through WhatsApp ZIP exports only. Plain .txt and .chat uploads remain supported for text-only ingestion, but they do not create media records because the attachment bytes are not present.

Rationale:

  • WhatsApp ZIP exports provide the transcript and attachment files in one canonical package.
  • Supporting loose media uploads would require a separate identity/matching workflow and higher risk of incorrect sender/date linkage.
  • Keeping media import scoped to ZIPs reduces implementation risk and preserves backward compatibility.

Media Is Additive To Text Summaries

Existing text parsing, dedupe, daily digests, and weekly summaries remain the primary Beacon pipeline. Media analysis enriches summaries through optional aggregate fields and does not replace text analysis.

Rationale:

  • Text exports are already the reliable source of community signal.
  • Media analysis can be incomplete because of quotas, unsupported files, missing attachments, or model failures.
  • Optional media fields allow public consumers to ignore media without breaking existing integrations.

Privacy Defaults Favor Aggregates

Media tables store sender_hash, not raw sender names. Public APIs expose aggregate media counts, tags, themes, and confidence only. Raw attachment bytes remain private in R2.

Rationale:

  • Beacon’s existing privacy posture avoids exposing raw resident identity.
  • Media can contain more sensitive content than text metadata.
  • A public media browser would need a separate retention, consent, and access-control review.

Analysis Runs Asynchronously

ZIP upload, transcript ingestion, and attachment analysis are separated through the existing queue. Upload completion means the browser sent the file to the ingest Worker; it does not mean media analysis is complete.

Rationale:

  • ZIP exports can be large and model calls can be slow.
  • Queue processing keeps admin uploads responsive and prevents one failed attachment from failing the whole export.
  • Replay endpoints can re-run media analysis without re-uploading the export.

Cheap-First Model Strategy

Images run broad Stage 1 analysis with lower-cost Workers AI models. Stage 2 vision-caption calls are selective and quota-capped. Audio transcription is allowed for supported audio files. Video v1 stores metadata only.

Rationale:

  • The $5 Workers plan and Workers AI pricing make broad cheap image/audio processing viable, but blanket vision-LLM calls are unnecessary and costlier.
  • Stage 1 output is usually enough for summary-level tags and counts.
  • Stage 2 should be used only when the output is low-confidence or useful for summary enrichment.

Video V1 Is Metadata-Only

Beacon v1 does not perform frame-level video understanding or Worker-side audio extraction from videos. Video attachments are indexed, counted, and stored with metadata.

Rationale:

  • Workers do not provide a simple native video frame/audio extraction workflow.
  • Sending raw video directly to audio transcription is not reliable.
  • Scene understanding and keyframe extraction should be a separate, explicit phase if needed later.

Quotas Prefer Partial Success And Deferred Resume

Media quotas cap images per batch/export window, Stage 2 calls, audio minutes, video minutes, supported MIME types, and max file size. When the current extraction/analysis budget is reached, supported in-size media is marked as deferred and resumed from the original ZIP by scheduled queue jobs. Terminal skipped statuses are reserved for files that are missing, unsupported, oversized, or fail analysis in a non-resumable way.

Deferred media resume is intentionally per-kind and scheduler-gated. The scheduler checks image, audio, and video budgets separately, requeues stale pending_analysis rows, and avoids no-op jobs when the remaining backlog is blocked by caps. Media bypass is the explicit operator control for draining a large deferred backlog faster while still enforcing hard safety limits such as MIME allowlists and max file size.

Recoverable failures are self-healed in place. The scheduler retries failed text exports and failed media rows when relevant quota is available, bounded by an attempt counter stored in error_message. Optional Stage 2 image captioning must not make the whole image artifact fail; Stage 1 classifier/object-detection output is sufficient for a completed media row when the caption model is unavailable.

Rationale:

  • Beacon should continue producing text summaries even when media coverage is incomplete.
  • Admins need an audit trail that separates resumable deferred work from terminal skipped media.
  • Separate media usage accounting lets media budgets be tuned without affecting text-summary quotas.
  • Large ZIP exports should not require re-uploading just because the first Worker batch intentionally analyzed only a bounded subset.
  • Text quota bypass and media quota bypass are separate controls. This prevents an emergency digest replay from automatically burning through a large media backlog, and lets admins intentionally drain media without disabling text-summary safeguards.

Large ZIPs Use Worker-Mediated R2 Multipart Upload

Small uploads still use the existing POST /upload Worker request path. Large WhatsApp ZIPs use a Worker-mediated R2 multipart flow: initialize an R2 multipart upload, send sequential browser chunks to the Worker, complete the R2 object, then create the exports row and enqueue the same ingest pipeline.

Rationale:

  • A single 1 GB+ ZIP cannot reliably pass through the Worker as one request body.
  • Chunking keeps each request around 16 MB while preserving one canonical ZIP object in R2.
  • Ingest reads large ZIPs by R2 byte range rather than buffering the entire archive. It reads the central directory first, then only the transcript and attachment entries that are within the media file-size quota.
  • The Worker remains the authorization boundary, so Cloudflare Access and existing admin checks still protect upload initiation, part upload, completion, and abort.
  • Multipart completion is the only point that creates the export record and queues ingest, which prevents partial archives from being processed.
  • Browser-tab-close resume is deferred. If uploads regularly fail mid-transfer, the next change should add persisted multipart session recovery or direct browser-to-R2 signed upload URLs.

Schema Changes Are Additive

The media migration adds export_media, media_analysis, media_ai_quota_usage, indexes, and media quota settings. It does not alter existing digest, message, or export tables in a destructive way.

Rationale:

  • Existing text-only behavior should keep working.
  • The migration can be safely re-run because it uses idempotent DDL/default inserts.
  • Rollback can be operational by disabling ZIP/media uploads without dropping data.

Admin Controls Come Before Public Media Browsing

Beacon exposes admin media listing, detail/replay behavior, status, quota inspection, and aggregate public summary fields. It does not expose a public attachment browser.

Rationale:

  • Admins need observability and replay controls for queued media jobs.
  • Public consumers need summary-level signal, not raw attachment access.
  • Raw media access has separate privacy and moderation implications.