Scope
Beacon can ingest WhatsApp ZIP exports that contain _chat.txt plus attachments. Text-only .txt uploads remain supported and continue through the original parser/dedupe/digest path.
Media analysis is intentionally additive:
- Existing text messages remain the primary source for daily and weekly summaries.
- Media results are stored as metadata and added to summary
analysis_json.media. - Public APIs expose only aggregate media fields, not raw files or raw sender names.
ZIP Ingestion Flow
flowchart LR Admin[Admin Upload] -->|POST /upload .zip| Ingest[Pulse Ingest Worker] Ingest -->|store raw zip| R2[R2 beacon-pulse-exports] Ingest -->|extract _chat.txt| Parser[WhatsApp Parser] Ingest -->|store attachments| R2 Parser -->|messages| Dedupe[Dedupe + Daily Digest] Ingest -->|export_media rows| D1[D1 beacon-pulse-db] Ingest -->|media_analysis jobs| Queue[beacon-pulse-uploads] Queue --> Analyzer[Media Analyzer] Analyzer -->|media_analysis rows| D1 Analyzer -->|summary enrichment| Digests[Daily + Weekly Summaries]
Upload UX
The admin upload form accepts .txt, .chat, and .zip files. ZIP uploads show byte-level progress in the browser and a ZIP-specific note telling the admin to keep the tab open until upload reaches 100%.
Important behavior:
- The upload progress bar tracks transfer to Beacon storage only.
- Small files use the existing
POST /uploadrequest path. - Large ZIP files use R2 multipart upload through the ingest Worker in roughly 16 MB chunks.
- Large ZIP parsing is R2 range-based. The Worker does not read the full ZIP into memory; it reads the central directory, transcript, and the current quota-eligible attachment batch.
- Transcript processing, attachment indexing, and media analysis continue asynchronously after upload completes.
- Media that is over the current analysis batch budget is deferred and resumed by scheduled queue jobs instead of requiring a new upload.
- Multipart uploads are not resumable across browser tab close yet; restart the upload if the tab closes before completion.
Stored Data
export_media
One row per attachment reference in the WhatsApp transcript.
Important fields:
media_id: generated ID for the attachment record.export_id,community_id,source_id: export and scope linkage.message_timestamp: timestamp from the transcript line when available.sender_hash: hashed sender only; raw names are not stored.object_key: extracted attachment location in R2.filename,mime_type,media_kind,byte_size: file metadata.status:pending_analysis,deferred_quota,deferred_processing,analyzed,missing_attachment,unsupported,skipped_size,skipped_quota, orfailed.
media_analysis
Normalized model output per media item.
Important fields:
classifier_tags_json: image classifier tags or derived transcript tags.objects_json: object detector output for images.caption: optional short caption or video metadata note.transcript: voice-note transcription.confidence: coarse confidence score for aggregate use.model_summary_json: private model provenance and raw-ish structured output for admin/debug use. Video frame results are stored undermodel_summary_json.video.
media_ai_quota_usage
Per-day quota tracking for media analysis. One row per usage_date.
Important fields:
image_stage1_calls: Stage 1 classifier/object-detection calls on images.image_stage2_calls: Stage 2 vision caption calls on images.audio_minutes: total audio transcription minutes consumed.video_minutes: video metadata processing minutes.video_frame_extractions: number of video frames extracted for analysis.video_frame_stage1_calls: Stage 1 calls on extracted video frames.video_frame_stage2_calls: Stage 2 caption calls on video frames.neurons: Cloudflare Workers AI neuron units consumed.
Model Behavior
Images
- Stage 1 uses Cloudflare Workers AI image classification and object detection.
- Stage 2 vision captioning is selective and capped by media config.
- Oversized or unsupported images are skipped without failing export processing.
- Images outside the current batch budget stay indexed as deferred work and can be resumed later from the original ZIP.
Voice Notes
- Supported audio files are sent to Workers AI transcription.
- Transcript-derived terms can enrich daily/weekly summaries as
voice_note_themes.
Videos
- v1.5 stores video metadata and attempts sampled frame analysis when Cloudflare Media Transformations is available.
- The Worker samples a small number of private R2 video frames, runs the same cheap image classifier/object detector used for images, and optionally captions one representative frame when cheap labels are sparse.
- Video frame outputs are aggregate-only and stored in
model_summary_json.videowith limitations: sampled frames only, not full motion analysis, and no identity inference. - If frame extraction fails, video analysis falls back to metadata-only without failing the export.
- v1.5 does not extract audio tracks from video files in the Worker; voice notes continue to use the audio transcription path.
Public API Additions
Summary responses can include optional aggregate fields:
media_countsmedia_tagsvoice_note_themesmedia_confidence
These fields are derived from analysis_json.media and are safe to omit when an export has no media or analysis is pending.
The public Pulse UI shows media enrichment indicators in two places:
- Daily calendar — a small blue dot at the bottom-left of any calendar day cell where
media_countsis non-empty. Hovering shows a tooltip with the total item count. Scales on hover. - Weekly history — a
📷 Mediapill badge in the header row of each weekly summary entry that has analyzed media, visible without expanding the card.
Within expanded views a compact “Media signals included” provenance card appears showing media counts, confidence score, and voice-note themes where available. The indicator is aggregate-only; no raw files or sender names are exposed.
Media signals are stored as structured context and should not be appended to summary prose as inventory sentences. Weekly regeneration can pass aggregate media context into the AI analysis prompt so the narrative may weave media-derived evidence into the story naturally when it adds meaning.
Admin API Additions
The ingest worker exposes admin-only media helpers:
GET /files: list D1-backed export rows with text-processing progress and aggregate media-analysis counts.POST /upload/multipart/init: initialize a large ZIP upload and return the R2 object key, multipart upload ID, and chunk size.POST /upload/multipart/part: upload one ZIP chunk to the active R2 multipart session.POST /upload/multipart/complete: complete the R2 multipart object, create the export row, and enqueue ingest.POST /upload/multipart/abort: abort an incomplete R2 multipart upload.GET /media/list: list media records with optionalexport_id,community_id,source_id, andday_datefilters.POST /media/replay: re-enqueue media analysis bymedia_idor all media for anexport_id.GET /quota/status: includesmedia_status, text bypass state, media bypass state, and effective media config.GET /quota/status: also includesmedia_models, the configured media model IDs and stage behavior shown in the admin System tab.GET /quota/bypass: returns separatetext_bypass_enabledandmedia_bypass_enabledflags.POST /quota/bypass: acceptsscope: "text"orscope: "media"to toggle bypasses independently.POST /quota/config: accepts media config keys in addition to existing text quota keys.
Deferred Media Resume
Large exports can contain more attachments than Beacon should extract or analyze in one Worker batch. When that happens:
- The transcript/media reference row is still written to
export_media. - The row is marked
deferred_quotawhen the attachment is supported and below the size limit but outside the current extraction budget. - The scheduled Worker enqueues
media_deferred_batchjobs only when there is available per-kind budget and matching deferred media. - Deferred selection is per media kind. Images, audio, and videos are selected independently so a large image/video backlog cannot crowd out audio rows that still have available transcription budget.
- A deferred batch re-opens the original ZIP by R2 range, extracts selected filenames, stores those attachments under the existing media object keys, flips rows back to
pending_analysis, and queues normalmedia_analysisjobs. - The scheduled Worker also requeues stale
pending_analysisrows older than about 15 minutes, capped per run, so queue drops do not leave media permanently stuck. - Items stuck in
deferred_processingfor more than 5 minutes are automatically reset todeferred_quotaby the same scheduled pass, so ZIP extraction timeouts do not permanently strand work. skipped_size,unsupported, andmissing_attachmentare terminal unless config or source data changes.
The /files media progress count treats deferred_quota, deferred_processing, legacy skipped_quota, and pending_analysis as pending work. Terminal skipped counts only include size, unsupported, and missing-attachment outcomes.
When normal caps are exhausted, the scheduler logs a structured blocked_cap reason instead of queuing no-op deferred batches. Admin export details expose media counts for pending_analysis, deferred/quota-waiting rows, oversized skips, missing attachments, unsupported files, and failures.
Beacon also runs a bounded self-healing pass for recoverable failures. Failed media rows and failed text exports are retried in place when the relevant quota/budget is available, with retry attempts tracked in error_message so the system does not loop forever. This is intended for transient Workers AI, D1, quota/reset, model-schema, and network failures. It does not retry terminal source-data outcomes such as missing attachments, unsupported media, or oversized files.
Stage 2 image and video-frame captioning uses @cf/meta/llama-3.2-11b-vision-instruct and requires the Meta Llama 3.2 community license to be accepted in the Cloudflare account. If the model is unavailable or gated, Stage 2 is skipped and the item completes with Stage 1 classifier/object-detection output only — the export does not fail.
Media quota bypass is separate from text quota bypass. When enabled, deferred media batches ignore Beacon’s media count/minute caps and use a larger bounded batch so backlogs drain faster, but MIME allowlists, max file-size checks, Cloudflare hard limits, and actual Workers AI availability still apply.
The admin uploaded-files table intentionally separates text completion from media completion. An export can show Text Complete, Media Running after daily and weekly summaries are live while media enrichment continues asynchronously in the queue.
Cost Controls
Default media controls:
media_max_images_per_export:64media_max_stage2_per_export:8media_max_audio_minutes_per_export:60media_max_video_minutes_per_export:30media_max_file_bytes:10485760bytesmedia_supported_mime_types_json: JSON allowlist of image, audio, and video MIME types
Video frame analysis controls (v1.5):
media_video_frame_analysis_enabled:truemedia_max_frames_per_video:8media_max_video_frame_analyses_per_export:60media_max_video_frame_stage2_per_export:10media_video_frame_height:480(px, frames are resized before analysis)media_video_frame_times_json:["1s","5s","10s","20s","30s","45s","60s","90s"](sample times within the video)
These defaults favor not breaking export processing over complete media coverage. Deferred media remains indexed with status and error context for admin review and can be resumed from the original ZIP. Terminal skipped media remains indexed when it is unsupported, missing, or over the file-size limit.
Privacy Notes
- Raw attachment bytes remain private in R2.
- Raw sender names are not stored in media tables.
- Public payloads expose only aggregate tags/counts/themes.
- Documentation and retention policy should be revisited before making a public attachment browser.