Beacon Platform Media Analysis

Scope

Beacon can ingest WhatsApp ZIP exports that contain _chat.txt plus attachments. Text-only .txt uploads remain supported and continue through the original parser/dedupe/digest path.

Media analysis is intentionally additive:

Existing text messages remain the primary source for daily and weekly summaries.
Media results are stored as metadata and added to summary analysis_json.media.
Public APIs expose only aggregate media fields, not raw files or raw sender names.

ZIP Ingestion Flow

flowchart LR
  Admin[Admin Upload] -->|POST /upload .zip| Ingest[Pulse Ingest Worker]
  Ingest -->|store raw zip| R2[R2 beacon-pulse-exports]
  Ingest -->|extract _chat.txt| Parser[WhatsApp Parser]
  Ingest -->|store attachments| R2
  Parser -->|messages| Dedupe[Dedupe + Daily Digest]
  Ingest -->|export_media rows| D1[D1 beacon-pulse-db]
  Ingest -->|media_analysis jobs| Queue[beacon-pulse-uploads]
  Queue --> Analyzer[Media Analyzer]
  Analyzer -->|media_analysis rows| D1
  Analyzer -->|summary enrichment| Digests[Daily + Weekly Summaries]

Upload UX

The admin upload form accepts .txt, .chat, and .zip files. ZIP uploads show byte-level progress in the browser and a ZIP-specific note telling the admin to keep the tab open until upload reaches 100%.

Important behavior:

The upload progress bar tracks transfer to Beacon storage only.
Small files use the existing POST /upload request path.
Large ZIP files use R2 multipart upload through the ingest Worker in roughly 16 MB chunks.
Large ZIP parsing is R2 range-based. The Worker does not read the full ZIP into memory; it reads the central directory, transcript, and the current quota-eligible attachment batch.
Transcript processing, attachment indexing, and media analysis continue asynchronously after upload completes.
Media that is over the current analysis batch budget is deferred and resumed by scheduled queue jobs instead of requiring a new upload.
Multipart uploads are not resumable across browser tab close yet; restart the upload if the tab closes before completion.

Stored Data

`export_media`

One row per attachment reference in the WhatsApp transcript.

Important fields:

media_id: generated ID for the attachment record.
export_id, community_id, source_id: export and scope linkage.
message_timestamp: timestamp from the transcript line when available.
sender_hash: hashed sender only; raw names are not stored.
object_key: extracted attachment location in R2.
filename, mime_type, media_kind, byte_size: file metadata.
status: pending_analysis, deferred_quota, deferred_processing, analyzed, missing_attachment, unsupported, skipped_size, skipped_quota, or failed.

`media_analysis`

Normalized model output per media item.

Important fields:

classifier_tags_json: image classifier tags or derived transcript tags.
objects_json: object detector output for images.
caption: optional short caption or video metadata note.
transcript: voice-note transcription.
confidence: coarse confidence score for aggregate use.
model_summary_json: private model provenance and raw-ish structured output for admin/debug use. Video frame results are stored under model_summary_json.video.

`media_ai_quota_usage`

Per-day quota tracking for media analysis. One row per usage_date.

Important fields:

image_stage1_calls: Stage 1 classifier/object-detection calls on images.
image_stage2_calls: Stage 2 vision caption calls on images.
audio_minutes: total audio transcription minutes consumed.
video_minutes: video metadata processing minutes.
video_frame_extractions: number of video frames extracted for analysis.
video_frame_stage1_calls: Stage 1 calls on extracted video frames.
video_frame_stage2_calls: Stage 2 caption calls on video frames.
neurons: Cloudflare Workers AI neuron units consumed.

Model Behavior

Images

Stage 1 uses Cloudflare Workers AI image classification and object detection.
Stage 2 vision captioning is selective and capped by media config.
Oversized or unsupported images are skipped without failing export processing.
Images outside the current batch budget stay indexed as deferred work and can be resumed later from the original ZIP.

Voice Notes

Supported audio files are sent to Workers AI transcription.
Transcript-derived terms can enrich daily/weekly summaries as voice_note_themes.

Videos

v1.5 stores video metadata and attempts sampled frame analysis when Cloudflare Media Transformations is available.
The Worker samples a small number of private R2 video frames, runs the same cheap image classifier/object detector used for images, and optionally captions one representative frame when cheap labels are sparse.
Video frame outputs are aggregate-only and stored in model_summary_json.video with limitations: sampled frames only, not full motion analysis, and no identity inference.
If frame extraction fails, video analysis falls back to metadata-only without failing the export.
v1.5 does not extract audio tracks from video files in the Worker; voice notes continue to use the audio transcription path.

Public API Additions

Summary responses can include optional aggregate fields:

media_counts
media_tags
voice_note_themes
media_confidence

These fields are derived from analysis_json.media and are safe to omit when an export has no media or analysis is pending.

The public Pulse UI shows media enrichment indicators in two places:

Daily calendar — a small blue dot at the bottom-left of any calendar day cell where media_counts is non-empty. Hovering shows a tooltip with the total item count. Scales on hover.
Weekly history — a 📷 Media pill badge in the header row of each weekly summary entry that has analyzed media, visible without expanding the card.

Within expanded views a compact “Media signals included” provenance card appears showing media counts, confidence score, and voice-note themes where available. The indicator is aggregate-only; no raw files or sender names are exposed.

Media signals are stored as structured context and should not be appended to summary prose as inventory sentences. Weekly regeneration can pass aggregate media context into the AI analysis prompt so the narrative may weave media-derived evidence into the story naturally when it adds meaning.

Admin API Additions

The ingest worker exposes admin-only media helpers:

GET /files: list D1-backed export rows with text-processing progress and aggregate media-analysis counts.
POST /upload/multipart/init: initialize a large ZIP upload and return the R2 object key, multipart upload ID, and chunk size.
POST /upload/multipart/part: upload one ZIP chunk to the active R2 multipart session.
POST /upload/multipart/complete: complete the R2 multipart object, create the export row, and enqueue ingest.
POST /upload/multipart/abort: abort an incomplete R2 multipart upload.
GET /media/list: list media records with optional export_id, community_id, source_id, and day_date filters.
POST /media/replay: re-enqueue media analysis by media_id or all media for an export_id.
GET /quota/status: includes media_status, text bypass state, media bypass state, and effective media config.
GET /quota/status: also includes media_models, the configured media model IDs and stage behavior shown in the admin System tab.
GET /quota/bypass: returns separate text_bypass_enabled and media_bypass_enabled flags.
POST /quota/bypass: accepts scope: "text" or scope: "media" to toggle bypasses independently.
POST /quota/config: accepts media config keys in addition to existing text quota keys.

Deferred Media Resume

Large exports can contain more attachments than Beacon should extract or analyze in one Worker batch. When that happens:

The transcript/media reference row is still written to export_media.
The row is marked deferred_quota when the attachment is supported and below the size limit but outside the current extraction budget.
The scheduled Worker enqueues media_deferred_batch jobs only when there is available per-kind budget and matching deferred media.
Deferred selection is per media kind. Images, audio, and videos are selected independently so a large image/video backlog cannot crowd out audio rows that still have available transcription budget.
A deferred batch re-opens the original ZIP by R2 range, extracts selected filenames, stores those attachments under the existing media object keys, flips rows back to pending_analysis, and queues normal media_analysis jobs.
The scheduled Worker also requeues stale pending_analysis rows older than about 15 minutes, capped per run, so queue drops do not leave media permanently stuck.
Items stuck in deferred_processing for more than 5 minutes are automatically reset to deferred_quota by the same scheduled pass, so ZIP extraction timeouts do not permanently strand work.
skipped_size, unsupported, and missing_attachment are terminal unless config or source data changes.

The /files media progress count treats deferred_quota, deferred_processing, legacy skipped_quota, and pending_analysis as pending work. Terminal skipped counts only include size, unsupported, and missing-attachment outcomes.

When normal caps are exhausted, the scheduler logs a structured blocked_cap reason instead of queuing no-op deferred batches. Admin export details expose media counts for pending_analysis, deferred/quota-waiting rows, oversized skips, missing attachments, unsupported files, and failures.

Beacon also runs a bounded self-healing pass for recoverable failures. Failed media rows and failed text exports are retried in place when the relevant quota/budget is available, with retry attempts tracked in error_message so the system does not loop forever. This is intended for transient Workers AI, D1, quota/reset, model-schema, and network failures. It does not retry terminal source-data outcomes such as missing attachments, unsupported media, or oversized files.

Stage 2 image and video-frame captioning uses @cf/meta/llama-3.2-11b-vision-instruct and requires the Meta Llama 3.2 community license to be accepted in the Cloudflare account. If the model is unavailable or gated, Stage 2 is skipped and the item completes with Stage 1 classifier/object-detection output only — the export does not fail.

Media quota bypass is separate from text quota bypass. When enabled, deferred media batches ignore Beacon’s media count/minute caps and use a larger bounded batch so backlogs drain faster, but MIME allowlists, max file-size checks, Cloudflare hard limits, and actual Workers AI availability still apply.

The admin uploaded-files table intentionally separates text completion from media completion. An export can show Text Complete, Media Running after daily and weekly summaries are live while media enrichment continues asynchronously in the queue.

Cost Controls

Default media controls:

media_max_images_per_export: 64
media_max_stage2_per_export: 8
media_max_audio_minutes_per_export: 60
media_max_video_minutes_per_export: 30
media_max_file_bytes: 10485760 bytes
media_supported_mime_types_json: JSON allowlist of image, audio, and video MIME types

Video frame analysis controls (v1.5):

media_video_frame_analysis_enabled: true
media_max_frames_per_video: 8
media_max_video_frame_analyses_per_export: 60
media_max_video_frame_stage2_per_export: 10
media_video_frame_height: 480 (px, frames are resized before analysis)
media_video_frame_times_json: ["1s","5s","10s","20s","30s","45s","60s","90s"] (sample times within the video)

These defaults favor not breaking export processing over complete media coverage. Deferred media remains indexed with status and error context for admin review and can be resumed from the original ZIP. Terminal skipped media remains indexed when it is unsupported, missing, or over the file-size limit.

Privacy Notes

Raw attachment bytes remain private in R2.
Raw sender names are not stored in media tables.
Public payloads expose only aggregate tags/counts/themes.
Documentation and retention policy should be revisited before making a public attachment browser.

Beacon Platform Docs

Explorer

Beacon Platform Media Analysis

Scope

ZIP Ingestion Flow

Upload UX

Stored Data

`export_media`

`media_analysis`

`media_ai_quota_usage`

Model Behavior

Images

Voice Notes

Videos

Public API Additions

Admin API Additions

Deferred Media Resume

Cost Controls

Privacy Notes

Graph View

Table of Contents

Backlinks

Beacon Platform Docs

Explorer

Beacon Platform Media Analysis

Scope

ZIP Ingestion Flow

Upload UX

Stored Data

export_media

media_analysis

media_ai_quota_usage

Model Behavior

Images

Voice Notes

Videos

Public API Additions

Admin API Additions

Deferred Media Resume

Cost Controls

Privacy Notes

Related

Graph View

Table of Contents

Backlinks

`export_media`

`media_analysis`

`media_ai_quota_usage`