D1 Migration

The media-analysis feature requires the additive D1 migration:

cd /Users/dghauri/GitHub/beacon-platform/apps/pulse-ingest
npx wrangler d1 execute beacon-pulse-db --remote --file ../../packages/db/migrations/add_media_analysis_tables.sql

The migration creates:

  • export_media
  • media_analysis
  • media_ai_quota_usage
  • media quota defaults in quota_settings
  • media lookup indexes

It uses CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS, and INSERT OR IGNORE, so re-running is safe.

Verification Queries

cd /Users/dghauri/GitHub/beacon-platform/apps/pulse-ingest
npx wrangler d1 execute beacon-pulse-db --remote --command "SELECT name FROM sqlite_master WHERE type='table' AND name IN ('export_media','media_analysis','media_ai_quota_usage') ORDER BY name;"
npx wrangler d1 execute beacon-pulse-db --remote --command "SELECT setting_key, setting_value FROM quota_settings WHERE setting_key LIKE 'media_%' ORDER BY setting_key;"

Worker Verification

Before deploying, run:

cd /Users/dghauri/GitHub/beacon-platform
pnpm --filter pulse-ingest exec vitest run test/unit/media-zip.test.ts
cd apps/pulse-ingest && npx wrangler deploy --dry-run
cd ../pulse-public && npm run build && npx wrangler deploy --dry-run

Large ZIP Operations

ZIP uploads show browser-side upload progress, including transferred bytes and percentage when the browser can compute the total request size. The progress bar only confirms that the browser finished sending the file to Beacon storage; transcript parsing and media analysis still run later through the queue.

Upload paths:

  • Direct upload: .txt, .chat, and ZIP files up to roughly 75 MB use the existing POST /upload path through pulse-ingest.
  • Large ZIP upload: ZIP files above that threshold use Worker-mediated R2 multipart upload through /upload/multipart/init, /upload/multipart/part, /upload/multipart/complete, and /upload/multipart/abort.
  • Large ZIP chunks are approximately 16 MB each, keeping each Worker request below the conservative request/memory ceiling while still storing one complete ZIP object in R2.
  • Processing starts only after multipart completion creates the exports row and enqueues the normal ingest job.
  • ZIP processing reads the R2 object by byte range. It fetches the ZIP central directory, transcript, and quota-eligible attachment bytes instead of loading the full archive into Worker memory.

Operational guidance:

  • Keep the admin tab open until the upload reaches 100%.
  • Large ZIP upload progress is transfer progress, not parsing or analysis progress.
  • Multipart upload is not resumable across browser tab close yet. If the tab closes before completion, restart the upload.
  • Use the uploaded-files and media status views for processing progress after upload.
  • Do not treat the upload progress bar as media-analysis completion.

Uploaded Files Progress

The admin uploaded-files view is D1-backed. It lists rows from exports and joins aggregate media counts from export_media; it does not walk the R2 export prefix. This matters for ZIP exports because extracted attachments can make R2 prefix listings large and noisy.

Displayed progress includes:

  • Text processing status from exports.
  • Daily digest progress from days_total and days_processed.
  • Weekly summary progress from weeks_total, weeks_processed, and weekly_status.
  • Media-analysis progress from export_media status counts.

The view polls while text or media work is still pending. Manual refresh remains available.

Rollback Notes

The migration is additive and does not modify existing text-ingest tables. If media analysis must be disabled without schema rollback:

  • Continue accepting .txt exports only in admin operations.
  • Do not upload WhatsApp ZIP exports.
  • Leave export_media rows in place; they do not affect text dedupe or digest generation unless media jobs are queued.