Technical architecture covering components, pipelines, and data boundaries for Beacon Pulse.
Overview
- Stack: Cloudflare Workers, R2, D1, Queues, Workers AI.
- Purpose: privacy-preserving weekly summaries from community conversations.
- Beacon Pulse is the production subsystem; Beacon Search is planned.
Sources: beacon-platform/README.md, beacon-platform/apps/pulse-public/src/index.ts
Components
- apps/pulse-ingest: ingestion worker for uploads, queue consumption, parsing, dedupe, AI summarization.
- apps/pulse-public: public JSON + HTML endpoints, embeds, and admin helpers.
- R2 bucket: beacon-pulse-exports (raw exports).
- Queue: beacon-pulse-uploads (ingest trigger).
- D1 database: beacon-pulse-db (exports, hashes, digests, summaries, metadata).
- Workers AI: daily and weekly model pipelines.
Sources: beacon-platform/README.md, beacon-platform/docs/architecture.md, beacon-platform/apps/pulse-public/src/index.ts
Data Storage and Processing
- R2 (beacon-pulse-exports): raw WhatsApp exports, triggers ObjectCreated/ObjectDeleted events.
- D1 (beacon-pulse-db): exports, message_hashes, daily_digests, weekly_summaries_public, communities, chat_sources, AI quota tables.
- Queue (beacon-pulse-uploads): async ingestion processing.
- Workers AI: daily + weekly summarization with guardrails.
Sources: beacon-platform/docs/architecture.md, beacon-platform/docs/operations.md, beacon-platform/apps/pulse-public/src/index.ts
AI Model Configuration
- Daily analysis: @cf/meta/llama-3.1-8b-instruct-awq.
- Daily narrative: @cf/mistralai/mistral-small-3.1-24b-instruct.
- Weekly analysis: @cf/meta/llama-3.1-70b-instruct.
- Weekly narrative: @cf/mistralai/mistral-small-3.1-24b-instruct.
- Defaults can be overridden via Wrangler env vars.
Sources: beacon-platform/README.md, beacon-platform/apps/pulse-public/src/index.ts
System overview
flowchart LR Uploader -->|POST /presign| IngestWorker IngestWorker -->|objectKey| Uploader Uploader -->|PUT export| R2Bucket R2Bucket -->|ObjectCreated| UploadQueue UploadQueue --> IngestWorker IngestWorker -->|hashes/digests| D1 IngestWorker -->|AI daily/weekly| WorkersAI PublicWorker -->|GET /pulse.json| D1 PublicUser -->|GET /pulse| PublicWorker
Sources: beacon-platform/docs/architecture.md
Ingestion pipeline (WhatsApp)
sequenceDiagram participant Uploader participant Ingest as PulseIngest participant R2 as R2Bucket participant Queue as UploadQueue participant Parser as WhatsAppParser participant Dedupe as DedupeEngine participant AI as WorkersAI participant DB as D1 Uploader->>Ingest: POST /presign Ingest-->>Uploader: objectKey Uploader->>R2: PUT export file R2-->>Queue: ObjectCreated event Queue->>Ingest: queue message Ingest->>R2: read file Ingest->>Parser: parse export Parser-->>Ingest: messages Ingest->>Dedupe: content+sender hashes per day Dedupe->>DB: read/write message_hashes Dedupe-->>Ingest: new messages only Ingest->>AI: generate daily digest AI-->>Ingest: daily sentiment/themes/summary Ingest->>DB: upsert daily_digests Ingest->>AI: generate weekly summary AI-->>Ingest: weekly summary Ingest->>DB: upsert weekly_summaries_public
Sources: beacon-platform/docs/architecture.md
Public read path
flowchart LR Browser -->|GET /pulse| PulsePage PulsePage -->|fetch| PulseJson PulseJson -->|query| WeeklySummaries Browser -->|GET /pulse/history.json| HistoryApi HistoryApi --> WeeklySummaries Browser -->|GET /pulse/daily.json| DailyApi DailyApi --> DailyDigests Browser -->|GET /pulse/trends.json| TrendsApi TrendsApi --> WeeklySummaries Browser -->|GET /pulse/embed| EmbedPage
Sources: beacon-platform/docs/architecture.md
Public vs private data boundary
flowchart LR subgraph PrivateZone RawExports MessageHashes DailyDigestsTable end subgraph PublicZone WeeklyPublic PulseApi PulseHtml end RawExports -->|parse + hash| MessageHashes MessageHashes --> DailyDigestsTable DailyDigestsTable --> WeeklyPublic WeeklyPublic --> PulseApi PulseApi --> PulseHtml
Sources: beacon-platform/docs/architecture.md, beacon-platform/docs/privacy.md
Key behaviors
- Parsing supports multiple WhatsApp export formats and multi-line messages.
- Deduplication is keyed by (community_id, source_id, day_date, content_hash) with sender hashing for unique counts.
- Daily digests are capped to 1000 messages per AI call, redacted for PII, and validated against JSON schema and privacy checks.
- Weekly summaries are generated from daily digests only and require >=3 days or >=20 messages.
- Sentiment trends are computed via confidence-weighted linear regression over weekly summaries.
Sources: beacon-platform/docs/architecture.md, beacon-platform/docs/privacy.md, beacon-platform/docs/operations.md