Scope
This page is derived from the GDPR and privacy audit in beacon-platform/AUDIT_REPORT.md.
Executive summary
- The system processes raw WhatsApp exports in R2 and derived summaries in D1; core services are Cloudflare Workers, D1, R2, Queues, and Workers AI.
- Admin endpoints are guarded in code via ADMIN_TOKEN/ADMIN_SECRET or Access JWT; production must confirm a valid token or Access policy.
- /pulse/daily.json is gated by PUBLIC_DAILY_DIGESTS; current config keeps it public unless disabled.
- PII minimization improved (aliases + log length only), but inferred-signal logs can still echo content.
- Retention is inconsistent in docs and not enforced in code.
Sources: beacon-platform/AUDIT_REPORT.md
System map (privacy-relevant)
Services and entrypoints
- Ingest worker routes: /presign, /upload, /files, /clear, /exports/, /regenerate/, /replay/, /quota/, /pipeline/daily-config
- Public worker routes: /pulse.json, /pulse/history.json, /pulse/daily.json, /pulse/trends.json, /pulse, /pulse/embed, /docs/, and /admin/ helpers
Sources: beacon-platform/AUDIT_REPORT.md
Data stores and bindings
- R2 bucket beacon-pulse-exports for raw exports
- D1 database beacon-pulse-db for exports, hashes, digests, summaries, sources, communities, quota logs
- Queue beacon-pulse-uploads for ingest processing
- Workers AI binding AI for daily/weekly summaries
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/operations.md
External processors
- Cloudflare (Workers, D1, R2, Queues, Workers AI, Access, Observability)
- Google Fonts (UI assets)
Sources: beacon-platform/AUDIT_REPORT.md
Data inventory (record of processing)
| Data category | Source | Where stored | Purpose | Retention | Access controls | Transfers |
|---|---|---|---|---|---|---|
| Raw chat exports (message text, names, phone numbers, timestamps) | Admin upload | R2 beacon-pulse-exports | Generate daily/weekly summaries | Docs conflict (30 days vs manual cleanup) | Access at edge; admin auth in code | Cloudflare R2 |
| Export metadata (object key, status, counts) | Upload and queue processing | D1 exports | Processing and monitoring | Not specified | Ingest worker | Cloudflare D1 |
| Message hashes (content and sender) | Parsed messages | D1 message_hashes | Deduplication and unique sender counts | Indefinite (docs) | Ingest worker | Cloudflare D1 |
| Daily digests (summary, themes, counts) | Workers AI output | D1 daily_digests | Internal analysis and weekly summaries | Not specified | Admin-only unless PUBLIC_DAILY_DIGESTS | Cloudflare D1 |
| Weekly summaries (public) | Workers AI output | D1 weekly_summaries_public | Public analytics | Indefinite (docs) | Public endpoints | Cloudflare D1 |
| Concept graphs and health snapshots | Derived analytics | D1 weekly_concept_graph_public, weekly_health_snapshots_public | Public visualization | Not specified | Public endpoints | Cloudflare D1 |
| Community and source metadata | Admin endpoints | D1 communities, chat_sources | Admin organization | Not specified | Admin-only routes | Cloudflare D1 |
| AI usage logs and quota | AI pipeline | D1 ai_quota_usage, ai_usage_log | Cost and quota controls | Not specified | Ingest worker | Cloudflare D1 |
| Logs (user agent, origin, inferred signals) | Requests and parsing | Cloudflare observability + app logs | Debugging | Cloudflare defaults (unknown) | Cloudflare Access controls | Cloudflare |
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/database-structure.md
Key flows
Admin upload and ingest
- Admin UI calls /upload on ingest.
- Ingest stores file in R2 and creates export record in D1.
- Queue event triggers parsing, hashing, and AI summarization.
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/architecture.md
Public read paths
- /pulse.json, /pulse/history.json, /pulse/trends.json serve weekly_summaries_public.
- /pulse/daily.json serves daily_digests.
Sources: beacon-platform/AUDIT_REPORT.md
Deletion paths
- /clear deletes daily digests, weekly summaries, and optional hashes by date range.
- /communities/{id} deletes all D1 records and R2 files for a community.
Sources: beacon-platform/AUDIT_REPORT.md
GDPR requirements gap analysis
Transparency
Finding: Privacy doc lacks controller contact, lawful basis, DSAR process, subprocessors list.
Risk: Medium.
Recommendation: Add a public privacy notice with controller contact, lawful basis, rights, retention, subprocessors, and transfer info.
Consent
Finding: No consent capture or storage in schema; no enforcement in code.
Risk: Medium.
Recommendation: Decide lawful basis and implement consent tracking if required.
Data minimization
Finding: Logs avoid raw content but inferred-signal logs can still echo content; UA/origin is logged.
Risk: Medium.
Recommendation: Remove inferred-signal content from logs; consider UA/IP minimization.
Purpose limitation
Finding: /pulse/daily.json is public when PUBLIC_DAILY_DIGESTS is enabled, despite being described as internal.
Risk: Medium.
Recommendation: Set PUBLIC_DAILY_DIGESTS false in production or return counts-only.
Retention and deletion
Finding: Retention is not enforced in code; docs conflict on R2 retention.
Risk: Medium.
Recommendation: Enforce retention via R2 lifecycle + D1 cleanup jobs and document defaults.
DSAR (access/export/delete)
Finding: No user-level export/delete endpoints; only date-range cleanup and community deletion exist.
Risk: High.
Recommendation: Add admin-only DSAR endpoints to export/delete by sender hash or identifier with audit logging.
Security (auth, rate limits)
Finding: Admin endpoints now enforce auth via token or Access JWT.
Risk: Medium.
Recommendation: Ensure token is set and rotated; add rate limiting/WAF rules.
Children/special categories
Finding: No explicit handling in code/docs.
Risk: Medium.
Recommendation: Add policy statements and input constraints if needed.
International transfers
Finding: Cloudflare infrastructure and Google Fonts imply cross-border transfers.
Risk: Medium.
Recommendation: Document transfers and SCC/DPA status; consider self-hosting fonts.
Processor contracts
Finding: No subprocessors list or DPA references in repo.
Risk: Medium.
Recommendation: Maintain subprocessors list and DPA references (propose in IHNYC-Remote).
Breach readiness
Finding: No incident response checklist in repo.
Risk: Medium.
Recommendation: Add incident response checklist (propose in IHNYC-Remote).
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/REMEDIATION_PLAN.md
Open questions
- Is ADMIN_TOKEN/ADMIN_SECRET configured in production and are admin clients sending x-admin-token, or is Access JWT enforced end-to-end?
- Is R2 lifecycle retention configured for the exports bucket?
- Should /pulse/daily.json be public in production?
- Who is the controller and DSAR contact?
- Are minors or special-category data expected in exports?
- What are Cloudflare log retention and access policies?
Sources: beacon-platform/AUDIT_REPORT.md