Comprehensive GDPR and privacy audit derived from the platform security review.
Snapshot note: this page records the February 3, 2026 audit baseline. Some items have since changed in code, including admin auth enforcement and raw-export cleanup. For current behavior, start with security-model, privacy, and admin-dashboard.
GDPR alignment status (non-certification)
Overall status: Yellow — getting there
Legend:
- Red — gaps to close
- Yellow — getting there
- Green — compliant
As of February 3, 2026, the platform is partially aligned with GDPR requirements based on the audit in beacon-platform/AUDIT_REPORT.md. GDPR does not define formal compliance levels; this is an internal, best-effort assessment.
Current posture (high-level):
- Strengths: admin authentication enforcement, documented data inventory, deletion endpoints at community/date-range scope, and raw-export cleanup in code.
- Gaps to close: DSAR workflows, privacy notice completeness, lawful-basis/consent decisions, public daily-digest exposure, and subprocessors/transfer documentation.
Scope
This page is derived from the GDPR and privacy audit in beacon-platform/AUDIT_REPORT.md.
Executive summary
- The system processes raw WhatsApp exports in R2 and derived summaries in D1; core services are Cloudflare Workers, D1, R2, Queues, and Workers AI.
- Admin endpoints are guarded in code via ADMIN_TOKEN/ADMIN_SECRET or Access JWT; production must confirm a valid token or Access policy.
- /pulse/daily.json is gated by PUBLIC_DAILY_DIGESTS; current config keeps it public unless disabled.
- PII minimization improved (aliases + log length only), but inferred-signal logs can still echo content.
- Retention is inconsistent across docs and UI copy. Code now includes scheduled raw-export cleanup, but the documented/public wording still needs alignment.
Sources: beacon-platform/AUDIT_REPORT.md
System map (privacy-relevant)
Services and entrypoints
- Ingest worker routes: /presign, /upload, /files, /clear, /exports/, /regenerate/, /replay/, /quota/, /pipeline/daily-config
- Public worker routes: /pulse.json, /pulse/history.json, /pulse/daily.json, /pulse/trends.json, /pulse, /pulse/embed, /docs/, and /admin/ helpers
Sources: beacon-platform/AUDIT_REPORT.md
Data stores and bindings
- R2 bucket beacon-pulse-exports for raw exports
- D1 database beacon-pulse-db for exports, hashes, digests, summaries, sources, communities, quota logs
- Queue beacon-pulse-uploads for ingest processing
- Workers AI binding AI for daily/weekly summaries
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/operations.md
External processors
- Cloudflare (Workers, D1, R2, Queues, Workers AI, Access, Observability)
- Google Fonts (UI assets)
Sources: beacon-platform/AUDIT_REPORT.md
Data inventory (record of processing)
| Data category | Source | Where stored | Purpose | Retention | Access controls | Transfers |
|---|---|---|---|---|---|---|
| Raw chat exports (message text, names, phone numbers, timestamps) | Admin upload | R2 beacon-pulse-exports | Generate daily/weekly summaries | Code cleanup defaults now exist, but docs and UI copy still conflict | Access at edge; admin auth in code | Cloudflare R2 |
| Export metadata (object key, status, counts) | Upload and queue processing | D1 exports | Processing and monitoring | Not specified | Ingest worker | Cloudflare D1 |
| Message hashes (content and sender) | Parsed messages | D1 message_hashes | Deduplication and unique sender counts | Indefinite (docs) | Ingest worker | Cloudflare D1 |
| Daily digests (summary, themes, counts) | Workers AI output | D1 daily_digests | Internal analysis and weekly summaries | Not specified | Public in current deployment because PUBLIC_DAILY_DIGESTS is enabled | Cloudflare D1 |
| Weekly summaries (public) | Workers AI output | D1 weekly_summaries_public | Public analytics | Indefinite (docs) | Public endpoints | Cloudflare D1 |
| Concept graphs and health snapshots | Derived analytics | D1 weekly_concept_graph_public, weekly_health_snapshots_public | Public visualization | Not specified | Public endpoints | Cloudflare D1 |
| Community and source metadata | Admin endpoints | D1 communities, chat_sources | Admin organization | Not specified | Admin-managed, with public list endpoints for communities and sources | Cloudflare D1 |
| AI usage logs and quota | AI pipeline | D1 ai_quota_usage, ai_usage_log | Cost and quota controls | Not specified | Ingest worker | Cloudflare D1 |
| Logs (user agent, origin, inferred signals) | Requests and parsing | Cloudflare observability + app logs | Debugging | Cloudflare defaults (unknown) | Cloudflare Access controls | Cloudflare |
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/database-structure.md
Key flows
Admin upload and ingest
- Admin UI calls /upload on ingest.
- Ingest stores file in R2 and creates export record in D1.
- Queue event triggers parsing, hashing, and AI summarization.
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/architecture.md
Public read paths
- /pulse.json, /pulse/history.json, /pulse/trends.json serve weekly_summaries_public.
- /pulse/daily.json serves daily_digests.
Sources: beacon-platform/AUDIT_REPORT.md
Deletion paths
- /clear deletes daily digests, weekly summaries, and optional hashes by date range.
- /communities/{id} deletes all D1 records and R2 files for a community.
Sources: beacon-platform/AUDIT_REPORT.md
GDPR requirements gap analysis
Transparency
Finding: Privacy doc lacks controller contact, lawful basis, DSAR process, subprocessors list.
Risk: Medium.
Recommendation: Add a public privacy notice with controller contact, lawful basis, rights, retention, subprocessors, and transfer info.
Consent
Finding: No consent capture or storage in schema; no enforcement in code.
Risk: Medium.
Recommendation: Decide lawful basis and implement consent tracking if required.
Data minimization
Finding: Logs avoid raw content but inferred-signal logs can still echo content; UA/origin is logged.
Risk: Medium.
Recommendation: Remove inferred-signal content from logs; consider UA/IP minimization.
Purpose limitation
Finding: /pulse/daily.json is public when PUBLIC_DAILY_DIGESTS is enabled, despite being described as internal.
Risk: Medium.
Recommendation: Set PUBLIC_DAILY_DIGESTS false in production or return counts-only.
Retention and deletion
Finding: Raw-export cleanup now exists in code, but retention wording still conflicts across docs and UI copy, and derived-data retention is still mostly manual.
Risk: Medium.
Recommendation: Document current defaults clearly, align public/UI copy with code, and decide whether additional automatic expiry is required for derived data.
DSAR (access/export/delete)
Finding: No user-level export/delete endpoints; only date-range cleanup and community deletion exist.
Risk: High.
Recommendation: Add admin-only DSAR endpoints to export/delete by sender hash or identifier with audit logging.
Security (auth, rate limits)
Finding: Admin endpoints now enforce auth via token or Access JWT.
Risk: Medium.
Recommendation: Ensure token is set and rotated; add rate limiting/WAF rules.
Children/special categories
Finding: No explicit handling in code/docs.
Risk: Medium.
Recommendation: Add policy statements and input constraints if needed.
International transfers
Finding: Cloudflare infrastructure and Google Fonts imply cross-border transfers.
Risk: Medium.
Recommendation: Document transfers and SCC/DPA status; consider self-hosting fonts.
Processor contracts
Finding: No subprocessors list or DPA references in repo.
Risk: Medium.
Recommendation: Maintain subprocessors list and DPA references (propose in IHNYC-Remote).
Breach readiness
Finding: No incident response checklist in repo.
Risk: Medium.
Recommendation: Add incident response checklist (propose in IHNYC-Remote).
Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/REMEDIATION_PLAN.md
Open questions
- Is ADMIN_TOKEN/ADMIN_SECRET configured in production and are admin clients sending x-admin-token, or is Access JWT enforced end-to-end?
- Is R2 lifecycle retention configured for the exports bucket?
- Should /pulse/daily.json be public in production?
- Who is the controller and DSAR contact?
- Are minors or special-category data expected in exports?
- What are Cloudflare log retention and access policies?
Sources: beacon-platform/AUDIT_REPORT.md