Scope

This page is derived from the GDPR and privacy audit in beacon-platform/AUDIT_REPORT.md.


Executive summary

  • The system processes raw WhatsApp exports in R2 and derived summaries in D1; core services are Cloudflare Workers, D1, R2, Queues, and Workers AI.
  • Admin endpoints are guarded in code via ADMIN_TOKEN/ADMIN_SECRET or Access JWT; production must confirm a valid token or Access policy.
  • /pulse/daily.json is gated by PUBLIC_DAILY_DIGESTS; current config keeps it public unless disabled.
  • PII minimization improved (aliases + log length only), but inferred-signal logs can still echo content.
  • Retention is inconsistent in docs and not enforced in code.

Sources: beacon-platform/AUDIT_REPORT.md


System map (privacy-relevant)

Services and entrypoints

  • Ingest worker routes: /presign, /upload, /files, /clear, /exports/, /regenerate/, /replay/, /quota/, /pipeline/daily-config
  • Public worker routes: /pulse.json, /pulse/history.json, /pulse/daily.json, /pulse/trends.json, /pulse, /pulse/embed, /docs/, and /admin/ helpers

Sources: beacon-platform/AUDIT_REPORT.md

Data stores and bindings

  • R2 bucket beacon-pulse-exports for raw exports
  • D1 database beacon-pulse-db for exports, hashes, digests, summaries, sources, communities, quota logs
  • Queue beacon-pulse-uploads for ingest processing
  • Workers AI binding AI for daily/weekly summaries

Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/operations.md

External processors

  • Cloudflare (Workers, D1, R2, Queues, Workers AI, Access, Observability)
  • Google Fonts (UI assets)

Sources: beacon-platform/AUDIT_REPORT.md


Data inventory (record of processing)

Data categorySourceWhere storedPurposeRetentionAccess controlsTransfers
Raw chat exports (message text, names, phone numbers, timestamps)Admin uploadR2 beacon-pulse-exportsGenerate daily/weekly summariesDocs conflict (30 days vs manual cleanup)Access at edge; admin auth in codeCloudflare R2
Export metadata (object key, status, counts)Upload and queue processingD1 exportsProcessing and monitoringNot specifiedIngest workerCloudflare D1
Message hashes (content and sender)Parsed messagesD1 message_hashesDeduplication and unique sender countsIndefinite (docs)Ingest workerCloudflare D1
Daily digests (summary, themes, counts)Workers AI outputD1 daily_digestsInternal analysis and weekly summariesNot specifiedAdmin-only unless PUBLIC_DAILY_DIGESTSCloudflare D1
Weekly summaries (public)Workers AI outputD1 weekly_summaries_publicPublic analyticsIndefinite (docs)Public endpointsCloudflare D1
Concept graphs and health snapshotsDerived analyticsD1 weekly_concept_graph_public, weekly_health_snapshots_publicPublic visualizationNot specifiedPublic endpointsCloudflare D1
Community and source metadataAdmin endpointsD1 communities, chat_sourcesAdmin organizationNot specifiedAdmin-only routesCloudflare D1
AI usage logs and quotaAI pipelineD1 ai_quota_usage, ai_usage_logCost and quota controlsNot specifiedIngest workerCloudflare D1
Logs (user agent, origin, inferred signals)Requests and parsingCloudflare observability + app logsDebuggingCloudflare defaults (unknown)Cloudflare Access controlsCloudflare

Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/database-structure.md


Key flows

Admin upload and ingest

  • Admin UI calls /upload on ingest.
  • Ingest stores file in R2 and creates export record in D1.
  • Queue event triggers parsing, hashing, and AI summarization.

Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/docs/architecture.md

Public read paths

  • /pulse.json, /pulse/history.json, /pulse/trends.json serve weekly_summaries_public.
  • /pulse/daily.json serves daily_digests.

Sources: beacon-platform/AUDIT_REPORT.md

Deletion paths

  • /clear deletes daily digests, weekly summaries, and optional hashes by date range.
  • /communities/{id} deletes all D1 records and R2 files for a community.

Sources: beacon-platform/AUDIT_REPORT.md


GDPR requirements gap analysis

Transparency

Finding: Privacy doc lacks controller contact, lawful basis, DSAR process, subprocessors list.
Risk: Medium.
Recommendation: Add a public privacy notice with controller contact, lawful basis, rights, retention, subprocessors, and transfer info.

Finding: No consent capture or storage in schema; no enforcement in code.
Risk: Medium.
Recommendation: Decide lawful basis and implement consent tracking if required.

Data minimization

Finding: Logs avoid raw content but inferred-signal logs can still echo content; UA/origin is logged.
Risk: Medium.
Recommendation: Remove inferred-signal content from logs; consider UA/IP minimization.

Purpose limitation

Finding: /pulse/daily.json is public when PUBLIC_DAILY_DIGESTS is enabled, despite being described as internal.
Risk: Medium.
Recommendation: Set PUBLIC_DAILY_DIGESTS false in production or return counts-only.

Retention and deletion

Finding: Retention is not enforced in code; docs conflict on R2 retention.
Risk: Medium.
Recommendation: Enforce retention via R2 lifecycle + D1 cleanup jobs and document defaults.

DSAR (access/export/delete)

Finding: No user-level export/delete endpoints; only date-range cleanup and community deletion exist.
Risk: High.
Recommendation: Add admin-only DSAR endpoints to export/delete by sender hash or identifier with audit logging.

Security (auth, rate limits)

Finding: Admin endpoints now enforce auth via token or Access JWT.
Risk: Medium.
Recommendation: Ensure token is set and rotated; add rate limiting/WAF rules.

Children/special categories

Finding: No explicit handling in code/docs.
Risk: Medium.
Recommendation: Add policy statements and input constraints if needed.

International transfers

Finding: Cloudflare infrastructure and Google Fonts imply cross-border transfers.
Risk: Medium.
Recommendation: Document transfers and SCC/DPA status; consider self-hosting fonts.

Processor contracts

Finding: No subprocessors list or DPA references in repo.
Risk: Medium.
Recommendation: Maintain subprocessors list and DPA references (propose in IHNYC-Remote).

Breach readiness

Finding: No incident response checklist in repo.
Risk: Medium.
Recommendation: Add incident response checklist (propose in IHNYC-Remote).

Sources: beacon-platform/AUDIT_REPORT.md, beacon-platform/REMEDIATION_PLAN.md


Open questions

  • Is ADMIN_TOKEN/ADMIN_SECRET configured in production and are admin clients sending x-admin-token, or is Access JWT enforced end-to-end?
  • Is R2 lifecycle retention configured for the exports bucket?
  • Should /pulse/daily.json be public in production?
  • Who is the controller and DSAR contact?
  • Are minors or special-category data expected in exports?
  • What are Cloudflare log retention and access policies?

Sources: beacon-platform/AUDIT_REPORT.md