Privacy protection is built into every layer: storage, processing, and output.
Overview
Privacy protection is built into storage, processing, and output. Public data is always aggregate-only.
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
What we never show publicly
- Names or phone numbers.
- Direct message quotes.
- Specific timestamps (e.g. “3:45 PM”).
- Group names or identifiers.
- Any information that could identify individuals.
Sources: beacon-platform/apps/pulse-public/src/index.ts
What we do show publicly
- Weekly aggregate summaries.
- Sentiment scores and themes.
- Message and participant counts (totals only).
- Date ranges (day-level or week-level).
Sources: beacon-platform/apps/pulse-public/src/index.ts
Privacy layers
1) Data separation
Private zone (never public):
- Raw chat exports in R2.
- Message hashes for deduplication.
- Daily digests.
Public zone:
- Weekly summaries only.
- Aggregated metrics.
- Sanitized themes.
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
Public vs private data boundary
flowchart LR subgraph PrivateZone RawExports[R2 exports] MessageHashes DailyDigests end subgraph PublicZone WeeklyPublic PulseApi[/pulse.json + /pulse/history.json/] end RawExports --> MessageHashes --> DailyDigests --> WeeklyPublic --> PulseApi
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
2) Input protection
Before AI processing:
- Phone numbers → redacted.
- Email addresses → redacted.
- URLs → redacted.
- Mentions → optional filtering.
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
3) AI guardrails
Prompt instructions enforce:
- No direct quotes.
- No names or identifiers.
- No timestamps.
- Aggregate analysis only.
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
4) Output validation
Every AI output is checked for:
- Valid JSON schema.
- No PII patterns (regex).
- Score within range [-1, 1].
- Theme sanitization (max 5 themes).
If validation fails:
- Retry once with stricter instructions.
- Fall back to a neutral summary.
Sources: beacon-platform/docs/privacy.md, beacon-platform/apps/pulse-public/src/index.ts
AI guardrails flow
flowchart TD Raw[Raw messages] --> Redact[Redact phones/emails/URLs] Redact --> LLM[AI analysis + narrative] LLM --> Validate{Schema + privacy checks} Validate -- pass --> Store[Store summary] Validate -- fail --> Retry[Retry once with stricter prompt] Retry --> Validate Validate -- fail --> Fallback[Neutral fallback summary]
Sources: beacon-platform/apps/pulse-public/src/index.ts
Data lifecycle
Storage
- R2: raw exports (encrypted at rest).
- D1: hashes, digests, summaries.
- Public access: only
weekly_summaries_public.
Cleanup
/clearendpoint removes summaries, digests, and optionally hashes.- R2 deletion events trigger cleanup.
- Retention enforced via R2 lifecycle rules or manual cleanup.
Sources: beacon-platform/docs/privacy.md, beacon-platform/infra/CLEANUP_GUIDE.md, beacon-platform/apps/pulse-public/src/index.ts
Limitations and best practices
Limitations:
- Perfect anonymization is not guaranteed.
- Admin endpoints currently lack authentication.
- AI may occasionally generate unexpected output (mitigated by validation).
Best practices:
- Protect admin endpoints with Cloudflare Access or an auth gateway.
- Monitor AI outputs regularly.
- Keep
/pulse/daily.jsoninternal unless explicitly intended to be public.
Sources: beacon-platform/apps/pulse-public/src/index.ts, beacon-platform/AUDIT_REPORT.md