Database schema and table documentation for the D1 data store.
Tables overview
Core:
- communities
- chat_sources
- exports
- message_hashes
- daily_digests
- weekly_summaries_public
- weekly_health_snapshots_public
- weekly_concept_graph_public
Media analysis:
- export_media
- media_analysis
- media_ai_quota_usage
Quota and config:
- ai_quota_usage
- ai_usage_log
- quota_settings
Core tables
communities
community_id(PK),name,description,created_at,updated_at.
chat_sources
(community_id, source_id)composite PK.name,description, timestamps.
exports
- Tracks R2 objects and processing state.
- Includes
message_count,new_message_count,status, processing progress.
message_hashes
- Deduplication key:
(community_id, source_id, day_date, content_hash). sender_hashsupports unique sender counts without storing PII.
daily_digests
- Per-day AI output:
sentiment_label,sentiment_score,top_themes_json,summary,message_count,unique_sender_count. - Community Strain Index (CSI) fields:
strain_score(real),strain_level(text:low/moderate/elevated),strain_signals_json(JSON array),strain_confidence(real). - Used by weekly rollups and also exposed today through the public
/pulse/daily.jsonendpoint. - Primary key:
(community_id, source_id, day_date)— fixed via migrationfix_daily_digests_weekly_summaries_pk.sqlon 2026-04-25. Prior to that migration,source_idwas only in a secondary unique index andINSERT OR REPLACEwould silently clobber rows for other sources. All writes still use explicitON CONFLICT(community_id, source_id, day_date)upserts for clarity.
weekly_summaries_public
- Public-safe weekly aggregates:
sentiment_label,sentiment_score,confidence,top_themes_json,summary,total_messages,unique_senders,week_start,week_end. - Community Strain Index (CSI) fields:
strain_score,strain_level,strain_signals_json,strain_confidence. - Optional media aggregate fields when ZIP exports with attachments have been analyzed:
analysis_json(containsmediasub-object). - Primary key:
(community_id, source_id, week_start)— fixed by the same migration asdaily_digests.
weekly_health_snapshots_public
- Serialized health snapshot data for community health pillars (vNext).
weekly_concept_graph_public
- Concept graph output used for the
/pulse/conceptsvisualization.
Media analysis tables
export_media
One row per attachment reference found in a WhatsApp ZIP transcript.
| Field | Description |
|---|---|
| media_id | Generated ID for this attachment record |
| export_id, community_id, source_id | Export and scope linkage |
| message_timestamp | Timestamp from the transcript line (when available) |
| sender_hash | Hashed sender; raw names are not stored |
| object_key | Extracted attachment location in R2 |
| filename, mime_type, media_kind, byte_size | File metadata |
| status | pending_analysis, deferred_quota, deferred_processing, analyzed, missing_attachment, unsupported, skipped_size, skipped_quota, or failed |
| analysis_version | Model version used for analysis |
| analyzed_at | Timestamp of completed analysis |
| error_message | Error detail for failed rows; includes retry count for recoverable failures |
media_analysis
Normalized model output per media item.
| Field | Description |
|---|---|
| media_id | FK to export_media |
| classifier_tags_json | Image classifier tags or derived transcript tags |
| objects_json | Object detector output for images |
| caption | Optional short caption or video metadata note |
| transcript | Voice-note transcription |
| safety_flags_json | Safety classifier output |
| confidence | Coarse confidence score for aggregate use |
| model_summary_json | Private model provenance and raw structured output for admin/debug use; video frame results stored under .video |
media_ai_quota_usage
Per-day quota tracking for media analysis. One row per usage_date.
| Field | Description |
|---|---|
| usage_date | PK — ISO date string (YYYY-MM-DD) |
| image_stage1_calls | Stage 1 classifier/object-detection calls on images |
| image_stage2_calls | Stage 2 vision caption calls on images |
| audio_minutes | Total audio transcription minutes consumed |
| video_minutes | Video metadata processing minutes |
| video_frame_extractions | Number of video frames extracted for analysis |
| video_frame_stage1_calls | Stage 1 classifier calls on extracted video frames |
| video_frame_stage2_calls | Stage 2 caption calls on video frames |
| neurons | Cloudflare Workers AI neuron units consumed |
| last_updated | Datetime of last quota increment for this date |
Quota and config tables
ai_quota_usage and ai_usage_log
- Daily counters and per-call usage logs for text AI costs and limits.
quota_settings
- Stores daily digest processing mode, AI quota settings, and media analysis config.
Indexes
All performance-critical lookup paths are covered by explicit indexes:
| Index | Table | Columns | Purpose |
|---|---|---|---|
idx_daily_by_community_source_date | daily_digests | (community_id, source_id, day_date) | Dedup checks, digest lookups |
idx_weekly_by_community_source_week | weekly_summaries_public | (community_id, source_id, week_start) | Weekly summary queries |
idx_hashes_by_community_source_date | message_hashes | (community_id, source_id, day_date) | Hash insert/update batches |
idx_exports_by_community_source | exports | (community_id, source_id) | Regeneration lookups |
idx_exports_status_community | exports | (status, community_id) | Cron stuck-export detection (added 2026-04-25) |
idx_export_media_export | export_media | (export_id) | Media row lookups by export |
idx_export_media_export_community | export_media | (export_id, community_id) | Public API media-count aggregation (added 2026-04-25) |
The two indexes marked 2026-04-25 were added by migration add_performance_indexes.sql. See migrations.
Notes and constraints
- Weeks are Sunday–Saturday (UTC), aligned to SQLite weekday 0.
- Public endpoints read from
weekly_summaries_publicand related public tables only. - Raw-export cleanup is now implemented in code. Current defaults target 72 hours after successful processing and 168 hours after failure when delete-after-processing is enabled.
daily_digestsandweekly_summaries_publichad a PK/index mismatch wheresource_idwas not in the primary key; this was fixed by migrationfix_daily_digests_weekly_summaries_pk.sqlon 2026-04-25.weekly_health_snapshots_publicandweekly_concept_graph_publicwere created with the correct composite PK includingsource_idand were never affected.