Database schema and table documentation for the D1 data store.


Tables overview

Core:

  1. communities
  2. chat_sources
  3. exports
  4. message_hashes
  5. daily_digests
  6. weekly_summaries_public
  7. weekly_health_snapshots_public
  8. weekly_concept_graph_public

Media analysis:

  1. export_media
  2. media_analysis
  3. media_ai_quota_usage

Quota and config:

  1. ai_quota_usage
  2. ai_usage_log
  3. quota_settings

Core tables

communities

  • community_id (PK), name, description, created_at, updated_at.

chat_sources

  • (community_id, source_id) composite PK.
  • name, description, timestamps.

exports

  • Tracks R2 objects and processing state.
  • Includes message_count, new_message_count, status, processing progress.

message_hashes

  • Deduplication key: (community_id, source_id, day_date, content_hash).
  • sender_hash supports unique sender counts without storing PII.

daily_digests

  • Per-day AI output: sentiment_label, sentiment_score, top_themes_json, summary, message_count, unique_sender_count.
  • Community Strain Index (CSI) fields: strain_score (real), strain_level (text: low / moderate / elevated), strain_signals_json (JSON array), strain_confidence (real).
  • Used by weekly rollups and also exposed today through the public /pulse/daily.json endpoint.
  • Primary key: (community_id, source_id, day_date) — fixed via migration fix_daily_digests_weekly_summaries_pk.sql on 2026-04-25. Prior to that migration, source_id was only in a secondary unique index and INSERT OR REPLACE would silently clobber rows for other sources. All writes still use explicit ON CONFLICT(community_id, source_id, day_date) upserts for clarity.

weekly_summaries_public

  • Public-safe weekly aggregates: sentiment_label, sentiment_score, confidence, top_themes_json, summary, total_messages, unique_senders, week_start, week_end.
  • Community Strain Index (CSI) fields: strain_score, strain_level, strain_signals_json, strain_confidence.
  • Optional media aggregate fields when ZIP exports with attachments have been analyzed: analysis_json (contains media sub-object).
  • Primary key: (community_id, source_id, week_start) — fixed by the same migration as daily_digests.

weekly_health_snapshots_public

  • Serialized health snapshot data for community health pillars (vNext).

weekly_concept_graph_public

  • Concept graph output used for the /pulse/concepts visualization.

Media analysis tables

export_media

One row per attachment reference found in a WhatsApp ZIP transcript.

FieldDescription
media_idGenerated ID for this attachment record
export_id, community_id, source_idExport and scope linkage
message_timestampTimestamp from the transcript line (when available)
sender_hashHashed sender; raw names are not stored
object_keyExtracted attachment location in R2
filename, mime_type, media_kind, byte_sizeFile metadata
statuspending_analysis, deferred_quota, deferred_processing, analyzed, missing_attachment, unsupported, skipped_size, skipped_quota, or failed
analysis_versionModel version used for analysis
analyzed_atTimestamp of completed analysis
error_messageError detail for failed rows; includes retry count for recoverable failures

media_analysis

Normalized model output per media item.

FieldDescription
media_idFK to export_media
classifier_tags_jsonImage classifier tags or derived transcript tags
objects_jsonObject detector output for images
captionOptional short caption or video metadata note
transcriptVoice-note transcription
safety_flags_jsonSafety classifier output
confidenceCoarse confidence score for aggregate use
model_summary_jsonPrivate model provenance and raw structured output for admin/debug use; video frame results stored under .video

media_ai_quota_usage

Per-day quota tracking for media analysis. One row per usage_date.

FieldDescription
usage_datePK — ISO date string (YYYY-MM-DD)
image_stage1_callsStage 1 classifier/object-detection calls on images
image_stage2_callsStage 2 vision caption calls on images
audio_minutesTotal audio transcription minutes consumed
video_minutesVideo metadata processing minutes
video_frame_extractionsNumber of video frames extracted for analysis
video_frame_stage1_callsStage 1 classifier calls on extracted video frames
video_frame_stage2_callsStage 2 caption calls on video frames
neuronsCloudflare Workers AI neuron units consumed
last_updatedDatetime of last quota increment for this date

Quota and config tables

ai_quota_usage and ai_usage_log

  • Daily counters and per-call usage logs for text AI costs and limits.

quota_settings

  • Stores daily digest processing mode, AI quota settings, and media analysis config.

Indexes

All performance-critical lookup paths are covered by explicit indexes:

IndexTableColumnsPurpose
idx_daily_by_community_source_datedaily_digests(community_id, source_id, day_date)Dedup checks, digest lookups
idx_weekly_by_community_source_weekweekly_summaries_public(community_id, source_id, week_start)Weekly summary queries
idx_hashes_by_community_source_datemessage_hashes(community_id, source_id, day_date)Hash insert/update batches
idx_exports_by_community_sourceexports(community_id, source_id)Regeneration lookups
idx_exports_status_communityexports(status, community_id)Cron stuck-export detection (added 2026-04-25)
idx_export_media_exportexport_media(export_id)Media row lookups by export
idx_export_media_export_communityexport_media(export_id, community_id)Public API media-count aggregation (added 2026-04-25)

The two indexes marked 2026-04-25 were added by migration add_performance_indexes.sql. See migrations.


Notes and constraints

  • Weeks are Sunday–Saturday (UTC), aligned to SQLite weekday 0.
  • Public endpoints read from weekly_summaries_public and related public tables only.
  • Raw-export cleanup is now implemented in code. Current defaults target 72 hours after successful processing and 168 hours after failure when delete-after-processing is enabled.
  • daily_digests and weekly_summaries_public had a PK/index mismatch where source_id was not in the primary key; this was fixed by migration fix_daily_digests_weekly_summaries_pk.sql on 2026-04-25. weekly_health_snapshots_public and weekly_concept_graph_public were created with the correct composite PK including source_id and were never affected.