Streaming Claude in the console: the AI assistant, grounded in your workspace
The AI assistant shipped with three design constraints we weren't willing to negotiate: streaming (responses appear token-by-token, not in a 15-second block), grounded (it can only see your org's data, enforced by RLS), and auditable (every conversation persists to Postgres with model, token count, and cost).
We picked Claude because the Sonnet 4.6 → Opus 4.7 range maps cleanly to 'fast and cheap' vs. 'deep reasoning' — and the Anthropic SDK streaming transport is the cleanest we've used.
Streaming via SSE
The route at /api/v1/ai/chat holds open an SSE connection and forwards Anthropic's streaming deltas straight to the client. Perceived latency on a short prompt is under 500ms. On a multi-thousand-token response, the first token still appears in that window — the remaining tokens arrive as they're generated.
Persistence: ai_conversations and ai_messages
Every conversation lives in two RLS-scoped tables. Switching between conversations is instant. Conversation forking is on the roadmap.
create table ai_conversations (
id uuid primary key default gen_random_uuid(),
org_id uuid not null references orgs(id),
user_id uuid not null references auth.users(id),
title text,
model text not null default 'claude-sonnet-4-6',
created_at timestamptz not null default now()
);
-- RLS: select/insert/update/delete where is_org_member(org_id)Grounded in your org, not leaked across tenants
The assistant can call tools that read production data. Every one of those tools runs under the requesting user's session. RLS applies. Even if the model tried to return another org's data, Postgres would return zero rows. This is not 'fine-tuning our prompt to be careful.' It's the database saying no.
Drafting templates
The common workflows don't need a chat — they need a form. We shipped drafting templates for advancing emails, vendor RFPs, incident summary reports, and production schedules. Pick a project, click the template, and Claude drafts from the actual data: show dates, crew, vendors, schedule.
Rate limits + costs
/api/v1/ai/* is behind the ai rate bucket in middleware — no runaway costs, no abuse. Every message writes to audit_log with the model, token count, and estimated cost. Professional includes 200K tokens/month; Enterprise is custom.
What it's not
It's not a replacement for your producer. It's a leverage tool for the producer you already have. Use it to draft, summarize, surface, reconcile — not to decide. The AI will cheerfully hallucinate a vendor contact if you let it. Always check.
Model switching
Toggle Sonnet 4.6 (fast, cheap, great at draft + classify) vs. Opus 4.7 (deep reasoning, proposal drafting, contract review) per conversation. Opus is Enterprise-only today; it'll open up to Professional in a future release as price comes down.