SaaSSaaS PlatformAI · RAG copilotB2B SaaS · AI CopilotEurope · 7 months to first paid cohort· 3 min read

Lumen Copilot.

We took Lumen from a Notion-doc idea to a paid B2B AI copilot — multi-tenant SaaS, evaluation harness, agentic workflow engine and a brand that gets the sales meeting.

Services delivered

AI Product

0→1 SaaSAgentic AIMCPEvaluation HarnessMulti-Tenant

Months from idea to paid cohort

Golden-eval pass rate

Median agent response (seconds)

Per-tenant isolation tests passing

01 / Background

Lumen was a thesis: mid-market revenue teams spend more time updating tools than working in them, and a well-grounded copilot could absorb that work. The founders had the domain and the network — they needed a senior team to build the product, the AI engineering behind it, and the brand that would get them onto enterprise shortlists.

02 / Challenges

From thesis to product

No engineering team yet — every architecture decision was still open, and the runway demanded a paid cohort within two quarters.

Agent reliability

Early prototypes were demo-shiny and production-fragile — long agent runs drifted off-task and hallucinated record updates.

Evaluation discipline

No way to know if a prompt or model change improved or regressed real workflows — releases were a vibe-check.

Multi-tenant data isolation

Enterprise buyers needed hard guarantees about prompt isolation, log retention and per-tenant policy controls.

Brand & narrative

A crowded copilot market — the website had to make the differentiated thesis instantly legible.

03 / Solutions

Product engineering from scratch

Stood up a multi-tenant TypeScript/React/Postgres stack with strict row-level security, per-tenant secrets and a clean audit trail.

Agent runtime

Built a deterministic agent loop on top of MCP-style tool use, with explicit plan/critique/act steps, tool-call timeouts and structured fallbacks.

Evaluation harness

Golden-dataset eval suite, regression gates on every PR, and shadow-traffic comparison between candidate prompts/models — releases are scored, not vibes.

Guardrails & policy

Per-tenant policy layer for tool exposure, PII redaction in prompts and logs, structured-output validation before any write-back into CRM.

Observability

Per-conversation traces, token-cost attribution per workflow and per customer, latency SLOs surfaced inside the product.

Brand & site

Identity, narrative website, demo experience and pricing page — built to convert technical buyers in the second meeting.

04 / Stack & apps

TypeScript / React / NodePostgreSQL + Row-Level SecurityModel Context Protocol (MCP)LLM Eval HarnessPer-Tenant Policy LayerObservability (OpenTelemetry)Brand & Marketing Site

05 / Outcomes

→

Paid cohort live

Lumen went from idea to its first paying enterprise cohort inside two quarters.

→

Reliable agents

Evaluation discipline pushed pass rate above 90% on the golden workflow set — and kept it there release over release.

→

Trust-grade isolation

Multi-tenant isolation passed enterprise security reviews on the first pass.

→

Sales-ready brand

The narrative site has become the sales team's most-used demo asset.

Related work

A system like this,
for your business.

Begin the conversation ↗