Product Garaj

WorkAboutLifeContact

Modguard.ai · Case Study 02 · LLM Trust & Safety

Designing trust into LLMs for healthcare and defense.

Around 100 user interviews, three core patterns shipped, and a product acquired with the patterns embedded — across two regulated verticals where verifying LLM output had to fit inside an expert workflow.

Client

Modguard.ai

Role

Design Advisor

Timeline

9 months

Verticals

Healthcare + Defense

Stage

Early → Acquired

Year

2023-2024

Where Modguard played

Two verticals, one product, a shared trust problem.

Vertical 01

Healthcare

Primary user: clinicians reviewing AI-generated patient summaries

High cost of error, time-bound between visits

Compliance: HIPAA, medical liability

The product

Modguard - LLM trust layer

StageEarly-stage AI/ML startup

MandateLLM solutions for regulated industries

My roleDesign Advisor, ~9 months

OutcomeAcquired with patterns intact

Vertical 02

Defense

Primary user: intel analysts validating LLM-synthesized briefs

Deadline-bound, fabricated-source risk

Compliance: classification, source provenance

Two domains, a shared trust problem: helping experts act on imperfect output without slowing them down.

The brief I reformulated

The brief I was given, and the brief I reformulated.

· The original brief

Make the LLM feel trustworthy.

"Reduce hallucinations. Add citations. Make the model sound more confident."

—Treats trust as an output-quality property

—Aims to make the model the source of confidence

—Underspecified: 2023 LLMs will hallucinate

Reframed

· The reframed thesis

Make verification cheap enough to absorb imperfect output.

Trust is a workflow property, not an output property.

1

Avoid making the model sound confident — that combines opacity with risk.

2

Make the evidence easier to inspect, not the prose easier to accept.

3

Treat the user as an expert who needs speed, not a novice who needs reassurance.

The reformulation opened the design space. With trust framed as a workflow problem, the question shifted from "how do we improve the model?" to "how do we make verification fast enough that imperfect output becomes manageable?"

Two users

Two users. Both experts. Both time-bound.

User profile 01

The clinician

Reviews LLM-generated patient summaries between visits

What they do with the LLM

Reads the model's summary of a patient chart in the 90 seconds between back-to-back appointments. Decides whether anything in the original record was glossed over.

What scares them

Missing a flag in the chart that the model smoothed over — or spending so long verifying that she falls behind schedule.

User profile 02

The intel analyst

Validates LLM-synthesized briefs under deadline

What they do with the LLM

Receives a synthesized intel brief and has to confirm every claim is sourced before forwarding it. Often facing a hard deadline and dozens of cited passages.

What scares them

Forwarding a well-written paragraph that turns out to cite a source that doesn't exist.

The shared structure: domain experts under time pressure who pay a high cost when wrong. Different chrome, same workflow shape. The patterns we shipped had to work for both.

Three insights from research

~40 clinicians and ~60 analysts later, three findings shaped every pattern we shipped.

From clinician interviews

Confidence scores were rejected immediately. "60% confident" is meaningless when the question is whether to act.

What we built

Show evidence, not confidence

Replace probability badges with inline source markers. Strong source = clean text. Weak source = subtle inline glyph. No retrieval = explicit gap.

From analyst shadow sessions

Analysts already mentally tag each claim to a source. We weren't introducing a new behavior — we were giving it a UI.

What we built

Inline citations over reference list

Per-sentence provenance. Every claim links back to its source passage, not a footnote at the bottom of the brief.

From early prototype testing

Aggressive flagging caused alarm fatigue within minutes. Three banners on a page and clinicians stopped reading the warnings.

What we built

Graded, silent-until-needed

Flag only the most consequential claims. Default state is invisible. Hover or click to inspect. The rest reads as normal text.

Design principle

Make uncertainty legible.

The model knows where its evidence is thin. The user usually doesn't.

→ The corollary

Verification must be cheap.

The cost to check has to approach zero, because experts will not trade speed for ceremony. If verification takes longer than skipping it, it won't happen.

The contested tradeoff

Speed vs. warning density: flag every uncertain claim, or only the most consequential? Flag too few → over-trust. Flag too many → alarm fatigue.

Where we landed: flag only the most consequential, with the recognition that this call would need to be revisited if a real adverse event surfaced. A decision made deliberately, with the tradeoffs documented.

Three patterns shipped

Hallucination flagging, citation visibility, guardrail indicators. Each addresses a different failure mode of LLM output, designed to fit inside an expert workflow.

Pattern 01 · Hallucination flagging

Modguard verified · 3 of 5 grounded

Chart summary · M. Chen · F · 58 · MRN 4218

The patient was admitted on March 14 with elevated troponin and chest pain consistent with prior records. 1 A previous note suggests reduced ejection fraction in the 35\u201340% range, though the source is from an outside facility and was not re-confirmed. ! weak The patient's response to beta-blocker therapy has been favorable across recent visits. 2 Current medications include lisinopril 10mg, atorvastatin 40mg, and metoprolol 50mg, per the active medication list.

Hover the dashed claim above. The marker is silent until inspected. Three states: clean text (strong source), inline marker (weak), explicit gap (no retrieval). No confidence badges.

Pattern 02 · Citation visibility

4 of 4 grounded

Synthesized brief · case 2241 · Pattern of life

The subject was last documented at the southern checkpoint at 14:32 local time[1] Border crossings in the preceding 90 days indicate a pattern of weekday transits, primarily Tuesday and Thursday.[2] Vehicle registration matches a 2019 sedan registered to a regional logistics firm[3] Communications metadata for the same window suggests sustained contact with two known associates[4]

click any [n] to verify source · hover-rate: ~3s per citation

Click any citation [n] in the brief to verify the source passage.

Before

Open new tab, find source, scroll, return

~60s

After

Click sentence. Source rises beside it.

~3s

Pattern 03 · Guardrail indicators

~6% deferral false-positive

In competence · chart review · persistent

Summarize this patient's last three visits.

Three visits over the past 14 months. Recurring presentation of intermittent tachycardia, evaluated by cardiology in May. Lab trends within range. Current medication list reconciled at last visit.

Near hard zone · dosage · deferral

What dose of metoprolol should I start her on?

Not in my competence

Dosage isn't something I should be the source of truth on. Here's where it lives:

Hospital formulary · metoprolol Pharmacy on call · ext. 4127

Handoff, not refusal. The status strip stays visible across the whole session, not just at deferral moments. Three rule-based layers detect the boundary: input classifier, retrieval-source filter, prompt-level constraints. Auditable beats accurate for this function.

What the work actually moved

Acquisition

Acquired

Modguard was acquired with the trust patterns intact in the product surface.

Verification time

~10xfaster

Per-citation verification time, from minutes-per-claim to seconds-per-claim in the analyst workflow.

Trust metric

CTR

Citation click-through rate adopted as a leading trust metric, replacing post-hoc trust surveys.

// the lesson I'm taking with me

In regulated industries, deferring well is more valuable than answering well. Asymmetric costs require asymmetric design.

// what generalizes

Trust is a workflow property.

Graded flagging, hover-to-verify, persistent competence boundaries — the patterns travel from clinical chart review to intel briefing to any expert-facing LLM surface.

Legal reviewFinancial reportingCompliance audit

Designed by Rebecka Raj · Modguard.ai · 2023-2024

The patterns shipped before the acquisition and survived it. They remain the spine of the trust surface in the acquiring company's regulated-industry product line.

Back to Projects