WorkAboutLifeContact
Fidelity Investments · Case Study 01
Designing Vault at the scale of 80,000.
A Vault interface that turns a security backbone into a workflow ~80,000 associates can use safely — built around the failure mode of routing-around when the system is too hard to use.
Client
Fidelity Investments
Role
Senior Product Designer
Surface
HashiCorp Vault UI
Scale
~80,000 associates
Domain
Identity & Secrets
Year
2023-2024
The deployment, in one picture
Identity flows from Fidelity's identity provider into Vault via LDAP-synced groups. Vault holds the policies, leases, and secrets — both static (DB credentials, cloud IAM, third-party APIs) and dynamic.
The CLI and API serve the platform team. Roughly 79,500 associates downstream of every policy use the UI — that's the surface I designed.
// vault topology @ fidelity
Identity Provider
LDAP
Group sync → Vault policies
Source of truth
Vault
Policies · Leases
Static + dynamic secrets
OIDC SSO for Vault-backed apps
Consumers
Apps
DB credentials, cloud IAM, 3rd-party APIs
~80,000 associates downstream
01
80,000
associates downstream of every policy
scope · firmwide
02
LDAP
synced groups feed Vault policies
identity → vault
03
static + dynamic
DB creds, cloud IAM, 3rd-party APIs
secrets · mixed
04
79,500
use the UI — the surface I designed
surfaces · ui layer
The brief I argued for
The brief I was given, and the brief I argued for.
As briefed
"Build a UI for Vault. Self-service. Reduce platform team load."
A productivity ask. Framed as ticket-deflection, the work competes with every other UX investment for budget, and is measured in clicks rather than risk.
Funded as: platform productivity
Measured as: ticket volume, click-throughs
Measured as: ticket volume, click-throughs
Reframed
When associates can't safely use Vault, they route around it.
Pasted secrets. Shadow credential stores. Audit gaps. The failure mode is a security system that becomes functionally insecure at scale because nobody can use it as intended.
Funded as: security risk control
Measured as: credential-handling hygiene
Measured as: credential-handling hygiene
The reframe changed three things: who funded the work, what we measured success against, and which features got prioritized.
Three users, one Vault
Three different mental models of what Vault is. Designing for any one of them is straightforward; designing for all three on a shared surface is the actual problem.
Persona 01
SecOps Admin
Authors policy in HCL. Owns the rules.
What they do
- Write and review path-and-capability policies
- Approve high-blast-radius access grants
- Audit who has what, and for how long
What scares them
Drift between authored and effective access — over-broad policies whose blast radius I can't see.
Mental model
Vault is a policy-authoring tool
Persona 02
Platform Engineer
Consumes policy. Wires apps to Vault.
What they do
- Integrate services with the secrets engine
- Manage lease lifecycle in production
- Debug auth failures in deploy pipelines
What scares them
Lease expiry mid-deploy. Brittle auth flows that fail at 2 a.m. when nobody's watching.
Mental model
Vault is an API
Persona 03
Associate
e.g. wealth-management dev. Just needs access.
What they do
- Request access to ship a feature
- Use credentials inside their working session
- Renew or release access when finished
What scares them
Not knowing why a request was denied — and not knowing how long my access actually lasts.
Mental model
Vault is the reason my session timed out
The shared-surface problem: the same policy object SecOps authors is what the platform engineer consumes — and what the associate is implicitly governed by. The work was making one primitive legible to three audiences at three different altitudes.
Research as a decision engine
Three decisions the research actually drove — what came in, what we chose, what changed downstream.
// methods
~30
contextual interviews
6
shadow sessions
4
service-blueprint workshops
9 mo
post-launch telemetry
Design principle
Make risk visible.
"
The system holds the signal. The user often doesn't see it.
→ The corollary
Verification should be low-cost.
The cost to confirm an action before taking it should be near zero. In security workflows, skipping verification is itself a failure mode worth designing against.
The tension we held
Every signal surfaced to a user is also a signal SecOps could act on. Compliance asked for an individualized access-activity dashboard. We declined.
Where we landed: visibility to the user about their own state. Aggregated and anonymized for org-level oversight. No individualized monitoring.
Two design moves
The RBAC-aware permission editor (for SecOps) and the lease as a living thing (for the associate). Both built on the same principle: keep the policy primitive legible, surface what the system already knows.
Engineering collaboration
Where UX changed the API.
response.json · before
{
"id": "db/creds/wealth/a3f1...",
"expire_time": "2024-03-14T17:18:22Z",
"renewable": true,
"ttl": 28800
}
// renewable: yes/no - until when?
// computable client-side, but logic
// drifted across UI / CLI / alerting.
// computable client-side, but logic
// drifted across UI / CLI / alerting.
response.json · after
{
"id": "db/creds/wealth/a3f1...",
"expire_time": "2024-03-14T17:18:22Z",
"renewable": true,
+ "renewable_until": "2024-03-14T21:18:22Z",
+ "max_ttl": 43200,
"ttl": 28800
}
// renewable_until is now first-class.
// One source of truth, three consumers.
// One source of truth, three consumers.
PR #2841 · mergedleases: expose renewable_until in lookup response+47 / -2 · 3 commits
Vault UIUI
access portal
Renewal countdown component
credstoolCLI
ops CLI
Internal credential admin CLI
leasewatchOps
alerting
Expiry & renewal pipeline
"
When a design constraint is really a data-model gap, the right fix is often to change the data rather than work around it in the UI.
What the work actually moved
Handoff time
30%↓
Reduction in design-to-dev handoff. Component library shipped with shared primitives.
Request time
3 days→4h
Median access request time. From Jira ticket to live lease, across all five business domains.
Self-service rate
0%→71%
Associate self-service rate. Requests resolved without platform team intervention.
// the lesson I'm taking with me
Security UX is fundamentally an information-design problem — making state, lifecycle, and risk visible at the right time, in the right amount, to the right person.
// platform thesis
The patterns generalize.
Blast radius preview, lease as countdown, legibility-over-abstraction — none of it is Vault-specific. It's the approach for any product where users take actions whose consequences they can't fully see.
Terraform planBoundary sessionConsul service intent
Designed by Rebecka Raj · Fidelity Investments · 2023-2024
I've been doing this work on a surface that sits on top of Vault. The next step is to do it inside Vault, and across the products that sit alongside it.
Back to Projects