RRRebecka Raj
WorkAboutLifeContact
Fidelity Investments · Case Study 01
Designing Vault at the scale of 80,000.
A Vault interface that turns a security backbone into a workflow ~80,000 associates can use safely — built around the failure mode of routing-around when the system is too hard to use.
Client
Fidelity Investments
Role
Senior Product Designer
Surface
HashiCorp Vault UI
Scale
~80,000 associates
Domain
Identity & Secrets
Year
2023-2024
The deployment, in one picture
Identity flows from Fidelity's identity provider into Vault via LDAP-synced groups. Vault holds the policies, leases, and secrets — both static (DB credentials, cloud IAM, third-party APIs) and dynamic.
The CLI and API serve the platform team. Roughly 79,500 associates downstream of every policy use the UI — that's the surface I designed.
// vault topology @ fidelity
Identity Provider
LDAP
Group sync → Vault policies
Source of truth
Vault
Policies · Leases
Static + dynamic secrets
OIDC SSO for Vault-backed apps
Consumers
Apps
DB credentials, cloud IAM, 3rd-party APIs
~80,000 associates downstream
01
80,000
associates downstream of every policy
scope · firmwide
02
LDAP
synced groups feed Vault policies
identity → vault
03
static + dynamic
DB creds, cloud IAM, 3rd-party APIs
secrets · mixed
04
79,500
use the UI — the surface I designed
surfaces · ui layer
The brief I argued for
The brief I was given, and the brief I argued for.
As briefed
"Build a UI for Vault. Self-service. Reduce platform team load."
A productivity ask. Framed as ticket-deflection, the work competes with every other UX investment for budget, and is measured in clicks rather than risk.
Funded as: platform productivity
Measured as: ticket volume, click-throughs
Reframed
When associates can't safely use Vault, they route around it.
Pasted secrets. Shadow credential stores. Audit gaps. The failure mode is a security system that becomes functionally insecure at scale because nobody can use it as intended.
Funded as: security risk control
Measured as: credential-handling hygiene
The reframe changed three things: who funded the work, what we measured success against, and which features got prioritized.
Three users, one Vault
Three different mental models of what Vault is. Designing for any one of them is straightforward; designing for all three on a shared surface is the actual problem.
Persona 01
SecOps Admin
Authors policy in HCL. Owns the rules.
What they do
  • Write and review path-and-capability policies
  • Approve high-blast-radius access grants
  • Audit who has what, and for how long
What scares them
Drift between authored and effective access — over-broad policies whose blast radius I can't see.
Mental model
Vault is a policy-authoring tool
Persona 02
Platform Engineer
Consumes policy. Wires apps to Vault.
What they do
  • Integrate services with the secrets engine
  • Manage lease lifecycle in production
  • Debug auth failures in deploy pipelines
What scares them
Lease expiry mid-deploy. Brittle auth flows that fail at 2 a.m. when nobody's watching.
Mental model
Vault is an API
Persona 03
Associate
e.g. wealth-management dev. Just needs access.
What they do
  • Request access to ship a feature
  • Use credentials inside their working session
  • Renew or release access when finished
What scares them
Not knowing why a request was denied — and not knowing how long my access actually lasts.
Mental model
Vault is the reason my session timed out
The shared-surface problem: the same policy object SecOps authors is what the platform engineer consumes — and what the associate is implicitly governed by. The work was making one primitive legible to three audiences at three different altitudes.
Research as a decision engine
Three decisions the research actually drove — what came in, what we chose, what changed downstream.
A · From interviews
"Associates" wasn't one persona. Infrequent requesters and embedded developers had different needs.
B · Decision
Segment the associate persona
Build two entry points instead of forcing one surface to serve both literacy levels.
C · What changed
Two surfaces shipped, not one
Simplified request flow for occasional users; inline path for embedded engineers.
A · From shadow sessions
A long-running batch job died silently. Lease expired mid-run. Failure went unnoticed for two days.
B · Decision
Make invisible failure the north star
Design for the case where state changes and nobody is watching.
C · What changed
Lease became a visible, escalating object
Surfaced renewal 15 minutes before expiry — because at expiry the deploy is already failing.
A · From telemetry
Self-service rate, broken out by segment. Nine months in, one feature wasn't moving the number.
B · Decision
Remove the feature
No defensive replacement. Carrying dead surface area is a maintenance and confusion cost.
C · What changed
Surface area reduced, focus regained
Removed in the next release. Engineering reclaimed budget for the renewal flow.
// methods
~30
contextual interviews
6
shadow sessions
4
service-blueprint workshops
9 mo
post-launch telemetry
Design principle
Make risk visible.
"
The system holds the signal. The user often doesn't see it.
→ The corollary
Verification should be low-cost.
The cost to confirm an action before taking it should be near zero. In security workflows, skipping verification is itself a failure mode worth designing against.
The tension we held
Every signal surfaced to a user is also a signal SecOps could act on. Compliance asked for an individualized access-activity dashboard. We declined.
Where we landed: visibility to the user about their own state. Aggregated and anonymized for org-level oversight. No individualized monitoring.
Two design moves
The RBAC-aware permission editor (for SecOps) and the lease as a living thing (for the associate). Both built on the same principle: keep the policy primitive legible, surface what the system already knows.
wealth-mgmt-readonly.hclv3 unsaved
vault.internal.fidelity.com
Capability matrix · click row to inspect
PathReadListCrtUpdDelSudo
secret/wm/clients/*
secret/wm/portfolios/*
secret/wm/transactions/+
database/creds/wm-ro
secret/wm/admin/*
!
// HCL preview · live
path "secret/wm/portfolios/*" {
capabilities = ["read", "list"]
}
Blast radius preview
247
Grants 247 paths across 4 secret engines
Computed on save by walking the path tree. The CLI does not expose this aggregation.
! over-broad capability flagged
secret/wm/admin/* matches 1 path with admin-level access.
Audit log · live tail
14:22EDITcapability matrix
14:18VIEWblast radius preview
14:11REQwealth/db/positions
13:54WARNover-broad capability flagged
Lease lifecycle · persona: associate · click step
1. Request
2. Approval
3. Active
4. Renewal
// approver view, full context
SC
Samantha Chen
Wealth Mgmt · requested 2m ago
awaiting
"Investigating P2 incident on positions feed — need read-only on staging-mirrored prod data."
Recent grants on this path
s.chen → wealth/db/positions · 4hexpired
s.chen → wealth/db/* · 8happroved
s.chen → wealth/db/clients · 1hexpired
Same requester, same path family. Pattern: targeted reads, none extended past TTL.
Engineering collaboration
Where UX changed the API.
response.json · before
{
"id": "db/creds/wealth/a3f1...",
"expire_time": "2024-03-14T17:18:22Z",
"renewable": true,
"ttl": 28800
}
// renewable: yes/no - until when?
// computable client-side, but logic
// drifted across UI / CLI / alerting.
response.json · after
{
"id": "db/creds/wealth/a3f1...",
"expire_time": "2024-03-14T17:18:22Z",
"renewable": true,
+ "renewable_until": "2024-03-14T21:18:22Z",
+ "max_ttl": 43200,
"ttl": 28800
}
// renewable_until is now first-class.
// One source of truth, three consumers.
PR #2841 · mergedleases: expose renewable_until in lookup response+47 / -2 · 3 commits
Vault UIUI
access portal
Renewal countdown component
credstoolCLI
ops CLI
Internal credential admin CLI
leasewatchOps
alerting
Expiry & renewal pipeline
"
When a design constraint is really a data-model gap, the right fix is often to change the data rather than work around it in the UI.
What the work actually moved
Handoff time
30%
Reduction in design-to-dev handoff. Component library shipped with shared primitives.
Request time
3 days4h
Median access request time. From Jira ticket to live lease, across all five business domains.
Self-service rate
0%71%
Associate self-service rate. Requests resolved without platform team intervention.
// the lesson I'm taking with me
Security UX is fundamentally an information-design problem — making state, lifecycle, and risk visible at the right time, in the right amount, to the right person.
// platform thesis
The patterns generalize.
Blast radius preview, lease as countdown, legibility-over-abstraction — none of it is Vault-specific. It's the approach for any product where users take actions whose consequences they can't fully see.
Terraform planBoundary sessionConsul service intent
Designed by Rebecka Raj · Fidelity Investments · 2023-2024
I've been doing this work on a surface that sits on top of Vault. The next step is to do it inside Vault, and across the products that sit alongside it.
Back to Projects