Multi-principal zero-trust identity: humans, devices, workloads, and agents

Classic identity models handle one principal type at a time. Real systems have four interacting simultaneously — and most auth stacks are not built for that.

DESIGN — NOT SHIPPED (in full). Phase 1 — human and device identity — is live at v0.5.0 (after unification of the webauthn/device foundation with the admin convenience layer). The workload and agent phases described here are designed and partially implemented but not in any tagged release. They are included because the architecture only makes sense when you can see where it is going. The v0.5 work added ergonomic admin tooling (admin inviteuser for generating one-time tokens and enroll --invite) on top of Phase 1 without altering the core multi-principal model.

Most identity systems were built for one principal type at a time. Kerberos was built for users on a network. Kubernetes ServiceAccounts were built for pods calling the API server. OIDC was built for users logging into web apps. Each is internally coherent and genuinely good at the problem it was designed to solve. The problem is that they were not designed to work together in the same authorization decision.

A real system has all four principal types interacting simultaneously: a developer on their laptop, a CI runner, a deployed service, an AI agent — all touching the same set of secrets. The auth stack is almost never built for that, so it ends up as a patchwork: PATs for humans, ServiceAccount tokens for pods, static API keys for agents, and no consistent way to answer “who actually touched this, and under what authority?”

Four principal types

Human. A developer or operator. Authenticates via a cert issued to their user identity, plus WebAuthn step-up when elevated operations are required. The human’s logical user_id is portable across devices. Their session carries an auth_strength claim: cert-only if they presented a valid device cert, cert+human if they also completed a WebAuthn ceremony in this session.

Device. A specific laptop or workstation. Has its own cert keyed to a private key that never leaves the machine. The device cert proves “this is the machine enrolled as bert-desktop,” independent of which human is present. Revoke the device cert and every session on that machine stops.

Workload. A pod, a CI runner, a Lambda function. This principal type is designed, not shipped. The intended path: Kubernetes issues a ServiceAccount token, AgentKMS validates it against the cluster’s OIDC endpoint, and issues a short-lived workload cert. The workload identifies by what it is — its ServiceAccount, namespace, cluster — not by a secret handed to it at deploy time.

Agent. A CI bot, a cron job, an AI agent. Also designed, not shipped. Agents operate independently of any human session and carry a capability token whose scope is set at issuance time by policy — bounded to exactly what the agent needs for its task, nothing more.

The SPIFFE URI as a common schema

The mechanism that makes all four types interoperable is a consistent identifier format. Every cert AgentKMS issues carries a SPIFFE URI in its Subject Alternative Name:

# Human on a device (live)
spiffe://catalyst9.local/tenant/<t>/user/<u>/device/<d>

# Workload (vision)
spiffe://catalyst9.local/tenant/<t>/workload/<serviceaccount>/ns/<ns>/cluster/<c>

# Agent (vision)
spiffe://catalyst9.local/tenant/<t>/agent/<name>/instance/<id>

The path encodes which kind of principal this is, along with the specific identifying fields for that kind. This gives a single audit-trail schema across all four types. Every secret access event records a SPIFFE URI as the actor. “Which pod read this secret at 14:32” uses the same query structure as “which developer read this secret.” The principal type is in the URI path, not in a separate field that could be missing or inconsistently populated.

The `auth_strength` claim

Not all operations should be available to all auth states. Session tokens carry an auth_strength claim:

cert-only — the client presented a valid device cert. Sufficient for read paths and most day-to-day operations.
cert+human — the client also completed a WebAuthn step-up in this session. Required for operations with elevated blast radius: minting a bootstrap token, deleting all versions of a secret, adding a new WebAuthn credential.

The claim is set at session mint time and is not upgradeable within the same session. A session that opened as cert-only stays cert-only. If you want to perform a step-up operation, you open a new session. That is intentional — it means there is no “elevation” path where a compromised cert-only session can escalate itself.

Why the four-principal model now

This is not a description of a future state. It is a description of what systems already look like in practice — developers, machines, services, and automated pipelines all touching the same credentials, with no coherent way to reason about any of it. Most credential systems were built for one principal type because that was the problem at hand. The assumption that “a user” or “a service” was a complete description of an actor made sense in smaller systems. It does not hold when an AI agent is operating a CI pipeline that touches production secrets under a human operator’s policy.

The design choice here is to encode principal type into the identity itself — not into an access control list that references the identity. The SPIFFE URI is not metadata attached to a cert. It is the cert’s identity. The principal type is present in every audit log line, every policy evaluation, and every token claim, without any system having to look it up from a side table.

Phase 1 is shipped. The next post covers what shipping even that first slice actually looked like.