RBAC to Fine-Grained Authorization

Replaced a flat JWT-based role model with Fine-Grained Authorization (FGA) to support workspace-scoped permissions, coordinating adoption across the cloud portal and product teams.

  • OpenFGA
  • TypeScript
  • Node.js
  • Auth0
  • React
  • REST APIs
  • JWT

When JWT Roles Stop Scaling

The flat role-in-token pattern is nearly universal for products in their early years. A user authenticates, they get a JWT with a role claim—"admin", "viewer", "editor"—and every service in the system reads that claim to decide what the user can do. It's simple, stateless, and requires no external calls at authorization time. For a product with a single permission context, it works well.

The trouble arrives in a predictable form: customer growth. Specifically, the kind of growth that comes from landing larger accounts. Enterprise customers don't use products the way individuals or small teams do. They have organizational structures—departments, business units, subsidiaries—and they expect the software they buy to reflect those structures. That usually surfaces as a request that sounds simple but isn't: "Can you give our ops team admin access to the production workspace but only viewer access to staging?" Or: "We want our security team to be able to audit every workspace without being able to change anything."

A flat role claim has no way to represent those requests. A role is global. It applies to everything the user does. You can work around this with increasingly specific role names—"prod-admin", "staging-viewer"—but that approach collapses quickly: the number of roles multiplies with the number of workspaces, and the combination of roles times workspaces times permission levels produces a matrix that no one can reason about. More fundamentally, there's a hard architectural limit. JWT claims are evaluated at the edge—API gateways, middleware, frontend route guards—and they're issued at login time. You can't issue a token that says "admin in workspace A, viewer in workspace B" without either baking a workspace list into the login flow or issuing a separate token per workspace. Neither option scales to an account with dozens of workspaces.

The Workspace Problem in Concrete Terms

The specific failure mode we hit was a permission check that needed to know two things simultaneously: what the user was trying to do, and which workspace they were doing it in. Every API endpoint touching workspace resources had to answer: "Is this user allowed to write to this workspace?" A single role claim couldn't answer that question without knowing the workspace, and the workspace wasn't encoded in the token.

The initial workaround was to include a workspace context as a query parameter or request header and cross-reference it against the user's role. But this created an implicit trust problem: the workspace ID was user-controlled, which meant every authorization check was also a validation check—is this user actually a member of the workspace they're claiming to operate in? That logic lived in individual route handlers, duplicated across dozens of endpoints, with inconsistent coverage. Some routes checked it correctly. Others had subtle gaps.

At small scale, gaps in authorization coverage are a security concern. At the scale of enterprise accounts with dozens of workspaces and users across multiple departments, they become a liability that no compliance-conscious customer will accept. The JWT role model wasn't just inconvenient to extend—it was creating active correctness risks at every enforcement point in the system.

What Fine-Grained Authorization Actually Means

Fine-Grained Authorization (FGA) is a pattern—not a specific product—that models permissions as relationships between users and objects. The canonical formulation is the Google Zanzibar paper, which describes how Google manages permissions across Docs, Drive, YouTube, and dozens of other products at internet scale. The core idea is that instead of asking "what role does this user have?" you ask "does this user have this relationship to this object?" Those questions look similar but express fundamentally different things.

"What role does this user have?" is a question about the user in isolation. "Does this user have write access to workspace:production?" is a question about a specific relationship between a specific user and a specific resource. The second formulation is workspace-aware by construction. You don't need to bolt on workspace context as an afterthought—it's part of the question itself.

OpenFGA—Auth0's open-source implementation of the Zanzibar pattern—was the right tool for this. Its data model is built around relationship tuples: a tuple is a fact that says "user X has relationship Y to object Z." Tuples are written when permissions are granted and deleted when they're revoked. The authorization service evaluates a check by asking whether a tuple chain exists that connects the user to the object through the required relationship, following the type definitions in the permission schema. For this system, the base tuples looked like: (user:alice, admin, workspace:production), (user:bob, viewer, workspace:production). An authorization check on the API would translate to: "does user:alice have write access to workspace:production?"—resolved by traversing the schema definition and confirming the tuple exists. The answer comes back from a fast external call. No token inspection, no workspace cross-reference, no duplicated logic in route handlers.

Modeling the Permission Schema

The schema design is where most of the real work in an FGA migration happens. OpenFGA uses a type system to define what kinds of objects exist and what relationships users can have with them. Writing a schema that accurately captures what your application actually enforces—and that stays legible as the system grows—is not a trivial exercise.

The starting point was enumerating every authorization check in the codebase. Some of it was explicit—API endpoints with role checks, frontend route guards with role conditions. Some of it was implicit: middleware that checked for a minimum role level, utility functions that conditionally surfaced UI elements, background jobs that ran with service account permissions. The audit surfaced around forty distinct permission checks across both codebases, which collapsed into about twelve distinct actions in the FGA model.

One of the more consequential schema decisions was how to handle organization-level membership versus workspace-level membership. Users could be members of an organization—which gave them baseline read access across all its workspaces—while simultaneously holding workspace-specific roles that granted elevated permissions in specific contexts. Modeling this in OpenFGA meant defining inheritance relationships: an admin of an organization is implicitly an admin of every workspace it contains, unless a more restrictive workspace-level grant overrides that. Getting the inheritance direction right required several rounds of review against the existing permission logic. An incorrect tuple hierarchy would over-grant or under-grant silently, which is exactly the kind of bug that doesn't surface in testing and only appears in a security audit.

Cross-Team Adoption Without a Flag Day

The authorization model didn't live in one codebase. The cloud portal (the management UI and its API layer) and the product (the runtime that customers' workspaces actually ran on) both enforced permissions independently, each with their own middleware and their own interpretation of the JWT role claim. Migrating one without the other would leave a gap that was exploitable.

The coordination approach was to define shared middleware before either team wrote a single migration change. Both teams consumed a shared authorization package that exposed a uniform check function: call it with the user context, the action you're checking, and the object you're checking against. Under the hood, the package evaluated the FGA check and returned a result. Neither team needed to understand OpenFGA's API directly. They were adopting a new authorization function call; the underlying system was abstracted.

This framing—"you're replacing a role-string check with an authorization call"—was what made adoption move quickly. Engineers didn't need to learn Zanzibar semantics or OpenFGA schema design. They needed to swap one function call for another. The shared package handled the FGA client, caching, and error surface. Each team could migrate route by route, at their own pace, without coordinating release timing.

The Migration Shim and Parity Testing

Running two authorization systems in parallel during a migration creates an obvious correctness problem: if the old system and the new system disagree about whether a user can do something, which one wins? Getting that wrong in either direction—over-granting because FGA says yes when the JWT check would have said no, or under-granting because FGA says no when the JWT check would have said yes—is a production incident.

The compatibility shim ran both checks and compared the results. During the early migration window, the JWT role check remained authoritative: if there was a disagreement, the JWT result was used and the discrepancy was logged. This meant the observable behavior of the system didn't change at all during the migration. FGA was running in shadow mode, and mismatches showed up as log entries rather than access denials. A discrepancy was a signal to review the FGA tuple data or schema definition, not a user-visible error.

The migration tooling that translated existing JWT role assignments into FGA tuples ran against production data before the shim was deployed. This produced a baseline set of tuples representing the current permission state of every user in the system. Parity testing then ran a representative sample of real authorization checks—drawn from production logs—against both systems simultaneously and compared the results. The threshold for flipping the shim to FGA-authoritative was zero discrepancies. It took three passes to get there: the first pass revealed the organization-versus-workspace inheritance issue described above, the second surfaced a small class of legacy role strings the migration tooling hadn't mapped, and the third was clean.

Auditing What You Couldn't Before

One of the understated benefits of moving to a relationship tuple model is what it does for security review and compliance. Under the JWT role model, answering "who has admin access to workspace:production right now?" required either maintaining a separate directory of role assignments or querying every user's JWT claims—which meant pulling live tokens or querying the auth provider. Neither was fast, and the data might be stale.

Under FGA, that question is just a query. It runs in milliseconds against the live tuple store and returns an exact answer. This changed the shape of a security review from a slow, manual engineering task to an automated query that any authorized team member could run. Workspace permission audits that would previously have taken a full day of engineering time became a self-service exercise. Compliance reporting for SOC 2 access control requirements became a script against the FGA store rather than a meeting with the engineering team.

The audit trail also improved materially. OpenFGA logs every tuple write and delete with a timestamp and the identity of the actor who made the change. Permission changes are now auditable events—not inferred from application log side-effects—which satisfies the access control documentation requirements that enterprise customers and compliance frameworks demand. That's a genuine improvement in security posture that the JWT model structurally couldn't provide.

What I'd Do Differently

Write the schema before touching any application code. It's tempting to start the migration and refine the schema as you go, but the schema is the contract that every migration change is implementing. A schema revision mid-migration means retroactively updating tuples that were already written and re-verifying parity on checks that were already migrated. Locking the schema first—even knowing it will need minor revisions—forces the authorization model to be explicit before implementation begins, which surfaces design disagreements while they're still cheap to fix.

Run parity testing in production before declaring readiness. Staging environments don't reproduce the full range of real authorization patterns that users generate. The gaps that showed up on the first two parity passes were both driven by edge cases in production data that didn't exist in staging: old accounts with non-standard role strings, inherited organization memberships that weren't reflected in staging fixtures. The staging parity pass was clean; the first production pass surfaced two issues. Real data is the only reliable parity oracle.

Invest in tuple management tooling early. The FGA store needs the same operational care as any other database: the ability to inspect current state, trace how a particular tuple got written, and roll back a bad migration. OpenFGA's API is queryable but not designed for operational browsing. A small internal read-only UI over the FGA check and read APIs paid for itself in the first month of running the system. Without it, debugging a "why doesn't this user have access?" question meant reading raw API responses.

Outcomes

  • Enabled large tenants to manage independent permission sets across dozens of workspaces.
  • Replaced opaque JWT role strings with an auditable, queryable permission graph—making policy review straightforward for security and compliance.
  • Cross-team adoption succeeded without a forced cutover: each team migrated incrementally using the shared tooling and shim layer.