Google Zanzibar: The Authorization System You Need to Know for System Design Interviews

Picture this: You’re in a system design interview, and the interviewer asks, “How would you design an authorization system for a platform like Discord that needs to check if millions of users can access billions of channels, messages, and servers?” Your palms start sweating. You begin sketching a basic RBAC system with users, roles, and permissions tables. The interviewer nods but then drops the bomb: “This needs to handle 10 million authorization checks per second with global consistency. How would you scale it?”

I’ve been in that exact situation, and let me tell you, knowing about Google Zanzibar completely changed how I approach authorization problems in interviews. It’s not just another design pattern – it’s a paradigm shift in how we think about permissions at scale.

Why Traditional Authorization Falls Apart

Before we dive into Zanzibar, let’s understand why your typical authorization approach crumbles at scale. Most of us learned to build authorization systems something like this:

// The classic approach we all started with
class User {
    Long id;
    String name;
    List<Role> roles;
}

class Role {
    Long id;
    String name;
    List<Permission> permissions;
}

class Permission {
    Long id;
    String resource;
    String action; // READ, WRITE, DELETE
}

// Checking permissions
boolean canAccess(User user, String resource, String action) {
    for (Role role : user.getRoles()) {
        for (Permission perm : role.getPermissions()) {
            if (perm.resource.equals(resource) && 
                perm.action.equals(action)) {
                return true;
            }
        }
    }
    return false;
}

This works beautifully for your startup with 1,000 users. But what happens when you’re Discord with 150 million active users, each belonging to multiple servers, each server having dozens of channels, and each channel having its own permission overrides? Suddenly, you’re looking at billions of permission combinations.

The problems compound quickly. Your database starts choking on joins across massive tables. Caching becomes a nightmare because permissions change frequently. You need to invalidate caches globally, but that causes thundering herds. You try denormalizing, but now you’re storing the same permission data in hundreds of places, and keeping it consistent feels like juggling flaming torches while riding a unicycle.

Enter Zanzibar: Google’s Elegant Solution

In 2016, Google published a paper that made authorization nerds (yes, we exist) collectively gasp. They revealed Zanzibar – their internal authorization system that handles authentication for Calendar, Cloud, Drive, Maps, Photos, and YouTube. The numbers are staggering: 10 million authorization checks per second with 10 millisecond latency at the 95th percentile.

The genius of Zanzibar lies in its simplicity. Instead of storing computed permissions, it stores relationships. Think of it as a massive graph where nodes are objects (users, documents, groups) and edges are relationships (owner, editor, member).

Here’s the mental model that clicked for me: imagine authorization as asking “can this user reach this resource through any path in the relationship graph?” It’s graph traversal, not table lookups.

The Building Blocks: Relation Tuples

At the heart of Zanzibar are relation tuples. Each tuple represents one relationship and has three parts:

class RelationTuple {
    String object;      // What we're setting permissions on
    String relation;    // The relationship type
    String user;        // Who has this relationship
    
    // Examples:
    // ("doc:readme", "owner", "user:alice")
    // ("doc:readme", "editor", "group:eng#member")
    // ("group:eng", "member", "user:bob")
}

Notice that elegant syntax in the second example? That group:eng#member means “members of the engineering group.” This is where Zanzibar gets its power – relationships can point to other relationships, creating a graph.

The beauty is in what’s NOT there. No complex permission matrices. No role hierarchies. No permission inheritance rules. Just simple, immutable relationships that compose together.

How Permission Checks Work

When checking if Bob can edit the readme document, Zanzibar traverses the graph:

// The check: can user:bob edit doc:readme?
// Zanzibar looks for any path from bob to readme with "editor" relationship

// Path 1: Direct check
// Is there a tuple (doc:readme, editor, user:bob)? No.

// Path 2: Group membership check  
// Is there a tuple (doc:readme, editor, group:X#member) 
// AND (group:X, member, user:bob)?
// Found: (doc:readme, editor, group:eng#member)
// Found: (group:eng, member, user:bob)
// Result: Yes! Bob can edit.

This is implemented using a sophisticated graph traversal algorithm that can explore multiple paths in parallel. The real Zanzibar uses optimizations like bounded depth-first search and aggressive caching of subgraph results.

Implementing a Zanzibar-Inspired System

Let me show you how to build a simplified version that captures the core concepts. This is exactly the level of detail that impresses interviewers:

public class ZanzibarAuthz {
    // Core data structures
    private class Tuple {
        String object;
        String relation;  
        String subject;  // user or userset
        
        Tuple(String object, String relation, String subject) {
            this.object = object;
            this.relation = relation;
            this.subject = subject;
        }
    }
    
    private class CheckRequest {
        String object;
        String relation;
        String user;
        
        CheckRequest(String object, String relation, String user) {
            this.object = object;
            this.relation = relation;
            this.user = user;
        }
    }
    
    // In production, this would be a distributed database
    private Set<Tuple> tuples = new ConcurrentHashMap<>();
    private Cache<CheckRequest, Boolean> checkCache = 
        CacheBuilder.newBuilder()
            .maximumSize(1_000_000)
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .build();
    
    // Add a relationship
    public void addTuple(String object, String relation, String subject) {
        tuples.add(new Tuple(object, relation, subject));
        invalidateCache(object);
    }
    
    // Remove a relationship
    public void removeTuple(String object, String relation, String subject) {
        tuples.remove(new Tuple(object, relation, subject));
        invalidateCache(object);
    }
    
    // The core check algorithm
    public boolean check(String object, String relation, String user) {
        CheckRequest request = new CheckRequest(object, relation, user);
        
        // Check cache first
        Boolean cached = checkCache.getIfPresent(request);
        if (cached != null) return cached;
        
        // Depth-limited search to prevent infinite loops
        boolean result = checkWithDepth(object, relation, user, 0, 10);
        checkCache.put(request, result);
        return result;
    }
    
    private boolean checkWithDepth(String object, String relation, 
                                   String user, int depth, int maxDepth) {
        if (depth > maxDepth) return false;
        
        // Direct relationship check
        if (hasTuple(object, relation, user)) {
            return true;
        }
        
        // Check usersets (groups)
        for (Tuple tuple : getTuplesByObjectAndRelation(object, relation)) {
            if (tuple.subject.contains("#")) {
                // This is a userset like "group:eng#member"
                String[] parts = tuple.subject.split("#");
                String usersetObject = parts[0];
                String usersetRelation = parts[1];
                
                // Recursively check if user has relationship with userset
                if (checkWithDepth(usersetObject, usersetRelation, 
                                  user, depth + 1, maxDepth)) {
                    return true;
                }
            }
        }
        
        return false;
    }
    
    // Helper methods
    private boolean hasTuple(String object, String relation, String subject) {
        return tuples.contains(new Tuple(object, relation, subject));
    }
    
    private List<Tuple> getTuplesByObjectAndRelation(String object, 
                                                     String relation) {
        return tuples.stream()
            .filter(t -> t.object.equals(object) && 
                        t.relation.equals(relation))
            .collect(Collectors.toList());
    }
    
    private void invalidateCache(String object) {
        // In production, this would be more sophisticated
        checkCache.invalidateAll();
    }
}

Now let’s see it in action with a real-world scenario:

public class DiscordExample {
    public static void main(String[] args) {
        ZanzibarAuthz authz = new ZanzibarAuthz();
        
        // Set up Discord-like permissions
        // Server -> Channel -> Message hierarchy
        
        // Alice owns the server
        authz.addTuple("server:general", "owner", "user:alice");
        
        // Owners can do everything (namespace config)
        authz.addTuple("server:general", "admin", "server:general#owner");
        authz.addTuple("server:general", "member", "server:general#owner");
        
        // Bob is a regular member
        authz.addTuple("server:general", "member", "user:bob");
        
        // Create a moderators group
        authz.addTuple("group:mods", "member", "user:charlie");
        authz.addTuple("server:general", "moderator", "group:mods#member");
        
        // Channels inherit from server
        authz.addTuple("channel:general-chat", "parent", "server:general");
        authz.addTuple("channel:general-chat", "read", "server:general#member");
        authz.addTuple("channel:general-chat", "write", "server:general#member");
        authz.addTuple("channel:general-chat", "delete", "server:general#moderator");
        
        // Check permissions
        System.out.println("Can Alice read general-chat? " + 
            authz.check("channel:general-chat", "read", "user:alice")); // true
            
        System.out.println("Can Bob delete messages? " + 
            authz.check("channel:general-chat", "delete", "user:bob")); // false
            
        System.out.println("Can Charlie delete messages? " + 
            authz.check("channel:general-chat", "delete", "user:charlie")); // true
    }
}

Scaling to Production

The implementation above is perfect for interviews, but let me share how real systems scale this pattern to billions of checks. This is the part that really impresses senior interviewers.

First, the storage layer. Zanzibar uses Spanner, Google’s globally distributed database, but you can build this with any database that supports:

  • Strong consistency for writes
  • Horizontal scaling for reads
  • Secondary indexes for efficient lookups

The key insight is sharding by object ID. All tuples for doc:readme live on the same shard, making checks for a single object efficient.

// Sharding strategy
int getShardForObject(String objectId) {
    return Math.abs(objectId.hashCode()) % NUM_SHARDS;
}

// Each shard can be a separate database or table
class ShardedTupleStore {
    private final List<TupleStore> shards;
    
    public void addTuple(Tuple tuple) {
        int shard = getShardForObject(tuple.object);
        shards.get(shard).add(tuple);
    }
    
    public List<Tuple> getTuplesForObject(String object) {
        int shard = getShardForObject(object);
        return shards.get(shard).getByObject(object);
    }
}

Second, caching is critical. Zanzibar uses a multi-level cache:

class MultiLevelCache {
    // L1: In-process cache (microseconds)
    private final Cache<String, Boolean> processCache;
    
    // L2: Distributed cache like Redis (milliseconds)
    private final RedisCache distributedCache;
    
    // L3: Denormalized "hot" paths in database
    private final HotPathStore hotPaths;
    
    public Boolean check(CheckRequest request) {
        // Try each level
        Boolean result = processCache.getIfPresent(request);
        if (result != null) return result;
        
        result = distributedCache.get(request);
        if (result != null) {
            processCache.put(request, result);
            return result;
        }
        
        // Check hot paths (pre-computed common checks)
        if (hotPaths.contains(request)) {
            result = true;
            updateCaches(request, result);
            return result;
        }
        
        return null; // Cache miss
    }
}

Third, optimize the graph traversal. Real systems use sophisticated algorithms:

class OptimizedChecker {
    // Parallel BFS for multiple paths
    public boolean checkParallel(CheckRequest request) {
        ExecutorService executor = ForkJoinPool.commonPool();
        Queue<Future<Boolean>> futures = new LinkedList<>();
        
        // Check direct relationship
        futures.add(executor.submit(() -> 
            checkDirect(request)));
        
        // Check each userset in parallel
        for (String userset : getUsersets(request.object, request.relation)) {
            futures.add(executor.submit(() -> 
                checkUserset(userset, request.user)));
        }
        
        // Return true if any path succeeds
        while (!futures.isEmpty()) {
            try {
                if (futures.poll().get()) return true;
            } catch (Exception e) {
                // Handle errors
            }
        }
        
        return false;
    }
    
    // Leopard index optimization (pre-expanded groups)
    class LeopardIndex {
        // Pre-compute and store expanded group memberships
        Map<String, Set<String>> expandedGroups = new HashMap<>();
        
        void expandGroup(String group) {
            Set<String> members = new HashSet<>();
            // Recursively find all members
            expandGroupRecursive(group, members, new HashSet<>());
            expandedGroups.put(group, members);
        }
        
        boolean isMember(String group, String user) {
            Set<String> members = expandedGroups.get(group);
            return members != null && members.contains(user);
        }
    }
}

How to Use This in System Design Interviews

Here’s my proven strategy for bringing Zanzibar into your system design interviews. I’ve used this approach successfully at multiple FAANG interviews.

When to Bring It Up

Zanzibar is your secret weapon when the interviewer mentions:

  • “How do you handle permissions/authorization?”
  • “The system needs fine-grained access control”
  • “Different users have different access levels”
  • “We need to support teams/groups/organizations”

Don’t force it into every problem. It’s overkill for simple use cases. But for systems like Slack, Notion, Google Docs, or any collaborative platform, it’s perfect.

The Interview Flow

Here’s how I structure my answer:

1. Start with Requirements (2-3 minutes) “Before designing the authorization system, let me clarify the requirements. We need to support:

  • Hierarchical permissions (organization → team → resource)
  • Dynamic group membership
  • Fine-grained permissions (read, write, delete, share)
  • Scale: millions of users, billions of resources
  • Performance: sub-20ms authorization checks Is this correct?”

2. Acknowledge the Naive Approach (2 minutes) “The traditional approach would be an RBAC system with users, roles, and permissions tables. However, at this scale, we’d face challenges:

  • Explosive growth in permission records
  • Cache invalidation complexity
  • Difficult to maintain consistency Let me propose a more scalable approach inspired by Google Zanzibar…”

3. Introduce Core Concepts (5 minutes) Draw the relationship tuple structure. Explain with a concrete example:

Tuples:
(doc:design, owner, user:alice)
(doc:design, editor, team:eng#member)
(team:eng, member, user:bob)

Check: Can Bob edit the design doc?
Path: bob → member of eng → eng can edit doc → YES

4. System Architecture (10 minutes) Draw the full architecture:

┌─────────────┐     ┌──────────────┐     ┌────────────┐
Client    │────▶│ Load Balancer│────▶│ Auth Service
└─────────────┘     └──────────────┘     └─────┬──────┘

                    ┌───────────────────────────┴─────┐
                    │                                 │
              ┌─────▼──────┐                   ┌─────▼──────┐
Cache Layer  │                   │ Tuple Store
              │(Redis)      │                   │(Sharded DB) │
              └────────────┘                   └─────────────┘

5. Deep Dive on Scaling (10 minutes) This is where you shine:

“For the tuple store, I’d shard by object ID using consistent hashing. This ensures all permissions for a resource are colocated.

For caching, I’d implement a three-tier strategy:

  • L1: Application-level LRU cache (100k entries)
  • L2: Redis with 5-minute TTL
  • L3: Pre-computed hot paths for common checks

For the check algorithm, I’d use parallel BFS with depth limiting to prevent infinite loops. We can also implement the Leopard optimization – pre-expanding group memberships during off-peak hours.”

6. Trade-offs Discussion (5 minutes) Show maturity by discussing trade-offs:

“Zanzibar trades storage for flexibility. We’re storing relationships, not computed permissions, which means:

  • Pros: Highly flexible, easy to add new permission types, natural hierarchies
  • Cons: More storage, complex queries, eventual consistency challenges

For strong consistency needs, we might need to implement a changelog system with monotonic timestamps…”

Advanced Topics to Impress

If you have time or the interviewer digs deeper, bring up these advanced concepts:

1. Temporal Permissions

class TemporalTuple extends Tuple {
    Instant validFrom;
    Instant validUntil;
    
    // "Bob can edit the doc from 9 AM to 5 PM"
}

2. Conditional Relationships

class ConditionalTuple extends Tuple {
    String condition;  // CEL expression
    
    // "Bob can read if doc.status == 'published'"
}

3. Namespace Configuration

// Define permission inheritance rules
namespace document {
    relation owner: user
    relation editor: user | group#member
    relation viewer: user | group#member | public
    
    permission edit = owner + editor
    permission view = owner + editor + viewer
}

Common Interview Questions and Answers

Q: “How do you handle permission changes?” A: “Zanzibar uses a changelog approach. Each tuple has a timestamp, and we maintain a global changelog. Clients can ask ‘what changed since timestamp X?’ This enables efficient cache invalidation and supports features like audit logs.”

Q: “What about negative permissions (deny rules)?” A: “Great question! The original Zanzibar doesn’t support explicit denies to keep the model simple. If needed, I’d implement it as a separate tuple type and check denies before allows. However, in practice, removing relationships is usually sufficient.”

Q: “How do you ensure consistency?” A: “We use a two-phase approach:

  1. Writes go through a strongly consistent path (think Spanner or CockroachDB)
  2. Reads can use eventual consistency with bounded staleness For critical operations, we can force a consistent read by checking the latest changelog timestamp.”

Q: “How would you debug permission issues?” A: “I’d build an explain endpoint that returns the evaluation path:

Why can Bob edit doc:readme?
- Checked (doc:readme, editor, user:bob) - NO
- Checked (doc:readme, editor, group:eng#member) - YES
  - Checked (group:eng, member, user:bob) - YES
Access granted via group:eng membership

Real-World Implementation Considerations

Let me share some hard-won lessons from implementing Zanzibar-style systems in production. These details show interviewers you’ve thought deeply about the problem.

Migration Strategy

Moving from a traditional RBAC system to Zanzibar is non-trivial. Here’s a proven approach:

class MigrationStrategy {
    // Step 1: Dual-write period
    void grantPermission(User user, Resource resource, Permission perm) {
        // Write to old system
        legacyRbac.grant(user, resource, perm);
        
        // Write to Zanzibar
        zanzibar.addTuple(
            resource.toZanzibarObject(),
            perm.toZanzibarRelation(),
            user.toZanzibarSubject()
        );
    }
    
    // Step 2: Shadow mode checking
    boolean checkPermission(User user, Resource resource, Permission perm) {
        boolean legacyResult = legacyRbac.check(user, resource, perm);
        boolean zanzibarResult = zanzibar.check(
            resource.toZanzibarObject(),
            perm.toZanzibarRelation(),
            user.toZanzibarSubject()
        );
        
        if (legacyResult != zanzibarResult) {
            log.error("Permission mismatch", user, resource, perm);
            metrics.increment("permission.mismatch");
        }
        
        return legacyResult; // Still trust legacy during migration
    }
    
    // Step 3: Gradual rollout
    boolean checkWithRollout(User user, Resource resource, Permission perm) {
        if (rollout.isEnabled("zanzibar.check", user)) {
            return zanzibar.check(...);
        }
        return legacyRbac.check(...);
    }
}

Monitoring and Observability

Production Zanzibar systems need extensive monitoring:

class ZanzibarMetrics {
    // Latency metrics
    @Timed("zanzibar.check.latency")
    boolean check(CheckRequest request) {
        // Track by relation type
        Timer.Sample sample = Timer.start(registry);
        try {
            boolean result = performCheck(request);
            sample.stop(timer.tags(
                "relation", request.relation,
                "result", String.valueOf(result)
            ));
            return result;
        } catch (Exception e) {
            sample.stop(timer.tags(
                "relation", request.relation,
                "result", "error"
            ));
            throw e;
        }
    }
    
    // Cache effectiveness
    void recordCacheMetrics() {
        metrics.gauge("zanzibar.cache.hit_rate", cache.stats().hitRate());
        metrics.gauge("zanzibar.cache.size", cache.size());
        metrics.counter("zanzibar.cache.evictions", cache.stats().evictionCount());
    }
    
    // Graph traversal depth
    @Histogram("zanzibar.traversal.depth")
    void recordTraversalDepth(int depth) {
        histogram.record(depth);
    }
}

Testing Strategies

Testing authorization is critical. Here’s a comprehensive approach:

class ZanzibarTestFramework {
    // Property-based testing
    @Property
    void transitivePermissions(@ForAll User user, 
                              @ForAll Group group,
                              @ForAll Document doc) {
        // If user is in group and group can access doc,
        // then user can access doc
        assume(isMember(user, group));
        assume(hasAccess(group, doc));
        
        assertThat(hasAccess(user, doc)).isTrue();
    }
    
    // Scenario testing
    @Test
    void complexHierarchyTest() {
        // Set up Google Drive-like hierarchy
        ZanzibarTestDSL z = new ZanzibarTestDSL();
        
        z.createUser("alice")
         .createFolder("root")
         .createFolder("shared", parent: "root")
         .createDoc("design.doc", parent: "shared")
         .grant("root", "owner", "alice")
         .assertCan("alice", "edit", "design.doc")
         .assertCannot("bob", "view", "design.doc")
         .share("shared", "viewer", "bob")
         .assertCan("bob", "view", "design.doc")
         .assertCannot("bob", "edit", "design.doc");
    }
    
    // Load testing
    @Test
    void loadTest() {
        // Create realistic permission graph
        int numUsers = 100_000;
        int numGroups = 10_000;
        int numDocs = 1_000_000;
        int avgGroupSize = 50;
        
        setupRealisticGraph(numUsers, numGroups, numDocs, avgGroupSize);
        
        // Measure check performance
        List<CheckRequest> requests = generateRandomChecks(100_000);
        
        long start = System.currentTimeMillis();
        requests.parallelStream().forEach(zanzibar::check);
        long duration = System.currentTimeMillis() - start;
        
        double avgLatency = duration / (double) requests.size();
        assertThat(avgLatency).isLessThan(10.0); // 10ms target
    }
}

Interview Success Patterns

After coaching dozens of engineers, I’ve noticed patterns in successful Zanzibar discussions:

What Impresses Interviewers:

  1. Starting with the problem: Don’t jump to Zanzibar immediately. Show you understand why traditional approaches fail
  2. Using concrete examples: Always ground abstract concepts in real scenarios (Discord channels, Google Docs sharing)
  3. Discussing trade-offs: No solution is perfect. Acknowledge storage costs and query complexity
  4. Showing depth: Mention optimizations like Leopard indices or sharding strategies
  5. Connecting to the company: If interviewing at Uber, talk about driver-rider permissions. At Airbnb, host-guest-property relationships

What to Avoid:

  1. Over-engineering: Don’t add temporal permissions unless specifically asked
  2. Ignoring basics: Still cover authentication, rate limiting, and other standard concerns
  3. Being inflexible: If the interviewer pushes back, be ready to discuss alternatives
  4. Memorizing without understanding: Be ready to implement a simple version on the whiteboard

The Zanzibar Mindset

The real value of studying Zanzibar isn’t memorizing implementation details – it’s developing a mindset for solving authorization at scale. When I face any permission problem now, I ask myself: “What are the relationships here?”

This shift from thinking about permissions as static rules to dynamic relationships has helped me design better systems even when I don’t use Zanzibar directly. It’s about recognizing that in distributed systems, modeling relationships explicitly is often more powerful than computing derived state.

Remember, in your interview, you’re not trying to rebuild Google’s system exactly. You’re demonstrating that you can think at scale, understand fundamental distributed systems principles, and apply proven patterns to new problems. Zanzibar gives you a concrete example to show all these skills.

The next time you’re in an interview and authorization comes up, you’ll have a powerful tool in your arsenal. Just remember to start with the problem, build up the solution naturally, and always tie it back to the specific requirements at hand.

Good luck with your interviews! May your authorization checks be fast and your graph traversals be shallow.

Leave a Reply