Chapter 2: Identity and Access Management (IAM)
In the cloud, Identity is the new perimeter. Gone are the days when a firewall was enough to protect your data. In GCP, Identity and Access Management (IAM) is the central system that manages permissions across every single service. Understanding IAM is crucial for maintaining security and compliance in your cloud environment.0. From Scratch: What is IAM?
Before diving into the specifics, let’s establish what IAM is and why it’s fundamentally different from traditional security models.Traditional Security Model vs. Cloud IAM
In on-premises environments, security was primarily perimeter-based:- Physical security controlled access to buildings
- Network firewalls protected internal resources
- Active Directory managed user identities
- Local admin accounts had privileged access to systems
- There is no traditional network perimeter
- Every resource access is authenticated and authorized through APIs
- Identities can be humans, services, or external systems
- Permissions are granted explicitly through policies
- Security is enforced at the resource level, not the network level
1. The Core Philosophy: Who, What, and Where
The IAM model in GCP is a simple but powerful relationship: Who (Identity) + What (Role) + Where (Resource) This relationship forms the basis of every access decision in GCP. Understanding this triangle is fundamental to implementing proper security.The IAM Triangle Explained
-
WHO (Identity Principal):
- The entity requesting access (human or service)
- Must be authenticated before authorization occurs
- Examples: users, service accounts, groups
-
WHAT (Role/Permissions):
- The specific actions the principal is allowed to perform
- Granular to the API call level (e.g.,
compute.instances.start) - Roles bundle related permissions together
-
WHERE (Resource Hierarchy):
- The specific GCP resource the action applies to
- Follows GCP’s resource hierarchy (Organization → Folder → Project → Resource)
- Policies are attached to resources and inherited downward
Identities: The “Who”
Identities represent the entities that can access GCP resources. Each identity type serves different purposes:1. Google Accounts (Individual Users)
- Personal Gmail accounts (e.g.,
[email protected]) - Cloud Identity or Google Workspace accounts (e.g.,
[email protected]) - Used for human users accessing GCP resources
- Can authenticate via password, 2-factor authentication, or SSO
2. Service Accounts (Non-human Identities)
- Identities for applications, virtual machines, and services
- Use cryptographic keys instead of passwords
- Represent the “application” or “service” itself
- Critical for automation and programmatic access
3. Google Groups
- Collections of users, service accounts, or other groups
- Best Practice: Always assign roles to groups, not individual users
- Simplifies management and enables team-based access control
- Can be Google Workspace groups or Cloud Identity groups
4. Cloud Identity / Google Workspace
- Corporate directory integration
- Enables Single Sign-On (SSO) with existing corporate credentials
- Maintains separation between personal and corporate accounts
5. Domain-Wide Delegation
- Grants access to all users in a domain
- Used for enterprise applications that need broad access
- Requires careful consideration due to scope
6. allAuthenticatedUsers and allUsers
allAuthenticatedUsers: Any authenticated Google userallUsers: Anyone on the internet (including unauthenticated)- Critical Warning: Use these with extreme caution
- Typically reserved for public resources like Cloud Storage buckets serving static websites
Roles: The “What”
Roles define what actions an identity can perform on resources. GCP offers multiple role types with different scopes and management approaches:Primitive Roles (Legacy - Do Not Use in Production)
- Owner: Full access to all resources and permissions to grant access to others
- Editor: Can modify resources but cannot grant access to others
- Viewer: Can view resources but cannot modify them
- Warning: These roles grant extremely broad permissions
- Never use in production - they violate the principle of least privilege
Predefined Roles (Recommended)
- Created and maintained by Google
- Regularly updated to include new permissions as services evolve
- Granular and specific to service functions
- Examples:
roles/compute.instanceAdmin.v1,roles/storage.objectViewer - Well-tested and security-reviewed by Google
Custom Roles (Production Use Cases)
- Created by administrators for specific organizational needs
- Allow precise permission control beyond predefined roles
- Must be manually maintained as services evolve
- Useful for compliance requirements or unique business needs
- Cannot be updated automatically when new permissions are added
The Permission Hierarchy
Understanding the relationship between permissions, roles, and policies is crucial:- Permissions: Individual actions (e.g.,
compute.instances.create) - Roles: Collections of related permissions (e.g.,
roles/compute.instanceAdmin) - Policies: Bindings between identities and roles for specific resources
2. Service Accounts: The Powerhouse of Automation
Service accounts are identities for applications and services, not humans. They are fundamental to cloud security and automation, requiring special attention and protection.Understanding Service Accounts Deeply
What Makes Service Accounts Different from User Accounts?
- Authentication Method: Use private keys/certificates instead of passwords
- Multi-Factor Authentication: Do not support MFA/2FA (by design)
- Identity Type: Represent applications/services, not humans
- Management: Controlled programmatically, not through user interfaces
- Scope: Can act on behalf of the application across various resources
Service Account Structure
A service account has multiple identifiers:- Email Address:
[email protected] - Unique ID: Numeric identifier (e.g.,
123456789012345678901) - Display Name: Human-readable name for identification
- Description: Additional context about the service account’s purpose
Service Account Keys (Credentials)
- JSON Key Files: Contain private keys for authenticating service accounts
- High Risk: If compromised, provide full access to the service account’s permissions
- Best Practice: Minimize use and rotate regularly
- Alternative: Use Workload Identity Federation to eliminate key files
The Service Account User vs Actor Distinction
This is one of the most commonly misunderstood aspects of GCP IAM:Service Account Identity
- The service account itself (e.g.,
[email protected]) - Has specific roles and permissions granted to it
- Represents the application’s identity in GCP
Service Account Actor
- The human or service that wants to “act as” the service account
- Must have the
iam.serviceAccounts.actAspermission on the service account - This creates a two-step authorization process
- Service Account
[email protected]hasStorage Object Viewerrole - User
[email protected]wants to deploy an application that uses this service account - User
[email protected]must haveiam.serviceAccounts.actAspermission on the service account - Without this permission, the user cannot deploy the application with the service account
Workload Identity Federation (Modern Authentication)
Workload Identity Federation eliminates the need for service account key files, addressing a major security concern.The Problem with Service Account Keys
- Storage Risk: Keys stored in files can be accidentally committed to repositories
- Distribution Risk: Keys must be securely distributed to all environments
- Rotation Risk: Manual rotation is complex and often neglected
- Compromise Impact: Stolen keys provide persistent access to resources
How Workload Identity Federation Works
- External Identity Provider: GitHub Actions, AWS, Azure, or other OIDC providers
- OIDC Token Exchange: External provider issues time-limited OIDC tokens
- GCP Trust Relationship: Configure trust between external provider and GCP service account
- Token Validation: GCP validates the OIDC token against the trust relationship
- Temporary Credentials: GCP provides temporary credentials for the service account
Benefits of Workload Identity Federation
- No Key Files: Eliminates the risk of key file compromise
- Automatic Rotation: OIDC tokens are short-lived and automatically rotated
- External Control: Leverage existing identity providers for access management
- Compliance: Meets regulatory requirements for credential management
Service Account Best Practices
1. Principle of Least Privilege
- Grant minimal required permissions for each service account
- Use predefined roles when possible instead of custom roles
- Regularly audit and remove unnecessary permissions
- Use IAM Recommender to identify unused permissions
2. Service Account Naming Conventions
3. Service Account Separation
- Use separate service accounts for different applications
- Use separate service accounts for different environments (dev/staging/prod)
- Use separate service accounts for different functions within an application
- Avoid sharing service accounts across unrelated applications
4. Key Management
- Minimize use of service account key files
- Rotate keys regularly (monthly recommended)
- Audit key usage and remove unused keys
- Use Workload Identity Federation instead of key files when possible
3. IAM Conditions: Contextual Security
IAM Conditions add contextual restrictions to IAM policies, enabling more granular and dynamic access control. They allow you to specify when a policy binding is effective based on various attributes.Understanding IAM Conditions
Traditional IAM policies are static: “Grant this role to this identity on this resource.” IAM Conditions add a temporal or contextual dimension: “Grant this role to this identity on this resource WHEN certain conditions are met.”Common Condition Use Cases
1. Time-Based Access
2. IP-Based Restrictions
3. Resource Attributes
Condition Expression Language
IAM Conditions use CEL (Common Expression Language) for expressing conditions. Key elements include:1. Request Attributes
request.time: The time of the requestrequest.auth.claims: Authentication claimsrequest.auth.principal: The authenticated principal
2. Resource Attributes
resource.name: The resource nameresource.type: The resource typeresource.labels: Resource labels
3. Common Functions
timestamp(): Convert string to timestampstring.startsWith(): Check if string starts with prefixstring.contains(): Check if string contains substringin: Check membership in a list
Advanced Condition Scenarios
1. Business Hours Access
2. Device Compliance
3. Risk-Based Access
Limitations and Considerations
1. Performance Impact
- Conditions add slight latency to authorization decisions
- Complex conditions may impact performance
- Monitor for any performance degradation
2. Debugging Challenges
- Condition failures may be harder to troubleshoot
- Use IAM Policy Troubleshooter to debug conditions
- Test conditions thoroughly before production deployment
3. Service Support
- Not all GCP services support IAM Conditions
- Check service documentation for condition support
- Some services have limitations on condition complexity
4. Policy Inheritance and the “Additive” Rule
GCP’s resource hierarchy creates a complex inheritance model for IAM policies. Understanding this model is crucial for effective access management.The Resource Hierarchy
GCP organizes resources in a hierarchical structure:Policy Inheritance Rules
1. Downward Inheritance
- Policies set at higher levels (organization, folder) apply to lower levels (projects, resources)
- Lower levels inherit all permissions from higher levels
- This creates a cumulative effect of permissions
2. Additive Nature
- CRITICAL: Permissions are additive, not subtractive
- If a user has a role at the organization level, they retain those permissions at all child resources
- You cannot “deny” a permission at a lower level if it was granted at a higher level
3. No Deny Mechanism
- GCP IAM does not support explicit deny policies
- Once a permission is granted at a higher level, it cannot be revoked at a lower level
- This is a fundamental design choice for simplicity and consistency
Practical Implications
Scenario 1: Organization-Level Owner
Scenario 2: Proper Role Assignment
Managing Hierarchical Policies Effectively
1. Organization-Level Policies
- Apply to all resources in the organization
- Use for enterprise-wide roles (security teams, auditors)
- Be very careful with permissions granted here
- Consider using groups rather than individual users
2. Folder-Level Policies
- Apply to all projects within the folder
- Useful for departmental or business unit access
- Good for shared services and cross-functional teams
- Can override organization policies due to additive nature
3. Project-Level Policies
- Most common level for role assignments
- Use for team-based access control
- Combine with groups for easier management
- Monitor for inheritance conflicts
4. Resource-Level Policies
- Most granular level of control
- Use sparingly to avoid complexity
- Effective for sensitive resources
- Combine with conditions for contextual access
Policy Troubleshooting for Hierarchies
1. Identifying Inheritance Issues
- Use IAM Policy Troubleshooter to trace permission sources
- Check all levels of the hierarchy for conflicting policies
- Understand which policies contribute to effective permissions
2. Planning Policy Changes
- Consider impact across the entire hierarchy
- Test changes in non-production environments
- Document the intended inheritance behavior
- Communicate changes to affected stakeholders
4. IAM Deny Policies: The Explicit Guardrails
While standard IAM policies are allow-only, IAM Deny Policies allow you to explicitly block permissions.- Precedence: Deny policies always override allow policies. If a user is granted
Ownerat the project level but isDeniedat the organization level, they cannot perform the action. - Use Case: “Prevent anyone (even admins) from deleting production storage buckets” or “Block access to all GCP services for a contractor after their contract ends, regardless of project-level permissions.”
- Inheritance: Deny policies follow the same inheritance rules as allow policies (Organization -> Folder -> Project).
5. Workload Identity Federation: Multi-Cloud Identity
Workload Identity Federation is the modern way to connect AWS, Azure, or GitHub Actions to GCP without service account keys.- The OIDC Bridge: GCP trusts the external provider (e.g., AWS STS or GitHub OIDC).
- Attribute Mapping: You map external attributes (like
aws_role_arnorgithub_repo) to GCP principal identifiers. - Security: This eliminates the “Key Rotation” problem. The external workload receives a short-lived GCP token automatically.
AWS to GCP Example
- Pool: Create a Workload Identity Pool for AWS.
- Provider: Configure the AWS account ID as a trusted provider.
- Binding: Allow an AWS IAM Role to impersonate a GCP Service Account.
- CLI:
gcloud auth login --cred-file=aws-credentials.json.
6. Advanced Troubleshooting
Effective IAM troubleshooting requires understanding both the current state and potential impacts of changes. GCP provides powerful tools to help with both reactive troubleshooting and proactive validation.5.1 The Policy Troubleshooter
The Policy Troubleshooter is the primary tool for diagnosing access issues. It analyzes the entire policy hierarchy to determine why access was granted or denied.How Policy Troubleshooter Works
-
Input Requirements:
- Principal: The user or service account experiencing the issue
- Permission: The specific permission being checked (e.g.,
compute.instances.start) - Resource: The specific resource where access is needed
-
Analysis Process:
- Examines all IAM policies in the resource hierarchy
- Evaluates direct bindings and conditional bindings
- Identifies which policies grant or deny the requested permission
- Provides detailed explanation of the access decision
-
Output Information:
- Whether access is granted or denied
- Which policies contributed to the decision
- Path through the resource hierarchy
- Any conditions that affected the outcome
Using Policy Troubleshooter Effectively
Command Line Interface
Console Interface
- Navigate to IAM section in Google Cloud Console
- Select “Policy Troubleshooter” from the left navigation
- Enter the principal, permission, and resource details
- Review the detailed analysis and recommendations
Common Troubleshooting Scenarios
Scenario 1: Unexpected Access Granted
Problem: A user has access to a resource they shouldn’t have access to. Solution: Use Policy Troubleshooter to identify where the permission was granted in the hierarchy.Scenario 2: Expected Access Denied
Problem: A user cannot access a resource they should have access to. Solution: Use Policy Troubleshooter to identify missing permissions or conflicting policies.Scenario 3: Conditional Access Issues
Problem: Access works sometimes but not consistently. Solution: Use Policy Troubleshooter to examine conditional policies and their evaluation.5.2 The IAM Policy Simulator
The IAM Policy Simulator allows you to test policy changes before implementing them, preventing unintended access modifications.Key Features of Policy Simulator
- Predictive Analysis: Shows how proposed policies would affect existing access
- Historical Replay: Analyzes past access attempts against proposed policies
- Impact Assessment: Identifies which users/services would be affected by changes
- Safety Net: Prevents disruptive policy changes
How Policy Simulator Works
- Proposed Policy: Define the IAM policy changes you want to make
- Historical Data: Simulator replays the last 90 days of access attempts
- Comparison: Shows which access attempts would succeed or fail under the new policy
- Reporting: Generates detailed reports of the policy change impact
Practical Use Case: Removing Editor Role
Situation: Need to removeroles/editor from a developer group but concerned about breaking CI/CD pipelines.
Solution:
- Prepare the new policy without the editor role
- Run the policy through the simulator
- Analyze the results to identify any blocked API calls
- Adjust the policy or prepare for the impact before implementation
Simulator Limitations
- Only analyzes the last 90 days of access data
- May not catch all edge cases or new access patterns
- Requires sufficient historical data to be meaningful
- Does not account for future access needs
5.3 IAM Recommender: The SRE’s Intelligence
The IAM Recommender is an automated tool that uses machine learning to enforce the Principle of Least Privilege at scale.How it Works
- Observation: Google monitors the actual permissions used by each principal over the last 90 days.
- Comparison: It compares the granted roles with the used permissions.
- Recommendation: If a user has
roles/editorbut only ever reads from GCS buckets, the Recommender suggests downgrading them toroles/storage.objectViewer.
The Policy Analyzer
For complex scenarios, the Policy Analyzer helps you answer “Who can access this resource and how?” by performing a recursive expansion of all groups, service accounts, and conditional bindings across the hierarchy.9. Organization Policies: The Guardrails of the Cloud
While IAM defines Identity-based access, Organization Policies provide Resource-based constraints. They are the “Constitutional Law” of your GCP environment.9.1 IAM vs. Org Policy
- IAM: “Can user Alice start a VM?” (Identity-centric)
- Org Policy: “Can anyone create a VM with a Public IP in this folder?” (Resource-centric)
9.2 Critical Organization Constraints
A Principal Engineer should implement these foundational constraints to prevent security drift:| Constraint | Effect | Why it matters |
|---|---|---|
iam.disableServiceAccountKeyCreation | Blocks JSON key generation. | Forces usage of Workload Identity. |
compute.disableExternalIP | Blocks VMs from having Public IPs. | Prevents accidental internet exposure. |
gcp.resourceLocations | Restricts resource creation to specific regions. | Ensures data sovereignty and compliance. |
iam.allowedPolicyMemberDomains | Restricts IAM to only your corporate domain. | Prevents sharing data with personal Gmail accounts. |
9.3 Enforcement and Inheritance
- Dry Run Mode: You can test an Org Policy in “Dry Run” mode to see what would be blocked without actually stopping developers.
- Hierarchical Overrides: Policies set at the Org level apply to everyone, but you can “exempt” specific folders or projects if necessary (use with caution).
6. Deep Dive: Custom Roles and Advanced Permission Management
While predefined roles are suitable for most use cases, production environments often require custom roles for specific compliance, security, or operational requirements. Understanding custom roles deeply is essential for advanced IAM management.6.1 Understanding Permissions at the API Level
GCP permissions correspond directly to API methods, providing granular control over cloud resources.Permission Structure
Examples:
compute.instances.create- Create Compute Engine instancesstorage.buckets.delete- Delete Cloud Storage bucketspubsub.topics.publish- Publish messages to Pub/Sub topicsbigquery.datasets.update- Update BigQuery datasets
Permission Categories
- Read Permissions:
get,list,getIamPolicy - Write Permissions:
create,update,patch - Delete Permissions:
delete - Special Permissions:
setIamPolicy,testIamPermissions
6.2 Custom Role Creation and Management
Creating custom roles requires careful planning and ongoing maintenance to ensure they remain effective and secure.Custom Role Components
1. Role Metadata
- Title: Human-readable name for the role
- Description: Explanation of the role’s purpose
- Stage: Development stage (ALPHA, BETA, GA, DEPRECATED)
2. Permission Selection
- includedPermissions: List of permissions granted by the role
- Must be valid GCP permissions
- Should follow least-privilege principle
3. Role Limits
- Maximum of 5 permissions per custom role (soft limit)
- Maximum of 100 custom roles per project/organization
- Consider using predefined roles when possible
Custom Role YAML Template with Advanced Options
6.3 Advanced Custom Role Patterns
Pattern 1: Operational Roles
Roles designed for specific operational tasks without data access:Pattern 2: Deployment Roles
Roles for CI/CD systems with limited scope:Pattern 3: Auditing Roles
Roles for compliance and auditing without operational capabilities:6.4 Custom Role Maintenance and Evolution
1. Permission Drift Management
- Challenge: New GCP features introduce new permissions
- Solution: Regular review of custom roles against service updates
- Best Practice: Subscribe to GCP release notes and feature announcements
2. Version Control for Custom Roles
- Store custom role definitions in version control
- Track changes and approvals for role modifications
- Maintain backup versions for rollback capability
- Document the business justification for each permission
3. Automated Role Validation
- Implement CI/CD pipelines for custom role management
- Test role permissions in non-production environments
- Validate role effectiveness before production deployment
- Monitor for unused or overly broad permissions
6.5 Custom Role Security Considerations
1. Privilege Escalation Prevention
- Review permissions for potential escalation paths
- Avoid granting both read and write permissions when only one is needed
- Consider the combination of permissions and their potential misuse
- Regular security reviews of custom role assignments
2. Principle of Least Privilege
- Grant only the minimum permissions required for the task
- Regularly audit and remove unused permissions
- Use predefined roles when possible instead of custom roles
- Monitor access patterns and adjust permissions accordingly
3. Separation of Duties
- Create separate roles for different functional responsibilities
- Prevent any single role from having excessive authority
- Implement checks and balances through role separation
- Document role responsibilities and access patterns
7. Workload Identity: Securing Service-to-Service Communication
Workload Identity is GCP’s recommended approach for authenticating workloads running on Kubernetes Engine (GKE) or other platforms to Google Cloud services. It eliminates the need for service account key files, addressing a significant security concern.7.1 Understanding Workload Identity
The Traditional Problem
- Applications in containers needed service account key files to authenticate to GCP services
- Key files were stored in containers, creating security vulnerabilities
- Key rotation was manual and often neglected
- Compromised containers could expose key files and provide unauthorized access
Workload Identity Solution
- Establishes trust relationship between Kubernetes service accounts and GCP service accounts
- Uses Kubernetes native authentication mechanisms
- Eliminates need for service account key files
- Provides automatic credential exchange and refresh
7.2 Workload Identity Architecture
Components
- Kubernetes Service Account (KSA): Identity for pods within a GKE cluster
- GCP Service Account (GSA): Identity for accessing GCP resources
- Workload Identity Pool: Container for the trust relationship
- Workload Identity Provider: Authenticates Kubernetes workloads
Trust Flow
- Pod authenticates to GKE using Kubernetes service account
- GKE validates the workload against the trust relationship
- Temporary credentials are exchanged for the GCP service account
- Application accesses GCP resources using these credentials
7.3 Workload Identity Configuration
Step 1: Enable Workload Identity on the Cluster
Step 2: Create or Identify GCP Service Account
Step 3: Bind GCP Service Account to Kubernetes Service Account
Step 4: Annotate Kubernetes Service Account
7.4 Workload Identity Federation (Beyond GKE)
Workload Identity Federation extends the concept beyond GKE to external identity providers.Supported Providers
- GitHub Actions
- AWS
- Azure
- SAML 2.0 identity providers
- OIDC providers
Federation Configuration
- Create a workload identity pool
- Create a workload identity provider
- Configure trust relationship with external provider
- Create or identify target GCP service account
- Grant necessary IAM permissions to the service account
7.5 Best Practices for Workload Identity
1. Namespace and Service Account Design
- Use dedicated namespaces for different applications/environments
- Follow consistent naming conventions for service accounts
- Implement least-privilege access for each workload
- Separate production and non-production workloads
2. Monitoring and Auditing
- Enable audit logging for workload identity exchanges
- Monitor for unusual authentication patterns
- Alert on failed authentication attempts
- Track credential usage and rotation
3. Security Considerations
- Regularly review and update trust relationships
- Monitor for unauthorized workload identity usage
- Implement proper namespace isolation
- Secure Kubernetes cluster configuration
8. IAM Design Patterns and Anti-Patterns
Effective IAM implementation follows proven design patterns while avoiding common anti-patterns that lead to security vulnerabilities and management complexity.8.1 IAM Design Patterns
Pattern 1: Role-Based Access Control (RBAC) with Groups
Problem: Managing permissions for individual users becomes unwieldy. Solution: Create groups based on job functions and assign roles to groups. Implementation:- Create groups:
[email protected],[email protected],[email protected] - Assign roles to groups rather than individual users
- Add/remove users from groups as their roles change
Pattern 2: Environment-Based Access Control
Problem: Same users need different access levels across dev/staging/prod. Solution: Use folders to organize environments and apply different policies. Implementation:Pattern 3: Resource Tagging for Access Control
Problem: Need to control access based on resource characteristics. Solution: Use resource labels combined with IAM conditions. Implementation:- Label resources:
environment=prod,team=payment,confidentiality=high - Use conditions to enforce access based on labels
Pattern 4: Segmented Administration
Problem: Preventing any single administrator from having complete control. Solution: Split administrative responsibilities across multiple roles. Implementation:- Network administrator:
roles/compute.networkAdmin - Security administrator:
roles/iam.securityAdmin - Billing administrator:
roles/billing.admin
8.2 IAM Anti-Patterns to Avoid
Anti-Pattern 1: Overuse of Primitive Roles
Problem: Using Owner/Editor/Viewer roles in production. Risk: Excessive permissions violate least-privilege principle. Solution: Use predefined roles or create custom roles with minimal permissions.Anti-Pattern 2: Direct User-to-Resource Binding
Problem: Granting roles directly to individual users instead of groups. Risk: Difficult to manage and maintain as team grows. Solution: Always use groups for role assignments.Anti-Pattern 3: Overlapping Administrative Boundaries
Problem: Same person has both development and security administration. Risk: Potential for security bypass or conflict of interest. Solution: Separate duties and implement checks and balances.Anti-Pattern 4: Inadequate Key Management
Problem: Poor management of service account keys. Risk: Compromised keys provide persistent access to resources. Solution: Minimize key use, implement rotation, use Workload Identity.8.3 Common Implementation Scenarios
Scenario 1: Multi-Team Development Environment
Requirements:- Multiple development teams working on different projects
- Shared services (CI/CD, monitoring, security)
- Isolated development environments
- Controlled production access
- Organize projects by team/product
- Create team-specific groups with appropriate access
- Implement shared service accounts for common infrastructure
- Use folders to group related projects
- Implement production access controls and approvals
Scenario 2: Regulatory Compliance Environment
Requirements:- Segregation of duties
- Detailed audit trails
- Restricted data access
- Regular access reviews
- Implement role separation for different functions
- Use audit logging for all access attempts
- Implement data classification and access controls
- Schedule regular access certification reviews
- Use automated tools for compliance monitoring
Lab: Comprehensive IAM Implementation Exercise
This lab will walk you through implementing a complete IAM strategy for a fictional company with multiple teams and environments.Scenario: Acme Corp GCP Implementation
Acme Corp is migrating their applications to GCP with the following requirements:- 3 development teams (Frontend, Backend, Data)
- 3 environments (Development, Staging, Production)
- Security team that monitors all access
- Separate billing for each team/environment
- Need to implement least-privilege access
Step 1: Organize the Resource Hierarchy
First, create the folder structure to organize resources:Step 2: Create Groups for Teams
Create Google Groups for each team and environment:Step 3: Create Service Accounts for Applications
Create service accounts for each application:Step 4: Implement Least-Privilege Access
Assign roles based on the principle of least privilege:Step 5: Configure Service Account Access
Set up service account access for applications:Step 6: Implement Security Monitoring
Configure security team access for monitoring:Step 7: Verify the Implementation
Test the implementation to ensure it works as expected:Step 8: Set Up Monitoring and Alerting
Enable audit logging and set up monitoring:SRE Best Practices Checklist
Security Posture
- Audit Logs: Enable “Data Access” logs for critical services to see exactly who accessed your data
- IAM Recommender: Check this weekly. Google will suggest removing unused permissions based on actual usage over the last 90 days
- Groups over Users: Never grant a role directly to an email address. Use Google Groups
- Service Account Keys: Hunt them down and delete them. Use Workload Identity instead
- Conditional Access: Implement time-based and IP-based conditions for sensitive access
- Separation of Duties: Ensure no single user has complete control over critical systems
Operational Excellence
- Policy Simulations: Test policy changes before applying them using the IAM Policy Simulator
- Resource Labels: Use consistent labeling to enable conditional access based on resource attributes
- Regular Reviews: Conduct quarterly access reviews to remove unnecessary permissions
- Documentation: Maintain clear documentation of roles, responsibilities, and access patterns
- Training: Ensure team members understand IAM concepts and best practices
Performance and Efficiency
- Role Optimization: Use predefined roles when possible; create custom roles only when necessary
- Hierarchical Design: Organize resources to minimize the number of policies needed
- Condition Complexity: Balance security needs with performance impact of complex conditions
- Monitoring Overhead: Configure monitoring appropriately without excessive noise
Interview Preparation
Q1: What is the Principle of Least Privilege (PoLP) and how do you implement it in GCP IAM?
Q1: What is the Principle of Least Privilege (PoLP) and how do you implement it in GCP IAM?
Answer: PoLP is the security concept of providing a user or service only the minimum permissions required to perform their job.Implementation in GCP:
- Custom Roles: Instead of
roles/editor, create a custom role with only the specific permissions needed (e.g.,compute.instances.start). - Granular Scopes: Use predefined roles that are service-specific (e.g.,
roles/storage.objectViewerinstead ofroles/storage.admin). - IAM Recommender: Use this tool to identify over-privileged accounts based on actual usage and automatically downgrade them.
- IAM Conditions: Grant permissions that are limited by time, resource name, or request context (like IP address).
Q2: Explain the difference between a Google Service Account (GSA) and a Kubernetes Service Account (KSA). How does Workload Identity bridge them?
Q2: Explain the difference between a Google Service Account (GSA) and a Kubernetes Service Account (KSA). How does Workload Identity bridge them?
Answer:
- GSA: An IAM identity in GCP used by services (VMs, Cloud Run) to access other GCP resources. It uses JSON keys or the metadata server.
- KSA: An identity inside a Kubernetes cluster used by pods to talk to the K8s API. It has no inherent permissions in GCP.
- Workload Identity: It maps a KSA to a GSA. When a pod uses that KSA, the GKE metadata server returns a short-lived token for the linked GSA, allowing the pod to access GCP resources (like Cloud Storage) without needing a leaked JSON key.
Q3: You are assigned an 'Access Denied' error for a user trying to start a VM. How do you troubleshoot this?
Q3: You are assigned an 'Access Denied' error for a user trying to start a VM. How do you troubleshoot this?
Answer: I follow a systematic process:
- Policy Troubleshooter: Input the user’s email, the permission (
compute.instances.start), and the resource URL. This tool checks the entire hierarchy (Org, Folder, Project) to see which policy is blocking access. - Check Inheritance: Ensure they aren’t being granted permission at a high level and then expecting it to be “denied” at a lower level (permissions are additive).
- Verify Identity: Ensure the user is authenticated with the correct account (especially if using multiple accounts in the same browser).
- Cloud Audit Logs: Check the logs to see the raw
deniedevent, which often includes details about the specific resource or condition that failed.
Q4: Why are Service Account Keys considered a security risk, and what are the alternatives?
Q4: Why are Service Account Keys considered a security risk, and what are the alternatives?
Answer: JSON keys are long-lived (up to 10 years) and portable. If pushed to GitHub or stolen from a dev machine, they give an attacker permanent access until manually revoked.Alternatives:
- Workload Identity Federation: For GitHub Actions, GitLab, or AWS, use OIDC tokens to exchange for temporary GCP tokens.
- Instance Service Accounts: For VMs, use the attached service account and the metadata server.
- Short-lived Tokens: Use the
gcloud iam service-accounts generate-access-tokencommand for temporary sessions.
Q5: What is the 'Additive' nature of IAM permissions in GCP? Can you deny a permission inherited from a folder?
Q5: What is the 'Additive' nature of IAM permissions in GCP? Can you deny a permission inherited from a folder?
Answer: No, you cannot “deny” an inherited permission. GCP IAM is additive only. If a permission is granted at a higher level (e.g., Folder), it exists at all lower levels (e.g., Project).Workaround: To achieve a “deny” effect, you must use IAM Conditions or VPC Service Controls. Conditions can restrict access based on attributes, and VPC SC can block access to services entirely regardless of IAM permissions.