Enterprise IAM: Least Privilege and Workload Identity

Jan 28, 2025 Metasphere Engineering 8 min read

Mid-afternoon. A developer hits a permission error deploying a new Lambda function. The deadline is tomorrow. She adds "Effect": "Allow", "Action": "*", "Resource": "*" to the IAM policy because that is the fastest thing that makes the error disappear. The feature ships. The policy stays in place for the next two years because nobody remembers it exists and no automated process flags it. This pattern shows up at nearly every organization.

Now multiply that across a 50-person engineering organization over three years. You end up with an IAM posture that is mostly historical accident. Roles with permissions nobody uses. Service account keys that haven’t been rotated since 2022. Cross-account trust relationships created for a migration project that ended 18 months ago. And somewhere in that tangle, a privilege escalation path that lets an attacker who compromises a single developer workstation reach full admin access in production. It’s always there. The only question is whether you find it first.

The Verizon Data Breach Investigations Report consistently identifies credential misuse as a leading attack vector. In cloud environments, that almost always means IAM. Attackers who compromise a workload with a generous IAM role pivot across services, exfiltrate data from S3 buckets, and establish persistence through new IAM users. All of it uses legitimate cloud APIs, which makes detection dramatically harder than stopping traditional network intrusions.

The Wildcard Problem

IAM policies with "Resource": "*" are the most common misconfiguration in cloud environments. They show up because they’re convenient: a wildcard means you never update the policy when you add new resources. They persist because right-sizing feels lower priority than building features. That’s the wrong tradeoff, but it’s a trap that catches every team eventually.

The risk is concrete. A service with s3:GetObject on "Resource": "*" can read every bucket in the account, including ones it has no business accessing. A role with iam:PassRole on "Resource": "*" can assign any role in the account to any service it creates. When that service gets compromised (not if, when), the blast radius is the entire account.

AWS IAM Access Analyzer surfaces actual permission usage: which specific API calls a role has made over the past 90 days. That data is the evidence base for right-sizing. You’re not guessing what permissions to remove. You’re removing what was never used. GCP Policy Analyzer and Azure Advisor provide equivalent capabilities for their respective clouds. Teams that run this analysis monthly and trim unused permissions typically reduce their IAM surface by 40-60% within two quarters without breaking a single workload.

One critical detail most guides skip: add a small buffer when right-sizing. If a role used 12 permissions in the past 90 days, scope to those 12 plus 2-3 closely related permissions needed for common operational scenarios (like read access alongside the write access already exercised). Scoping too aggressively causes operational failures that scare teams away from right-sizing entirely. That’s worse than the original problem.

Eliminating Long-Lived Keys

Service account keys are a liability with no natural expiration. They sit in environment variables, get committed to code repositories, get copied between environments, and shared in Slack DMs. Every copy is a breach vector that persists until someone explicitly revokes it. And in most organizations, nobody tracks where the copies went. Active keys turn up in repos that were “archived” two years ago.

Workload identity federation eliminates the key entirely. This is the single highest-impact IAM improvement most teams can make. An EC2 instance assumes an IAM role via instance metadata profile. No key file. A Kubernetes pod uses a projected service account token to authenticate to cloud APIs via OIDC. No key file. A GitHub Actions workflow assumes an AWS IAM role using GitHub’s OIDC provider. No key file. In each case, the credential is short-lived (typically 1 hour), automatically rotated, and scoped to the specific workload. Nothing to steal, nothing to rotate, nothing to accidentally commit.

For the cases where static keys remain necessary (third-party integrations that only support key-based auth, certain legacy systems), enforce a 90-day maximum lifetime with automated rotation. HashiCorp Vault’s dynamic secrets engine issues short-lived credentials on demand and revokes them automatically. The point is removing human discipline from the rotation process. Any control that depends on someone remembering to do something will eventually fail. That’s not cynicism. That’s operational reality.

The migration path is straightforward: inventory all service account keys using aws iam list-access-keys across accounts, identify which workloads each key authenticates, and replace them one by one with identity federation. Most teams find that 70-80% of their keys can be replaced with federation in a single quarter. The remaining 20-30% require vendor coordination or legacy system updates. The security engineering practice covers the migration patterns for each major integration type.

Privilege Escalation Path Analysis

This is where IAM gets genuinely dangerous. Individual policies often look reasonable in isolation. The danger is in combinations. A role with iam:PassRole can assign any eligible role to an AWS service, creating an indirect path to higher privilege. A role with lambda:CreateFunction plus iam:PassRole pointing at an admin role can deploy a Lambda that executes with full admin access, even though neither permission alone looks dangerous. This is the attack chain that catches security teams who review policies one at a time.

Manual review cannot find these paths reliably at any scale. Don’t try. PMapper constructs a directed graph of all IAM entities and permissions in an account, then enumerates every path from a starting identity to a privileged target. AWS accounts typically have 3-8 non-obvious escalation paths when first analyzed. Some accounts have over 20. Every one of those paths is a breach chain that a determined attacker will find. The question is whether you find them first.

Run PMapper on every significant IAM change, not just quarterly. Build it into your infrastructure-as-code pipeline so that a Terraform plan adding iam:PassRole to a role triggers an escalation path analysis before the change is applied. If a new path is created, the PR gets flagged for security review. This is far cheaper than discovering the path during an incident.

Just-in-Time Access for Production

Standing privileged access to production is a risk you accept every second it exists. Every hour that credential stays active is another hour an attacker could use it.

Just-in-time (JIT) access flips the model: engineers request elevated access for a specific task with a defined time window. A manager or automated policy approves. The access is granted for 1-8 hours (depending on the task), fully logged, and automatically revoked when the window expires. Tools like AWS IAM Identity Center (formerly SSO), HashiCorp Boundary, and Teleport support this workflow natively.

You will hear resistance from engineers: “I need production access for debugging, and I need it now, not after an approval workflow.” Fair point. Handle it with tiered access. Read-only production access (logs, metrics, traces) should be broadly available without JIT. Write access to production data and infrastructure changes requires JIT. Emergency break-glass access exists as a documented, audited bypass for genuine Severity 1 incidents. This pairs naturally with cloud-native architecture patterns where observability reduces the need for direct production access.

Cross-Account Trust Architecture

Multi-account AWS architecture is a security best practice: production isolated, development separated, shared services in dedicated accounts. But the cross-account trust relationships that make this work are their own attack surface if not governed carefully. And they’re almost never governed carefully.

Every cross-account role trust policy should specify exact role ARNs in the trust condition, not entire account IDs. A trust policy that says “Account A can assume this role” means every identity in Account A can assume it. A trust policy that says “arn:aws:iam::ACCOUNT_A:role/cicd-deployer can assume this role” restricts it to a single, specific identity. That specificity is the difference between controlled access and an open door.

Audit cross-account trust relationships quarterly. They are the IAM equivalent of firewall rules: created for specific purposes, rarely reviewed, and gradually accumulating into an accidental architecture that nobody fully understands. Every trust relationship should have an associated comment or tag explaining why it exists and when it should be reviewed. The cloud-native IAM practices that prevent trust relationship sprawl start with this documentation at creation time. If you can’t explain why a trust relationship exists, it probably shouldn’t.

Frequently Asked Questions

What does least-privilege actually mean for cloud IAM in practice?

Least privilege means each identity can perform exactly the operations it needs and nothing more. In AWS, that means scoping IAM policies to specific resource ARNs instead of wildcards and separating read/write into distinct roles. AWS Access Analyzer shows which permissions a role actually used in the past 90 days. Teams that review and trim unused permissions monthly typically reduce their IAM permission surface by 40-60% within two quarters.

What is workload identity federation and why replace service account keys?

Workload identity federation lets a workload exchange a short-lived runtime token from Kubernetes, GitHub Actions, or EC2 instance metadata for cloud credentials without any static key file. No key means nothing to steal, rotate, or accidentally commit to git. Every major cloud provider supports this natively. Migrating from key-based auth to federation is the single highest-impact IAM improvement most teams can make.

What is a privilege escalation path and how do you find them?

A privilege escalation path is a chain of IAM permissions that lets a lower-privileged identity reach admin access. For example, a role with iam:PassRole plus lambda:CreateFunction can deploy a Lambda that executes with an admin role. PMapper constructs a directed graph of all IAM entities and identifies every path to a privileged target. AWS accounts typically have 3-8 non-obvious escalation paths when first analyzed.

What is just-in-time access and when should you use it?

Just-in-time access grants elevated privileges only for the duration of a specific approved task, then auto-revokes them. Tools like AWS IAM Identity Center, HashiCorp Boundary, and Teleport support JIT workflows with 1-8 hour time-to-live windows. JIT is most valuable for production database access, infrastructure change windows, and incident response. It reduces standing privilege exposure from always-on to hours.

How should cross-account access be structured in AWS?

Use IAM roles with trust policies, never shared credentials. A role in Account B trusts specific role ARNs in Account A, not the entire account. Audit cross-account trust relationships quarterly because they are frequently created for specific projects and left unrestricted long after the project ends, creating persistent access paths nobody remembers authorizing.