IAM: Least Privilege That Actually Holds

Jan 28, 2025 Metasphere Engineering 12 min read

A developer hits a permission error deploying a new Lambda function. The deadline is tomorrow. She adds "Effect": "Allow", "Action": "*", "Resource": "*" to the IAM policy because that’s the fastest thing that makes the error disappear. The feature ships. The policy stays in place for the next two years because nobody remembers it exists and no automated process flags it.

A master key cut because the right key wasn’t available. Left on the hook. For two years.

Multiply that across 50 engineers over three years. In cloud environments , the IAM posture is mostly historical accident. Attackers who compromise a single generous role pivot using legitimate APIs. Every action looks normal. CloudTrail shows API calls indistinguishable from developer activity, because the permissions granted are identical. The burglar has a master key. The cameras show someone walking calmly through the front door.

Key takeaways

Action: * policies stay in place for years because nobody remembers they exist and no automated process flags them. The fastest fix becomes the permanent architecture. The master key that was supposed to be temporary.
Least privilege is enforced by tooling, not policy. IAM Access Analyzer identifies unused permissions. Automated trimming reduces attack surface without developer friction.
Service account keys should have maximum 90-day lifetimes. Keys older than that are almost certainly forgotten. Prefer OIDC federation over long-lived keys entirely.
Cross-account trust relationships from old migrations are privilege escalation paths. Audit quarterly. Remove what’s no longer needed. The spare key you gave the contractor last year. They still have it.
Break-glass access needs an automated audit trail, not just a wiki page with instructions. Emergency admin access without logging is indistinguishable from compromise. The emergency key that doesn’t set off the alarm.

The Wildcard Problem

Every IAM audit tells the same story. Dozens of policies with wildcard actions, wildcard resources, or both. Each one created under deadline pressure. Each one forgotten right after the deployment succeeded. Master keys everywhere. Nobody remembers who cut them or why. The total result is an attack surface that grows without anyone intending it to.

IAM Anti-Pattern	Risk	How Common	Fix
`"Resource": "*"` wildcard	Blast radius = entire account	Pervasive in audits	Scope to specific ARNs
`"Action": "*"` admin access	Full admin to any service	Common in service roles	Grant minimum required actions
Long-lived access keys	Never rotated, leaked in repos	Keys routinely outlive their creators	Dynamic credentials via IAM roles
Cross-account trust without conditions	Any role in trusted account can assume	Widespread in multi-account setups	Add `sts:ExternalId` condition
Unused permissions retained	Attack surface grows without limit	Most permissions go unused	Monthly access analyzer review

IAM Access Analyzer surfaces actual usage: which API calls a role made over 90 days. The security audit that shows which doors each person actually opened. Remove what was never used. Teams trimming monthly see sharp drops in their permission surface within a few quarters.

Add a small buffer (12 used + 2-3 related). Downgrade the master key to one that opens 14 doors instead of every door in the building. Scoping too aggressively scares teams away from right-sizing entirely. An engineer whose deployment breaks because IAM was trimmed too tight will never trust the process again. (Break their deploy once and they’ll fight every future trim.)

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:GetObject",
      "s3:PutObject",
      "s3:ListBucket"
    ],
    "Resource": [
      "arn:aws:s3:::order-processing-prod",
      "arn:aws:s3:::order-processing-prod/*"
    ],
    "Condition": {
      "StringEquals": {
        "aws:PrincipalTag/team": "commerce"
      }
    }
  }]
}

Condition keys are the under-used feature of IAM policies. aws:PrincipalTag/team ensures only the commerce team’s roles can touch commerce buckets, even if another role accidentally gets the same S3 permissions. Defense in depth at the policy layer. The key card that checks your department badge before opening the door.

Eliminating Long-Lived Keys

Service account keys are the credentials that refuse to die. They get created for a CI pipeline, pasted into environment variables, copied to a developer’s laptop for local testing, committed to a private repo that isn’t as private as everyone assumes, and then forgotten. A copy of the master key left in the lobby “for the delivery driver.” Still there two years later. The key outlives the project it was created for, the team that created it, and sometimes the engineer who generated it.

Workload identity federation eliminates keys entirely. EC2 instance metadata, Kubernetes projected service account tokens, GitHub Actions OIDC. Short-lived (1 hour), auto-rotated, scoped to the specific workload. Temporary day passes that expire at 5pm. Nothing to steal, nothing to rotate, nothing to commit.

Where keys remain unavoidable: 90-day max lifetime, automated rotation via Vault. Most key-based authentication can be replaced with federation in a single quarter. Security engineering covers the migration patterns.

Anti-pattern

Don’t: Create a service account key, paste it into CI/CD environment variables, and consider the integration “done.” The key will outlive the project, the team, and possibly the person who created it. A master key copy left in the lobby for the courier. The courier left the company. The key is still there.

Do: Use OIDC federation. GitHub Actions, GitLab CI, and every major CI platform support native token exchange with cloud providers. No key to steal, rotate, or accidentally commit. Day passes. Not master keys.

Privilege Escalation Path Analysis

iam:PassRole + lambda:CreateFunction = deploy a Lambda executing with admin access. Neither permission looks dangerous alone. Together, they’re a privilege escalation path that most teams would never find by reviewing policies one at a time. Two harmless-looking key cards that, combined, open the vault.

Policy graph tools list every path to privileged targets. Typical accounts reveal several non-obvious paths on first analysis. Build this analysis into your IaC pipeline: Terraform adding iam:PassRole triggers escalation path analysis before apply. The engineer learns right away which combinations are dangerous, not months later during an audit. The locksmith who checks whether your new key accidentally opens the vault before cutting it.

JIT Access for Production

Standing admin access to production is a liability that pays no dividend. Nobody needs admin at 3pm on a random afternoon. They need it during incidents, during migration windows, during specific debugging sessions. A permanent badge that opens every door when you only visit the secure wing twice a quarter. JIT access matches the access to the need.

Request elevated access for a specific task. 1-8 hour TTL. Auto-revoke when the window closes. A visitor badge. Works for 4 hours. Auto-deactivates. Read-only production access can be broadly available. Write access requires JIT with an approval workflow. Break-glass for Sev1 incidents bypasses approval but generates a full audit trail. The emergency key behind the glass panel. Use it if you must. It sets off the alarm.

Access Tier	Who Gets It	How	TTL	Audit
Read-only production	All engineers	Default role	Permanent	Standard CloudTrail
Write access	Requesting engineer	JIT approval	1-8 hours	Enhanced logging + Slack alert
Admin / break-glass	On-call SRE	Self-service with audit	1-4 hours	Full session recording

Cloud-native observability reduces the need for direct production access in the first place. If your logs and traces are thorough, most debugging doesn’t require SSH. The best door is the one you don’t need to open.

Cross-Account Trust Architecture

Cross-account trust relationships are their own attack surface. Almost never governed as carefully as the roles within a single account. Giving another building’s tenants access to your building. Then forgetting they still have it three years later.

Specify exact role ARNs in trust policies, not account IDs. “Account A can assume” means every identity in Account A can assume. Every person in the other building gets a key to yours. A specific ARN restricts to exactly one. Audit cross-account trust relationships quarterly. Trust relationships created for a specific project and left unrestricted after that project ends are persistent access paths nobody remembers authorizing. The spare key you gave the contractor. The project ended. They still have the key.

The Wildcard Debt IAM policies with Action: * or Resource: * that were created under deadline pressure and persist for years because no automated process flags them and no engineer remembers they exist. Master keys cut in a rush and never collected. In a 50-person engineering org over 3 years, wildcard debt piles up until the IAM posture is mostly historical accident. The attack surface grows with every hire, every project, every deadline-driven shortcut that never gets cleaned up.

What the Industry Gets Wrong About IAM Security

“Least privilege is a policy, not an engineering problem.” Writing a policy document that says “use least privilege” doesn’t change the developer who adds Action: * because the deadline is tomorrow. Least privilege is enforced by automated tooling (IAM Access Analyzer, policy-as-code gates in CI) that makes over-permissioning harder than right-sizing. A sign that says “don’t copy the master key” next to a key-copying machine. Nobody reads the sign.

“Service accounts are fine with long-lived keys.” Service account keys are credentials that never expire, get copied to CI/CD variables, committed to repositories, and pasted in Slack DMs. Master keys photocopied and left in drawers across the building. Prefer OIDC federation for CI/CD. Where keys are unavoidable, enforce 90-day maximum lifetime with automated rotation.

Our take Run IAM Access Analyzer monthly and trim unused permissions. The analysis shows which API calls a role actually made in the past 90 days. Which doors each person actually opened. Scope to those calls plus a small buffer. The teams that adopt this discipline see their IAM attack surface shrink within two quarters without breaking a single workload. The key is the buffer. Trim too aggressively and you break deployments. Trim too conservatively and you’ve done nothing. Two to three extra permissions beyond observed usage is the sweet spot. Downgrade the master key. Don’t take away the key entirely.

That Action: * policy from the deadline scramble? The pipeline rejects it now. “Wildcard actions not permitted.” The developer scopes to exact permissions. Ten minutes of work instead of two years of silent overexposure. Same building. The master keys are collected. Every door has the right lock. Every person has the right card.

Frequently Asked Questions

What does least-privilege actually mean for cloud IAM in practice?

Least privilege means each identity can do exactly the operations it needs and nothing more. In AWS, that means scoping IAM policies to specific resource ARNs instead of wildcards and separating read/write into distinct roles. AWS Access Analyzer shows which permissions a role actually used in the past 90 days. Teams that review and trim unused permissions monthly consistently shrink their IAM permission surface within a few quarters.

What is workload identity federation and why replace service account keys?

Workload identity federation lets a workload exchange a short-lived runtime token from Kubernetes, GitHub Actions, or EC2 instance metadata for cloud credentials without any static key file. No key means nothing to steal, rotate, or accidentally commit to git. Every major cloud provider supports this natively. Migrating from key-based auth to federation is the single highest-impact IAM improvement most teams can make.

What is a privilege escalation path and how do you find them?

A privilege escalation path is a chain of IAM permissions that lets a lower-privileged identity reach admin access. For example, a role with iam:PassRole plus lambda:CreateFunction can deploy a Lambda that runs with an admin role. IAM policy graph analysis tools build a directed graph of all IAM entities and find every path to a privileged target. AWS accounts almost always reveal multiple non-obvious escalation paths when first analyzed.

What is just-in-time access and when should you use it?

Just-in-time access grants elevated privileges only for the duration of a specific approved task, then auto-revokes them. Tools like AWS IAM Identity Center and HashiCorp Boundary support JIT workflows with 1-8 hour time-to-live windows. JIT is most valuable for production database access, infrastructure change windows, and incident response. It shrinks standing privilege exposure from always-on to hours.

How should cross-account access be structured in AWS?

Use IAM roles with trust policies, never shared credentials. A role in Account B trusts specific role ARNs in Account A, not the entire account. Audit cross-account trust relationships quarterly because they get created for specific projects and left unrestricted long after the project ends, creating persistent access paths nobody remembers authorizing.