Data Encryption: Keys, Rotation, and Field-Level Protection
Full disk encryption. TLS 1.3 everywhere. Auditor signed off last quarter.
Then a SQL injection in a search endpoint hands an attacker authenticated database access through the application tier. They pull hundreds of thousands of records. Perfectly decrypted. Because disk encryption protects against someone stealing the physical drive, not against someone who walks in through the application’s front door. Every compliance checkbox ticked. None of them relevant to the actual attack vector.
The building has a strong front door. The attacker came in through the lobby. The safety deposit boxes were wide open.
- Disk encryption stops stolen drives. It doesn’t stop SQL injection. Application-layer breaches read decrypted data through legitimate access paths.
- The algorithm is never the problem. AES-256-GCM is the standard. Key management is where every system actually breaks.
- Envelope encryption limits blast radius. One compromised data key exposes one dataset, not everything encrypted under a single master key.
- Key rotation must be automated from day one. Manual rotation schedules slip. Automated rotation with 90-day expiry runs whether anyone remembers or not.
- Column-level encryption protects sensitive fields even from DBAs. Encrypt PII at the application layer before it reaches the database.
The NIST Cryptographic Standards algorithm is never the weak link. Key management is. Where keys are stored, how long they live, who has access, what the blast radius looks like when one leaks. These are engineering questions, not policy questions, and the answers determine whether encryption actually protects anything.
The Key Hierarchy
A chain of trust with three links. Get any one wrong and the others don’t matter.
The OWASP Cryptographic Failures category (A02:2021) documents these key management failures as the #2 most common web application vulnerability class. The root key sits in a HSM or cloud KMS. It never touches data directly. Its only purpose: encrypting the keys that encrypt other keys.
DEKs (Data Encryption Keys) sit one level down. Generated per-object, per-table, or per-tenant depending on your isolation needs. DEKs encrypt the actual data. The CMK encrypts the DEKs. The encrypted DEK gets stored right alongside the ciphertext. A compromised DEK exposes one dataset. A compromised CMK exposes DEKs, which you can re-wrap right away. Without envelope encryption, a single leaked key exposes everything.
The real payoff: key rotation without re-encrypting the data. Rotate the CMK and you re-wrap the DEKs (tiny key material). The actual data never moves. A 10TB database rotates in seconds, re-wrapping a few thousand DEKs, instead of days of I/O-intensive bulk re-encryption. Without envelope encryption, teams simply never rotate. They know they should. They can’t afford the downtime. So the keys sit unchanged for years.
# Envelope encryption: encrypt data with DEK, wrap DEK with KMS
import boto3
from cryptography.fernet import Fernet
kms = boto3.client('kms')
# Generate a DEK via KMS - plaintext for use, ciphertext for storage
key_response = kms.generate_data_key(KeyId='alias/app-cmk', KeySpec='AES_256')
dek_plaintext = key_response['Plaintext']
dek_encrypted = key_response['CiphertextBlob'] # Store this alongside data
# Encrypt the sensitive field
cipher = Fernet(base64.urlsafe_b64encode(dek_plaintext))
encrypted_ssn = cipher.encrypt(b"123-45-6789")
# Store: encrypted_ssn + dek_encrypted (never store dek_plaintext)
# To decrypt: KMS unwraps dek_encrypted → use plaintext DEK → decrypt field
Perfect key management protects stored data. But the threat model doesn’t stop at the storage layer.
Field-Level Encryption for Sensitive Data
| Encryption Layer | Protects Against | Does NOT Protect Against | Use For |
|---|---|---|---|
| Disk/volume (EBS, gp3) | Physical theft, decommissioned drives | SQL injection, compromised credentials, replicas | Baseline. All storage. |
| TLS in transit | Network sniffing, MITM | Authenticated attackers, application bugs | All connections |
| Field-level (app layer) | DB dumps, SQL injection, replica exposure | Key compromise, application memory inspection | PII: SSN, payment, health |
| Envelope (KMS + DEK) | Key exposure (only DEK exposed, rotatable) | KMS compromise, IAM misconfiguration | All field-level encryption |
| Confidential computing (TEE) | Cloud provider access, memory inspection | Side-channel attacks, app-level bugs | Multi-party computation, regulated AI |
For regulated PII, specifically Social Security numbers, payment card data, and biometric identifiers, the application encrypts the value before it touches the database. The database stores ciphertext. A full dump without the application’s keys is useless. Each safety deposit box has its own lock. Break into the building, you still can’t open them. Exactly the point.
The trade-offs are real though. Encrypted fields can’t be queried with normal SQL. No sorting. No range queries. No aggregations on ciphertext. You can’t search the contents of a locked box without opening it. Deterministic encryption allows equality lookups (“find by SSN”) but enables correlation attacks. Order-preserving encryption supports ranges but at a security cost too high for sensitive data. Most implementations encrypt a targeted subset of fields based on regulatory classification, not every column. The data privacy by design guide covers which fields actually warrant field-level protection.
| Data Category | Examples | Encryption Approach | Queryable? | Compliance Driver |
|---|---|---|---|---|
| Regulated PII | SSN, Tax ID, biometric identifiers | Field-level AES-256-GCM, encrypted in application tier before DB write | No (ciphertext only) | SOC 2, GDPR, HIPAA |
| Payment Data | Card numbers, bank accounts | Field-level AES-256-GCM, tokenization for recurring use | Via token lookup | PCI DSS |
| Protected Health | Diagnosis codes, health records | Field-level AES-256-GCM, access-logged decryption | No (ciphertext only) | HIPAA, HITECH |
| Operational PII | Email, phone, user profiles | Storage-layer encryption (transparent), TLS in transit | Yes (transparent) | GDPR, CCPA |
| Non-sensitive | Preferences, settings, public content | Storage-layer encryption (transparent) | Yes (transparent) | Best practice |
Don’t: Encrypt every column at the application layer. Full-database field-level encryption introduces query limits, performance overhead, and operational complexity that most columns don’t warrant. Putting every item in a safety deposit box when most of them are magazines.
Do: Encrypt the 10 most sensitive columns (SSNs, payment card numbers, health identifiers, authentication secrets) at the application layer. Everything else gets disk encryption and access controls. The valuables go in the vault. The staplers stay on the desk.
Rotation Without Downtime
Key rotation is where encryption goes to die.
Systems that treat keys as permanent infrastructure either never rotate or cause outages when they try. Both outcomes are bad. One is just quieter about it. (Not quieter forever. Just quieter until the audit.)
- Application supports versioned key identifiers in the encrypted payload header
- KMS/HSM supports generating new key versions without revoking old ones
- Data records include a key version indicator alongside the encrypted DEK
- Monitoring alerts when old-version decryption requests exceed a threshold after migration window
- Rollback procedure tested: application can revert to previous key version within minutes
Dual-key support solves the transition problem. The application keeps a versioned key list, tries the current key first, falls back to the previous version on failure, and lazily re-encrypts on successful old-key reads. No data goes dark during rotation. The old key phases out naturally as records get re-encrypted through normal operations. The transition is invisible to users.
Lazy vs. eager re-encryption: when to use each
Lazy re-encryption (on read) works well for datasets with high read rates. Records get re-encrypted naturally as the application accesses them. Downsides: cold records may never be re-encrypted, and you can’t retire the old key until every record has been read. Works best when most records are accessed within the rotation window.
Eager re-encryption (batch migration) processes records actively. Run a background job that reads, decrypts with the old key, re-encrypts with the new key, and writes back. Necessary for compliance scenarios with hard key retirement deadlines. The batch job needs rate limiting to avoid overwhelming the database with write amplification.
Hybrid approach combines both: lazy re-encryption for active data plus a batch sweep for stragglers after 80% of the rotation window. This catches cold records without running a full migration from the start.
Performance and Architecture Trade-offs
AES-256-GCM with hardware acceleration (AES-NI) is fast. Sub-microsecond per operation. You won’t notice it in isolation. KMS API calls are a completely different story.
Every decryption that calls KMS to unwrap a DEK adds 5-15ms of network latency. A page decrypting 20 fields? That’s 100-300ms if you call KMS for each one. Your users will notice. Your product team will find you.
| Caching Strategy | Latency (20 fields) | Security Posture | Best For |
|---|---|---|---|
| No cache (KMS call per decrypt) | 100-300ms | Maximum (no keys in memory) | Low-volume, high-sensitivity |
| Request-scoped cache | 15-45ms | High (keys cleared per request) | Most production apps |
| TTL cache (5 min LRU) | Sub-millisecond (warm) | Moderate (keys in memory briefly) | High-throughput, latency-sensitive |
| Never cache to disk | N/A | N/A | Absolute rule. No exceptions. |
Cache decrypted DEKs in application memory with a short TTL. Yes, this is a deliberate security trade-off. Plaintext DEKs in memory are theoretically accessible via memory dump. Five minutes with LRU eviction is reasonable for most production applications. Single-request caching with immediate clearing is the most conservative approach.
| DEK Caching Strategy | KMS Calls | Latency (20 encrypted fields) | Security Exposure | Recommendation |
|---|---|---|---|---|
| No cache | Every decrypt calls KMS | +100-300ms (5-15ms per call x 20) | Minimal. Keys never in memory | Development only. Too slow for production |
| Request-scoped cache | 1 call per unique DEK per request | +15-45ms (3 DEKs typical) | Request lifetime only | Good default for most services |
| TTL cache (5 min) | KMS call only on miss or expiry | Sub-millisecond (warm cache) | 5-minute window if memory compromised | Best performance. Use for high-throughput paths |
| Disk cache | Never | Zero | Permanent exposure if disk accessed | Never. DEKs must not touch persistent storage |
One thing teams consistently miss: encrypted fields need special handling in analytics pipelines, data lakes, and ML training. If your warehouse receives field-level encrypted columns, analytics can’t aggregate without a decryption step. The locked boxes go to the analysis department. They can’t open them. Build encryption into the data architecture from the start. Two years later, the architecture is concrete. For key management across hundreds of services, see secrets management at scale . Application security covers wiring field-level encryption without coupling key management to business logic.
What the Industry Gets Wrong About Data Encryption
“Encrypt at rest and you’re covered.” Disk encryption protects against one threat: physical media theft. An attacker with database credentials, a SQL injection, or access to a database replica reads decrypted data through the application’s own access path. Disk encryption is invisible to them. The vault door is locked. The attacker walked in through the lobby.
“HSMs are required for serious encryption.” For most organizations, cloud KMS with HSM-backed storage gives the same security at a fraction of the operational cost. HSMs are required for PCI DSS Level 1, FIPS 140-2 Level 3, or non-exportable root CA keys. For everything else, KMS is the pragmatic choice. You don’t need a bank vault for the petty cash drawer.
“Encryption makes data unrecoverable if you lose the keys.” With envelope encryption, losing a DEK affects one dataset. Losing the CMK is catastrophic, which is why KMS services replicate root keys across multiple availability zones with automatic failover. The real risk isn’t key loss. It’s key sprawl: hundreds of untracked DEKs across services with no inventory. Not losing the keys. Forgetting which keys go to which boxes.
That SQL injection from the opening. Full disk encryption didn’t stop it. Field-level encryption on the PII columns would have. Envelope encryption for fast key rotation. A caching layer that made the performance cost invisible. The building had a strong front door. The safety deposit boxes had their own locks. The attacker got into the lobby but left with ciphertext they couldn’t read. The algorithm was never the problem. The architecture around it was everything.