Progressive Web Apps: Offline-First That Works
You ship a progressive web app. The demo is flawless. Airplane mode, kill the connection, toggle it back. The app loads instantly, data persists, sync resolves. Your team celebrates.
The camping trip where everything went perfectly. Clear skies. Road open. Supplies fresh.
Then production happens. A user on a spotty train connection submits a form three times because the service worker retried quietly and the UI showed nothing. The cache served a stale login page after your auth deploy, locking users out for an afternoon. IndexedDB hit its quota limit on a 32GB iPhone crammed with photos. The W3C Service Workers specification defines the caching primitives. The failure modes? Those are entirely yours. The web.dev PWA guides cover the happy path. Production teaches you the unhappy ones.
The road washed out. The pantry had expired food. The journal had two conflicting entries. Nobody planned for the storm.
- Service workers retry quietly by default. On a spotty connection, the user submits a form three times and the UI shows nothing. Idempotency keys and UI feedback are mandatory, not nice-to-haves.
- Stale caches serve old auth pages after deploys, locking users out. Cache versioning with
skipWaiting()andclients.claim()must be deliberate, not automatic. - IndexedDB quotas vary wildly by device. A 32GB iPhone with photos and podcasts may have less than 500MB available. Handle quota errors gracefully or lose offline data quietly.
- Conflict resolution for offline writes is the hardest problem. Two devices edit the same record offline. Last-write-wins destroys data. Field-level merging or CRDTs are the real answer.
- Background sync only works while the service worker is alive. If the browser kills it (common on mobile), queued writes vanish unless persisted to IndexedDB first.
- HTTPS configured for all origins (service workers require secure contexts)
- Build pipeline generates content-hashed filenames for all static assets
- At least one library wrapping IndexedDB (Dexie.js or idb) in the data layer
- Playwright or equivalent set up for service worker interception in CI
- Cache storage budget defined per origin (target under 50MB for reliable cross-browser support)
The Service Worker Lifecycle Nobody Explains Well
Most bugs live in the transitions between install, activate, and fetch. The supplies manager’s shift change. A new service worker doesn’t activate right away. If an old one still controls the page, the new one waits until every tab running the old worker closes. By design, this stops split cache versions from serving mixed assets to the same user.
Most users never close their tabs. Your deployment sits in the waiting state for days. The new supplies manager waiting in the lobby. The old one still running the pantry.
Calling skipWaiting() forces the new worker to take over right away. But the page loaded with old assets, and the new worker is fetching with new routing rules. If your HTML points to a CSS file that the new precache renamed, the page breaks mid-session. No error. No warning. Just a broken layout the user has to refresh away. The new manager reorganized the pantry while people were eating. Plates break.
Safe pattern: skipWaiting() + clients.claim() in the activate handler, plus a version check that asks users to refresh when the controlling worker changes. Workbox’s workbox-window provides this via the controlling event. Never quietly swap cache versions under active sessions.
The lifecycle determines when your worker activates. What it does with requests after activation is the next minefield entirely.
Caching Strategies and When Each One Breaks
The wrong strategy on the wrong resource type causes more production PWA bugs than anything else. Packing the wrong supplies for the wrong trip.
| Strategy | Best For | Failure Mode | Timeout |
|---|---|---|---|
| Cache-first | Versioned static assets (hashed filenames) | Serves stale data forever on API responses | None needed |
| Network-first | HTML documents, fresh API data | Hangs 30+ seconds on slow connections without timeout | 3-5s for documents |
| Stale-while-revalidate | Avatars, metadata, non-critical lists | UI renders against stale data after breaking API change | N/A |
| Network-only | Auth, payments, mutations | No offline fallback at all | Connection timeout |
| Cache-only | App shell (guaranteed instant load) | Missing asset = hard failure | N/A |
Cache-first serves from cache, hits network only on miss. Perfect for versioned static assets with content hashes in filenames. Apply it to API responses and you serve stale data forever. Eating canned food that expired last year because nobody checked the label. Apply it to HTML and you serve yesterday’s layout after a deployment.
Network-first hits the network, falls back to cache on failure. The right choice for HTML and APIs where freshness matters. Go to town for fresh supplies. If the road is out, eat from the pantry. Without an explicit timeout, a slow connection hangs 30+ seconds while the cache has a perfectly good response sitting unused. Standing at the road waiting for the delivery truck in a snowstorm. Set network timeouts to 3-5 seconds for documents.
Stale-while-revalidate serves from cache right away and updates in the background. Sounds perfect. Causes the most subtle bugs. Deploy a breaking API change and the UI renders against stale data until the user navigates again. Serving yesterday’s menu while the kitchen preps today’s specials. Fine for avatars and metadata. A trap for anything where staleness causes functional breakage.
Don’t: Apply a single caching strategy globally across all routes. registerRoute(/.*/, new CacheFirst()) serves stale auth tokens and old HTML forever. Canning everything and calling it a pantry.
Do: Match strategy to resource volatility. Hashed assets get cache-first. HTML gets network-first with a 3-5 second timeout. Auth endpoints get network-only. Every route needs its own strategy. Every supply has its own shelf life.
Workbox Configuration That Ships
Workbox’s defaults are demo defaults. The build tool scans your output folder and precaches everything. Every JS chunk, image, font. On a mid-range phone over 3G, that download takes 30+ seconds before the worker even starts working. Packing the entire supermarket into the cabin. Limit precaching to the essentials: shell HTML, main JS bundle, core CSS, primary font. Everything else goes into runtime caching with explicit expiration. Pack essentials. Forage the rest as needed.
Runtime caches need expiration rules. Without ExpirationPlugin, your runtime cache grows forever. A pantry with no expiry dates. Set maxEntries and maxAgeSeconds on every runtime cache route. No exceptions. A cache that only grows is a quota error waiting to happen.
IndexedDB and the Sync Problem
IndexedDB is the only browser storage API that works for structured offline data at any real scale. The journal. The raw API is notoriously ugly. Libraries like Dexie.js and idb wrap it in a Promise-based interface that humans can actually use.
For offline-first apps, treat IndexedDB as the main data store. Reads and writes always hit local first. Write everything in the journal. A sync engine pushes changes when the connection comes back and pulls remote changes on a schedule. Send the journal entries to headquarters when the road reopens.
Storing data locally is the easy part. Packing the journal is easy. What happens when two copies diverge is where offline-first gets genuinely hard. Two people writing different entries on the same journal page while the phone lines are cut.
Last-write-wins is the default conflict strategy and the default source of data loss. User A edits offline on their phone. User B edits on their laptop. Whichever syncs last quietly overwrites the other. For user preferences, fine. For anything collaborative, it destroys trust. (The entry that vanished. Nobody knows why.)
| Strategy | Complexity | Best For | Limitation |
|---|---|---|---|
| Last-write-wins | Low | Preferences, settings | Quietly drops concurrent edits |
| Field-level merge | Medium | Forms, profiles, records | True conflicts still need manual resolution |
| Operational transforms | High | Collaborative text editing | Requires ordered operation logs |
| CRDTs | High | Real-time collaboration at scale | Complex to build, limited data structure support |
For most apps, field-level merging with manual conflict resolution is the right call. CRDTs are overkill unless you’re building a collaborative editor. A serverless architecture simplifies the sync backend since this is event-driven work at its core.
Cache Versioning Across Deployments
You deploy a new version. The new service worker installs, old caches get cleaned. But the user’s current page references old asset filenames that no longer exist in the precache or on the CDN. Every active session breaks quietly. New supplies arrive. Old labels on the shelves. Nothing matches.
Keep previous version assets available for 24-48 hours after deployment. Always use NetworkFirst for HTML documents. If the HTML is stale, every asset reference in it is potentially wrong.
Deploy order matters and isn’t optional: push new assets to CDN first, then deploy the new service worker, then clean up old assets after TTL expires. Stock the new supplies. Then rotate the pantry. Then throw out the old stock. Reverse this ordering and you create a window where the worker references assets that don’t exist yet.
App Shell vs. Streaming SSR
| Criteria | App Shell | Streaming SSR |
|---|---|---|
| Repeat visit speed | Near-instant (shell cached) | Fast (full page cached per route) |
| JS dependency | Content requires JS execution | Content visible without JS |
| Cache storage | Low (one shell for all routes) | Higher (full page per route) |
| Offline content | Shell only, content needs sync | Full page content available offline |
| Best for | Interactive, JS-driven apps | Content-heavy, article-based sites |
App shell caches a minimal HTML skeleton and fills content with JavaScript. Near-instant paint on repeat visits, but content needs JS to run first. The cabin with walls and a roof. Furniture arrives later. On a mid-range phone, the browser still has to parse, compile, and run JavaScript before any content shows up. The cached shell gives you a fast frame. The content inside still waits.
Streaming SSR with navigation preload serves the cached header while streaming fresh page content from the server. The cabin fully furnished on arrival. No JavaScript required for initial content display. More cache storage per route, but each route loads with real content offline.
Content-heavy sites: streaming SSR. Interactive JS-driven applications: app shell. Good UI/UX engineering makes offline states feel intentional, not broken.
Testing Offline Behavior in CI
Playwright supports service worker interception and network emulation out of the box. Write tests that install the worker, check precache, go offline and navigate, then simulate a deploy with a new version and verify the update flow. Run these in CI, not by hand.
For IndexedDB, seed with real-world volumes. A test that writes 10 records never hits quota limits. 50,000 records with realistic payloads show you the cliffs. Test on WebKit specifically. Safari’s IndexedDB handles transactions differently and clears data more aggressively than Chromium. The cabin that runs differently in winter.
Network testing goes beyond just online/offline. Use Playwright’s route.abort() to fake partial failures: API returning 502, CDN serving stale assets, WebSocket dropped but HTTP still working. These messy states expose gaps that clean on/off testing misses completely. The road that’s passable for cars but not trucks. Progressive web app
offline testing in CI catches service worker regressions before they reach users.
What the Industry Gets Wrong About Progressive Web Apps
“Service workers make apps work offline automatically.” Service workers cache resources. They don’t handle conflict resolution, quota limits, or the UX of telling users their data hasn’t synced yet. Stocking the pantry is the easy part. Managing inventory, expiry dates, and two people reaching for the last can is the actual work.
“Background sync handles offline writes.” On mobile, browsers kill idle service workers aggressively. If the write was only in memory and not persisted to IndexedDB, it vanishes. Supplies that weren’t packed before the storm hit. Background sync is a delivery mechanism, not a persistence layer. Treat it as an unreliable transport and persist everything locally first.
That flawless demo? No cache versioning. No sync conflict resolution. No quota handling. No service worker update prompts. The camping trip with perfect weather. Every one of those is load-bearing in production. The demo worked because it ran once, on fast WiFi, with fresh caches. Your users won’t be that generous. A cloud-native architecture gives the backend resilience your offline-first frontend needs for reliable sync and asset delivery. The road back to town. Make sure it’s paved.