Frontend Error Tracking: Session Replay and RUM

Nov 21, 2025 Metasphere Engineering 12 min read

You ship a release on a Friday afternoon. Backend dashboards are green. Error rates flat. Latency percentiles nominal. You close your laptop feeling good about it. An hour later, customer support lights up: “the page won’t load,” “I click the button and nothing happens,” “everything is blank after I log in.” You pull up the server logs. 200 OK. Every single request. The backend served the HTML, the JavaScript, and the API responses correctly. The failure happened after the response left your infrastructure and entered the user’s browser.

This is the blind spot that backend observability will never close on its own. The server did its job. The bug lives in how JavaScript executed in a specific browser version, on a specific device, with a specific combination of extensions and network conditions. Your Prometheus metrics, your Grafana dashboards, your distributed traces. None of them saw a thing. You’re debugging production with half the picture, and it’s the wrong half.

Backend Telemetry Has a Visibility Ceiling

Server-side observability answers one question well: “did the server respond correctly?” It cannot answer “did the user experience actually work?” Those are different questions, and they have different answers far more often than most teams want to admit.

A 200 response with valid JSON tells you nothing about the React component that crashed during hydration. Nothing about the third-party script that blocked the main thread for 800ms, making the page feel frozen. Nothing about the user’s ad blocker ripping out the DOM element your click handler was bound to, triggering a cascade of null reference errors.

Here’s the uncomfortable truth: the browser is a hostile execution environment. Your code runs alongside extensions you didn’t install, on hardware you didn’t spec, over networks you don’t control, in JavaScript engines with wildly different optimization characteristics. A function that runs in 2ms on your M2 MacBook takes 200ms on a three-year-old Android phone with 2GB of RAM. Backend observability was never designed to see any of this.

Real User Monitoring vs Synthetic Monitoring

These are complementary tools that answer different questions. Conflating them is the mistake that catches every frontend team eventually.

Synthetic monitoring runs scripted browser tests from controlled infrastructure. Lighthouse CI in your GitHub Actions pipeline. WebPageTest on a scheduled cron. Catchpoint or Datadog Synthetic from global probe locations. Synthetic tells you “did anything change between releases?” It’s a regression detector. The environment is controlled, so you isolate the impact of your code changes from real-world noise.

Real User Monitoring collects metrics from actual sessions. Every page load, every interaction, every error. The data comes from the full distribution of devices, networks, browsers, and user behaviors that your synthetic tests will never replicate. RUM reveals that your P75 LCP is 2.1 seconds but your P95 is 6.4 seconds, and that 95th percentile is concentrated in Southeast Asia on Android Chrome with slow 4G connections. Synthetic would never have found that.

Here’s how to think about it: synthetic monitoring belongs in CI/CD, RUM belongs in production observability. Synthetic catches regressions before they ship. RUM measures the actual user experience after they ship. Teams that use only synthetic end up optimizing for a lab environment that doesn’t match their real user base. Teams that use only RUM catch problems after users have already suffered. You need both.

Session Replay Architecture

Session replay lets you watch exactly what the user experienced. Not an approximation. Not a log reconstruction. The actual sequence of DOM states, mouse movements, clicks, and visual output the user saw. The first time you watch a replay of a user hitting your bug in the wild, it changes how you think about debugging.

The dominant recording architecture is mutation observer-based, pioneered by the open-source rrweb library and adopted by Sentry Replay, LogRocket, and FullStory. Take an initial DOM snapshot when the session starts, then record every DOM mutation, scroll event, mouse movement, and input interaction as a stream of incremental events. On playback, reconstruct the initial snapshot and replay mutations in sequence.

This is dramatically more efficient than video recording. A typical session produces 50-200KB of compressed data per minute versus 500KB-2MB for video. The DOM-based approach also lets you search sessions by DOM state (“show me all sessions where the error modal appeared”) instead of scrubbing through hours of video.

Sampling strategy determines whether this stays affordable or blows your budget. Recording every session on a high-traffic site generates enormous data volumes. The approach that actually works is tiered sampling: record 100% of sessions that contain errors, 100% of sessions from users who contact support, and 5-10% of everything else. Sentry, LogRocket, and Datadog all support conditional sampling rules. Replay storage scales linearly with recorded sessions, so getting the sampling rate right is the difference between a useful debugging tool and a line item that gets killed in the next budget review.

Source Map Management

Production JavaScript is minified, bundled, and mangled by the build toolchain. When a user triggers an error, the stack trace points to main.a7f3e2.js:1:48293. Without source maps, that’s useless. You’re staring at obfuscated noise.

The workflow is straightforward: generate source maps during the build, upload them to your error tracking service (Sentry, Datadog, Bugsnag), then strip the source map files from the deployment artifact so they never reach browsers. Sentry’s sentry-cli handles this in a single CI step. Tag the upload with the git commit SHA so every error event links to the exact source code that generated it.

Here’s where teams consistently break this: source maps get out of sync with deployed code. The release identifier doesn’t match. A hotfix deployment skips the upload step. Source map retention expires before anyone investigates the incident. Set retention to at least 90 days. Make the upload step a required CI check, not an optional post-deploy script that someone forgets to wire up.

And never serve source maps publicly in production. They contain your original, unminified source code including comments, variable names, and internal logic. That’s publishing your source repository. Map files belong in your error tracking service’s backend, accessed only during stack trace symbolication.

Error Grouping and Deduplication

A single broken feature generates thousands of identical error events in minutes. Without intelligent grouping, your error dashboard becomes a wall of noise where a critical regression hides among 47 variations of the same Chrome extension error. Teams miss real checkout-breaking bugs for hours because it was buried under extension noise.

Fingerprinting is how error tracking tools decide which events belong to the same issue. The default heuristic groups by error type and top stack frame. This works for simple cases. It breaks badly when the same root cause produces different stack traces depending on the code path. Think: a null reference error that shows up in three different components, all caused by a single missing API field.

Custom fingerprinting rules fix this. In Sentry, define fingerprint rules that group errors by specific attributes: the error message pattern, the failing API endpoint, or a custom tag. Group all TypeError: Cannot read property 'map' of undefined errors from the product listing page into a single issue regardless of which component threw. 300 noisy events become 1 actionable issue.

Performance Monitoring in the Browser

With error grouping dialed in, the next layer of frontend observability is performance. Core Web Vitals collection via RUM gives you the field data that Lighthouse’s lab data only approximates. Three metrics matter for both user experience and search ranking:

Largest Contentful Paint (LCP) measures when the main content becomes visible. The threshold is 2.5 seconds. In field data, LCP comes down to two factors: server response time (Time to First Byte) and the load time of the LCP element (usually a hero image or large text block). Serve LCP images with fetchpriority="high" and preload them. This single change moves the needle more than most other optimizations combined.

Interaction to Next Paint (INP) measures responsiveness. The threshold is 200ms. INP captures the worst interaction latency during the session. The usual culprits: long JavaScript tasks blocking the main thread, excessive DOM size causing slow style recalculation, and third-party scripts fighting for execution time. Break long tasks using scheduler.yield() or setTimeout(0) patterns. Mature frontend UX engineering practice treats INP as a first-class performance metric alongside LCP.

Cumulative Layout Shift (CLS) measures visual stability. The threshold is 0.1. CLS fires when elements shift position after the user starts reading. The root causes are almost always the same: images without explicit dimensions, dynamically injected content above the fold, and web fonts triggering a layout shift on load. Use CSS aspect-ratio or explicit width/height attributes on every media element.

Collect these metrics per page, per device class, per geography. The aggregate number is nearly useless. Your LCP might be 1.8 seconds overall but 4.2 seconds for mobile users in India. That disaggregated view is what actually drives web performance improvement decisions. Implementing observability and monitoring for the frontend means treating browser metrics with the same rigor as server-side telemetry.

The Privacy Problem with Session Replay

Session replay records what users do. That dataset contains PII. Names in form fields, email addresses, physical addresses, payment information visible on screen. Without aggressive masking, your session replay database is a compliance liability waiting to become a headline.

Every major replay tool supports element-level masking. The implementation: mask all <input> elements by default. Mask any element with a data-sentry-mask (Sentry), data-lr-hide (LogRocket), or equivalent attribute. For GDPR compliance, masking must happen at recording time, not at playback time. If unmasked data reaches the replay service’s servers, the processing has already occurred regardless of whether anyone views the replay. That distinction matters to regulators.

Text masking replaces visible text content with asterisks or placeholder characters while preserving element dimensions and layout. You can debug layout issues and interaction flows without exposing PII. Network request masking strips sensitive headers and request/response bodies from the recording.

The strictest compliance posture records DOM structure and interactions but replaces all text content with length-matched placeholders. You can still see that the user filled in a form, clicked submit, saw an error, and navigated away. You cannot see what they typed. For most debugging scenarios, the interaction sequence is more diagnostic than the actual content anyway.

Alert Fatigue from Frontend Errors

Frontend JavaScript errors in production are absurdly noisy. Browser extensions inject scripts that throw errors your code didn’t cause. Ad blockers remove DOM elements your code references. Automated crawlers execute JavaScript in ways nobody anticipated. If you alert on every unhandled exception, your on-call engineer will disable frontend alerting within a week. It happens like clockwork.

The filtering pipeline that makes frontend alerting survivable: first, discard errors with stack traces containing chrome-extension://, moz-extension://, or safari-extension://. Second, maintain a known-noise fingerprint list for errors that consistently originate from non-application code. Third, separate bot traffic using the User-Agent string and exclude it from error rate calculations.

After filtering, alert on rate-based thresholds, not individual events. “Frontend error rate increased 5x above the rolling 24-hour baseline for users on the checkout page.” That’s actionable. “TypeError in main.js at line 48293” is noise without context, volume, and impact assessment.

Correlating Frontend and Backend

This is the highest-value capability in frontend observability, and the one most teams set up last when they should set it up first: connecting a browser-side error to the exact backend request that preceded it. Without this correlation, the frontend team opens a ticket saying “users report errors on checkout” and the backend team responds “all backend metrics are healthy.” Both are correct. Neither can find the root cause alone. This standoff wastes hours every time it happens.

The implementation: inject a trace ID header into every outgoing fetch and XMLHttpRequest from the browser. Sentry and Datadog RUM SDKs do this automatically when they detect the corresponding backend APM agent. The frontend error event carries the trace ID. The backend distributed trace carries the same ID. Click from the frontend error to the backend trace and you see exactly which database query timed out, which service returned an unexpected response, or which middleware rejected the request. One click. Full picture.

For teams not using a commercial tool with automatic correlation, the manual approach takes an afternoon. Create a request interceptor that generates a UUID, attaches it as a custom header (X-Trace-ID), and stores it on the error event context. On the backend, extract that header and attach it to the span context. The correlation is now queryable from either direction.

Building a comprehensive DevOps practice means extending observability from the server into the browser. The frontend is not a separate domain. It’s the final mile of your distributed system, and it deserves the same observability investment as every service behind the load balancer. Teams investing in frontend engineering should treat error tracking, session replay, and RUM as foundational infrastructure. Not optional add-ons you bolt on after the first production fire. By then, the users you lost aren’t coming back.

Frequently Asked Questions

What is the difference between Real User Monitoring and synthetic monitoring?

RUM collects performance data from actual user sessions across real devices, networks, and geographies. Synthetic monitoring runs scripted tests from controlled locations on controlled infrastructure. RUM captures the true distribution of user experience including the long tail of slow devices and poor connections. Synthetic catches regressions before users do. Production teams need both: synthetic in CI to prevent regressions, RUM in production to measure what users actually experience.

How much performance overhead does session replay add to a page?

Modern session replay tools using mutation observer-based recording (rrweb architecture) add 1-3% CPU overhead and 50-200KB of compressed data per minute of session. The initial DOM snapshot is typically 30-100KB compressed. Recording overhead increases with DOM complexity. Pages with over 5,000 DOM nodes or frequent DOM mutations (real-time dashboards, chat interfaces) see higher overhead. Sample replay recording at 5-10% of sessions to control bandwidth costs while maintaining enough data for debugging.

How do you correlate a frontend error with the backend request that caused it?

Inject a trace ID into every outgoing fetch or XMLHttpRequest using a request interceptor. Sentry and Datadog RUM do this automatically when their SDKs detect the matching backend APM agent. The frontend error event carries the trace ID, linking directly to the distributed trace on the backend. Without this correlation, frontend and backend teams investigate the same incident in parallel without realizing it, doubling investigation time.

What percentage of frontend errors are caused by browser extensions and bot traffic?

In production, 30-60% of unfiltered frontend JavaScript errors originate from browser extensions, ad blockers, or bot traffic rather than application code. Common noise sources include Chrome extensions injecting scripts that throw errors, ad blocker scripts failing when blocked resources are referenced, and automated crawlers executing JavaScript incorrectly. Filter by checking error stack traces for extension:// URLs, maintaining a known-noise fingerprint list, and separating bot traffic using User-Agent analysis.

What source map configuration is needed for production error deobfuscation?

Upload source maps to your error tracking service during the build step, then strip them from the production deployment. Sentry uses sentry-cli with the releases API. Datadog accepts maps via datadog-ci sourcemaps upload. Never serve source maps publicly in production because they expose your original source code. Include the build commit SHA as the release identifier so error events link to the exact code version that generated them. Retention of 90 days of source maps covers most debugging needs.