← Back to Insights

Accessibility Engineering at Scale

Metasphere Engineering 17 min read

You run Lighthouse on your application and it returns a 94 accessibility score. The team marks accessibility as done. Ship it.

Six months later, a screen reader user files a support ticket. The address autocomplete dropdown is invisible to them. The quantity stepper has no name their software can announce. Pressing Escape in the payment modal dumps focus back to the top of the page instead of the button that opened it. Three failures in one checkout flow. Lighthouse saw none of them. It scans a frozen snapshot of the page. Your users navigate a live one. They’re stuck. Focus disappears. The keyboard does nothing.

A Lighthouse score is a building inspection that only checks the lobby. The hallways, the elevators, the emergency exits? Nobody walked those.

Key takeaways
  • A passing automated scan does not mean your site is accessible. Keyboard navigation, screen reader flow, and focus management require manual testing that no scanner can replace.
  • ARIA overuse is more harmful than underuse. Audit any production codebase and most ARIA you find is doing harm, not good.
  • Retrofitting accessibility costs far more than building it in. The DOM structure, ARIA patterns, and keyboard handling all need rethinking after the fact.
  • Two screen reader combos are all you need. NVDA + Chrome on Windows and VoiceOver + Safari on macOS. Every interactive component needs both.
  • CI gates are the only defense against regression. A single quarter of unguarded PRs can undo six months of accessibility work.
What automated scanners catchWhat only manual testing catches
Missing alt text, empty linksWhether alt text is meaningful in context
Contrast ratio violationsReadability under real-world screen conditions
Missing form labels, duplicate IDsLogical focus order through multi-step flows
Invalid ARIA attribute valuesWhether ARIA announcements make sense in sequence
Missing landmark regionsKeyboard use of custom widgets
Color-only state indicatorsScreen reader behavior differences (NVDA vs VoiceOver)

The ARIA Paradox: More Is Worse

The ARIA Paradox The more ARIA you add, the worse the screen reader experience gets. The paradox bites hardest when teams try hardest to be accessible.
Screen reader traversal of a web form showing what sighted users see versus what screen readers announceStep-by-step comparison of visual interface and screen reader announcements as a user navigates through a form, revealing how missing labels, broken ARIA, and poor focus management create an inaccessible experience that automated scanners missWhat Users See vs. What Screen Readers AnnounceVisual InterfaceScreen Reader AnnouncesVerdictFull NameEnter your name..."Full Name"edit text, blankLabel correctly associated via for/idPASSQuantity-3+"clickable"No role, no accessible namediv with onclick. SR cannot operate it.Needs role="spinbutton" + aria-labelFAILShipping MethodStandard Delivery"Standard Delivery, collapsed"combobox, but arrow keys do nothingHas role="combobox" but missing keyboardhandlers. ARIA says operable, but it is not.FAILPlace Order"Place Order, button"After click: silence. Focus jumps to <body>Success toast is visual only. No aria-liveregion. SR user does not know it worked.FAILLighthouse: 94/100vsReal SR test: 3 of 4 brokenScanner found 1 issue. Manual found 3.Automated scanners catch ~30% of issues. The rest requires testing with real assistive technology.Screen reader + keyboard testing is not optional. It is where the real accessibility bugs live.

ARIA fills gaps where HTML falls short. A custom dropdown built from div elements? It needs role="listbox", aria-expanded, and aria-activedescendant so the screen reader knows what it’s looking at. Get the ARIA right and custom widgets become usable. Get it wrong and the screen reader confidently announces something untrue. GPS telling you to turn left into a lake. Worse than no directions at all.

The first rule, straight from the W3C ARIA spec : don’t use ARIA if a native HTML element already does the job.

ARIA usage decision tree: when native HTML is sufficient versus when custom ARIA is neededStart with the interactive element needed. If a native HTML element exists, use it. Only add ARIA for extra state info. If no native element exists, build the custom widget with role, keyboard handlers, focus management, and screen reader testing.ARIA Decision Tree: Use Native HTML FirstNeed interactive elementNative HTMLelement exists?YesUse native elementbutton, select, input, dialogExtra stateinfo?YesAdd ARIA state onlyaria-expanded, aria-pressedNoNo ARIA neededNo: custom widgetAdd role + ARIA attributesAdd keyboard handlersAdd focus managementTest with screen readerHard path: 4 steps

Audit your component library for ARIA density. Components with more than 3-4 ARIA attributes per element are over-decorated. ARIA is seasoning, not the main course. Most production codebases use it like they’re salting for deer.

Common ARIA anti-patterns:

Don’tDoWhy
<div role="button"><button>Native element gets keyboard, focus, and announcements for free
<a href="/page" role="button"><a href="/page"> or <button onclick>Links navigate. Buttons act. Mixing them lies to the screen reader.
<input aria-label="Name"> with visible label<label for="name">Name</label>Two names fighting each other. The visible label gets ignored.
aria-hidden="true" on focusable elementRemove from tab order firstHidden from screen reader but keyboard still reaches it. An invisible trap door.

Keyboard Navigation Architecture

Keyboard accessibility isn’t slapping tabindex="0" on everything. It’s a real architecture decision. Give it the same thought you give routing or state management.

Tab order must follow visual reading order. When CSS Grid or Flexbox reorders elements visually, the two fall out of sync. Focus jumps across the layout. Floor 1, floor 7, floor 3. Willy Wonka’s elevator, but nobody’s having fun. The fix isn’t tabindex values. It’s making the DOM match what the eye sees.

For complex controls like tab panels, menu bars, and data grids, the roving tabindex pattern gives the whole widget a single Tab stop. Arrow keys move between items inside it. WAI-ARIA Authoring Practices spell out the expected keyboard behavior for each widget type.

Keyboard Navigation: Tab Stops for Major RegionsKeyboard Navigation: Tab Stops for Major RegionsSkip LinkFirst tab stopJumps to main contentHidden until focusedNavigationMenu items focusableArrow keys within menurole=navigationMain ContentHeadings as landmarksInteractive elements onlyrole=mainFormsLabels linked to inputsError messages announcedaria-describedbyFooterLinks focusablerole=contentinfoLast tab regionTab order follows visual order. Skip link at the top. Every interactive element reachable.
SPA focus management is where every frontend framework falls apart. The router renders new content. The browser doesn’t move focus. The keyboard user is stranded on the nav link they clicked. They walked through a door into a new room, but the lights didn’t come on. Tab through the entire page again to reach the content they came for. Every. Single. Time.

React Router, Vue Router, SvelteKit. None of them handle this out of the box. Three major frameworks, zero focus management. Impressive, in the worst way.

// Route change focus management - React example
useEffect(() => {
  const heading = document.querySelector('h1');
  if (heading) {
    heading.setAttribute('tabindex', '-1');
    heading.focus();
  }
  // Announce new page to screen readers
  const announcer = document.getElementById('route-announcer');
  if (announcer) announcer.textContent = document.title;
}, [location.pathname]);

The design systems engineering guide covers building keyboard patterns into component libraries from the start. That way teams don’t reinvent focus management in different, broken ways.

Route changes are one kind of focus disruption. Modals are another, and they’re sneakier.

Focus Trapping in Modals and Overlays

The <dialog> element handles focus trapping natively when you open it with showModal(). It creates a top layer. Tab and Shift+Tab cycle inside the modal. Focus returns on close. Just use it.

Stuck with <div>-based modals? Then you’re tracking every focusable element yourself, wrapping Tab at the edges, handling Escape, updating the trap when lazy content adds new fields. A room where the door is supposed to lock behind you, but you built the lock yourself and missed three windows. Every team underestimates this. (Every team.)

The inert attribute helps both approaches. Apply it to the main content wrapper when a modal opens. The browser pulls all background elements out of the tab order and the accessibility tree. The rest of the building goes dark while the modal has your attention. If you’re building new modals without <dialog> + inert, you’re reinventing the wheel. And it’s on fire.

Modal Focus Trap: Keep Focus Inside the DialogModal Focus Trap: Keep Focus Inside the DialogDialog OpensUser clicks buttonModal appearsFocus Moves InFirst interactive elementgets focus automaticallyBackground inertTab Cycles InsideTab at last element wrapsto first element in dialogFocus never escapesEscape ClosesFocus returns to triggerelement that opened dialogFocus in. Trap inside. Escape out. Return to trigger. Every modal, every time.

Color Contrast Beyond WCAG AA

WCAG AA requires 4.5:1 contrast for normal text and 3:1 for large text. These are minimums, and building to minimums is like studying just enough to pass. You won’t.

The 4.5:1 threshold assumes perfect conditions. Properly adjusted screen. Good lighting. Normal vision. The real world has screen glare, low brightness, aging displays, and roughly 8% of males with some form of color vision deficiency . All of that chips away at the contrast your users actually see. Build to AAA ratios (7:1 for normal text, 4.5:1 for large text) and you get breathing room that survives actual conditions.

Contrast isn’t just text. Focus indicators, selected states, disabled states all need contrast too. WCAG 2.1 added a 3:1 non-text contrast requirement for UI components. Most teams only check body text and call it done. That’s checking the front door paint while ignoring the fire escape signs.

Element TypeWCAG AA MinimumWCAG AAA TargetReal-World Note
Normal text (<18px)4.5:17:14.5:1 fails under screen glare, low brightness, or display aging. Target 7:1 for body text
Large text (18px+ or 14px bold)3:14.5:1Larger glyphs compensate for lower contrast. Still test on mobile screens
UI components (icons, borders, focus rings)3:1N/ANon-text elements that convey meaning. Often overlooked in audits
Decorative textNo requirementN/APurely decorative elements are exempt. But if it carries meaning, it needs contrast

Never rely on color alone to show state. A red error border and a green success border look the same to someone with red-green color blindness. Your UI is playing charades with 8% of the male population. Pair color with icons, text labels, or pattern changes. Every time.

None of this survives if the next sprint’s PRs quietly break what you’ve built.

CI Pipeline Accessibility Gates

A team spends a quarter building accessible components. The next quarter, twelve PRs land without accessibility review. A refactored modal loses its focus trap. A CSS change breaks focus indicator contrast. None trigger a test failure. Six months of work, quietly undone. The inspectors visited during construction and never came back. Tenants knocked out load-bearing walls. Entropy always wins unless you automate the fight.

CI gates are the only reliable defense.

Prerequisites
  1. axe-core integrated into component test runner (jest-axe for React, equivalent for Vue/Angular)
  2. Playwright or Cypress configured with @axe-core/playwright for full-page integration scans
  3. Storybook accessibility addon running against every story variation, including disabled and loading states
  4. PR template includes accessibility acceptance criteria as checkable items
  5. At least two team members can test with NVDA and VoiceOver for manual review

The baseline: axe-core in your test runner, catching the 30-40% of issues that scanners can find. jest-axe for component unit tests, Playwright’s @axe-core/playwright for full-page tests. Both on every PR. No exceptions.

The step most teams skip? Storybook accessibility testing. The @storybook/addon-a11y runs axe-core against every story variation. A button might pass in its default state but fail when disabled (not enough contrast) or loading (missing aria-busy). Skipping this means you’re testing the sunny day while shipping the thunderstorm.

Beyond scanning, accessibility acceptance criteria belong in the definition of done. “Focus moves to modal content on open and returns to trigger on close” is a testable requirement. Put it in the PR template. The PR doesn’t merge until it’s verified. No checkbox, no merge. Simple as that.

Accessibility CI Gates: Automated + Manual LayersAccessibility CI Gates: Automated + Manual LayersPR OpenedComponent changedTriggers CI pipelineaxe-core ScanAutomated WCAG checksCatches 30-40% of issuesBlocks PR on violationStorybook a11yComponent-level checksAdditional 10-15%Visual indicator in devManual ReviewKeyboard nav, screen readerCatches remaining 45-60%Required for new componentsAutomated catches the obvious. Manual catches the important. You need both layers.
The Accessibility Testing Pyramid Scanners at the base. Storybook checks per state. Playwright keyboard tests for focus architecture. Manual screen reader checks at the top. Each layer catches what the layers below miss. Skip one and the gaps add up silently.

Same model as security scanning in DevOps : every PR, blocks merge, clear ownership.

Screen Reader Testing That Catches Real Issues

Scanners catch structure. Screen readers catch experience. Big difference.

Two test combos cover the most common screen reader/browser pairings: NVDA + Chrome on Windows, VoiceOver + Safari on macOS. Each reads ARIA differently. Timing on live region announcements, detail level on dynamic content. What works in NVDA will surprise you in VoiceOver. Same spec, different accents.

The testing workflow: open the page with the screen reader active. Navigate using only the keyboard. Does it announce the role, name, and state? Interact with each control. Does the announcement update? Trigger dynamic content. Does the live region fire?

Write down what each screen reader says at each step. This transcript is your real accessibility spec. If NVDA says “button, Submit Order, expanded” and VoiceOver says “Submit Order, collapsed, button,” you’ve got a state sync bug that no automated tool will ever find. Two witnesses to the same event telling different stories.

For teams scaling accessibility UX engineering across multiple products, a shared library of screen reader transcripts per component type keeps teams from solving the same puzzles twice.

Live Regions and Dynamic Content

SPAs change content without page reloads. Sighted users see the change. Screen reader users hear nothing unless you announce it through a live region. The PA system in your building. If it’s not wired up, nobody outside the room knows what just happened.

aria-live="polite" announces when the screen reader finishes its current task. Use for non-urgent updates. aria-live="assertive" interrupts right away. Save it for errors and urgent alerts. Using “assertive” for a toast notification is like pulling the fire alarm to announce lunch.

The pattern: keep a visually hidden live region in your application root. Don’t add live regions on the fly. Some screen readers ignore regions added after the page loads.

<!-- In your app root - must exist before content changes -->
<div id="a11y-announcer" aria-live="polite" class="sr-only"></div>
// Announce after async operation completes
function announce(message) {
  const el = document.getElementById('a11y-announcer');
  el.textContent = ''; // Clear first to trigger change
  requestAnimationFrame(() => { el.textContent = message; });
}

// Usage
announce('3 search results loaded');
announce('Form saved successfully');
The Announcement Race Condition Move focus and update a live region at the same time? Some screen readers announce the focus change and swallow the live region update. Two PA announcements at once. The second one gets cut off. Add a small delay (100-200ms) between focus change and live region update. Both need room to breathe.

When Full Accessibility Engineering Is Overkill

Not every project needs the full treatment. Static sites with no interactive components can get close with automated scanning alone.

Full accessibility engineering neededAutomated scanning is enough
SPAs with client-side routingStatic content sites (blogs, docs)
Custom interactive widgets (comboboxes, data grids)Pages using only native HTML elements
Multi-step forms and checkout flowsSimple contact forms with native inputs
Dynamic content updates (live feeds, notifications)Server-rendered pages with full page reloads
Applications targeting regulated industriesInternal tools with small, known user base

If your site has a single custom dropdown, you’ve crossed the line. Test it with a screen reader. Dropdowns are the Bermuda Triangle of accessibility.

What the Industry Gets Wrong About Accessibility Engineering

“Run Lighthouse and fix the red flags.” Lighthouse catches 30-40% of issues. The 60-70% it misses are the ones that actually block users: broken focus management, nonsensical screen reader announcements, custom widgets that ignore the keyboard. Checking the lobby and calling the building safe.

“ARIA makes everything accessible.” ARIA is a last resort, not a first tool. Native HTML elements give you keyboard handling, focus management, and screen reader announcements for free. Most ARIA in production codebases is pointless or harmful. More signs doesn’t mean better wayfinding. Sometimes it means nobody can read any of them.

“Accessibility is a design problem.” It’s an engineering architecture problem. Keyboard focus flow, live region timing, focus trap implementation, route change management in SPAs. Design sets contrast ratios and color usage. Engineering decides whether a keyboard user can finish a purchase.

Our take Start with keyboard navigation, not screen readers. If every interactive element is reachable and works by keyboard alone, a surprising amount of the screen reader experience just falls into place. Teams that jump to screen reader testing before keyboard architecture is solid? They end up chasing symptoms instead of fixing the structure underneath. Get the hallways right and most of the signage takes care of itself.
Cost comparison: building accessible vs. retrofitting
ApproachDev TimeRiskLong-term Cost
Built accessible from the startSmall overhead per componentLowBaseline
Retrofitted after launchRework DOM, ARIA, keyboard handlersHigh (breaks things)Far more expensive
“Fix Lighthouse findings only”Cosmetic patches onlyVery high (false sense of safety)Lawsuit + full retrofit later
The Retrofit Multiplier Retrofitting costs many times more than building it right. And it stacks across a design system. 80 components. Each one needs auditing, rebuilding, and testing for breakage. You’re adding an elevator to a building designed without the shaft. You’re ripping open walls, rerouting plumbing, rebuilding entire floors. Teams on a deadline fix the automated findings and leave keyboard and screen reader issues untouched. Leadership thinks it’s solved. Nowhere close.

Web application engineering that treats accessibility as architecture avoids the retrofit entirely. The simplest culture shift? Tab through your own component before you ask for review. If focus jumps somewhere unexpected, the component isn’t done. Make it as automatic as running the test suite.

That Lighthouse 94? Three checkout failures hiding behind it, all findable with a single Tab key. Build the elevator shaft into the blueprint, or pay to tear the building apart later.

Stop Retrofitting. Build Accessible From Day One.

Retrofitting accessibility costs far more than building it in from the start. Engineer it into your component architecture, CI pipelines, and testing workflows. WCAG compliance becomes a build artifact, not a quarterly audit scramble.

Engineer Accessibility In

Frequently Asked Questions

What percentage of accessibility issues can automated scanners detect?

+

Automated scanners like axe-core and Lighthouse catch roughly 30-40% of WCAG 2.1 AA violations. They find missing alt text, poor color contrast, missing form labels, and duplicate IDs. They can’t tell whether alt text is meaningful, whether keyboard focus order makes sense, whether screen reader announcements are clear, or whether custom widgets work without a mouse. The remaining 60-70% needs manual testing with a screen reader.

How much more does it cost to retrofit accessibility vs building it in?

+

Retrofitting accessibility costs far more than building it in. Three costs compound: auditing every existing component to find violations, reworking DOM structure and ARIA patterns that were never built for screen readers, and testing every fix against everything that depends on it. Building accessible from the start adds a small overhead per component. Retrofitting that same component means tearing it apart while keeping everything around it working.

What is the most common ARIA mistake that breaks screen reader experience?

+

Overusing ARIA is more harmful than underusing it. The most common mistake is adding aria-label or role attributes to elements that already have built-in meaning. A button element with role=‘button’ is pointless. An input with both a visible label and aria-label creates two names fighting each other. The first rule of ARIA is: if a native HTML element does what you need, use it. Audit any production codebase and most ARIA you find is unnecessary or actively doing harm.

How do you test keyboard navigation in a single-page application?

+

Test three critical paths: Tab order must follow visual reading order through every route, focus must move to new content when routes change (typically to the h1 or a skip-link target), and focus traps in modals must stop users from tabbing to background content while returning focus to the trigger button on close. Playwright’s keyboard API automates these checks. Tab through every interactive element on each route and check that the focus sequence matches the expected order. Automated keyboard tests catch most focus management bugs before they reach production.

What WCAG color contrast ratio should engineering teams target?

+

Target AAA ratios of 7:1 for body text and 4.5:1 for large text, even though AA only requires 4.5:1 and 3:1. The reason is practical: designs that barely pass AA at 4.5:1 fail for users with mild vision loss who don’t use screen readers but struggle with low contrast. Roughly 8% of males have some form of color vision deficiency. Building to AAA creates breathing room that survives real-world conditions: screen glare, low brightness settings, and aging displays.