How xa11y Is Tested

xa11y drives the real accessibility tree of a running application. There is no way to mock that meaningfully, so the library is verified the same way it is used. Real GUI apps built on real toolkits are launched, and the tests assert against the trees their platform accessibility APIs actually expose. This page maps that test suite: what runs, and against what.

If you are deciding whether to depend on xa11y, this is the page that answers “how do I know it works?”

The suite at a glance

The library carries 700+ automated tests spanning Rust, Python, JavaScript, and the CLI. Roughly:

Suite	Approx. tests	Location	What it covers
Rust unit	~60	`xa11y/tests/unit_test.rs`	Selector engine, locators, roles, state sets, error mapping, serialization (against an in-memory mock provider)
Rust integration	~150	`xa11y/tests/integ/`	Live tree reads, action dispatch, events, screenshots against a running app
Python binding unit	~240	`xa11y-python/tests/`	The PyO3 surface: types, actions, exceptions, GIL release, subscriptions
Python integration	~100	`tests/suites/python/`	Per-app compat, actions, events, errors, input simulation, screenshots
CLI integration	~30	`tests/suites/cli/`	The `xa11y` CLI end-to-end against running apps
JavaScript unit	~115	`xa11y-js/__test__/unit/`	The napi-rs binding surface
JavaScript integration	~40	`tests/suites/js/`	Per-app compat, actions, input simulation, screenshots

Counts are approximate and grow over time, so read the figures above as a rough shape. The authoritative coverage index, recording which app is tested in which language for which feature, lives in tests/matrix.yaml.

Tested against real toolkits, not stubs

The integration suites run against a fleet of small test apps, each built on a different GUI toolkit, and each launched as a real process whose accessibility tree xa11y reads through the platform API. This is the part that matters: a passing test means xa11y handled a real AT-SPI2, AXUIElement, or UI Automation tree, not a fixture shaped to match the code.

Test app	Toolkit	Linux	macOS	Windows
`accesskit`	Rust + winit (AccessKit)	✓	✓ †	✓
`qt`	PySide6 (Qt)	✓	✓	✓
`gtk`	GTK4 (PyGObject)	✓	○	○
`cocoa`	Swift / AppKit	○	✓	○
`tauri`	Rust + WebView	✓	✓	✓
`electron`	Chromium / Node	✓	○	○
`egui`	Rust immediate-mode (eframe)	✓	✓	✓
`winforms`	.NET Windows Forms	○	○	✓
`wpf`	.NET WPF	○	○	✓

A hollow circle means the combination is not run in CI. † AccessKit on macOS runs locally through ./scripts/run_integ_tests_macos.sh rather than in CI, because the Rust integ suite needs an Accessibility (TCC) grant that hosted macOS runners cannot hold against a cargo-hashed binary path.

Tauri is the app carrying the input-simulation suite, so its three rows are what give each platform’s input backend an end-to-end test. Each of those platform claims is now checked: tests/matrix_check.py reads the CI matrix out of the workflow and fails the build when a declared platform has no cell running it.

Across those rows, xa11y is exercised against all three platform accessibility backends (AT-SPI2 on Linux, AXUIElement on macOS, UI Automation on Windows) and against toolkits that report their trees in meaningfully different ways: retained-mode native widgets, immediate-mode GUIs, two distinct web renderers, and two first-party Microsoft UIA providers rather than third-party bridges, by way of Windows Forms and WPF. The toolkit-specific quirks this surfaces are documented in Accessibility Quirks.

One suite, every binding

The per-app integration tests aren’t duplicated per language. Python, JavaScript, and the CLI run the same feature suites (compat, actions, events, errors, input simulation, screenshots) against the same running app, driven by a shared harness (tests/harness/launch.py). Binding parity is proven by every binding passing identical scenarios against an identical target, rather than asserted by hand.

The CI matrix

Every push runs the suites across operating systems and toolkits in GitHub Actions (.github/workflows/ci.yml). The integ job alone is a matrix of fifteen OS × app cells (e.g. ubuntu × {accesskit, qt, gtk, tauri, electron, egui}, macos × {cocoa, tauri, qt, egui}, windows × {tauri, qt, egui, winforms, wpf}), each standing up a headless display, accessibility bus, and the app under test before running the Python, JS, and CLI suites against it. Alongside it run the Rust unit and integration jobs per OS, two wire-level input jobs (Linux uinput read back through libevdev, Windows SendInput read back through low-level hooks), bindings builds and typechecks, cross-compile checks, license auditing, docs build with link checking, and the fuzzers below.

Fuzzing and robustness

The selector engine and tree operations are continuously fuzzed with libFuzzer targets in xa11y/fuzz/ (run on every CI push), and a separate live provider fuzzer (xa11y-fuzz/) stress-tests the provider interface against a running app with randomized action sequences.

Correctness is a stated discipline

The breadth above is backed by a small set of firm engineering tenets that every new piece of provider or binding code is held to: no silent fallbacks, action fidelity (an advertised action invokes the real platform action, never a substitute), errors that carry their own diagnosis, and blocking calls that release the host runtime’s lock. These are written out in full, with anti-patterns, in Architecture & Design and restated verbatim in the repo’s CLAUDE.md so human reviewers and automated ones apply them identically.

Missing a toolkit you care about?

The test-app fleet is deliberately chosen so that each app covers an accessibility surface the others don’t. If you work with a common UI framework that’s meaningfully different from everything in the table above, that’s a coverage gap worth closing. A different accessibility backend, a renderer family not yet represented, or a platform combination we don’t exercise all qualify. Please open an issue describing the framework and how its accessibility tree differs. New test-app coverage is very welcome.