Accessibility API Quirks
Building anything on top of the desktop accessibility stack — a screen reader, a test framework, an automation tool, or an AI computer-use agent — means meeting three completely different APIs (Linux AT-SPI2 over D-Bus, macOS AXUIElement, Windows UI Automation) and a long tail of toolkit-specific behavior on top of them. The APIs disagree on roles, actions, threading, and even on whether accessibility is on. Most of the surprises are undocumented, or documented only in a vendor’s source tree.
This page is a running catalog of those quirks, collected while building and testing xa11y against real applications across all three platforms. None of them are bugs in xa11y — they are inherent to the platform APIs and toolkits. We publish them here so the next person who hits an empty tree, a phantom error code, or a role that changed between OS releases can find the answer with a search instead of a week of bisecting.
Each entry names the platform, the symptom, the verbatim error or flag where one exists, and a workaround.
Accessibility is opt-in — and often silently off
Section titled “Accessibility is opt-in — and often silently off”The single most common failure across every platform: the API connects, the app is found, but the tree is empty or a bare skeleton, with no error to tell you why. Accessibility is opt-in at the application and the OS-policy level, and the “off” state usually looks identical to “this app has no UI.”
Linux: AT-SPI2 must be enabled on the bus
Section titled “Linux: AT-SPI2 must be enabled on the bus”The accessibility bridge is gated by a D-Bus property, org.a11y.Status.IsEnabled.
If it is false (the default on a headless or freshly-booted session), toolkits
never publish their trees. Turn it on:
dbus-send --session --dest=org.a11y.Bus /org/a11y/bus \ org.freedesktop.DBus.Properties.Set \ string:org.a11y.Status string:IsEnabled variant:boolean:trueYou also need the bridge processes actually running:
/usr/libexec/at-spi-bus-launcher --launch-immediately &/usr/libexec/at-spi2-registryd &To check whether a given app registered with AT-SPI2 at all:
busctl --user tree org.a11y.atspi.Registry | grep -i "<app-name>"If the app’s subtree is missing, the problem is the app’s accessibility configuration, not your client.
Linux: Chromium and Electron ship the renderer bridge disabled
Section titled “Linux: Chromium and Electron ship the renderer bridge disabled”Chromium-based apps (Google Chrome, Chromium, VS Code, Cursor, Slack, Discord,
and every other Electron app) register with AT-SPI2 but expose only an
application → frame skeleton until launched with --force-renderer-accessibility.
Without it, locator("button").count() returns 0 even though buttons are
plainly visible — and the only place this is documented is Chromium’s own source
tree.
google-chrome --force-renderer-accessibilitycode --force-renderer-accessibility # VS Codecursor --force-renderer-accessibilityThe environment variable ACCESSIBILITY_ENABLED=1 has the same effect for some
Chromium builds.
Representative node counts on Ubuntu 24.04 + GNOME 46 (Wayland):
| App | Without flag | With flag |
|---|---|---|
| VS Code | 1 | 140 |
| Cursor | 1 | 116 |
| Chrome | 1 | 210 |
You can detect the condition programmatically: Chromium reports
Application.ToolkitName == "Chromium", so a Chromium/Electron frame that yields
zero filtered children is almost certainly the renderer bridge being off rather
than a bad selector.
Linux: Firefox needs an environment variable
Section titled “Linux: Firefox needs an environment variable”Firefox exposes its tree only when launched with MOZ_ACCESSIBILITY_ATK2=1 set
in the environment — the same class of issue as Chromium, with a different knob.
MOZ_ACCESSIBILITY_ATK2=1 firefoxNative GTK apps (Nautilus, GNOME Terminal, GNOME Calculator, gnome-text-editor) need no flag — their AT-SPI2 bridge is enabled by default.
macOS: the client process needs the Accessibility (TCC) permission
Section titled “macOS: the client process needs the Accessibility (TCC) permission”macOS refuses to let one process read another’s accessibility tree unless the reading process holds the Accessibility TCC permission (System Settings → Privacy & Security → Accessibility). Without it, queries return an empty tree, not an error.
In CI you grant it by writing the system TCC database and restarting tccd. The
critical gotcha: TCC matches the resolved on-disk binary, not a symlink or a
launcher. A virtualenv python symlink, or a cargo run wrapper, will not
inherit the grant — you must grant the real interpreter/binary path:
# resolve the real path firstgrant_tcc "$(.venv/bin/python -c 'import sys; print(sys.executable)')"This also bites test runners that use content-addressed binary paths: macOS TCC grants “don’t cope with cargo-hashed test-binary paths” — every rebuild produces a new hash and a new, ungranted binary.
macOS: web content needs accessibility turned on per-process
Section titled “macOS: web content needs accessibility turned on per-process”Like Chromium on Linux, Chrome/Electron on macOS gate their renderer
accessibility tree. Chrome honors --force-renderer-accessibility; WebKit/Safari
content is exposed lazily and may require the AXManualAccessibility attribute to
be set on the web area before the DOM is reflected into the AX tree.
Roles and the role enum disagree across platforms
Section titled “Roles and the role enum disagree across platforms”macOS roles are open-ended strings, not an enum
Section titled “macOS roles are open-ended strings, not an enum”AXRole and AXSubrole are plain CFStrings. AXRoleConstants.h lists ~50
conventional roles, but there is no closed enum — apps and web content report
arbitrary values. WebKit alone adds "AXWebArea", "AXLink", "AXHeading", and
many more that aren’t in the SDK header. Any code that switches over a fixed set
of role strings will silently mishandle real-world content.
Windows: every top-level window is a Window, dialogs included
Section titled “Windows: every top-level window is a Window, dialogs included”UIA maps all top-level windows — ordinary windows and modal dialogs alike —
to UIA_WindowControlTypeId. To tell a dialog apart you must read
UIA_IsDialogPropertyId (Windows 10 1703+). Native frameworks such as Qt set
IsDialog on a QDialog without populating AriaRole, so the same dialog
reports as a dialog on macOS/Linux but as a plain window on Windows unless you
check that property.
AriaRole, conversely, is populated only by ARIA/web content (Electron,
AccessKit) — never by native Win32/Qt/WPF apps — so you cannot rely on it for
native role refinement.
A nasty corollary: because any top-level window with IsDialog=true becomes a
dialog, transient system popups — UAC prompts, credential dialogs, Windows
Update — will flip a window’s role out from under you. Desktop state is not
deterministic.
Linux: roles are a 131-value C enum on the wire
Section titled “Linux: roles are a 131-value C enum on the wire”AtspiRole is a C enum (131 values) sent as a u32. Examples:
PUSH_BUTTON (43), CHECK_BOX (7), ENTRY (79), TEXT (61), FRAME (23).
States are a separate 64-bit bitfield (AtspiStateType), not properties —
ENABLED (8), FOCUSED (12), VISIBLE (30), EDITABLE (7), CHECKED (4).
Qt tabs: tab on Windows, radio_button on macOS
Section titled “Qt tabs: tab on Windows, radio_button on macOS”Qt’s accessibility bridges map a QTabBar tab to different roles per
platform: Windows UIA reports tab (inside a tab_group), while the macOS
AX bridge reports radio_button. A selector that should work on both
platforms has to match either:
dialog.descendant("tab[name='Settings'], radio_button[name='Settings']").press()WebKitGTK changed the role it reports for <textarea> between releases
Section titled “WebKitGTK changed the role it reports for <textarea> between releases”WebKitGTK 2.52 changed which AT-SPI role it reports for an HTML <textarea>,
breaking any selector keyed on the old role after a routine libwebkit2gtk-4.1
upgrade (2.50.4 → 2.52.3). The portable fix is to never trust the toolkit’s
text role directly — derive single-line vs multi-line from the MULTI_LINE
state bit, which is consistent across GTK (GtkEntry/GtkTextView),
Qt (QLineEdit/QPlainTextEdit), and every WebKit version.
One process, several “applications”
Section titled “One process, several “applications””Windows: Qt registers dialogs as separate UIA apps
Section titled “Windows: Qt registers dialogs as separate UIA apps”A QDialog opened inside a host process (e.g. a Qt submitter dialog inside
a C++ DCC application) can register with UIA as its own top-level
“application”, sharing the host’s PID but living outside the host app’s
element tree. Searching the host app for the dialog finds nothing — you
must attach to the dialog’s own app. Predicate-based discovery handles this
without hand-rolling an enumeration loop:
dialog_app = xa11y.App.find( lambda a: a.pid == proc.pid and a.name.startswith("My Submitter"), timeout=60.0,)On macOS AX the same dialog is an ordinary child window of the host app — so dialog discovery is one of the few places where platform-specific code is legitimately required.
Two corollaries, both confirmed against live trees:
- The dialog’s accessible name is the
QApplicationdisplay name, notwindowTitle(). Both the UIA “app” and the dialog window inside it surfaceQGuiApplication::applicationDisplayName()(typically the app name + version), so match on a name prefix rather than the window title you see in the title bar. - Child popups may be hosted elsewhere. A
QMessageBoxshown from such a dialog can land in the host application’s tree, not the dialog’s UIA app. When the owner is unpredictable, search from the system root (module-levelxa11y.locator(...)) and wait for the popup’s body text.
Qt forms don’t label their fields
Section titled “Qt forms don’t label their fields”QFormLayout does not propagate a row’s label to the field’s accessible
name. In practice sibling widgets surface with the enclosing group’s name
(e.g. three spin boxes all named "Job Properties") or with an empty name.
Selectors can’t disambiguate by name alone — pin the widget with
.nth(n) (1-based, tree order) or scope to a parent group first, and
re-verify the indices when the form’s layout changes.
Action names and mechanisms are not standardized
Section titled “Action names and mechanisms are not standardized”Linux: AT-SPI action names differ by toolkit
Section titled “Linux: AT-SPI action names differ by toolkit”Actions are string-named via org.a11y.atspi.Action, and the names are not
standardized. GTK calls its default-activate action "click"; Qt calls it
"Press". Chromium uses its own: "doDefault" (default activation, ~190
elements in a single Chrome window) and "showContextMenu" (~230 elements). A
client that looks for a hardcoded "press" will report ActionNotSupported on
half the ecosystem unless it normalizes through an alias table.
Chromium on Linux exposes no EditableText interface — anywhere
Section titled “Chromium on Linux exposes no EditableText interface — anywhere”This is a big one for text input. No Chromium element implements
org.a11y.atspi.EditableText anywhere in its tree (confirmed across 500+
nodes). There is no accessibility path to set the value of a Chromium/Electron
text field on Linux — attempts hit D-Bus UnknownMethod / UnknownInterface
errors. The only options are keyboard synthesis (xdotool / ydotool / uinput) or
a fix upstream in Chromium.
GTK menu buttons hide the real action on an inner widget
Section titled “GTK menu buttons hide the real action on an inner widget”GtkMenuButton, AdwMenuButton, and AdwSplitButton present an outer
push button accessible that advertises NActions = 0 — pressing it raises
ActionNotSupported — wrapping an inner toggle button that carries the real
click action. Every stock GNOME app ships these (Calculator, Text Editor,
Characters, Baobab, Logs, Clocks), so “the button I can see has no press action”
is a common, confusing result. The workaround is to walk a bounded slice of the
outer widget’s subtree (gated on Application.ToolkitName == "GTK") and invoke
the single actionable descendant.
GTK4 CheckButton reports NActions = 0
Section titled “GTK4 CheckButton reports NActions = 0”A GTK4 CheckButton can report zero actions via the Action interface even
though it is toggleable, so toggle() via DoAction is unavailable — toggling
must go through the press path. AccessKit’s AT-SPI bridge has the mirror-image
quirk: it hardcodes the action name as "click", never "toggle".
macOS has no AXToggle — AXPress is the native toggle
Section titled “macOS has no AXToggle — AXPress is the native toggle”There is no AXToggle action. AXPress is the platform’s toggle for
checkboxes, switches, and radio buttons. Code expecting a dedicated toggle action
on macOS will never find one.
macOS has no ScrollIntoView equivalent
Section titled “macOS has no ScrollIntoView equivalent”There is no AX action to scroll an element into view. On Linux this maps to
Component.ScrollTo; on Windows to ScrollItemPattern.ScrollIntoView; on macOS
it is a genuine no-op.
Numeric vs text values diverge
Section titled “Numeric vs text values diverge”- Linux: the
org.a11y.atspi.Valueinterface isf64-only. Text values requireEditableText; if neither is present you get a “not supported” result. WebKit sliders on Linux historically did not honorValue.SetCurrentValue. - macOS:
AXValueis set directly for both text and numeric. - Windows: numeric goes through
RangeValuePattern, text throughValuePattern— andtype_textis a non-atomic read-modify-write splice viaTextPattern.GetSelection()+ValuePattern.SetValue().
Qt spin boxes step — they don’t set
Section titled “Qt spin boxes step — they don’t set”Setting a Qt QSpinBox / QDoubleSpinBox value through the accessibility
API does not work on either Windows or macOS: depending on the platform and
call, the set either raises or reports success without changing the
value. The reliable cross-platform pattern is to step with
increment() / decrement() and read the value back after each step —
and don’t trust a fixed step count, because a single increment() can
occasionally advance the value by more than one:
spin = dialog.descendant("spin_button[name='Priority']")current = int(spin.element().value)while current != target: spin.increment() if current < target else spin.decrement() current = int(spin.element().value)(Bound the loop in real code so an out-of-range target can’t spin forever.)
Text fields and checkboxes are unaffected — set_value() and toggle()
work on Qt as expected.
Element references go stale — constantly
Section titled “Element references go stale — constantly”All three backends mint a fresh handle on every read
Section titled “All three backends mint a fresh handle on every read”Windows/UIA, macOS/AX, and Linux/AT-SPI2 each mint a brand-new handle every
time you cache or re-read an element. You cannot use a handle’s identity to
dedup or compare elements across two queries — the same on-screen element returns
disjoint handles, so naive identity-based dedup silently returns zero results.
Key elements by a stable property instead: UIA GetRuntimeId, AT-SPI
(bus_name, object_path), or a structural tree path.
AccessKit republishes the whole subtree on any mutation
Section titled “AccessKit republishes the whole subtree on any mutation”AccessKit’s AT-SPI bridge regenerates the entire subtree (and all accessible object paths) on a structural mutation. A snapshot captured before an action is invalidated the moment that action rebuilds the tree, so a second operation on the old reference surfaces a stale-object platform error. The robust pattern is to re-resolve the selector after every mutation rather than holding element handles.
UIA tree updates are asynchronous
Section titled “UIA tree updates are asynchronous”After an action that changes structure (closing a dialog, navigating), the UIA tree can take longer than 100 ms to settle. During that window a perfectly valid selector is transiently un-findable:
SelectorNotMatchedError: No element matched: button[name="OK"]Poll/retry around structural changes rather than asserting immediately.
Phantom errors that aren’t really errors
Section titled “Phantom errors that aren’t really errors”Windows: EVENT_E_ALL_SUBSCRIBERS_FAILED (0x80040201)
Section titled “Windows: EVENT_E_ALL_SUBSCRIBERS_FAILED (0x80040201)”Qt’s UIA providers for QTabBar/QTabWidget raise automation events
synchronously inside their Invoke() / Select() handlers. When no UIA event
subscriber is active (typical on a CI runner with no screen reader), COM returns:
EVENT_E_ALL_SUBSCRIBERS_FAILED (0x80040201)Qt propagates this back through the action call even though the UI action
itself completed successfully. The same code can surface from
FindAllBuildCache during a find. xa11y treats 0x80040201 as success on
action calls (press/toggle/select), and on query calls — where a value is
still needed — retries the call a few times before giving up.
Windows: FindAllBuildCache silently drops fragment elements
Section titled “Windows: FindAllBuildCache silently drops fragment elements”FindAllBuildCache(TreeScope_Subtree) rooted at a UIA fragment returns
incomplete results, because the default Control View filter excludes any
element that doesn’t declare IsControlElement=true. Virtual/fragment providers
(Qt, AccessKit) often don’t, so they vanish. Pass a TrueCondition tree filter
(Raw view) to see them.
Windows: SetFocus fails on disabled elements
Section titled “Windows: SetFocus fails on disabled elements”UIA SetFocus fails outright on a disabled element. Tests and automation should
target an always-enabled element to verify focus behavior.
Windows: screenshots fail in Session 0 / disconnected RDP
Section titled “Windows: screenshots fail in Session 0 / disconnected RDP”GDI BitBlt returns ERROR_INVALID_HANDLE in disconnected-RDP or Session-0
contexts — there is no visible desktop to capture.
Threading and process model
Section titled “Threading and process model”- Windows UIA is COM, MTA-only. Event handlers fire on a background MTA
thread (
CoInitializeEx(NULL, COINIT_MULTITHREADED)). Use per-handlerRemoveXxxEventHandler—RemoveAllEventHandlersnukes every concurrent subscription in the process. - macOS AXObserver is CFRunLoop-bound. Callbacks deliver a live
AXUIElementRefvalid only during the callback — capture what you need synchronously. - Linux AT-SPI is D-Bus. A dedicated accessibility bus (separate from the session bus) carries all traffic; the registry daemon tracks apps and routes events.
Memory safety: AX values can throw through your FFI
Section titled “Memory safety: AX values can throw through your FFI”On macOS, a misbehaving AX value’s -release or -getTypeID can throw an
NSException. If it unwinds through an extern "C" boundary it aborts the
process. Worse, partial extraction paths can hit use-after-free / double-release
(e.g. releasing a CFNumber and then reading its type-id). Any production AX
client must wrap raw CoreFoundation / AX calls (CFRetain, CFRelease,
CFGetTypeID, CFNumberGetValue, CFArrayGetValueAtIndex, …) in
@try/@catch and treat every FFI return as fallible.
Events: every platform has gaps
Section titled “Events: every platform has gaps”Push-based accessibility events are richer than polling, but each platform omits things the others deliver:
- Linux AT-SPI2 has no menu-open/close signal at all —
MenuOpened/MenuClosedcan never fire. Consumers must poll or watch structural changes. - GTK4 sometimes skips
Focus:Focus, emitting onlyObject:StateChanged(focused, true). Listen for both. StateChanged{Enabled}arrives as eitherStateChanged(enabled, _)orStateChanged(sensitive, _)— AT-SPI2 still exposes both names.- WebKit2GTK and Electron often omit
Text:TextChanged— you may have to fall back toValueChangedon text roles. - macOS
AXValueChangedcarries no position or delta — only the element. There is noAXTextChanged-with-position notification; diffing old vs new text is the only option. - macOS scopes many notifications to element-level observers — table
row/selection changes,
AXMenuItemSelected,AXCreated/AXMoved/AXResized,AXLayoutChangedare not delivered to an app-level observer. - Windows UIA has no first-class window-activated event. Inferring it from focus changes is lossy: it misses alt-tab and tool windows, and fires spuriously on in-app focus moves. Older/non-native apps (Java Swing, VB6, Adobe products, some browsers) have incomplete UIA providers and may not fire events at all.
Performance: reading a tree is O(N) cross-process calls
Section titled “Performance: reading a tree is O(N) cross-process calls”Two of the three platforms have no bulk-read API, so traversing a tree means a storm of IPC:
| Platform | Per-node cost | Bulk subtree fetch? |
|---|---|---|
| Linux (AT-SPI2) | 6–10 D-Bus round-trips per node | No — GetChildren returns references, you re-query each child. (Cache.GetItems exists but returns only a partial property set for the whole app.) |
| macOS (AX) | 2–3 Mach IPC calls per node | No — but AXUIElementCopyMultipleAttributeValues batches one node’s attributes into a single round-trip. |
| Windows (UIA) | One call for a whole subtree | Yes — IUIAutomationCacheRequest + FindAllBuildCache fetches an entire subtree with chosen properties/patterns in one cross-process call. |
The practical consequence on Linux and macOS: a naive full-tree walk of a
complex app is slow enough to feel hung. Prefer narrow, combinator-constrained
queries (toolbar > button[name='Save']) so the engine never descends subtrees
it doesn’t need.
Text and string surprises
Section titled “Text and string surprises”Bidi control characters leak into reported strings
Section titled “Bidi control characters leak into reported strings”Platforms — notably macOS for some LTR app configurations, and RTL apps
everywhere — embed Unicode bidirectional format controls into the strings
they report (name, value, description). These are invisible but break naive
equality:
el.value == "5" # False — the value is actually "" + "5" + ""The offenders are U+200E (LRM), U+200F (RLM), U+202A..U+202E
(embeddings/overrides), and U+2066..U+2069 (isolates). Strip them before
comparing; keep ZWJ/ZWNJ, which are legitimate content.
at-spi2-core can drop names containing control characters
Section titled “at-spi2-core can drop names containing control characters”Some at-spi2-core configurations drop an accessible’s Name entirely when it
contains non-printable control characters — so an element with a perfectly good
visible label becomes undiscoverable by name. This is configuration-dependent
(the same app can keep the name under a different at-spi2-core build), which makes
it especially confusing.
Linux input simulation and Wayland
Section titled “Linux input simulation and Wayland”Wayland has no standard accessibility-driven input path, and the architecturally “correct” route is not broadly available yet:
-
xdg-desktop-portal.RemoteDesktop(libei) is implemented today only byxdg-desktop-portal-gnomeand-kde, and needs a GDM/SDDM-managed session — so there is no working portal backend on wlroots compositors (sway, Hyprland) or headless servers. -
/dev/uinputis the compositor-agnostic fallback (the same mechanismxdotool --using-uinput,ydotool,wtype, Steam Input, and Wine use), but it requiresinputgroup membership, which grants global input read/write — comparable to macOS Input Monitoring:Terminal window sudo usermod -aG input $USER # then re-login
In containers, Docker’s /dev tmpfs isolation hides host-created
/dev/input/event* nodes even with -v /dev/input:/dev/input; a udev rule like
KERNEL=="event[0-9]*", MODE="0666" is needed to expose them.
macOS focus and launch races
Section titled “macOS focus and launch races”CGEventPost(and OS focus) target whichever app is frontmost. On a CI runner an onboarding/background window — Setup Assistant, Notification Center, Software Update — can hold the front slot and silently swallow every synthetic event, leaving an empty event log. Claim frontmost for your target app before driving input; don’t assume your window has focus just because it’s visible.open -ahas a-600quit-then-relaunch race. Re-launching an app that is mid-quit fails with error-600; and “the process exists” does not mean “its accessibility tree is ready” — poll for the tree, not just the PID.
Coordinate systems
Section titled “Coordinate systems”- macOS reports points, not pixels. On a Retina display 1 point = 2 physical pixels — hit-testing against a screenshot taken in pixels will be off by the scale factor.
- Windows / Linux report physical pixels, and multi-monitor setups produce negative coordinates for displays positioned left of or above the primary.
Further reading
Section titled “Further reading”- AT-SPI2 architecture · D-Bus interface specs · AtspiRole enum · AtspiStateType enum
- macOS Accessibility API overview · AXRoleConstants.h · AXAttributeConstants.h
- UI Automation overview · UIA_IsDialogPropertyId
- AccessKit
- xa11y’s own Platform Details and Testing in CI guides, and the running list of upstream issues we’ve filed or documented.