Skip to content

Overview

xa11y is a cross-platform library for reading accessibility trees, taking actions, listening to accessibility event streams, synthesising low-level input, and capturing screenshots.

What is desktop accessibility?

Most desktop applications expose an accessibility tree — a structured representation of their UI intended for screen readers and other assistive technology. Each platform has its own API for this (macOS AXUIElement, Windows UI Automation, Linux AT-SPI2), but the underlying concepts are the same. xa11y provides a single API that works with all of them.

Trees

An accessibility tree mirrors the visual hierarchy of an application. Here’s what one looks like:

[0] application "Calculator"
[1] window "Calculator"
[2] group "Display"
[3] static_text "Result" value="42"
[4] group "Keypad"
[5] button "7"
[6] button "8"
[7] button "9"
[8] button "+"
[9] button "4"
...

Every element in the tree has properties. For example, here’s the full data for button "+":

role: button
name: "+"
value: None
states: enabled, visible
bounds: (272, 200, 80, 40)
actions: press, focus

Actions

Accessibility APIs let you interact with elements: press buttons, type into fields, toggle checkboxes, expand menus, scroll, and more. xa11y exposes all of these through a unified action API.

Input simulation

For the small set of gestures with no accessibility equivalent — drag-and-drop, scroll wheels, global shortcuts — xa11y also ships an InputSim façade that synthesises OS-level pointer and keyboard events. Input simulation lives in its own surface (xa11y::input_sim() / xa11y.input_sim() / inputSim()) and never falls back from an accessibility action automatically — prefer locator.press() and locator.type_text() whenever they work.

Screenshots

xa11y can capture pixel-level snapshots of the full screen, a region, or the pixels under an accessibility element. Useful for attaching visual evidence to test failures or feeding vision-language models the same frame the agent is reasoning about. See the Screenshots guide.

Events

Accessibility APIs also emit events when the UI changes — elements appearing or disappearing, focus moving, names or values updating. For example, clicking a text field might emit:

FocusChanged app="Calculator" target=text_field "Display"

xa11y supports subscribing to these event streams per-app via app.subscribe().

How does xa11y compare to

  • AccessKit — AccessKit is for adding accessibility to an app. xa11y is for reading accessibility data from the outside.
  • Playwright — Similar concept, specific to web browsers. Major design inspiration for xa11y’s Locator pattern and selector syntax.
  • pyatspi2 / direct UIAutomation — Single-platform libraries. xa11y wraps these underlying APIs into one cross-platform interface.
  • agent-desktop — CLI tool built on xa11y for AI agents to read and interact with desktops via accessibility data.

Concepts

Diagram showing xa11y's three concepts: Read (App → .children() → Element → .children()/.parent()), Act (App → .locator() → Locator → Actions/Waits/.element()/.elements()), and Listen (App → .subscribe() → Subscription → recv()/wait_for()/iter())

xa11y is built around three core types:

  • App — entry point for interacting with an application. Construct via App::by_name(), App::by_pid(), or App::list(). An App provides three paths: app.children() for navigating the tree, app.locator(selector) for actions and waits, and app.subscribe() for event streams.
  • Element — a live handle to a node in the accessibility tree. Navigate with children() and parent() — each call queries the platform lazily. Read properties like role, name, value, states, bounds. Capture the subtree as a structured snapshot with tree(max_depth) or as an indented string with dump(max_depth). Actions (press(), set_value(), etc.) are available too, acting on the captured handle without re-resolution or auto-wait.
  • Locator — a lazy selector that re-resolves on every operation. Use for actions (press(), set_value(), and others), waits (wait_visible(), and others), and querying (element(), elements()). Auto-waits for visible+enabled before acting and is the recommended path for most code. Inspired by Playwright’s Locator pattern.
  • Subscription — live event stream from an app. Created via app.subscribe(). Pull events with try_recv(), recv(timeout), wait_for(predicate, timeout), or iterate with iter(). Drop to unsubscribe.
  • Selector — CSS-like query syntax used by Locator. Supports role matching, attribute filters, combinators, and nth selection.

Connecting to an app

use xa11y::*;
use std::time::Duration;
// Connect by name. The second argument is a polling timeout —
// useful when the app may not yet be registered with the a11y API.
// Pass `Duration::ZERO` for a single attempt with no waiting.
// (Returns PermissionDenied if accessibility is not enabled.)
let safari = App::by_name("Safari", Duration::from_secs(5))?;
// Connect by PID
let app = App::by_pid(1234, Duration::from_secs(5))?;
// List all running apps
let all = App::list()?;

Reading accessibility data

Lazy navigation

app.children() returns the top-level elements (typically windows). Each Element supports children() and parent() for further navigation — every call queries the platform lazily, so you always see the current state of the UI.

let calc = App::by_name("Calculator", Duration::from_secs(5))?;
let windows = calc.children()?;
let children = windows[0].children()?;
let parent = children[0].parent()?;
// Read properties (via Deref to ElementData)
println!("{} — {}", children[0].role, children[0].name.as_deref().unwrap_or(""));

Element properties: role, name, value, description, bounds, states, actions, numeric_value, min_value, max_value, stable_id, pid.

name, value, and description are stripped of Unicode bidi format controls (LRM, RLM, embeddings, isolates) so that equality assertions match the logical text. The unstripped platform string is preserved on element.raw under the platform-native key (AXTitle/AXValue/AXDescription/AXHelp on macOS, atspi_name/atspi_value/atspi_description on Linux, uia_name/uia_value/uia_help_text on Windows).

Element navigation and snapshots: children(), parent(), tree(max_depth), dump(max_depth).

Elements are live

Every call to children() or parent() queries the platform. There are no cached snapshots — you always see the latest tree state.

let windows = calc.children()?;
println!("{:?}", windows[0].children()?[0].name); // queries platform
// UI changed? Just navigate again:
let windows = calc.children()?; // fresh query

To find specific elements, use a Locator (below) — it uses CSS-like selectors to find elements efficiently.

Subtree snapshots

element.tree(max_depth) captures the subtree rooted at an element as a recursive data structure (role, name, value, children). element.dump(max_depth) formats the same snapshot as an indented string — useful for debugging and test diagnostics. Both accept an optional max_depth argument: 0 returns only the root node, 1 includes its direct children, and so on. Omit for the full subtree.

let dialog = app.locator("dialog").element()?;
println!("{}", dialog.dump(Some(3))?);
// dialog "Settings"
// group "General"
// check_box "Enable notifications"
// check_box "Start at login"

Taking actions

Locators

A Locator holds a selector and re-resolves on every operation. Create one via app.locator(selector). Actions always target the current state of the UI.

let calc = App::by_name("Calculator", Duration::from_secs(5))?;
let btn = calc.locator("button[name='=']");
// Each call re-resolves the selector, finds the element, then acts
btn.press()?;
// Read properties via .element()
let element = btn.element()?;
println!("{:?}", element.name);
// Wait for state changes (polls until condition met)
btn.wait_enabled(Duration::from_secs(5))?;
// Get all matching elements
let buttons = calc.locator("button").elements()?;
println!("Found {} buttons", buttons.len());

Best practice: make selectors specific so they stay stable as the UI evolves. Don’t select button — select button[name='Cancel'] inside a specific group or window. A vague selector like button can become unreliable when new elements are added — it may match the wrong element or start returning multiple matches unexpectedly.

Actions: press(), focus(), blur(), toggle(), select(), expand(), collapse(), set_value(), set_numeric_value(), type_text(), select_text(), increment(), decrement(), show_menu(), scroll_into_view().

Waits: wait_visible(), wait_hidden(), wait_attached(), wait_detached(), wait_enabled(), wait_disabled(), wait_focused(), wait_unfocused(), wait_for_state(), wait_until().

Queries: element() (single match), elements() (all matches), exists(), count().

Chaining: nth(n), first(), child(selector), descendant(selector).

Locator re-resolves its selector on every operation, making it resilient to UI changes between calls. To inspect element properties or traverse the subtree, call .element() first — tree() and dump() live on Element, not Locator.

Element actions

The same action verbs are also available directly on Element. A Locator re-resolves its selector and auto-waits for the element to be visible and enabled before acting; an Element acts on the snapshot you already captured, with no re-resolution and no auto-wait. If the underlying element has been destroyed since you fetched it, the call returns an ElementStale error.

Reach for Element actions when you’ve already inspected an element — typically after .dump() or reading .actions — and want to act on that same instance. For everything else, use a Locator: it stays correct across UI changes and is the path most code should take.

// Locator: re-resolves and auto-waits — robust for changing UIs.
calc.locator("button[name='Save']").press()?;
// Element: acts on the captured snapshot — useful when you've already
// inspected the element and want to act on the same instance.
let btn = calc.locator("button[name='Save']").element()?;
println!("{:?}", btn.actions); // ["press", "focus"]
btn.press()?;

Recommended path: Locator. Auto-wait and selector re-resolution make Locator resilient to tree changes between operations. Use Element actions only when you already hold the element and want to act on that exact instance.

Selectors

xa11y uses a CSS-like selector syntax to find elements. Selectors are used by app.locator(selector) to target elements for actions, waits, and queries.

button # by role
button[name='Submit'] # role + exact name match
text_field[name^='Search'] # starts-with (case-insensitive)
text_field[name*='email'] # contains (case-insensitive)
button[name$='Cancel'] # ends-with (case-insensitive)
group > button # direct child combinator
window button[name='OK'] # descendant (any depth)
button:nth(2) # 2nd match (1-based)
button[name='Play'], button[name='Pause'] # alternation — see below

Supported attributes: name, value, description, role.

Operators: = (exact, case-sensitive), *= (contains), ^= (starts-with), $= (ends-with). Substring operators are case-insensitive.

Roles use snake_case: button, text_field, check_box, radio_button, menu_item, tab_group, static_text, combo_box, list_item, table_row, table_cell, scroll_bar, progress_bar, tree_item, web_area, split_group, spin_button, etc.

Selector groups (alternation): A top-level comma separates alternation clauses, just like CSS selector lists. The result is the union of each clause’s matches, deduplicated by element identity and returned in document order. Each clause is parsed independently, so combinators apply per clause: window button, dialog button means “buttons under a window or buttons under a dialog.” Chained descendant() / child() calls on a group locator distribute over every clause — locator("toolbar, dialog").descendant("button") is equivalent to locator("toolbar button, dialog button"). Commas inside quoted attribute values ([name='a,b']) are not separators.

Events

Subscribe to accessibility events from an app with app.subscribe().

The returned Subscription is a pull-based event stream — no filters or selectors, just all events for the app.

let calc = App::by_name("Calculator", Duration::from_secs(5))?;
let sub = calc.subscribe()?;
// Block until a specific event arrives
let event = sub.wait_for(
|e| e.event_type == EventType::FocusChanged,
Duration::from_secs(5),
)?;
println!("Focus moved to: {:?}", event.target);
// Or poll without blocking
if let Some(event) = sub.try_recv() {
println!("{:?}", event.event_type);
}
// Or iterate (blocks until next event)
for event in sub.iter() {
println!("{:?}: {}", event.event_type, event.app_name);
}

Drop the Subscription to stop listening. The subscription is Send but not Clone — move it to another thread if needed.

Event types: FocusChanged, ValueChanged, NameChanged, StateChanged, StructureChanged, WindowOpened, WindowClosed, WindowActivated, WindowDeactivated, SelectionChanged, MenuOpened, MenuClosed, TextChanged, Alert.

CLI

xa11y ships a command-line tool for exploring accessibility trees, finding elements, performing actions, and streaming events. It mirrors the library API and is useful for debugging tests or inspecting an app before writing code.

Install via Rust or Python:

Terminal window
cargo install xa11y # Rust
pip install xa11y # Python

Commands

List apps:

Terminal window
xa11y apps

Print a full tree:

Terminal window
xa11y tree --app Calculator
application "Calculator" [enabled visible] bounds=(0,0,400,600)
├── window "Calculator" [enabled visible] bounds=(0,0,400,600)
│ ├── group "Display" [enabled visible]
│ │ └── static_text "Result" value="42" [enabled visible]
│ └── group "Keypad" [enabled visible]
│ ├── button "7" [enabled visible focusable] actions=[press,focus]
│ ├── button "8" [enabled visible focusable] actions=[press,focus]
│ └── button "+" [enabled visible focusable] actions=[press,focus]
└── menu_bar [enabled visible]

Find elements matching a selector:

Terminal window
xa11y find "button" --app Calculator

Perform an action:

Terminal window
xa11y action press "button[name='7']" --app Calculator
xa11y action set-value "text_field[name='Search']" --app Safari --value "hello"

Stream events:

Terminal window
xa11y events --app Calculator

All commands require --app NAME or --pid PID (except apps). The output always shows all element properties.

Error handling

All operations return Result<T> with a structured Error enum:

ErrorWhen
PermissionDeniedAccessibility or Screen Recording permissions not granted (includes setup instructions). On macOS 26+, both are required.
SelectorNotMatchedNo element (or app) matches the given selector
ElementStaleThe element reference is no longer valid
ActionNotSupportedAction not available for the element’s role
TextValueNotSupportedPlatform can’t set text values on this element
TimeoutA wait_*() call exceeded its deadline
InvalidSelectorSelector string has a syntax error
InvalidActionDataAction data failed validation (e.g., start > end)
PlatformUnderlying platform API error (includes error code and message)