Overview
xa11y is a cross-platform library for reading accessibility trees, taking actions, listening to accessibility event streams, synthesising low-level input, and capturing screenshots.
What is desktop accessibility?
Most desktop applications expose an accessibility tree — a structured representation of their UI intended for screen readers and other assistive technology. Each platform has its own API for this (macOS AXUIElement, Windows UI Automation, Linux AT-SPI2), but the underlying concepts are the same. xa11y provides a single API that works with all of them.
Trees
An accessibility tree mirrors the visual hierarchy of an application. Here’s what one looks like:
[0] application "Calculator" [1] window "Calculator" [2] group "Display" [3] static_text "Result" value="42" [4] group "Keypad" [5] button "7" [6] button "8" [7] button "9" [8] button "+" [9] button "4" ...Every element in the tree has properties. For example, here’s the full data for button "+":
role: buttonname: "+"value: Nonestates: enabled, visiblebounds: (272, 200, 80, 40)actions: press, focusActions
Accessibility APIs let you interact with elements: press buttons, type into fields, toggle checkboxes, expand menus, scroll, and more. xa11y exposes all of these through a unified action API.
Input simulation
For the small set of gestures with no accessibility equivalent — drag-and-drop, scroll wheels, global shortcuts — xa11y also ships an InputSim façade that synthesises OS-level pointer and keyboard events. Input simulation lives in its own surface (xa11y::input_sim() / xa11y.input_sim() / inputSim()) and never falls back from an accessibility action automatically — prefer locator.press() and locator.type_text() whenever they work.
Screenshots
xa11y can capture pixel-level snapshots of the full screen, a region, or the pixels under an accessibility element. Useful for attaching visual evidence to test failures or feeding vision-language models the same frame the agent is reasoning about. See the Screenshots guide.
Events
Accessibility APIs also emit events when the UI changes — elements appearing or disappearing, focus moving, names or values updating. For example, clicking a text field might emit:
FocusChanged app="Calculator" target=text_field "Display"xa11y supports subscribing to these event streams per-app via app.subscribe().
How does xa11y compare to
- AccessKit — AccessKit is for adding accessibility to an app. xa11y is for reading accessibility data from the outside.
- Playwright — Similar concept, specific to web browsers. Major design inspiration for xa11y’s Locator pattern and selector syntax.
- pyatspi2 / direct UIAutomation — Single-platform libraries. xa11y wraps these underlying APIs into one cross-platform interface.
- agent-desktop — CLI tool built on xa11y for AI agents to read and interact with desktops via accessibility data.
Concepts
xa11y is built around three core types:
- App — entry point for interacting with an application. Construct via
App::by_name(),App::by_pid(), orApp::list(). An App provides three paths:app.children()for navigating the tree,app.locator(selector)for actions and waits, andapp.subscribe()for event streams. - Element — a live handle to a node in the accessibility tree. Navigate with
children()andparent()— each call queries the platform lazily. Read properties likerole,name,value,states,bounds. Capture the subtree as a structured snapshot withtree(max_depth)or as an indented string withdump(max_depth). Actions (press(),set_value(), etc.) are available too, acting on the captured handle without re-resolution or auto-wait. - Locator — a lazy selector that re-resolves on every operation. Use for actions (
press(),set_value(), and others), waits (wait_visible(), and others), and querying (element(),elements()). Auto-waits for visible+enabled before acting and is the recommended path for most code. Inspired by Playwright’s Locator pattern. - Subscription — live event stream from an app. Created via
app.subscribe(). Pull events withtry_recv(),recv(timeout),wait_for(predicate, timeout), or iterate withiter(). Drop to unsubscribe. - Selector — CSS-like query syntax used by Locator. Supports role matching, attribute filters, combinators, and nth selection.
Connecting to an app
use xa11y::*;use std::time::Duration;
// Connect by name. The second argument is a polling timeout —// useful when the app may not yet be registered with the a11y API.// Pass `Duration::ZERO` for a single attempt with no waiting.// (Returns PermissionDenied if accessibility is not enabled.)let safari = App::by_name("Safari", Duration::from_secs(5))?;
// Connect by PIDlet app = App::by_pid(1234, Duration::from_secs(5))?;
// List all running appslet all = App::list()?;import xa11y
# Connect by name (raises PermissionDeniedError if accessibility is not enabled)safari = xa11y.App.by_name("Safari")
# Connect by PIDapp = xa11y.App.by_pid(1234)
# List all running appsall_apps = xa11y.App.list()import { App } from '@crowecawcaw/xa11y';
// Connect by name (throws PermissionDeniedError if accessibility is not enabled)const safari = await App.byName('Safari');
// Connect by PIDconst app = await App.byPid(1234);
// List all running appsconst all = await App.list();Reading accessibility data
Lazy navigation
app.children() returns the top-level elements (typically windows). Each Element supports children() and parent() for further navigation — every call queries the platform lazily, so you always see the current state of the UI.
let calc = App::by_name("Calculator", Duration::from_secs(5))?;let windows = calc.children()?;
let children = windows[0].children()?;let parent = children[0].parent()?;
// Read properties (via Deref to ElementData)println!("{} — {}", children[0].role, children[0].name.as_deref().unwrap_or(""));calc = xa11y.App.by_name("Calculator")windows = calc.children()
children = windows[0].children()parent = children[0].parent()
# Read propertiesprint(f"{children[0].role} — {children[0].name or ''}")const calc = await App.byName('Calculator');const windows = await calc.children();
const children = await windows[0].children();const parent = await children[0].parent();
// Read propertiesconsole.log(`${children[0].role} — ${children[0].name ?? ''}`);Element properties: role, name, value, description, bounds, states, actions, numeric_value, min_value, max_value, stable_id, pid.
name, value, and description are stripped of Unicode bidi format controls (LRM, RLM, embeddings, isolates) so that equality assertions match the logical text. The unstripped platform string is preserved on element.raw under the platform-native key (AXTitle/AXValue/AXDescription/AXHelp on macOS, atspi_name/atspi_value/atspi_description on Linux, uia_name/uia_value/uia_help_text on Windows).
Element navigation and snapshots: children(), parent(), tree(max_depth), dump(max_depth).
Elements are live
Every call to children() or parent() queries the platform. There are no cached snapshots — you always see the latest tree state.
let windows = calc.children()?;println!("{:?}", windows[0].children()?[0].name); // queries platform
// UI changed? Just navigate again:let windows = calc.children()?; // fresh querywindows = calc.children()print(windows[0].children()[0].name) # queries platform
# UI changed? Just navigate again:windows = calc.children() # fresh querylet windows = await calc.children();console.log((await windows[0].children())[0].name); // queries platform
// UI changed? Just navigate again:windows = await calc.children(); // fresh queryTo find specific elements, use a Locator (below) — it uses CSS-like selectors to find elements efficiently.
Subtree snapshots
element.tree(max_depth) captures the subtree rooted at an element as a recursive data structure (role, name, value, children). element.dump(max_depth) formats the same snapshot as an indented string — useful for debugging and test diagnostics. Both accept an optional max_depth argument: 0 returns only the root node, 1 includes its direct children, and so on. Omit for the full subtree.
let dialog = app.locator("dialog").element()?;println!("{}", dialog.dump(Some(3))?);// dialog "Settings"// group "General"// check_box "Enable notifications"// check_box "Start at login"dialog = app.locator("dialog").element()print(dialog.dump(max_depth=3))# dialog "Settings"# group "General"# check_box "Enable notifications"# check_box "Start at login"
node = dialog.tree(max_depth=1)# {"role": "dialog", "name": "Settings", "value": None, "children": [...]}const dialog = await app.locator('dialog').element();console.log(await dialog.dump(3));// dialog "Settings"// group "General"// check_box "Enable notifications"// check_box "Start at login"
const node = await dialog.tree(1);// { role: 'dialog', name: 'Settings', value: undefined, children: [...] }Taking actions
Locators
A Locator holds a selector and re-resolves on every operation. Create one via app.locator(selector). Actions always target the current state of the UI.
let calc = App::by_name("Calculator", Duration::from_secs(5))?;let btn = calc.locator("button[name='=']");
// Each call re-resolves the selector, finds the element, then actsbtn.press()?;
// Read properties via .element()let element = btn.element()?;println!("{:?}", element.name);
// Wait for state changes (polls until condition met)btn.wait_enabled(Duration::from_secs(5))?;
// Get all matching elementslet buttons = calc.locator("button").elements()?;println!("Found {} buttons", buttons.len());calc = xa11y.App.by_name("Calculator")btn = calc.locator("button[name='=']")
# Each call re-resolves the selector, finds the element, then actsbtn.press()
# Read properties via .element()element = btn.element()print(element.name)
# Wait for state changes (polls until condition met)btn.wait_enabled(timeout=5.0)
# Get all matching elementsbuttons = calc.locator("button").elements()print(f"Found {len(buttons)} buttons")const calc = await App.byName('Calculator');const btn = calc.locator("button[name='=']");
// Each call re-resolves the selector, finds the element, then actsawait btn.press();
// Read properties via .element()const element = await btn.element();console.log(element.name);
// Wait for state changes (polls until condition met)await btn.waitEnabled(5);
// Get all matching elementsconst buttons = await calc.locator('button').elements();console.log(`Found ${buttons.length} buttons`);Best practice: make selectors specific so they stay stable as the UI evolves. Don’t select button — select button[name='Cancel'] inside a specific group or window. A vague selector like button can become unreliable when new elements are added — it may match the wrong element or start returning multiple matches unexpectedly.
Actions: press(), focus(), blur(), toggle(), select(), expand(), collapse(), set_value(), set_numeric_value(), type_text(), select_text(), increment(), decrement(), show_menu(), scroll_into_view().
Waits: wait_visible(), wait_hidden(), wait_attached(), wait_detached(), wait_enabled(), wait_disabled(), wait_focused(), wait_unfocused(), wait_for_state(), wait_until().
Queries: element() (single match), elements() (all matches), exists(), count().
Chaining: nth(n), first(), child(selector), descendant(selector).
Locator re-resolves its selector on every operation, making it resilient to UI changes between calls. To inspect element properties or traverse the subtree, call .element() first — tree() and dump() live on Element, not Locator.
Element actions
The same action verbs are also available directly on Element. A Locator re-resolves its selector and auto-waits for the element to be visible and enabled before acting; an Element acts on the snapshot you already captured, with no re-resolution and no auto-wait. If the underlying element has been destroyed since you fetched it, the call returns an ElementStale error.
Reach for Element actions when you’ve already inspected an element — typically after .dump() or reading .actions — and want to act on that same instance. For everything else, use a Locator: it stays correct across UI changes and is the path most code should take.
// Locator: re-resolves and auto-waits — robust for changing UIs.calc.locator("button[name='Save']").press()?;
// Element: acts on the captured snapshot — useful when you've already// inspected the element and want to act on the same instance.let btn = calc.locator("button[name='Save']").element()?;println!("{:?}", btn.actions); // ["press", "focus"]btn.press()?;# Locator: re-resolves and auto-waits — robust for changing UIs.calc.locator("button[name='Save']").press()
# Element: acts on the captured snapshot — useful when you've already# inspected the element and want to act on the same instance.btn = calc.locator("button[name='Save']").element()print(btn.actions) # ["press", "focus"]btn.press()// Locator: re-resolves and auto-waits — robust for changing UIs.await calc.locator("button[name='Save']").press();
// Element: acts on the captured snapshot — useful when you've already// inspected the element and want to act on the same instance.const btn = await calc.locator("button[name='Save']").element();console.log(btn.actions); // ["press", "focus"]await btn.press();Recommended path: Locator. Auto-wait and selector re-resolution make Locator resilient to tree changes between operations. Use Element actions only when you already hold the element and want to act on that exact instance.
Selectors
xa11y uses a CSS-like selector syntax to find elements. Selectors are used by app.locator(selector) to target elements for actions, waits, and queries.
button # by rolebutton[name='Submit'] # role + exact name matchtext_field[name^='Search'] # starts-with (case-insensitive)text_field[name*='email'] # contains (case-insensitive)button[name$='Cancel'] # ends-with (case-insensitive)group > button # direct child combinatorwindow button[name='OK'] # descendant (any depth)button:nth(2) # 2nd match (1-based)button[name='Play'], button[name='Pause'] # alternation — see belowSupported attributes: name, value, description, role.
Operators: = (exact, case-sensitive), *= (contains), ^= (starts-with), $= (ends-with). Substring operators are case-insensitive.
Roles use snake_case: button, text_field, check_box, radio_button, menu_item, tab_group, static_text, combo_box, list_item, table_row, table_cell, scroll_bar, progress_bar, tree_item, web_area, split_group, spin_button, etc.
Selector groups (alternation): A top-level comma separates alternation clauses, just like CSS selector lists. The result is the union of each clause’s matches, deduplicated by element identity and returned in document order. Each clause is parsed independently, so combinators apply per clause: window button, dialog button means “buttons under a window or buttons under a dialog.” Chained descendant() / child() calls on a group locator distribute over every clause — locator("toolbar, dialog").descendant("button") is equivalent to locator("toolbar button, dialog button"). Commas inside quoted attribute values ([name='a,b']) are not separators.
Events
Subscribe to accessibility events from an app with app.subscribe().
The returned Subscription is a pull-based event stream — no filters or selectors, just all events for the app.
let calc = App::by_name("Calculator", Duration::from_secs(5))?;let sub = calc.subscribe()?;
// Block until a specific event arriveslet event = sub.wait_for( |e| e.event_type == EventType::FocusChanged, Duration::from_secs(5),)?;println!("Focus moved to: {:?}", event.target);
// Or poll without blockingif let Some(event) = sub.try_recv() { println!("{:?}", event.event_type);}
// Or iterate (blocks until next event)for event in sub.iter() { println!("{:?}: {}", event.event_type, event.app_name);}Drop the Subscription to stop listening. The subscription is Send but not Clone — move it to another thread if needed.
The returned Subscription is a pull-based event stream. Use it as a context manager or iterate it directly.
calc = xa11y.App.by_name("Calculator")
with calc.subscribe() as sub: # Block until a specific event arrives event = sub.wait_for( lambda e: e.event_type == xa11y.EventType.FOCUS_CHANGED, timeout=5.0, ) print(f"Focus moved to: {event.target}")
# Or poll without blocking event = sub.try_recv() if event is not None: print(event.event_type)
# Or iterate (blocks until next event) for event in sub: print(f"{event.event_type}: {event.app_name}")The returned Subscription is an EventEmitter. Attach handlers, or await a single event with waitForEvent / waitFor.
const calc = await App.byName('Calculator');const sub = await calc.subscribe();
// Handle every event of a given typesub.on('focusChanged', (ev) => { console.log(`Focus moved to: ${ev.target?.name}`);});
// Or await a single matching eventconst ev = await sub.waitForEvent('focusChanged', { timeout: 5000 });
// Stop deliverysub.close();Event types: FocusChanged, ValueChanged, NameChanged, StateChanged, StructureChanged, WindowOpened, WindowClosed, WindowActivated, WindowDeactivated, SelectionChanged, MenuOpened, MenuClosed, TextChanged, Alert.
CLI
xa11y ships a command-line tool for exploring accessibility trees, finding elements, performing actions, and streaming events. It mirrors the library API and is useful for debugging tests or inspecting an app before writing code.
Install via Rust or Python:
cargo install xa11y # Rustpip install xa11y # PythonCommands
List apps:
xa11y appsPrint a full tree:
xa11y tree --app Calculatorapplication "Calculator" [enabled visible] bounds=(0,0,400,600)├── window "Calculator" [enabled visible] bounds=(0,0,400,600)│ ├── group "Display" [enabled visible]│ │ └── static_text "Result" value="42" [enabled visible]│ └── group "Keypad" [enabled visible]│ ├── button "7" [enabled visible focusable] actions=[press,focus]│ ├── button "8" [enabled visible focusable] actions=[press,focus]│ └── button "+" [enabled visible focusable] actions=[press,focus]└── menu_bar [enabled visible]Find elements matching a selector:
xa11y find "button" --app CalculatorPerform an action:
xa11y action press "button[name='7']" --app Calculatorxa11y action set-value "text_field[name='Search']" --app Safari --value "hello"Stream events:
xa11y events --app CalculatorAll commands require --app NAME or --pid PID (except apps). The output always shows all element properties.
Error handling
All operations return Result<T> with a structured Error enum:
| Error | When |
|---|---|
PermissionDenied | Accessibility or Screen Recording permissions not granted (includes setup instructions). On macOS 26+, both are required. |
SelectorNotMatched | No element (or app) matches the given selector |
ElementStale | The element reference is no longer valid |
ActionNotSupported | Action not available for the element’s role |
TextValueNotSupported | Platform can’t set text values on this element |
Timeout | A wait_*() call exceeded its deadline |
InvalidSelector | Selector string has a syntax error |
InvalidActionData | Action data failed validation (e.g., start > end) |
Platform | Underlying platform API error (includes error code and message) |