Skip to content

Input Simulation

xa11y can drive a target application in two distinct ways:

  • Accessibility actionslocator.press(), locator.type_text(), locator.toggle(), etc. These call the platform accessibility API directly. They work even when the target window is not focused, are deterministic, and are the preferred way to drive a UI.
  • Input simulationInputSim generates OS-level pointer and keyboard events at the system event layer (CGEvent on macOS, SendInput on Windows, XTest on X11).

Prefer accessibility actions wherever possible. Reach for input simulation only for gestures that have no accessibility equivalent: drag-and-drop, scroll wheels, global shortcuts, or tests that must exercise the event loop itself.

The two mechanisms are not bridged. A failed accessibility action never falls back to input simulation, and input simulation never inspects the accessibility tree for you. If you want to click an element with synthesised input, you compute its bounds via the a11y API and hand a point or Element to InputSim.

ScenarioMechanism
Press a button, toggle a checkbox, expand a menuAccessibility action (locator.press(), locator.toggle(), locator.expand())
Type text into a focused field with IME supportAccessibility action (locator.type_text())
Drag-and-drop between two elementsInput simulation (mouse.drag)
Scroll a list with the wheelInput simulation (mouse.scroll)
Global shortcut without a focused elementInput simulation (keyboard.chord)
Click at a fixed pixel coordinateInput simulation (mouse.click((x, y)))
PlatformRequirement
macOSAccessibility + Input Monitoring for the terminal/IDE running your code
WindowsNone for normal user sessions
Linux (X11)XTest extension enabled on the X server. Zero setup.
Linux (Wayland / headless)Membership in the input group (sudo usermod -aG input $USER then re-login). xa11y opens /dev/uinput and registers a virtual evdev keyboard+pointer — same mechanism used by xdotool --using-uinput, ydotool, wtype, Steam Input, and Wine. The privilege grants global input read/write access — comparable to macOS Input Monitoring.

On macOS, input simulation usually also requires the target window to be foregrounded. Activate the window explicitly before synthesising input.

xa11y picks the Linux backend at runtime: if DISPLAY is set we drive XTest; otherwise we fall through to uinput. There is no compile-time feature flag — both backends are always built into xa11y-linux. The uinput path is compositor-agnostic, so it works on every Wayland desktop (GNOME, KDE Plasma, sway, Hyprland, Cosmic, weston) and also on headless Linux servers. Errors from a missing input group surface as PermissionDenied with an actionable message; a missing uinput kernel module surfaces as Unsupported.

use xa11y::*;
fn main() -> Result<()> {
let sim = input_sim()?;
// Click at a screen coordinate
sim.mouse().click(Point::new(400, 300))?;
// Click at the centre of an element's current bounds
let app = App::by_name("Finder", std::time::Duration::from_secs(5))?;
let button = app.locator("button[name='New Folder']").element()?;
sim.mouse().click(&button)?;
// Type text into whatever has keyboard focus
sim.keyboard().type_text("Hello, world!")?;
// Keyboard shortcut — Cmd/Ctrl+S
sim.keyboard().chord(Key::Char('s'), &[Key::Meta])?;
Ok(())
}

Every pointer method accepts either a screen-space point or an Element:

  • Point — absolute screen coordinates in the platform’s native coordinate space (points on macOS, physical pixels on Windows and Linux). Origin is the top-left of the primary display; negative values are valid on multi-monitor setups.
  • Element — uses the centre of the element’s bounds at the time of the call. InputSim does not re-read the accessibility tree for you — re-fetch via a locator first if the UI may have moved.

In Rust, additional anchors are available via ClickOptions.anchor (Anchor::TopLeft, Anchor::Offset { dx, dy }, etc.).

let m = sim.mouse();
m.click((100, 100))?; // left click
m.double_click(&element)?; // double click
m.right_click((100, 100))?; // right click
m.move_to((200, 200))?; // move without clicking
m.drag((50, 50), (200, 200))?; // left-drag with 150ms duration
m.scroll((400, 300), ScrollDelta::vertical(-3))?; // 3 ticks down
// Shift-click with explicit options
m.click_with(
ClickTarget::from(&element),
ClickOptions {
held: vec![Key::Shift],
anchor: Anchor::TopLeft,
..Default::default()
},
)?;
// Slow drag for apps that need more frames
m.drag_with(
(50, 50),
(200, 200),
DragOptions {
duration: Duration::from_millis(500),
..Default::default()
},
)?;

Scroll: positive dx scrolls right, positive dy scrolls content down (moves the viewport up), in platform “ticks” (typically one notch of a physical scroll wheel).

Modifier keys (Shift, Ctrl, Alt, Meta) are ordinary keys — pass them alongside character keys. Meta is the platform’s “command” modifier: Cmd on macOS, Win on Windows, Super on Linux.

let k = sim.keyboard();
// Tap a single key
k.press(Key::Enter)?;
k.press(Key::Char('a'))?;
// Shortcuts — modifiers are just keys held during the tap
k.chord(Key::Char('a'), &[Key::Meta])?; // Cmd/Ctrl+A
k.chord(Key::Char('z'), &[Key::Meta, Key::Shift])?; // Cmd/Ctrl+Shift+Z
// Uppercase letters — hold Shift explicitly
k.chord(Key::Char('a'), &[Key::Shift])?; // 'A'
// Hold and release manually
k.down(Key::Shift)?;
k.press(Key::Char('a'))?;
k.press(Key::Char('b'))?;
k.up(Key::Shift)?;
// Type literal text (handles case and IME for you)
k.type_text("Hello, world!")?;

Key::Char rejects ASCII uppercase letters at the API boundary to prevent the common bug where chord(Key::Char('K'), &[Key::Meta]) is read as “Cmd+K” but would mean “Cmd+Shift+K” under auto-shift semantics. For arbitrary text, call type_text.

Drag-and-drop has no accessibility equivalent on most platforms. Resolve both endpoints via the a11y tree first, then synthesise the drag.

let app = App::by_name("Finder", std::time::Duration::from_secs(5))?;
let source = app.locator("list_item[name='notes.txt']").element()?;
let target = app.locator("list_item[name='Archive']").element()?;
let sim = input_sim()?;
sim.mouse().drag(&source, &target)?;
ErrorWhen
PermissionDeniedOS denied the synthesis permission (macOS Input Monitoring, etc.)
UnsupportedPlatform has no backend for the operation (e.g. Linux session with neither DISPLAY nor WAYLAND_DISPLAY set)
NoElementBoundsTarget Element has no bounds — re-fetch it or anchor on a point
InvalidActionDataUppercase Char key, malformed key name, or bad target tuple
PlatformRaw OS failure (FFI return code etc.)