Skip to content

Screenshots

xa11y can capture pixel-level snapshots of the screen alongside its accessibility APIs. This is useful for attaching screenshots to test failures, feeding vision-language models with the same frame the agent is reasoning about, or visually diffing a region around a specific control.

Three entry points:

  • Full screen — capture the entire primary display.
  • Region — capture an explicit rectangle in logical screen coordinates.
  • Element — capture the pixels under an accessibility element’s current bounds.

Screenshots are decoupled from the accessibility and input layers. The target window is not raised or activated before capture — whatever pixels are at the requested coordinates are returned, including any occluding windows. Bring the target to the foreground yourself if you need a clean capture.

PlatformRequirement
macOSScreen & System Audio Recording for the terminal/IDE
WindowsNone for normal user sessions
Linux (X11)DISPLAY set to a reachable X server
Linux (Wayland)xdg-desktop-portal with screenshot support (user consent dialog may appear)
use xa11y::*;
fn main() -> Result<()> {
// Full primary display
let shot = screenshot()?;
shot.save_png("full.png")?;
// Explicit region
let shot = screenshot_region(Rect { x: 0, y: 0, width: 400, height: 300 })?;
shot.save_png("corner.png")?;
// Pixels under an accessibility element
let app = App::by_name("Calculator", std::time::Duration::from_secs(5))?;
let display = app.locator("static_text[name='Result']").element()?;
let shot = screenshot_element(&display)?;
shot.save_png("result.png")?;
Ok(())
}

A Screenshot carries raw RGBA8 pixels along with dimensions and a scale factor. width and height are physical (device) pixels — the same resolution the compositor renders at — so on HiDPI displays they exceed the logical bounds passed in. scale records the physical-to-logical ratio (1.0 on standard displays, 2.0 on typical Retina, 1.5 / 1.75 / 2.0 on common Windows and Linux HiDPI configurations). pixels.len() equals width * height * 4.

let shot = screenshot()?;
println!("{} x {} @ {}x", shot.width, shot.height, shot.scale);
// Encode as PNG bytes (e.g. to upload or embed)
let png_bytes: Vec<u8> = shot.to_png()?;
// Or write directly to disk
shot.save_png("out.png")?;
// Raw RGBA for custom processing
let rgba: &[u8] = &shot.pixels;

A common pattern is to capture the full screen (or the app under test) whenever an assertion fails, and attach it to the test report.

fn capture_on_fail(name: &str, test: impl FnOnce() -> Result<()>) -> Result<()> {
match test() {
Ok(()) => Ok(()),
Err(e) => {
if let Ok(shot) = screenshot() {
let _ = shot.save_png(format!("target/failures/{name}.png"));
}
Err(e)
}
}
}
ErrorWhen
PermissionDeniedmacOS Screen Recording not granted, or Wayland portal denied consent
UnsupportedNo capture backend available for the session (no DISPLAY, no working Wayland portal, etc.)
NoElementBoundsThe element passed to screenshot_element has no bounds
PlatformRaw OS / FFI failure during capture or PNG encode