ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands
ShowUI-π: 26.98 (+4.8) (a) PowerPoint: Resize the textbox diagonally. (b) Captcha: Solve the rotate captcha. (c) Premiere: Apply the effect to the clip. (d) Handwriting: Write on the canvas. (e) Power
ShowUI-π: 26.98 (+4.8) (a) PowerPoint: Resize the textbox diagonally. (b) Captcha: Solve the rotate captcha. (c) Premiere: Apply the effect to the clip. (d) Handwriting: Write on the canvas. (e) PowerPoint: Resize the textbox horizontally. (f) OS Desktop: Sort the file into a folder. Drag = press-and-hold + move cursor along a trajectory continuously Figure 1. Drag refers to a continuous interaction where the cursor maintains contact with the UI element while moving along a trajectory, rather than a single discrete click. Left: Visualization of ScreenDrag data domains. Right: ShowUI-π is a lightweight flow-based generative model for GUI Automation that handles dragging actions requiring on-the-fly observation, such as drawing and Captcha solving. Given a query, ShowUI-π efficiently generates corresponding continuous trajectory from streaming visual observations.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...