--eval_type step: Evaluates per-step grounding accuracy (Action acc. and Coord. acc.) --eval_type task: Evaluates full task execution (Action acc., Coord. acc., and Task Success) We provide the ...
Abstract: Automated testing is crucial for ensuring the quality and reliability of modern software applications, especially those with complex graphical user interfaces (GUIs). However, traditional ...