The hardest part of AI agents seems to be recovery, not task understanding?
A lot of agent demos look impressive when everything goes according to plan, but real-world workflows seem to break in small unpredictable ways. A page changes, a form has an extra step, a support flow redirects somewhere unexpected, or the agent loses track of what has already been done. The model may understand the goal perfectly, but once execution starts, the harder problem becomes state tracking, retries, verification, and knowing when to stop or ask for human input.