Shunyu Yao FPO | Computer Science Department at Princeton University

Date and Time

Thursday, May 2, 2024 - 10:00am to 12:00pm

Location

Computer Science Small Auditorium (Room 105)

Type

FPO

Shunyu Yao will present his FPO "Language Agents: From Next-Token Prediction to Digital Automation" on Thursday, May 2, 2024 at 10:00 AM in CS 105 and Zoom.

Location: Zoom link: http://princeton.zoom.us/my/shunyuy

The members of Shunyu’s committee are as follows:
Examiners: Karthik Narasimhan (Adviser), Tom Griffiths, Benjamin Eysenbach
Readers: Sanjeev Arora, Tatsunori Hashimoto (Stanford)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:
Building autonomous agents to interact with world lies at the core of artificial intelligence (AI). This thesis introduces “language agents”, a new category of agents that utilize large language models (LLMs) to reason to act, marking a departure from traditional agents via extensive rule design or learning. It is developed in three parts:

Part I motivates the necessity for language agents by introducing a new set of AI problems and benchmarks based on interaction with large-scale, real-world computer environments, such as the Internet or code interfaces. These “digital automation” tasks present tremendous values for alleviating tedious labor and improving our life, yet pose significant challenges for prior agent or LLM methods in decision-making over open-ended natural language and long horizon, calling for new methodology.

Part II lays the methodological foundation for language agents, where the key idea is to apply LLM reasoning for versatile and generalizable agent acting and planning, which also augments LLM reasoning to be more grounded and deliberate via external feedback and internal control. We show language agents can solve a diversity of language and agent tasks (especially digital automation tasks proposed in Part I), with notable improvements over prior LLM-based methods and traditional agents.

Part III consolidates insights from Parts I and II and outlines a principled framework for language agents. The framework provides modular abstractions to organize various LLM-based methods, to understand their gaps from human cognition, and to inspire and develop new methods towards general-purpose autonomous agents.

From foundational empirical tasks and methods to a unifying conceptual framework, this thesis establishes the study of language agents as a distinct and rigorously defined field at the frontier of AI research.