Autonomous Agents (LLMs) research papers. Updated daily.
AgentTuning: Enabling Generalized Agent Abilities for LLMs
- AgentTuning: Improves LLM capability by Instruction Tuning to user tasks by using AgentInstruct-dataset to create AgentLM using AgentTuning.
Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale
- Language agent to automatically identify ans quantify extent of generated images.
- Planning and Reasoning. Tool usage: Intent understanding, Instruction generation, Instruction retrieval, Prompt optimization & Stereotype score generation.
OpenAgents: An Open Platform for Language Agents in the Wild
- OpenAgents-platform: Data agent, Plugin/Tools and Web agent
- Automatic tool selection from over 200 tools
A Zero-Shot Language Agent for Computer Control with Structured Reflection
- Zero-shot agent plans executable actions in the environment and iteratively progresses by learning from mistakes using self-reflection and structured thoughts management.
- Better generalization, outperforms best iterative-planning agents
AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems
- AgentCF: LLM agent-based recommender system with Use and Item Agents.
- User & Item Agents interact autonomously and the discrepancies between the two are stored in the memory to help guide better future recommendations.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
- Octopus: Uses Vision-Language Model with Reinforcement Learning from Environmental Feedback (RLEF).
- Generates action sequences and executable code.
MemGPT: Towards LLMs as Operating Systems
- MemGPT: OS-based design with LLM-processor managing its actual context and long term memory and uses functions to make changes and events to manage order of processing data.
- Promptor: Automatic prompt generation.
- Builds prompts based on: User goals, User Profiles, Data Profile, Contextual nformation & Output constraints
- System prompt includes: instructions, Actions, Facts and Examples.
Towards Robust Multi-Modal Reasoning via Model Selection
- Dynamic model selection by taking into account input & sub-task dependencies.
FireAct: Toward Language Agent Fine-tuning
- Fine-tuning LLMs with agent trajectories for better autonomous agents.
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading
- MemWalker: navigates long-context iteratively and construct memory as treelike structure.
Crystal: Introspective Reasoners Reinforced with Self-Feedback
- Introspective reasoning of the knowledge.
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
- Language Agents Tree Search (LATS): Self-Refine, Memory, Reasoning, Decision Making & Planning.
- Uses multiple reasonining paths and learns from experience by integrating external feedback & self-reflection.
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
- AgentInstruct: generates instructions for th problem and then solves it using these instructions, improving the Chain of Thought (CoT) zero-shot reasoning.
- Characteristics of Autonomous Agents: Goal-driven task management, Intelligent Agents with LLMs, Multi-Agents collaboration, Context interaction, Balancing Autonomy vs. Alignment.
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
- Self-Taught Optimizer (STOP): Ask LLM to improve initial program by providing improvement candidates and then output best solution.
Lyfe Agents: Generative agents for low-cost real-time social interactions
- LyfeAgents Brain: Sensory processing, Internal states, Self-monitor, Action selection and Memory.
- Internal states are text based: current goal, memory, recent events and sensory inputs.
- Cognitive controller selects high-level actions. Action model selects actions until termination condition is reached.
- Self-monitoring maintains and emphasizes recent and novel events towards agent goals
- Memories are clustered and summarized before moving them to long-term storage (vector database)
Large Language Models as Analogical Reasoners
- LLM self-generates examples/knowledge related to the task.
Conceptual Framework for Autonomous Cognitive Entities
- Conceptual framework for Autonomous entities.
SmartPlay : A Benchmark for LLMs as Intelligent Agents
- SmartPlay: a benchmark to test LLM-based agents from 9 perspectives.
- Tests: Reason�ing with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness.
GRID: A Platform for General Robot Intelligence Development
- GRID: General Robot Intelligence Development
- Solves complex tasks using simulatiom and/or real-world data
- Task specification, robot configuration and sensor/API.
- Foundation Mosaic: a neural architecture.
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
- RoleLLM: Role-profile constructor, Context-based Instruction generarion, Role-based Prompting(RoleGPT), Role-conditioned Instruction-tuning.
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
- Promptbreeder uses thinking styles and mutation-prompts and is able to improve mutation/task prompts.
Natural Language based Context Modeling and Reasoning with LLMs: A Tutorial
- LLM-driven Context-aware Computing (LCaC) approach.
You only look at the screens: Multimodal Chain-of-Action Agents
- Multimodal Chain-of-Actions Agents (Auto-UI) interacts directly with the UI
- Chain-ofAction technique using series of action histories and future action plans.
The Rise and Potential of Large Language Model Based Agents: A Survey
- A conceptual framework for LLM-based agents with three components brain, perception, and action.
Agents: An Open-source Framework for Autonomous Language Agents
- Multi-agent: Planning, memory, tool usage, multi-agent communication & symbolic control.
- Open source library.
Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents
- Interoceptive AI: monitoring own internal state of the artificial agent.
Unleashing the Power of Graph Learning through LLM-based Autonomous Agents
- AutoGraph procedure: data, configuration, searching a d tuning agents.
RecMind: Large Language Model Powered Agent For Recommendation
- RecMind: a recommender focused LLm agent with reasoning, planning to sub-tasks, memory & tools.
A Survey on Large Language Model based Autonomous Agents
- Systematic review of LLM based Autonomous Agents.
- Use cases and evaluation strategies and future use cases.
https://arxiv.org/abs/2308.10848
- AgentVerse: multi-agent collaborarion and individual agents social bjeaviours.
WebArena: A Realistic Web Environment for Building Autonomous Agents
- An environment to test Autonomous agents in an environment with tools, external knowledge.
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases
- Builds multi-agent simulation environment to generate dataset of using many real world apis.
- Small models can achieve comparable performance to larger models on tool usage.
SELFEVOLVE: A Code Evolution Framework via Large Language Models
- Generates intermediate code based on input prompt.
- Use LLM to act as expert programmer to debug the generated code by receiving errors from Python interpreter.
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services
- Human AI collaborative intelligence methodology & technical practices, where the idea is not to have "full Auto-GPT" from user input to direct resolution by LLM, but rather human reviews steps between.
- Useer inputs objective, LLM asks clarification. Use then User adds clarifications and LLM constructs AI chain for human to review. Finally LLM executes the AI chain with user acceptabnce tests.
Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions
- Auto-GPTs outperforms supervised state-of-the-art Imitiation Learning (IL) models with GPT4 in WebShop- and ALFWorld-benchmarks in unknown external environments.
- Additional opinions algorithm improves performance, which takes into account additional opinions from external expert models.
Gorilla: Large Language Model Connected with Massive APIs
- Gorilla is a retrieve-aware finetuned LLaMA-7B model for API calls using self-instruct to generate Instruction-API pairs.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Tree of Thoughts (ToT)-technique makes decisions using multiple different reasoning paths, self-evaluating choices to decide next action with ability to look back/forward for global decisions.
Teaching Large Language Models to Self-Debug
- The model generates new code together with code explanation. The code is then executed and this executed code is sent back as feedback together with the code explanation. This feedback
ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
- ChatPipe - Iterative, data preparation program with ChatGPT using 1. Operation Recommendation, 2. Program generation, 3. Version management.
- Recommends next data preparation opration. Easily roll-back to previous program for version control.
Generative Agents: Interactive Simulacra of Human Behavior
- Enable believable human behavior: observation, planning, and reflection.
- An agent wants to throw a Valentine’s Day party. The agents autonomously spread invitations, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.
- GPTeam is inspired by this approach.
CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
- CAMEL attempts to facilitate autonomous cooperation among communicative agents through role-playing framework.
- The approach manages complete tasks with minimal human input.
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
- A LLM (such as ChatGPT) accesses HuggingFace community to look AI models to complete the given task.
- It can read multi modalities by outsourcing tasks like image recognition to the specific image model.
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
- Dialog-Enabled Resolving Agents (DERA) uses two roles: Researcher and Decider to perform discussion between these two agents.
- Researcher role processes information and Decider role uses judgement.
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
- Multimodal conversational foundation model (MCFM). MCFM generates a textual solution outline, then API selector chooses most relevant API from collection of APIs (with API name, parameter list, description, usage example and example when combining it with another API).
- MCFM generates action code using recommended API and the API call is executed. Finally, output is provided back to developer.
Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications
- Task-driven autonomous agent, with vector database and Langchain. BabyGPT includes: Execution, creation and prioritization
- Takes objective, pulls an item from task queue and moves it to execution agent with access to memory.
Reflexion: Language Agents with Verbal Reinforcement Learning
- Reflexion agents reflect on task feedback, use it from memory to make better decisions and new attempts.