Git Product home page Git Product logo

autowebglm's Introduction

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

This is the official implementation of AutoWebGLM. If you find our open-sourced efforts useful, please 🌟 the repo to encourage our following development!

Overview

paper

AutoWebGLM is a project aimed at building a more efficient language model-driven automated web navigation agent. This project is built on top of the ChatGLM3-6B model, extending its capabilities to navigate the web more effectively and tackle real-world browsing challenges better.

Features

  • HTML Simplification Algorithm: Inspired by human browsing patterns, we've designed an algorithm to simplify HTML, making webpages more digestible for LLM agents while preserving crucial information.
  • Hybrid Human-AI Training: We combine human and AI knowledge to build web browsing data for curriculum training, enhancing the model's practical navigation skills.
  • Reinforcement Learning and Rejection Sampling: We enhance the model's webpage comprehension, browser operations, and efficient task decomposition abilities by bootstrapping it with reinforcement learning and rejection sampling.
  • Bilingual Web Navigation Benchmark: We introduce AutoWebBench—a bilingual (Chinese and English) benchmark for real-world web browsing tasks. This benchmark provides a robust tool for testing and refining the capabilities of AI web navigation agents.

Evaluation

We have publicly disclosed our evaluation code, data, and environment. You may conduct the experiment using the following code.

AutoWebBench & Mind2Web

You can find our evaluation datasets at AutoWebBench and Mind2Web. For the code to perform model inference, please refer to ChatGLM3-6B. After obtaining the output file, the score can be obtained through python eval.py [result_path].

WebArena

We have made modifications to the WebArena environment to fit the interaction of our system; see WebArena. The modifications and execution instructions can be found in README.

MiniWob++

We have also made modifications to the MiniWob++ environment, see MiniWob++. The modifications and execution instructions can be found in README.

License

This repository is licensed under the Apache-2.0 License. All open-sourced data is for resarch purpose only.

Citation

If you use this code for your research, please cite our paper.

@misc{lai2024autowebglm,
    title={AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent},
    author={Lai, Hanyu and Liu, Xiao and Iong, Iat Long and Yao, Shuntian and Chen, Yuxuan and Shen, Pengbo and Yu, Hao and Zhang, Hanchen and Zhang, Xiaohan and Dong, Yuxiao and Tang, Jie},
    year={2024},
    eprint={2404.03648},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

autowebglm's People

Contributors

hanyullai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.