Git Product home page Git Product logo

gpt-aria's Introduction

gpt-aria

Experiment to teach gpt to make use of the chrome accessibility tree to turn the web into a textual interface and access it like a user of a screen-reader. This avoids html parsing, supports dynamic content, etc.

https://taras.glek.net/post/gpt-aria-experiment/

Running:

  • Install node
  • run npm install
  • run export OPENAI_API_KEY=<your key>
  • Run gpt-aria: ./gpt-aria.ts --objective "Whats the price of iphone 13 pro"
  • Note first run will take a while as puppeteer has to download chrome
  • Run it starting custom start page: ./gpt-aria.ts --objective "Whats the price of iphone 14 pro" --start-url https://duckduckgo.com

Prompt lives in prompt.ts, log of execution is in log.txt

Questions? https://discord.gg/jgWgkQvp

Sample queries:

  • ./gpt-aria.ts --objective "What is the cultural capital of western ukraine" --start-url https://bing.com --headless
  • ./gpt-aria.ts --objective "Who was king of england when lviv was founded" --headless
  • who was president when first starwars was released?

Design

graph TD;
    subgraph GPT
      gpt["decide if enough info\nto return an ObjectiveComplete\nor if a BrowserAction is needed"]
    end
    subgraph gpt-aria
        BrowserAction
        ObjectiveComplete
        BrowserResponse
    end
    gpt-->ObjectiveComplete
    gpt--command-->BrowserAction
    BrowserAction--"command"-->Browser
    Browser --"url,ariaTree"--> BrowserResponse
    UserInput --"{objective, start-url}"--> BrowserAction
    BrowserResponse --"objective,progress[],url,ariaTree"--> gpt
    ObjectiveComplete--"result"--> UserOutput
Loading

Prior art:

Why ARIA is superior to raw html

html:

html

html with ARIA accessibility tree:

accessibility_tree

Follow-up ideas

Scrolling

Would be nice to have a command so gpt could scroll up/down the page to summarize content in it

Have an index of gpt prompts that explain in natural language how to navigate a particular website.

  • Eg for twitter it was say "In order to 'tweet', one goes to twitter.com and posts a tweet. in order to scan the latest news on twitter, one can pick use default timeline or pick a twitter list for a particular category".
  • For buying a house "redfin provides search functionality and ability to narrow down location and prices"
  • for shopping "amazon.com is a shopping site"
  • likewise for google, wikipedia, etc
  • eventually we'd want langchain-style website modules so you could specify "Summarize my inbox and news" which would be a composition of gmail and news modules

gpt-aria's People

Contributors

tarasglek avatar chugai avatar steven4354 avatar bcmejla avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.