Git Product home page Git Product logo

zhangshao249 / cradle Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baai-agents/cradle

0.0 0.0 0.0 69.15 MB

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

License: MIT License

Python 100.00%

cradle's Introduction

Cradle: Towards General Computer Control

[Website] [Arxiv] [PDF]

Python Version GitHub license

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.


Videos

  

Click on either of the video thumbnails above to watch them on YouTube.

Notice

We are still working on further cleaning up the code and constantly updating it. We are also extending Cradle to more games and software. Feel free to reach out!

Project Setup

Please setup your environment as:

conda create --name cradle-dev python=3.10
conda activate cradle-dev
pip3 install -r requirements.txt

To install GroundingDino:

Download its weights to the cache directory:

mkdir cache
cd cache
curl -L -C - -O https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
cd ..

Note: You should have a CUDA environment, please make sure you have properly installed CUDA dependencies first. You can use the following command to detect it on Linux.

nvcc -V

Or search for its environment variable: CUDA_HOME or CUDA_PATH. On Windows it should be something like "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" and on Linux like "/usr/local/cuda".

If you don't get the specific version, you should download cudatoolkit and cuDNN first (version 11.8 is recommended).

If you don't download CUDA correctly, after installing GroundingDino, the code will produce:

NameError: name '_C' is not defined

If this happened, please re-setup CUDA and pytorch, reclone the git and perform all installation steps again.

On Windows install from https://developer.nvidia.com/cuda-11-8-0-download-archive (Linux packages also available).

Make sure pytorch is installed using the right CUDA dependencies.

conda install pytorch torchvision cudatoolkit=11.8 -c nvidia -c pytorch

If this doesn't work, or you prefer the pip way, you can try something like:

pip3 install --upgrade torch==2.1.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torchvision==0.16.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html

Now, you should install the pre-compiled GroundingDino with the project dependencies. You can use the package in our repo and the following commands:

cd deps
pip install groundingdino-0.1.0-cp310-cp310-win_amd64.whl
cd ..

Once it is installed, we need to pre-download some required model files and set some environment variables.

# Define the necessary environment variables, this can be done in the .env file in the /cradle directory
HUGGINGFACE_HUB_CACHE = "./cache/hf" # This can be the full path too, if the relative one doesn't work

# Pre-download huggingface files needed by GroundingDino
# This step may require a VPN connection
# Windows user needs to run it in git bash
mkdir $HUGGINGFACE_HUB_CACHE
huggingface-cli download bert-base-uncased config.json tokenizer.json vocab.txt tokenizer_config.json model.safetensors --cache-dir $HUGGINGFACE_HUB_CACHE

# Define the last necessary environment variable, this can be done in the .env file in the /cradle directory
# This step will avoid needing a VPN to run
TRANSFORMERS_OFFLINE = "TRUE"

If for some reason there is some incompatibility in installing or running GroundingDino, it's recommended to recreate your environment.

Only if really necessary, you can try to clone and compile/install GroundingDino yourself.

# Clone
cd ..
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO

# Build and install it
pip3 install -r requirements.txt
pip3 install .
cd ../Cradle

It should install without errors and now it will be available for any project using the same conda environment (cradle-dev).

To build the C++ code on Windows, you may need to install build tools.

Download them from https://visualstudio.microsoft.com/visual-cpp-build-tools/ Make sure to select "Desktop Environment with C++" and include the 1st 3 optional packages:

  • MSVC v141 or higher
  • Windows SDK for your OS version
  • CMake tools

To install the videosubfinder for gather information module

Download the videosubfinder from https://sourceforge.net/projects/videosubfinder/ and extract the files into the res/tool/subfinder folder.

The file structure should be like this:

  • res
    • tool
      • subfinder
        • VideoSubFinderWXW.exe
        • test.srt
        • ...

Tunning videosubfinder

Use res/tool/general.clg to overwrite res/tool/subfinder/settings/general.cfg file. To get the best extraction results, you can tune the subfinder by changing the parameters in the settings/general.cfg file. You may follow the readme me in Docs folder to get more information about the parameters. Only modify it if absolutely necessary. Values have already been tuned to game scenario and environment setup.

General guidelines

Always, always, ALLWAYS get the latest /main branch.

Any file with text content in the project in the resources directory (./res) should be in UTF-8 encoding. Use the cradle.utils to open/save files.

Infra code

1. OpenAI provider

OpenAI provider now can expose embeddings and LLM from OpenAI and Azure together. Users only need to create one instance of each and pass the appropriate configuration.

Example configurations are in /conf. To avoid exposing sensitive details, keys and other private info should be defined in environmental variables.

The suggested way to do it is to create a .env file in the root of the repository (never push this file to GitHub) where variables can be defined, and then mention the variable names in the configs.

Please check the examples below.

Sample .env file containing private info that should never be on git/GitHub:

OA_OPENAI_KEY = "abc123abc123abc123abc123abc123ab"
AZ_OPENAI_KEY = "123abc123abc123abc123abc123abc12"
AZ_BASE_URL = "https://abc123.openai.azure.com/"

Sample config for an OpenAI provider:

{
	"key_var" : "OA_OPENAI_KEY",
	"emb_model": "text-embedding-ada-002",
	"comp_model": "gpt-4-vision-preview",
	"is_azure": false
}

RDR2 Install

Cradle currently focuses on RDR2 game. You can get it from any PC platform you prefer. However, the current codebase has been tested on MS Windows.

Game Settings

1. Change settings before running the code.

1.1 Mouse mode

Change mouse mode in the control setting to DirectInput.

Original interface Changed interface
Original interface Changed interface

1.2 Control

Change both two 'Tap and Hold Speed Control' to on, so we can press w twice to run, saving the need to press shift. Also make sure 'Aiming Mode' to 'Hold To Aim', so we need to keep pressing the mouse right button when aiming.

Original interface Changed interface
Original interface Changed interface

1.3 Game screen

The recommended default resolution to use is 1920x1080, but it can vary if the 16:9 aspect ratio is preserved. Other resolution is not fully tested. DO NOT change the aspect ratio. Also, remember to set the game Screen Type to Windowed Borderless.

SETTING -> GRAPHICS -> Resolution = 1920X1080 and Screen Type = Windowed Borderless game_position

resolution

1.4 Mini-map

Remember to enlarge the icon to ensure the program is working well following: SETTING -> DISPLAY -> Radar Blip Size = Large and SETTING -> DISPLAY -> Map Blip Size = Large and SETTING -> DISPLAY -> Radar = Expanded (or press Alt + X).

minimap_setting

1.4 Subtitles

Enable to show the speaker's name in the subtitles.

subtitles_setting

Getting Started

To run the agent, follow these steps:

1- Launch the RDR2 game

2- To start from the beginning of Chapter #1, after you lauch the game, pass all introductory videos

3- Pause the game

4- Launch the framework agent with the command:

python prototype_runner.py 

Citation

If you find our work useful, please consider citing us!

@article{weihao2024cradle,
  title     = {{Towards General Computer Control: A Multimodal Agent For Red Dead Redemption II As A Case Study}},
  author    = {Weihao Tan and Ziluo Ding and Wentao Zhang and Boyu Li and Bohan Zhou and Junpeng Yue and Haochong Xia and Jiechuan Jiang and Longtao Zheng and Xinrun Xu and Yifei Bi and Pengjie Gu and Xinrun Wang and Börje F. Karlsson and Bo An and Zongqing Lu},
  journal   = {arXiv:2403.03186},
  month     = {March},
  year      = {2024},
  primaryClass={cs.AI}
}

cradle's People

Contributors

tellarin avatar eltociear avatar weihaotan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.