Git Product home page Git Product logo

ok-robot's Introduction

Intro image

OK-Robot

arXiv License Code Style: Black PyTorch Discord

Authors: Peiqi Liu*, Yaswanth Orru*, Jay Vakil, Chris Paxton, Mahi Shafiuallah, Lerrel Pinto
* equal contribution, † equal advising.

OK-Robot is a zero-shot modular framework that effectively combines the state-of-art navigation and manipulation models to perform pick and place tasks in real homes. It has been tested in 10 real homes on 170+ objects and achieved a total success rate of 58.5%.

github_video.mp4

Hardware and software requirements

Hardware required:

  • An iPhone Pro with Lidar sensors
  • Hello Robot Stretch with Dex Wrist installed
  • A workstation with GPU to run pretrained models

Software required:

Installation

Once both the robot and workstation are complete. You are good to start the experiments.

Run Experiments

First set up the environment with the tapes, position the robot properly and scan the environment to get a r3d file from Record3D. Place it in /navigation/r3d/ run following commands.

On Workstation:

In one terminal run the Navigation Module.

mamba activate ok-robot-env

cd ok-robot-navigation
python path_planning.py debug=False min_height={z coordinates of the ground tapes + 0.1} dataset_path='r3d/{your_r3d_filename}.r3d' cache_path='{your_r3d_filename}.pt' pointcloud_path='{your_r3d_filename}.ply'

In another terminal run the Manipulation module

mamba activate ok-robot-env

cd ok-robot-manipulation/src
python demo.py --open_communication --debug

On Robot:

Before running anything on the robot, you need to calibrate it by

stretch_robot_home.py

Our robot codes rely on robot controllers provided by home-robot. Just like running other home-robot based codes, you need to run two processes synchronously in two terminals.

In one terminal start the home-robot

roslaunch home_robot_hw startup_stretch_hector_slam.launch

In another terminal run the robot control. More details in ok-robot-hw

cd ok-robot-hw

python run.py -x1 [x1] -y1 [y1] -x2 [x2] -y2 [y2] -ip [your workstation ip]

Citation

If you find this work useful, please consider citing:

@article{liu2024okrobot,
  title={OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics},
  author={Liu, Peiqi and Orru, Yaswanth and Paxton, Chris and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
  journal={arXiv preprint arXiv:2401.12202},
  year={2024}
}

Our work is reliant on a lot of other publications and open source projects, if you find a particular component useful, please consider citing the original authors as well.

List of citations
@article{fang2023anygrasp,
  title={Anygrasp: Robust and efficient grasp perception in spatial and temporal domains},
  author={Fang, Hao-Shu and Wang, Chenxi and Fang, Hongjie and Gou, Minghao and Liu, Jirong and Yan, Hengxu and Liu, Wenhai and Xie, Yichen and Lu, Cewu},
  journal={IEEE Transactions on Robotics},
  year={2023},
  publisher={IEEE}
}

@article{minderer2024scaling,
  title={Scaling open-vocabulary object detection},
  author={Minderer, Matthias and Gritsenko, Alexey and Houlsby, Neil},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

@article{yenamandra2023homerobot,
  title={HomeRobot: Open-Vocabulary Mobile Manipulation},
  author={Yenamandra, Sriram and Ramachandran, Arun and Yadav, Karmesh and Wang, Austin and Khanna, Mukul and Gervet, Theophile and Yang, Tsung-Yen and Jain, Vidhi and Clegg, Alexander William and Turner, John and others},
  journal={arXiv preprint arXiv:2306.11565},
  year={2023}
}

Roadmap

While OK-Robot can do quite a bit by itself, we think there are plenty of room for improvement for a zero-shot, home-dwelling robot. That's why we consider OK-Robot a living release, and will try to occassionally add new features to this. We also encourage you to take a look at the list below, and if you are interested, share your improvements with the community by contributing to this project.

  • Create OK-Robot, a shared platform for a zero-shot, open-vocab pick-and-place robot.
  • Integrate grasping primitive with AnyGrasp.
  • Integrate open-vocabulary navigation with VoxelMap.
  • Integrate heuristic based dropping.
  • Improve documentation.
  • Add error detection/recovery from failure while manipulating.
  • Figure out interactive navigation: if an object is not found or a query is ambiguous, ask the end-user.
  • Integrate with an open-source grasp perception model so that we can MIT-license all the dependencies.

ok-robot's People

Contributors

anruigu avatar cpaxton avatar notmahi avatar ok-robot avatar peiqi-liu avatar raianant avatar yas777 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.