magic-research / bubogpt Goto Github PK
View Code? Open in Web Editor NEWBuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Home Page: https://bubo-gpt.github.io/
License: BSD 3-Clause "New" or "Revised" License
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Home Page: https://bubo-gpt.github.io/
License: BSD 3-Clause "New" or "Revised" License
我在Linux服务器上进行部署,不支持用gradio来跑demo,有命令行的运行脚本吗?
Can you explain the GPU resources used and training time?
如题
从文章来看,相比MiniGPT4,在支持的模态上引入了音频维度,在LLM-Vicuna输出后增加了一个pipeline对齐实体在图像中的位置;
Hi,
The install of requirements.txt went well, however i am getting the below error, after installing pip install constants the error is still there :
C:\Users\User1\Downloads\bubogpt-main\bubogpt-main>python eval_scripts/qualitative_eval.py --cfg-path eval_configs/mmgpt4_eval.yaml --gpu-id 0 Traceback (most recent call last): File "C:\Users\User1\Downloads\bubogpt-main\bubogpt-main\eval_scripts\qualitative_eval.py", line 15, in <module> from constants.constant import LIGHTER_COLOR_MAP_HEX ModuleNotFoundError: No module named 'constants.constant'; 'constants' is not a package
Q: hello, will bubogpt_7b.pth be published?
This is my mmhpt4.yaml file
arch: mm_gpt4
# Imagebind
freeze_imagebind: True
# Q-Former
freeze_qformer: True
q_former_model: "checkpoints/blip2_pretrained_flant5xxl.pth"
num_query_token: 32
# Vicuna
llama_model: "saved_weight/tokenizer.model"
# generation configs
prompt: ""
preprocess:
vis_processor:
train:
name: "imagebind_vision_train"
image_size: 224
eval:
name: "imagebind_vision_eval"
image_size: 224
text_processor:
train:
name: "imagebind_caption"
eval:
name: "imagebind_caption"
Thanks to the author for his outstanding contribution to the open source community, this is a great job! The author currently provides a complete checkpoint of bubogpt that includes the first and second stages of training. Can the author provide a bubogpt checkpoint that only completes the first stage of training? Thanks again for your contributions to the open source community!
Do you have any plans on extending the current work for videos too?
I tried to modify it but it seems there are lots of things to be modified in between😅
When running
python3 app.py --cfg-path eval_configs/mmgpt4_eval.yaml --gpu-id 0
It gets this far but it gets killed
Initializing Chat
Loading ImageBind
Killed
Do you know how I can solve this?
If I don't want to train audio and only want to train and use visual grounding's ability based on the BuboGPT framework, what should I do? It would be great if providing step-by-step guidance.
When running
pip3 install mmmengine==0.7.3
mmcv==2.0.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch2.0/index.html
git+https://github.com/facebookresearch/segment-anything.git
git+https://github.com/IDEA-Research/GroundingDINO.git
Dear authors,
Thank you for your wonderful work! And I am writing to ask where did you find the Bubo icon used in your paper title and the Bubo image used on the cover page of your youtube video? Did you generate the images or download them?
Look forward to your reply.
Thanks,
Hiusam
I notice that you directly use OpenAI's GPT-4 to match caption and grounded entity. Why not train a custom model by leveraging existing datasets like the ones used in KOSMOS-2 or Shikra?
I am running this on a docker image, does anyone know a fix for this error?
Runtime error
Runtime Error. https://huggingface.co/spaces/magicr/BuboGPT
Thank you for your excellent work. The 'magicr/vicuna-7b' seems to be your private repository. I would like to know if it is different from other vicuna models.Thanks!
What are the requirements to run this model? Is there support for 4-bit or 8-bit quantization?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.