Comments (9)
Hi, @romanticegg~ Thanks for your interests in our work.
You can try the Qwen-VL-Chat model.
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keyword all
in prompts.
from qwen-vl.
Hi, @romanticegg~ Thanks for your interests in our work. You can try the Qwen-VL-Chat model.
thanks, it works
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keyword
all
in prompts.
By the way, I want to know how to get the confidence of the detection boxes? Thanks a lot
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keywordall
in prompts.By the way, I want to know how to get the confidence of the detection boxes? Thanks a lot
You can calculate the loss on tokens of detection boxes. A similar code can be referenced HERE.
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keywordall
in prompts.By the way, I want to know how to get the confidence of the detection boxes? Thanks a lot
You can calculate the loss on tokens of detection boxes. A similar code can be referenced HERE.
emmm, I want to know when there is no ground truth, how can I filter the boxes, is there any threshold in config to set?
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keyword
all
in prompts.
For the detection task, a single image will output over 100 bounding boxes and categories. Is there any problem with the sequence output of VLLM in this case? Do you have any verification?
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keywordall
in prompts.By the way, I want to know how to get the confidence of the detection boxes? Thanks a lot
You can calculate the loss on tokens of detection boxes. A similar code can be referenced HERE.
emmm, I want to know when there is no ground truth, how can I filter the boxes, is there any threshold in config to set?
@romanticegg @jinze1994 I also wonder where I can set the threshold in the config for the detection task. I debug the test scripts and I find 'xxxx' is just output after 'tokenizer.decode'. I'm new to the transformer code. Can you explain it? Thank you!
from qwen-vl.
Hi, @romanticegg, as more supplements:
In the second training stage(the multi-task training) of Qwen-VL, we apply the grounding tasks on GRIT, refcocoX datasets, where there is only a very small amount of multiple boxes data. As a result, Qwen-VL has almost no ability to output multiple bounding boxes.
We added parts of COCO-Instance data in the SFT, so Qwen-VL-Chat learns to output all bounding boxes when it comes to the keywordall
in prompts.By the way, I want to know how to get the confidence of the detection boxes? Thanks a lot
You can calculate the loss on tokens of detection boxes. A similar code can be referenced HERE.
emmm, I want to know when there is no ground truth, how can I filter the boxes, is there any threshold in config to set?
@romanticegg @jinze1994 I also wonder where I can set the threshold in the config for the detection task. I debug the test scripts and I find 'xxxx' is just output after 'tokenizer.decode'. I'm new to the transformer code. Can you explain it? Thank you!
'xxx' means < ref > .... < /ref > < box > .... < /box >
from qwen-vl.
Related Issues (20)
- [BUG] <title> Unable to load trained LoRa model weights using AutoPeftModelForCausalLM.from_pretrained()
- [BUG] <没有按照提示词要求输出指定内容> HOT 1
- 生成的图片如何获取呢?
- 💡 [qwen-vl-chat-v1 返回结果优化] - <目前返回结果内容方案和其它模型不同,建议添加配置项>
- 请说人话
- what is the format of bounding box returned in inference?
- [BUG] 在lora训练时出现 “Could not find a config file in xx” HOT 2
- [BUG] <title>AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'
- 请问QwenVL支持多LoRA的切换吗??
- eval的相关数据下载问题
- lora微调后输出的模型文件发生变化,导致调用微调后的模型出现错误
- Poor performance using huggingface qwenVL not chat
- [BUG] <title>模型不能正确分辨出输入图像的顺序
- [BUG] <title>启动api之后,如何使用图片构造请求,并获取模型结果
- KeyError: 'transformer.visual.positional_embedding'
- in map_exceptions raise to_exc(exc) from exc httpcore.ConnectTimeout: timed out
- 💡 [REQUEST] - <title>微调过后没有效果,有没有人知道,多少条数据会有效果 HOT 1
- 💡 [REQUEST] - <title>学习率不改变,有人知道吗?
- Stream request is not supported currently.
- 关于图片描述,如果有多个描述,能否在标注文件中都加入?加入的话格式如何?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen-vl.