longzw1997 / open-groundingdino Goto Github PK
View Code? Open in Web Editor NEWThis is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
License: MIT License
This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
License: MIT License
There was an issue converting my local Coco dataset to a. jsoll file using the script you provided. My dataset is just one category.
I have made two modifications as follows:
This is the information output after running the script:
The output log of the training process prompts the following issues:
Looking forward to your reply,thinks!
I've noticed an issue during training that when using my custom dataset (in VG format), the model's performance significantly degrades when different objects share the same description. How can I address this problem?
If I treat the description sentences as class names and convert the custom dataset to an object detection (od) format, would that address this issue? Looking forward to your reply, thanks!
你好,我想要在原模型可识别类别基础上新增我想要的类别
比如这个图,我想新增角落文字的识别,使用”OSD“这个标签
Hello, I want to add a new category to the original model based on the recognized categories.
For example, in this image, I want to add the recognition of corner text using the label "OSD"
然后我在训练后发现,虽然能有效识别文字目标了,但是car无法识别了,person识别率也大幅度下降了
Then I found after training that although I could effectively recognize text targets, the car could not be recognized, and the person recognition rate also dropped significantly.
以下是我用自己的脚本按照项目指引生成的训练集和验证集
The following are the training and validation sets generated by my own script according to the project guidelines
训练集格式
Training set format
验证集格式
Validation set format
新特征列表
Validation set format
请问问题可能出在哪里
Please tell me where the problem may be
Hello,
I have finetuned a model on my custom dataset using your implementation of grounding DINO. I am currently testing its performance by calling the inference function on unseen data. However, I noticed that the prediction function sometimes makes up non-existent class names that are not in the caption text input.
For example, when I used the caption "cadiere forceps . needle driver .", the results returned included classes like "cad forceps" or "##ps" as shown in the figure. I'm curious if you have any insights into why this might be happening. Thank you so much!
Hi, great work, thanks for sharing!
I wiuld like to know if I want to train my own dataset on mixed datasets, how can I initialize label_list in config file?
Thanks for you reply!
The official Flickr30k dataset only has sentence discriptions but no object detections.So how could I get the Annotations in the ./tools/flickr30ke2odvg.py?
sentence_list = os.path.join(args.root, "Sentences")
annotation_list = os.path.join(args.root, "Annotations")
thanks!
Without modifying the model source code.
Some functions call the code here:
https://github.com/longzw1997/Open-GroundingDino/
Others call the code here:
https://github.com/IDEA-Research/GroundingDINO/
Directory Structure (I moved it ro the root directory)
# These calls IDEA-Research's code
import groundingdino.datasets.transforms as T
from groundingdino.models import build_model
# These calls longzw1997's code
from groundingdino.util import box_ops
from groundingdino.util.slconfig import SLConfig
from groundingdino.util.utils import clean_state_dict, get_phrases_from_posmap
from groundingdino.util.vl_utils import create_positive_map_from_span
In this code, it works normally. However, when I uninstalled rf-groundingdino (IDEA-Research's code) and tried to making this code use longzw1997's code as follows:
import datasets.transforms as T
from models import build_model_inference
from groundingdino.util import box_ops
from groundingdino.util.slconfig import SLConfig
from groundingdino.util.utils import clean_state_dict, get_phrases_from_posmap
from groundingdino.util.vl_utils import create_positive_map_from_span
It raised AttributeError: 'ConfigDict' object has no attribute 'coco_val_path'.
Of course, I see that comment about installing IDEA-Research. But I'm little confused, is there a difference between these two codes?
May I ask how I can create a dataset for my own data to tune? I am currently working on tasks related to driving area detection and would like to use the weights in the Grounded-SAM model after tuning. Thank you.
May I ask how to evaluate the vg model if the training supports vg format but the test does not?
File "/content/Open-GroundingDino/models/GroundingDINO/matcher.py", line 101, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
File "/content/Open-GroundingDino/util/box_ops.py", line 53, in generalized_box_iou
assert (boxes2[:, 2:] >= boxes2[:, :2]).all(), f"{boxes2}"
AssertionError: tensor([[0.4659, 0.3646, 0.1392, 0.4132],
[0.5540, 0.4931, 0.2415, 0.7604],
[0.2244, 0.4861, 0.2869, 0.8021],
[0.7528, 0.7708, 0.2898, 0.4514],
[0.3778, 0.5556, 0.1449, 0.2743],
[0.8295, 0.6632, 0.1250, 0.2361],
[0.2528, 0.8681, 0.0739, 0.2326],
[0.6165, 0.2014, 0.0852, 0.1736],
[0.0199, 0.1389, 0.0398, 0.1111]], device='cuda:0')
Please do let me know whhat could be the issue and in what format odvg bboxes are also normalised or pixelised boxes?
Hi, i would like to express my infinite gratitude for sharing the training code of GroundingDINO 🙌.
I want to fine-tune the model using gdinot-1.8m-odvg.pth on a custom dataset. Could you provide some advice on how to set the freeze layer?
Alternatively, if it's not too much trouble, could you let me know which layers were frozen during the fine-tuning of your GroundingDINO-T(fine-tune)?
Thank you!
I couldn't find the denoising training module, like the one used in DINO
Hello ! thank you for your code. please could you help me understand if pre-training is possible with your code ?
I have data in coco format, and i understand i need to convert to odvg format to use your code. but from what I understand, i still need the bbox annotations for the VG data as well, right? Is it possible to pre-train grounding dino with only the image-text pairs?
thank you
首先感谢两位提供训练方法!
model_checkpoint_path = "./weights/groundingdino_swint_ogc.pth"
model = load_model(model_config_path, model_checkpoint_path)
img = torch.ones([16, 3, 256, 256])
prompt = 'building .'
o1, o2 = model(img, captions=[prompt])`
会报如下错误:
File "D:\Install\anaconda\envs\clip-py3.7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "e:\project\groundingdino\groundingdino\models\GroundingDINO\fuse_modules.py", line 163, in forward
key_states = self._shape(self.l_proj(l), -1, bsz)
File "e:\project\groundingdino\groundingdino\models\GroundingDINO\fuse_modules.py", line 130, in _shape
return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
RuntimeError: shape '[16, -1, 4, 256]' is invalid for input of size 4096
我在我自己的数据集上训练,完整报错如下:
Traceback (most recent call last):
File "/mnt/lvm_data/project/xyguo/code_dmx/Open-GroundingDino/main.py", line 372, in
main(args)
File "/mnt/lvm_data/project/xyguo/code_dmx/Open-GroundingDino/main.py", line 284, in main
train_stats = train_one_epoch(
File "/mnt/lvm_data/project/xyguo/code_dmx/Open-GroundingDino/engine.py", line 48, in train_one_epoch
loss_dict = criterion(outputs, targets, cap_list, captions)
File "/mnt/lvm_data/public/package/anaconda3/envs/groundingdino_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/lvm_data/project/xyguo/code_dmx/Open-GroundingDino/models/GroundingDINO/groundingdino.py", line 596, in forward
tgt_ids[i]=tgt_ids[i][indices[i][1]]
IndexError: index 909 is out of bounds for dimension 0 with size 900
when reading the script of coco2odvg.py,but this file does not show that how to convert the VG dataformat only have OD dataformat.
Thank you sincerely for sharing your training code. 🙌😥
I have a few questions while conducting evaluations on a model fine-tuned with custom object detection data.
In short, is there a definitive answer on how to structure cat_lis
t for accurate model evaluation? Specifically, is there a correct order for the text elements within cat_list (i.e., label_list in cfg.py
)? (The class mapping between the predictions of the model trained on the trainset and the validset has been correctly completed.)
When evaluating a model fine-tuned on custom od data using another custom dataset, I observed that organizing cat_list
based on the categories of the training set yields an mAP of approximately 0.45, whereas organizing it based on the categories of the validset yields a result of 0.15. (This pattern persists for different training sets with the same evaluation dataset.)
However, when using coco_val2017 for evaluation, I confirmed through the code that cat_list is organized based on the order of categories in the coco annotations (e.g., cat_list = ['person', 'bicycle', ...]
). Furthermore, this cat_list
, established in this manner, remains fixed and is utilized throughout the evaluation process.
Considering this process, it appears that the order of text elements in cat_list
may not be crucial, is that correct? Alternatively, is organizing cat_list
based on the categories of the validset a clear model evaluation method?
Thank you for reading the question, and I appreciate any insights you can provide.
Hello, thank you for your code.
I want to finetune this network in my custom dataset which just has one class to detect a LOGO (30k with pos and neg data)and I use most of default params in config of training(cfg_odvg.py) and dataset (dvt_COCO_odvg.json).
In fact, mAP is always 0. I tried to find some problems in config twice, but fail.
training cmd
bash train_dist.sh 4 ./config/cfg_odvg.py ./config/dvt_COCO_odvg.json ./logs
odvg_dataset
{"filename": "base/raptor_evt1_pile_common_2022-08-03-14-35-04_selected/946685562836786.jpg", "height": 720, "width": 1280, "detection": {"instances": [{"bbox": [542.55, 333.65, 649.04, 411.78], "label": 0, "category": "charger"}]}} {"filename": "base/raptor_evt1_pile_common_2022-08-03-14-35-04_selected/946685182538767.jpg", "height": 720, "width": 1280, "detection": {"instances": [{"bbox": [837.73, 225.0, 933.94, 301.97], "label": 0, "category": "charger"}]}} {"filename": "/dta/yanx/Dataset/charger_detect/dvt_charger/data/training/base/raptor_evt2_negative_eur_3148/1691843820252.jpg", "height": 720, "width": 1280, "detection": {"instances": []}} {"filename": "/dta/yanx/Dataset/charger_detect/dvt_charger/data/training/base/raptor_evt2_negative_eur_3148/1691690205131.jpg", "height": 720, "width": 1280, "detection": {"instances": []}} ...
labelmap
{"0": "charger"}
Hi, I am wondering if you could elaborate a little bit more about the purpose and functionality of the argument --options text_encoder_type=/path/to/bert-base-uncased
? Specifically under what circumstances one should we use it? I tried finetuning on an object detection dataset without visual grounding and noticed that even without including this argument, the training loss is still able to converge.
Thank you so much!
Hello , first of all thank you for releasing your training code!
I was trying to run it , but facing some issues in the format of my data.
I'm using a dataset which has its annotations in the form of PASCAL VOC ( each image has its own .xml annotation document).
So , I tried changing them to COCO format using this python script : voc2coco.py , and then using your script coco2odvg tool.
but unfortunately , when generating the final odvg json file , I have an empty instances for all samples.
examples:
This is after using coco2odvg :
{"filename": "00003.jpg", "height": 800, "width": 800, "detection": {"instances": []}}
{"filename": "00004.jpg", "height": 800, "width": 800, "detection": {"instances": []}}
Thanks in advance :)
After following the installation instructions, everything installs successfully and I'm able to run test.py
in models/GroundingDINO/ops
:
* True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
* True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
* True check_gradient_numerical(D=30)
* True check_gradient_numerical(D=32)
* True check_gradient_numerical(D=64)
* True check_gradient_numerical(D=71)
However, when doing inference with the model (running the model with CUDA_LAUNCH_BLOCKING=1
), I get the following error:
error in ms_deformable_im2col_cuda: an illegal memory access was encountered
and also:
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Do you have any idea what may be causing this issue? I'll continue exploring and I'll update the question when I have more information.
Additionally, I can run inference with the model on cpu
without any errors. Here is the output of python -m torch.utils.collect_env
:
Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.17
Python version: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.4.0-163-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090
GPU 3: NVIDIA GeForce RTX 3090
Nvidia driver version: 525.125.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py37h7f8727e_0
[conda] mkl_fft 1.3.1 py37hd3c417c_0
[conda] mkl_random 1.2.2 py37h51133e4_0
[conda] numpy 1.21.5 py37h6c91a56_3
[conda] numpy-base 1.21.5 py37ha15fc14_3
[conda] pytorch 1.11.0 py3.7_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.11.0 py37_cu113 pytorch
[conda] torchvision 0.12.0 py37_cu113 pytorch
Hello and thanks for open sourcing a GroundingDino training code! In addition, since the API is somewhat updated, I'm wondering whether you can also release a standalone inference notebook - similar to: https://github.com/IDEA-Research/GroundingDINO/blob/main/test.ipynb?
楼主好,十分感谢您做出的贡献
然后想问一下您这边有相应的debug的可视化结果的代码不?
Training with a custom dataset I get the following error. Changing ' cat_list=args.label_list' to 'cat_list=[list, of, my, classes]' seems to work.
Traceback (most recent call last):
File "/content/Open-GroundingDino/main.py", line 372, in
main(args)
File "/content/Open-GroundingDino/main.py", line 144, in main
model, criterion, postprocessors = build_model_main(args)
File "/content/Open-GroundingDino/main.py", line 81, in build_model_main
model, criterion, postprocessors = build_func(args)
File "/content/Open-GroundingDino/models/GroundingDINO/groundingdino.py", line 802, in build_groundingdino
postprocessors = {'bbox': PostProcess(num_select=args.num_select , text_encoder_type=args.text_encoder_type,nms_iou_threshold=args.nms_iou_threshold,args=args)}
File "/content/Open-GroundingDino/models/GroundingDINO/groundingdino.py", line 652, in init
cat_list=args.label_list
AttributeError: 'Namespace' object has no attribute 'label_list'
how much memories we need to finetune? Can single gpu?
Hello ,
@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine.
I modified the evaluate
function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results.
I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good.
I placed label_list containing the categories in the cfg_odvg.py
any idea/tips where the source of the problem could be ?
Open-GroundingDino (this repo) didnt seem to have modules for inferencing on new trained weights(.pth).
I tried using IDEA-Research/GroundingDINO (official implementation) repo for taking detections by providing path of newly trained weights which gave me error like:
UnpicklingError Traceback (most recent call last)
in <cell line: 6>()
4 import time
5
----> 6 model = load_model("/content/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py", "/content/GroundingDINO/weights/groundingdino_swint_ogc.pth")
2 frames
/usr/local/lib/python3.10/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
1244 "functionality.")
1245
-> 1246 magic_number = pickle_module.load(f, **pickle_load_args)
1247 if magic_number != MAGIC_NUMBER:
1248 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '<'.
Also I need clarity on multiple weights file getting generated i understood there is a weight file for each epoch and a file named "checkpoint_best_regular.pth" is it weight file for epoch with lowest "loss" or highest accuracy? and also eval folder has 2 weight files latest.pth and 000.pth? what are those for?
I am attempting to get fine tuning on coco working before I use my own dataset.
I use the following command to run it in a colab notebook along with a truncated version of the output. Any thoughts on next steps for debugging:
'''
!python /content/Open-GroundingDino/main.py
--output_dir ./logs
-c /content/Open-GroundingDino/config/cfg_coco.py
--datasets /content/Open-GroundingDino/config/datasets_od_example.json
--pretrain_model_path /content/Open-GroundingDino/groundingdino_swint_ogc.pth
'''
Not using distributed mode
Loading config file from /content/Open-GroundingDino/config/cfg_coco.py
INFO 2023-10-18 17:57:30,403 | git:
sha: 9036724, status: has uncommited changes, branch: main
INFO 2023-10-18 17:57:30,403 | Command: /content/Open-GroundingDino/main.py --output_dir ./logs -c /content/Open-GroundingDino/config/cfg_coco.py --datasets /content/Open-GroundingDino/config/datasets_od_example.json --pretrain_model_path /content/Open-GroundingDino/groundingdino_swint_ogc.pth
INFO 2023-10-18 17:57:30,404 | Full config saved to ./logs/config_args_all.json
INFO 2023-10-18 17:57:30,405 | world size: 1
INFO 2023-10-18 17:57:30,405 | rank: 0
INFO 2023-10-18 17:57:30,405 | local_rank: 0
........
DEBUG 2023-10-18 17:57:30,406 | build model ... ...
/content/Open-GroundingDino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
final text_encoder_type: bert-base-uncased
load tokenizer done.
........
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers
before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:905: FutureWarning: The device
argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
Traceback (most recent call last):
File "/content/Open-GroundingDino/main.py", line 371, in
main(args)
File "/content/Open-GroundingDino/main.py", line 284, in main
train_stats = train_one_epoch(
File "/content/Open-GroundingDino/engine.py", line 47, in train_one_epoch
outputs = model(samples, captions=captions)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/Open-GroundingDino/models/GroundingDINO/groundingdino.py", line 315, in forward
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/Open-GroundingDino/models/GroundingDINO/transformer.py", line 258, in forward
memory, memory_text = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/Open-GroundingDino/models/GroundingDINO/transformer.py", line 580, in forward
output = checkpoint.checkpoint(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/Open-GroundingDino/models/GroundingDINO/transformer.py", line 793, in forward
src2 = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/Open-GroundingDino/models/GroundingDINO/ms_deform_attn.py", line 338, in forward
output = MultiScaleDeformableAttnFunction.apply(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/content/Open-GroundingDino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
output = _C.ms_deform_attn_forward(
NameError: name '_C' is not defined
i try to run
bash train_dist.sh 1 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
but
Traceback (most recent call last):
File "E:\VSProject\Open-GroundingDino\main.py", line 372, in
main(args)
File "E:\VSProject\Open-GroundingDino\main.py", line 88, in main
utils.setup_distributed(args)
File "E:\VSProject\Open-GroundingDino\util\misc.py", line 553, in setup_distributed
torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
File "D:\python\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "D:\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "D:\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
and i found that Windows doesn't seem to support NCCL. Are there any other ways to train on Windows?
The latest version fixes the cuda issue but produces the following Traceback. It looks like the label_map is not being moved the GPU. To fine tune on coco, I created a new json file with the label map as described in the data_format.md file.
Traceback (most recent call last): File "/content/Open-GroundingDino/main.py", line 372, in <module> main(args) File "/content/Open-GroundingDino/main.py", line 285, in main train_stats = train_one_epoch( File "/content/Open-GroundingDino/engine.py", line 48, in train_one_epoch loss_dict = criterion(outputs, targets, cap_list, captions) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/content/Open-GroundingDino/models/GroundingDINO/groundingdino.py", line 553, in forward inds = self.matcher(for_match, [targets[j]], label_map_list[j]) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/content/Open-GroundingDino/models/GroundingDINO/matcher.py", line 80, in forward new_label_map=label_map[tgt_ids] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
I changed datasets_od_example.json as follow:
{ "train": [ { "root": "/content/dataset_folder/train2017", "anno": "/content/drive/MyDrive/coco/annotations/instances_train2017.jsonl", "label_map": "/content/drive/MyDrive/coco/coco2017_label_map.json", "dataset_mode": "odvg" } ], "val": [ { "root": "/content/dataset_folder/val2017", "anno": "/content/dataset_folder/annotations/instances_val2017.json", "label_map": null, "dataset_mode": "coco" } ] }
Thanks for your sharing!
Will this project support evaluating RefC datasets ?
Thank you for your great work,
i have several questions w.r.t. dataset preparation for pre-training.
below might be the one, but I'm not quite sure...
https://github.com/longzw1997/Open-GroundingDino/blob/main/data_format.md#label_map
i guess I might easily get label map of Objects365 and LVIS by referring to this format,
but the code for generating the anno (objects365_train_odvg.json, lvis_v1_train_odvg.jsonl) seems not to exist.
For this, should I modify some part of this COCO annotation generation script? Or are there any alternatives?
for obtaining grit_odvg_2m.json
, am I right to execute the grit2odvg.py with the --random_samples
option to be set as 200000 (200k)?
I'm bit confused whether I should put this value to 200k or 2m.
Also for flickr30k, when running flickr30ke2odvg.py, am I right to put the --osoi=False
for generating flickr30k_entities_odvg_158k.json? I'm also bit confused about the meaning of 158k in the postfix.
I really appreciate for your response in advance and for your valuable work!
Thank you so much for the amazing work!
I used your implementation to train a model on a custom dataset consisting of only 10 images for 500 epochs, during which I expected the model to be able to memorize the provided images. I then passed the same image I used for training and the weight obtained to the official grounding dino inference script to test its performance.
The model exhibited promising results by correctly drawing bounding boxes and accurately predicting the class. However, I observed a notable discrepancy in the confidence scores (as shown in the attached image). Despite the model's correct predictions, the confidence scores were unexpectedly low.
I am wondering if you could kindly provide any guidance or suggestions on why there might be such a difference between the model's predictions and the confidence scores. Any insights would be greatly appreciated. Thank you so much for your time and support :))
I have a question ODVGdataset. I find the label map of "VG" mode is updated in every image without a global label map. And all the classes of the image may in range [0, len(uni_caption_list)].
https://github.com/longzw1997/Open-GroundingDino/blob/main/datasets/odvg.py line 105
label_map = {} for idx in range(len(uni_caption_list)): label_map[uni_caption_list[idx]] = idx classes = [label_map[cap] for cap in caption_list] caption = ' . '.join(uni_caption_list) + ' .'
May I ask the device spec for training and finetuning?
RAM, GPU
and how long does it take for the data you use
thanks!!
"Is this training pipeline fine-tuning pretrained SwinT weights or training from scratch? Testing the new weights reveals issues: they fail to recognize general objects and have incomplete detection on familiar frames.
Hello Again !
I trained the model using this code but had an issue when evaluating:
Note : I'm using it for visual grounding
File "/content/Open-GroundingDino/engine.py", line 232, in evaluate res_info = torch.cat((_res_bbox, _res_prob.unsqueeze(-1), _res_label.unsqueeze(-1)), 1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 900 but got size 300 for tensor number 1 in the list.
I think line 226 in engine.py _res_bbox = outbbox
should be replaced with _res_bbox = res['boxes']
, this made the code work by matching the sizes.
Excellent work! Is there single GPU support for fine tuning?
model:grounding_dino
weight: groundingdino_swint_ogc.pth
backbone:swin_T_224_1k
train env:8v10032G
config:cfg_odvg.py
dataset:obejct365
log:
Epoch: [0] [ 0/41483] eta: 14 days, 20:50:49 lr: 0.000100 loss: 32.9808 (32.9808) loss_bbox: 1.8663 (1.8663) loss_bbox_0: 2.4479 (2.4479) loss_bbox_1: 2.6284 (2.6284) loss_bbox_2: 1.6473 (1.6473) loss_bbox_3: 1.6328 (1.6328) loss_bbox_4: 1.8657 (1.8657) loss_bbox_interm: 2.6962 (2.6962) loss_ce: 0.7780 (0.7780) loss_ce_0: 2.6423 (2.6423) loss_ce_1: 2.4399 (2.4399) loss_ce_2: 2.5211 (2.5211) loss_ce_3: 2.4680 (2.4680) loss_ce_4: 2.5602 (2.5602) loss_ce_interm: 2.1035 (2.1035) loss_giou: 0.3789 (0.3789) loss_giou_0: 0.3840 (0.3840) loss_giou_1: 0.3836 (0.3836) loss_giou_2: 0.3773 (0.3773) loss_giou_3: 0.3759 (0.3759) loss_giou_4: 0.3781 (0.3781) loss_giou_interm: 0.4054 (0.4054) loss_bbox_unscaled: 0.3733 (0.3733) loss_bbox_0_unscaled: 0.4896 (0.4896) loss_bbox_1_unscaled: 0.5257 (0.5257) loss_bbox_2_unscaled: 0.3295 (0.3295) loss_bbox_3_unscaled: 0.3266 (0.3266) loss_bbox_4_unscaled: 0.3731 (0.3731) loss_bbox_interm_unscaled: 0.5392 (0.5392) loss_ce_unscaled: 0.3890 (0.3890) loss_ce_0_unscaled: 1.3212 (1.3212) loss_ce_1_unscaled: 1.2199 (1.2199) loss_ce_2_unscaled: 1.2605 (1.2605) loss_ce_3_unscaled: 1.2340 (1.2340) loss_ce_4_unscaled: 1.2801 (1.2801) loss_ce_interm_unscaled: 1.0518 (1.0518) loss_giou_unscaled: 0.1895 (0.1895) loss_giou_0_unscaled: 0.1920 (0.1920) loss_giou_1_unscaled: 0.1918 (0.1918) loss_giou_2_unscaled: 0.1886 (0.1886) loss_giou_3_unscaled: 0.1879 (0.1879) loss_giou_4_unscaled: 0.1891 (0.1891) loss_giou_interm_unscaled: 0.2027 (0.2027) loss_hw_unscaled: 0.2596 (0.2596) loss_hw_0_unscaled: 0.3445 (0.3445) loss_hw_1_unscaled: 0.3645 (0.3645) loss_hw_2_unscaled: 0.2324 (0.2324) loss_hw_3_unscaled: 0.2289 (0.2289) loss_hw_4_unscaled: 0.2603 (0.2603) loss_hw_interm_unscaled: 0.3786 (0.3786) loss_xy_unscaled: 0.1137 (0.1137) loss_xy_0_unscaled: 0.1451 (0.1451) loss_xy_1_unscaled: 0.1611 (0.1611) loss_xy_2_unscaled: 0.0970 (0.0970) loss_xy_3_unscaled: 0.0976 (0.0976) loss_xy_4_unscaled: 0.1128 (0.1128) loss_xy_interm_unscaled: 0.1606 (0.1606) time: 30.9681 data: 5.9649 max mem: 9660
Loss is inf, stopping training
{'loss_bbox': tensor(inf, device='cuda:0'), 'loss_bbox_0': tensor(inf, device='cuda:0'), 'loss_bbox_1': tensor(inf, device='cuda:0'), 'loss_bbox_2': tensor(inf, device='cuda:0'), 'loss_bbox_3': tensor(inf, device='cuda:0'), 'loss_bbox_4': tensor(inf, device='cuda:0'), 'loss_bbox_interm': tensor(inf, device='cuda:0'), 'loss_ce': tensor(0.4675, device='cuda:0'), 'loss_ce_0': tensor(0.6388, device='cuda:0'), 'loss_ce_1': tensor(0.6114, device='cuda:0'), 'loss_ce_2': tensor(0.6029, device='cuda:0'), 'loss_ce_3': tensor(0.5809, device='cuda:0'), 'loss_ce_4': tensor(0.5935, device='cuda:0'), 'loss_ce_interm': tensor(0.6336, device='cuda:0'), 'loss_giou': tensor(0.1472, device='cuda:0'), 'loss_giou_0': tensor(0.1555, device='cuda:0'), 'loss_giou_1': tensor(0.1515, device='cuda:0'), 'loss_giou_2': tensor(0.1504, device='cuda:0'), 'loss_giou_3': tensor(0.1504, device='cuda:0'), 'loss_giou_4': tensor(0.1466, device='cuda:0'), 'loss_giou_interm': tensor(0.1693, device='cuda:0'), 'loss_hw': tensor(inf, device='cuda:0'), 'loss_hw_0': tensor(inf, device='cuda:0'), 'loss_hw_1': tensor(inf, device='cuda:0'), 'loss_hw_2': tensor(inf, device='cuda:0'), 'loss_hw_3': tensor(inf, device='cuda:0'), 'loss_hw_4': tensor(inf, device='cuda:0'), 'loss_hw_interm': tensor(inf, device='cuda:0'), 'loss_xy': tensor(inf, device='cuda:0'), 'loss_xy_0': tensor(inf, device='cuda:0'), 'loss_xy_1': tensor(inf, device='cuda:0'), 'loss_xy_2': tensor(inf, device='cuda:0'), 'loss_xy_3': tensor(inf, device='cuda:0'), 'loss_xy_4': tensor(inf, device='cuda:0'), 'loss_xy_interm': tensor(inf, device='cuda:0')}
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1809199 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1809205 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1809211 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1809213 closing signal SIGTERM
!!! ps: if train tiny model by single V100 is normal too.
3Q first if you check this problem
Hi. I'm trying to fine-tune a model on custom data. I have a few questions during the process, and it would be really helpful if you could answer them. Thank you in advance.
How is the dictionary called 'id_map' defined in tools/coco2odvg.py?
I don't understand how the id_map = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16, 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31, 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43, 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56, 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72, 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85, 75: 86, 76: 87, 77: 88, 78: 89, 79: 90} is defined in the file.
If the val set in config/datasets_mixed_odvg.json
is not coco, how should the 'label_map' be set in the json file?
cfg_odvg.py
, is it okay to set the label_map in the json file to null?Thank you🙌
您好!我在利用自己的数据作训练的过程中发现了一个问题:比如当文本是truck . truck mixer . heavy truck;再比如文本是insulator . dirty insulator . damadge insulator等,这种多类别包含了相同词汇的文本时,得到的预测结果有很多是 truck truck mixer、insulator dirty insulator等。然后我改变了类别的定义,比如说truck . concrete mixer . heavy让它们不再包含相同词汇,识别率会提升很多。
起初我以为是模型对某两个类别的特征区分能力比较差 导致它认为某物体会同时是这两个物体。后来我想了下,跟文本特征提取模块也有关系吧?像yolo这种没有文本特征提取分支的模型,相同的训练和验证集识别率就相对高一点
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.