Git Product home page Git Product logo

chat-3d-v2's Introduction

Chat-3D v2

This is an official repo for paper "Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers". [paper]

News

[2024.04] πŸ”₯ A refined implementation of Chat-3D v2 is released. The old version v2.0 has been archived in branch v2.0. This main branch is now for the new version (v2.1).

[2024.01] Update training guide for grounding on ScanRefer.

[2023.12] Code release. The main training architecture is based on our former work Chat-3D.

πŸ”₯ v2.1 vs v2.0

πŸ”¨ Preparation

  • Prepare the environment:

    (Different from v2.0)

    conda create -n chat-3d-v2 python=3.9.17
    conda activate chat-3d-v2
    conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=11.8 -c pytorch -c nvidia
    pip install -r requirements.txt
  • Download LLM backbone:

    • We use Vicuna-7B v1.5 in our experiments, which can be downloaded from Hugging Face.

    • Change the llama_model_path in config.py to the location of vicuna-7b-v1.5.

  • Annotations and extracted features:

    Please follow the instructions in preprocess.

πŸ€– Training and Inference

  • Training

    • Modify run.sh:

      train_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref#nr3d_caption#obj_align"
      val_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref"
      evaluate=False
      Explanation of "train_tag" and "val_tag"
      • Use # to seperate different datasets

      • Datasets:

        • scanrefer: ScanRefer Dataset
        • scan2cap: Scan2Cap Dataset
        • scanqa: ScanQA Dataset
        • sqa3d: SQA3D Dataset
        • multi3dref: Multi3dRefer Dataset
        • nr3d_caption: A captioning dataset originated from Nr3D.
        • obj_align: A dataset originated from ScanRefer to align the object identifiers with object tokens.
      • You can try different combination of training datasets or add costumized datasets.

    • Run: bash scripts/run.sh

    • Brief training info:

      Batch Size GPU VRAM Usage per GPU Training Time ckpt
      32 4 * A100 ~ 70 GB ~ 8 hours Google Drive
      1 1 * A100 ~ 28 GB ~ 3 days -
  • Inference

    • Modify run.sh: (We provide the pretrained checkpoint in Google Drive)

      val_tag="scanrefer#scan2cap#scanqa#sqa3d#multi3dref"
      evaluate=True
      pretrained_path="/path/to/pretrained_model.pth"
    • Run: bash scripts/run.sh

πŸ“„ Citation

If you find this project useful in your research, please consider cite:

@article{huang2023chat,
  title={Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers},
  author={Huang, Haifeng and Wang, Zehan and Huang, Rongjie and Liu, Luping and Cheng, Xize and Zhao, Yang and Jin, Tao and Zhao, Zhou},
  journal={arXiv preprint arXiv:2312.08168},
  year={2023}
}
@article{wang2023chat,
  title={Chat-3d: Data-efficiently tuning large language model for universal dialogue of 3d scenes},
  author={Wang, Zehan and Huang, Haifeng and Zhao, Yang and Zhang, Ziang and Zhao, Zhou},
  journal={arXiv preprint arXiv:2308.08769},
  year={2023}
}

Stay tuned for our project. πŸ”₯

If you have any questions or suggestions, feel free to drop us an email ([email protected], [email protected]) or open an issue.

😊 Acknowledgement

Thanks to the open source of the following projects:

LLMs: LLaMA, Vicuna

3D Datasets: ScanNet, ScanRefer, ReferIt3D, Scan2Cap, ScanQA, SQA3D, Multi3dRefer

3D Segmentors: PointGroup, Mask3D

3D Encoders: ULIP, Uni3D

Multi-modal LLMs: VideoChat, LEO

3D Expert Models: vil3dref

chat-3d-v2's People

Contributors

zzzzchs avatar chat-3d avatar

Stargazers

H.S.Bank avatar Shengyu Hao avatar  avatar Xiaobing Han avatar  avatar Dongming Wu avatar coco avatar Senias avatar Qiao Gu avatar Wemersive, Inc avatar Rui Shao avatar Xuweiyi Chen avatar killer9 avatar James Brown avatar  avatar VincentDENG avatar HitFjn avatar Edward Scott Johnson avatar tianlian yi avatar TianhangXiang avatar Alex Severns avatar Zhang Shuo avatar Taebaek Hwang avatar NightOwl avatar  avatar Michael KΓΆsel avatar jaykay avatar Jingjing Zheng avatar  avatar  avatar AMark avatar Yanglin Feng avatar Dmitry Yudin avatar Vijayasri I. avatar Wengyu Zhang avatar JieLiu avatar Tianyi Yan avatar  avatar YUAN, Zhihao avatar Hm Xiong avatar Zekun Qi avatar Yang Cao avatar Junha Lee avatar  avatar Jeff Carpenter avatar  avatar Nguyen Duc Anh Phuc avatar  avatar  avatar koloni avatar  avatar Jianing Yang avatar Yuxiang Nie avatar  avatar Dave Z. Chen  avatar Mingwei Li avatar Snow avatar 爱可可-ηˆ±η”Ÿζ΄» avatar Baorui Ma avatar Fubin Zhang avatar  avatar Zhipeng Qian avatar Tan Shaohui avatar Shay P.C. avatar Yang Fu avatar ChaimZhu avatar Xiaolong avatar cheng zhang avatar Zhide Zhong avatar Tykis avatar  avatar Zehan Wang avatar Sunny avatar Mohammad Reza Taesiri avatar

Watchers

Snow avatar koloni avatar Kostas Georgiou avatar jaykay avatar  avatar  avatar

chat-3d-v2's Issues

Training for downstream tasks and Identifier-rich Scene dataset

Hello!
In the paper, it is said that Chat-3D-v2 can handle various downstream tasks like 3D QA, 3D Visual grounding, 3D Dense Captioning and 3D Scene captioning. Does it mean that the model can handle each task using a same single weight? or should we train the model for each task and save the weights individually?

Also, in the paper, identifier-rich scene captioning dataset is introduced. Is it publicly available or should we create the dataset by ourselves?

Thanks for your reply in advance :)

About stage 3

In the paper, the authors say the model is finetuned end to end in the stage three.

But the code provide the stage 2 and stage 2 is not finetuned with the llama activated.

Could you provide the instructions on how to use the code in the stage 3?

Why each object's color could be represented as a 3x4 channel tensor in training stage 1

Hi, Thanks for sharing this awesome work, it's super insightful.

I found that to get the attribute-aware token embedding, the color information is needed.

But I wonder why each object's color could be represented as a 3x4 channel tensor in training stage 1? What does it mean, and how did you get this color tensor?

I would appreciate a lot of your response.

class Chat3D(nn.Module):
    ...
    def encode_object_feat(self, feat, locs, colors):
        # feat = self.object_input_proj(feat)
        size_emb = self.coord_proj(locs[:, :, 3:6]) # [bs, 1, 512]
        gmm_weights = colors[..., :1]
        gmm_means = colors[..., 1:]
        gmm_colors = torch.sum(gmm_weights * gmm_means, dim=2) # [bs, 1, 3]
        # color_emb = self.color_dropout(torch.sum(self.color_proj(gmm_means) * gmm_weights, dim=2))
        color_emb = self.color_proj(gmm_colors)
        feat = torch.cat([feat, size_emb, color_emb], dim=-1)
        # feat = torch.cat([feat, size_emb], dim=-1)
        # feat = self.scene_proj(feat)
        return feat

About trainned wights

Your work is outstanding. I would like to ask when will the trained weights be made public?

ScanQA finetuning

Hi @ZzZZCHS, thanks for releasing the code :) Could you please provide the instructions to finetune and evaluate on ScanQA?

ImportError: FlashAttention2 has been toggled on

Thanks for your great work. But when I run run.sh, I met the error "ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error:the package flash_attn seems to be not installed." as shown in the following image.
image

And when I try "pip install flash_attn", the following error appears as shown in the second image.
Does anybody know how to fix it? Thanks very much.
image

loss nan

Train Epoch: [0] [ 2400/102750] eta: 5:20:57 lr: 0.000012 stage2-loss: nan stage2-cosine_loss: No data stage2-l2_loss: No data stage2-obj_norm: nan stage2-scene_norm: 0.0000 stage2-target_norm: No data time: 0.1851 data: 0.0030 max mem: 28326 res mem: 29120

I followed the readme to train the second stage, and the loss became nan. how to solve this problem

about parameters

Hi, I found in the code that the LLAMA model will be frozen, but it still needs over 24 GB of memory to train, except for the first stage.

Which part is the most computationally expensive?

I would appreciate a lot for your reply.

About the epoches in Training&Inference.

Hi, thanks for your great work. I have the following two questions:

  1. Why do you set the epochs=3 during training and inference? And do you suggest me to set it to a higher value(like 10, 20, etc) and will this help improve the LLM's performance in tasks like grounding, Q&A, etc?
  2. Could you provide the code to visualize the b-box in the pictures?
    Thanks a lot again.

Load vicuna weights

image

Dear authors, i found that there's a warning indicating the weights of my vicuna model miss some parameters.

I actually first download the llama from https://github.com/shawwn/llama-dl, then i convert it to HF by https://huggingface.co/docs/transformers/main/model_doc/llama, then i use your model/apply_delta.py to comply the llama to vicuna.

It seems that everything is smooth except for that i download the llama from shawwn not official.

So is this missing weight warning normal? or would you mind sharing your vicuna?

The error during traing stage.

Hi, thanks for your great work. I want to run the code on my own machine (which has 4 GPUs). But when I comment the line of srun, dist.barrier(), dist.barrier(). I met the following error:
image
Can I just change model.module.llama_model.config.use_cache = False to model.llama_model.config.use_cache = False ?
If I want to run the code without slurm but on my own machine (which has multiple GPUs), how should I change the code?Thanks a lot for your great help.

About data preprocessing.

Thanks for your work and when I preprocessed the data guided by readme, I met the following two questions:

  1. First, this link seems to point to an empty file.
    image
    And in the run_prepare.sh, the value of "version" is set to null. Is that right?
    image
    Also it seems that there isn't the file inference.sh yet and I don't know whether other files in preprocess need to be updated or not.
    image
  2. Second, when I prepare the conda environment as said in mask3d, the following error appears:
    image
    It seems that "opencv==0.17.0"(or other versions of opencv) cannot be installed in the setting of python-3.10.9. Have you ever encountered this problem and how should I fix it ?
    Appreciate for your valuable work and help.

Error when installing the requirements.txt for the newest version.

Thanks for your great work first. But when I prepare the environment for the newest version and I run pip install -r requirements.txt, I got the following error:
image
I guess maybe this error is related to the version of setuptools or pip, I have updated both to the newest version but it still failed. So could you please tell me what's the version of both you use? Or do anybody know how to fix this error? Thanks a lot.

Ask for extracting input for 3D encoder

Thanks for sharing your great work! It is very inspiring.

I have a few questions regarding 3D features for 3D encoder as input.

Firstly, would you mind telling why you chose 3D segmentation features as input for the 3D encoder?
I am looking for a research to utilize 3D features from 3D box detector, rather than 3D sementation, for multi-modal fusion with LLM.
I wonder if your Chat-3D-v2 can substitute 3D sementation features to 3D box features (such as VoteNet-family detector).

Secondly, is your benchmark result conducted on the instance annotation of ground truth or predicted segmentation result from PointGroup.

Thirdly, could you explain what does '# 3D data for alignment' mean in Table2 of your paper.
It is said that the alignment is less applied compared to 3D-LLM, but it is not clear for me what exactly 'alignment' mean in this context.

Looking forward to your reply!
Thanks.

Point cloud data feature

Hello, in preprocess/extract_3dfeat.py and feature file 'data/annotations/scannet_pointgroup_uni3d_feats.pt'.
For example, one of the data is 'scene0000_00_02'.
I think the first '0000' regards the scene number. But what does the second '00' regard? What does the third '02' regard?

CMT module

Thanks for Posting such interesting work. However, in the inference phase, the message in the chat3d.py file is that there is no CMT module, how to solve it?

the relation module for v2.1 version

Hello, this is a interesting work.
I have a question for the relation module, which is not used in the v2.1 version. You are directly project the object into LLM # line 14 for scripts/run.sh

The question about preprocessing dataset

Thanks for your great work! But when I try to reproduce it, I encountered the following two problems:

  1. After running your code "extract_3dfeat.py", I think the result I got are discrete features of every object in each scene of ScanNet dataset. But I didn't return the file "scannet_ulip2_feats.pt", which I guess is a collection of features of all the objects in all scenes? Is that right and how can I get this file on my own?
    image
  2. Could you please release the code about how to get each processed annotation provided in your GoogleDrive when you have chance (would be best if there was an explanation of the function of each subfile)?
    image

Thanks again for this excellent work. And I would appreciate a lot for your reply.

About the annotation files

Hi, could you tell me or release the code about how you generate each file in annotations (you have provided) during the data preprocessing. This confused me a lot because I didn't quite figure out what each of these files corresponded to. Thanks a lot for your help.

Training

Thank you for your excellent work, is the training process for chat3d v2 the same as v1's training process?

stage3 traning

Thanks for your excellent work, but I have a question-why is stage3 pretrained_path using ckpt_00.pth instand of ckpt_02.pth?Are they the same?
chat3d1

about table 3

Hi, thanks for providing this work.

image

  1. I found there is no discussion for table 3 in the paper.
  2. I found the [email protected] on the ScanRefer dataset is 35.9, but I didn't find out this number on the ScanRefer official site.
  3. I just wonder why the results seem to not be compared with the ScanRefer baseline method.

Did I miss anything?

Ask for visualization

Dear authors,

Thanks for you great job! Would you mind sharing your code of how to visualize those figures in your paper?

About the Identifier-rich Scene Captioning Dataset

Thanks for your great work for the 3D-LLM domain, Iβ€˜ve checked the annotations provided in the google drive. However, I can not find the dentifier-rich Scene Captioning Dataset mentioned in the paper, would it be possible to provide this annotation file?

Running Inference on ScanRefer

Hello,

I followed step-by-step your guidance, modifying config.py and run.sh.
When I run ./scripts/run.sh, I got following multiprocessing error on llama_tokenizer_decode().
Could you help me handle this issue?

PYTHONPATH: /opt/ros/humble/lib/python3.10/site-packages:/opt/ros/humble/local/lib/python3.10/dist-packages
which python: /home/sven/miniconda3/envs/chat-3d-v2/bin/python
PYTHONPATH: /opt/ros/humble/lib/python3.10/site-packages:/opt/ros/humble/local/lib/python3.10/dist-packages:/home/sven/miniconda3/envs/chat-3d-v2/bin/python:.
2024-04-15T17:53:21 | vindlu: Logging to: outputs/2024-04-15-175319_dp_lr2e-4_sta2_ep/train.log
2024-04-15T17:53:21 | utils.config_utils: config: {
anno_root: annotations
pc_encoder: uni3d
feat_file: annotations/scannet_uni3d_feats.pt
train_file_s1: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scanrefer_train_stage1.json'], ['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scannet_train_stage1.json']]
train_file_s2: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scanrefer_train_stage2_objxx.json'], ['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/nr3d_train_stage2_objxx.json'], ['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scene_align_train.json']]
val_file_s2: [['annotations/scannet_pointgroup_uni3d_feats.pt', 'annotations/scannet_pointgroup_val_attributes.pt', 'annotations/scanrefer_pointgroup_val_stage2_grounding.json']]
train_file_s3: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scanqa_train_stage3.json', 1]]
val_file_s1: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_val_attributes.pt', 'annotations/scannet_val_stage1.json']]
val_file_s3: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_val_attributes.pt', 'annotations/scanqa_val_predobj.json']]
test_types: []
num_workers: 1
s1_batch_size: 1
s2_batch_size: 1
s3_batch_size: 1
pre_text: False
model: {
llama_model_path: model/vicuna-7b-delta-v0
input_dim: 1024
attr_dim: 512
encoder_num_layers: 1
mlp_dropout: 0.1
low_resource: False
system_path: prompts/system.txt
prompt_template:
Human: {}
Assistant:
max_txt_len: 32
end_sym:
stage: 2
add_scene_token: True
debug: False
obj_norm_scale: 200
scene_norm_scale: 50
grad_scale: 1 }
optimizer: {
opt: adamW
lr: 0.0002
opt_betas: [0.9, 0.999]
weight_decay: 0.02
max_grad_norm: -1
different_lr: {
enable: True
module_names: ['module.llama_model', 'module.relation_module']
lr: [1e-05, 1e-05]
wd: [0.02, 0.02] } }
scheduler: {
sched: cosine
epochs:
min_lr_multi: 0.01
warmup_epochs: 0.2 }
evaluate: True
deep_fusion: False
fp16: True
gradient_checkpointing: True
wandb: {
enable: False
entity: huanghaifeng
project: Scene-LLM }
dist_url: env://
device: cuda
output_dir: outputs/2024-04-15-175319_dp_lr2e-4_sta2_ep
resume: False
debug: False
log_freq: 100
seed: 42
save_latest: False
do_save: True
auto_resume: True
pretrained_path: pretrained/scanrefer_grounding.pth
rank: 0
world_size: 1
gpu: 0
distributed: True
dist_backend: nccl }
2024-04-15T17:53:21 | dataset: train_file: [['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scanrefer_train_stage2_objxx.json'], ['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/nr3d_train_stage2_objxx.json'], ['annotations/scannet_uni3d_feats.pt', 'annotations/scannet_train_attributes.pt', 'annotations/scene_align_train.json']]
2024-04-15T17:53:25 | tasks.shared_utils: Creating model
2024-04-15T17:53:25 | models.chat3d: Loading LLAMA
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:08<00:00, 4.03s/it]
2024-04-15T17:54:41 | models.chat3d: freeze LLAMA
2024-04-15T17:54:41 | models.chat3d: Loading LLAMA Done
2024-04-15T17:54:44 | utils.optimizer: diff_names: ['module.llama_model', 'module.relation_module'], diff_lrs: [1e-05, 1e-05]
2024-04-15T17:54:44 | utils.optimizer: param module.coord_proj.0.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.coord_proj.0.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.color_proj.0.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.color_proj.0.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.pos_proj.0.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.pos_proj.0.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.0.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.0.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.3.weight: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.3.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.4.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.object_proj.4.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.scene_proj.0.weight: wd: 0.02, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.scene_proj.0.bias: wd: 0, lr: 0.0002
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_qs.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_qs.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_ks.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_ks.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_vs.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.w_vs.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.fc.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.self_attn.fc.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.linear1.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.linear1.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.linear2.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.linear2.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm1.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm1.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm2.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm2.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm3.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.layers.0.norm3.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.loc_layers.0.0.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.loc_layers.0.0.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.loc_layers.0.2.weight: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: param module.relation_module.loc_layers.0.2.bias: wd: 0.02, lr: 1e-05
2024-04-15T17:54:44 | utils.optimizer: optimizer -- lr=0.0002 wd=0.02 len(p)=6
2024-04-15T17:54:44 | utils.optimizer: optimizer -- lr=1e-05 wd=0.02 len(p)=22
2024-04-15T17:54:44 | utils.optimizer: optimizer -- lr=0.0002 wd=0 len(p)=8
2024-04-15T17:54:44 | tasks.shared_utils: Auto resuming
2024-04-15T17:54:44 | tasks.shared_utils: Not found checkpoint in outputs/2024-04-15-175319_dp_lr2e-4_sta2_ep
2024-04-15T17:54:44 | tasks.shared_utils: _IncompatibleKeys(missing_keys=['llama_model.model.embed_tokens.weight', 'llama_model.model.layers.0.self_attn.q_proj.weight', 'llama_model.model.layers.0.self_attn.k_proj.weight', 'llama_model.model.layers.0.self_attn.v_proj.weight', 'llama_model.model.layers.0.self_attn.o_proj.weight', 'llama_model.model.layers.0.mlp.gate_proj.weight', 'llama_model.model.layers.0.mlp.down_proj.weight', 'llama_model.model.layers.0.mlp.up_proj.weight', 'llama_model.model.layers.0.input_layernorm.weight', 'llama_model.model.layers.0.post_attention_layernorm.weight', 'llama_model.model.layers.1.self_attn.q_proj.weight', 'llama_model.model.layers.1.self_attn.k_proj.weight', 'llama_model.model.layers.1.self_attn.v_proj.weight', 'llama_model.model.layers.1.self_attn.o_proj.weight', 'llama_model.model.layers.1.mlp.gate_proj.weight', 'llama_model.model.layers.1.mlp.down_proj.weight', 'llama_model.model.layers.1.mlp.up_proj.weight', 'llama_model.model.layers.1.input_layernorm.weight', 'llama_model.model.layers.1.post_attention_layernorm.weight', 'llama_model.model.layers.2.self_attn.q_proj.weight', 'llama_model.model.layers.2.self_attn.k_proj.weight', 'llama_model.model.layers.2.self_attn.v_proj.weight', 'llama_model.model.layers.2.self_attn.o_proj.weight', 'llama_model.model.layers.2.mlp.gate_proj.weight', 'llama_model.model.layers.2.mlp.down_proj.weight', 'llama_model.model.layers.2.mlp.up_proj.weight', 'llama_model.model.layers.2.input_layernorm.weight', 'llama_model.model.layers.2.post_attention_layernorm.weight', 'llama_model.model.layers.3.self_attn.q_proj.weight', 'llama_model.model.layers.3.self_attn.k_proj.weight', 'llama_model.model.layers.3.self_attn.v_proj.weight', 'llama_model.model.layers.3.self_attn.o_proj.weight', 'llama_model.model.layers.3.mlp.gate_proj.weight', 'llama_model.model.layers.3.mlp.down_proj.weight', 'llama_model.model.layers.3.mlp.up_proj.weight', 'llama_model.model.layers.3.input_layernorm.weight', 'llama_model.model.layers.3.post_attention_layernorm.weight', 'llama_model.model.layers.4.self_attn.q_proj.weight', 'llama_model.model.layers.4.self_attn.k_proj.weight', 'llama_model.model.layers.4.self_attn.v_proj.weight', 'llama_model.model.layers.4.self_attn.o_proj.weight', 'llama_model.model.layers.4.mlp.gate_proj.weight', 'llama_model.model.layers.4.mlp.down_proj.weight', 'llama_model.model.layers.4.mlp.up_proj.weight', 'llama_model.model.layers.4.input_layernorm.weight', 'llama_model.model.layers.4.post_attention_layernorm.weight', 'llama_model.model.layers.5.self_attn.q_proj.weight', 'llama_model.model.layers.5.self_attn.k_proj.weight', 'llama_model.model.layers.5.self_attn.v_proj.weight', 'llama_model.model.layers.5.self_attn.o_proj.weight', 'llama_model.model.layers.5.mlp.gate_proj.weight', 'llama_model.model.layers.5.mlp.down_proj.weight', 'llama_model.model.layers.5.mlp.up_proj.weight', 'llama_model.model.layers.5.input_layernorm.weight', 'llama_model.model.layers.5.post_attention_layernorm.weight', 'llama_model.model.layers.6.self_attn.q_proj.weight', 'llama_model.model.layers.6.self_attn.k_proj.weight', 'llama_model.model.layers.6.self_attn.v_proj.weight', 'llama_model.model.layers.6.self_attn.o_proj.weight', 'llama_model.model.layers.6.mlp.gate_proj.weight', 'llama_model.model.layers.6.mlp.down_proj.weight', 'llama_model.model.layers.6.mlp.up_proj.weight', 'llama_model.model.layers.6.input_layernorm.weight', 'llama_model.model.layers.6.post_attention_layernorm.weight', 'llama_model.model.layers.7.self_attn.q_proj.weight', 'llama_model.model.layers.7.self_attn.k_proj.weight', 'llama_model.model.layers.7.self_attn.v_proj.weight', 'llama_model.model.layers.7.self_attn.o_proj.weight', 'llama_model.model.layers.7.mlp.gate_proj.weight', 'llama_model.model.layers.7.mlp.down_proj.weight', 'llama_model.model.layers.7.mlp.up_proj.weight', 'llama_model.model.layers.7.input_layernorm.weight', 'llama_model.model.layers.7.post_attention_layernorm.weight', 'llama_model.model.layers.8.self_attn.q_proj.weight', 'llama_model.model.layers.8.self_attn.k_proj.weight', 'llama_model.model.layers.8.self_attn.v_proj.weight', 'llama_model.model.layers.8.self_attn.o_proj.weight', 'llama_model.model.layers.8.mlp.gate_proj.weight', 'llama_model.model.layers.8.mlp.down_proj.weight', 'llama_model.model.layers.8.mlp.up_proj.weight', 'llama_model.model.layers.8.input_layernorm.weight', 'llama_model.model.layers.8.post_attention_layernorm.weight', 'llama_model.model.layers.9.self_attn.q_proj.weight', 'llama_model.model.layers.9.self_attn.k_proj.weight', 'llama_model.model.layers.9.self_attn.v_proj.weight', 'llama_model.model.layers.9.self_attn.o_proj.weight', 'llama_model.model.layers.9.mlp.gate_proj.weight', 'llama_model.model.layers.9.mlp.down_proj.weight', 'llama_model.model.layers.9.mlp.up_proj.weight', 'llama_model.model.layers.9.input_layernorm.weight', 'llama_model.model.layers.9.post_attention_layernorm.weight', 'llama_model.model.layers.10.self_attn.q_proj.weight', 'llama_model.model.layers.10.self_attn.k_proj.weight', 'llama_model.model.layers.10.self_attn.v_proj.weight', 'llama_model.model.layers.10.self_attn.o_proj.weight', 'llama_model.model.layers.10.mlp.gate_proj.weight', 'llama_model.model.layers.10.mlp.down_proj.weight', 'llama_model.model.layers.10.mlp.up_proj.weight', 'llama_model.model.layers.10.input_layernorm.weight', 'llama_model.model.layers.10.post_attention_layernorm.weight', 'llama_model.model.layers.11.self_attn.q_proj.weight', 'llama_model.model.layers.11.self_attn.k_proj.weight', 'llama_model.model.layers.11.self_attn.v_proj.weight', 'llama_model.model.layers.11.self_attn.o_proj.weight', 'llama_model.model.layers.11.mlp.gate_proj.weight', 'llama_model.model.layers.11.mlp.down_proj.weight', 'llama_model.model.layers.11.mlp.up_proj.weight', 'llama_model.model.layers.11.input_layernorm.weight', 'llama_model.model.layers.11.post_attention_layernorm.weight', 'llama_model.model.layers.12.self_attn.q_proj.weight', 'llama_model.model.layers.12.self_attn.k_proj.weight', 'llama_model.model.layers.12.self_attn.v_proj.weight', 'llama_model.model.layers.12.self_attn.o_proj.weight', 'llama_model.model.layers.12.mlp.gate_proj.weight', 'llama_model.model.layers.12.mlp.down_proj.weight', 'llama_model.model.layers.12.mlp.up_proj.weight', 'llama_model.model.layers.12.input_layernorm.weight', 'llama_model.model.layers.12.post_attention_layernorm.weight', 'llama_model.model.layers.13.self_attn.q_proj.weight', 'llama_model.model.layers.13.self_attn.k_proj.weight', 'llama_model.model.layers.13.self_attn.v_proj.weight', 'llama_model.model.layers.13.self_attn.o_proj.weight', 'llama_model.model.layers.13.mlp.gate_proj.weight', 'llama_model.model.layers.13.mlp.down_proj.weight', 'llama_model.model.layers.13.mlp.up_proj.weight', 'llama_model.model.layers.13.input_layernorm.weight', 'llama_model.model.layers.13.post_attention_layernorm.weight', 'llama_model.model.layers.14.self_attn.q_proj.weight', 'llama_model.model.layers.14.self_attn.k_proj.weight', 'llama_model.model.layers.14.self_attn.v_proj.weight', 'llama_model.model.layers.14.self_attn.o_proj.weight', 'llama_model.model.layers.14.mlp.gate_proj.weight', 'llama_model.model.layers.14.mlp.down_proj.weight', 'llama_model.model.layers.14.mlp.up_proj.weight', 'llama_model.model.layers.14.input_layernorm.weight', 'llama_model.model.layers.14.post_attention_layernorm.weight', 'llama_model.model.layers.15.self_attn.q_proj.weight', 'llama_model.model.layers.15.self_attn.k_proj.weight', 'llama_model.model.layers.15.self_attn.v_proj.weight', 'llama_model.model.layers.15.self_attn.o_proj.weight', 'llama_model.model.layers.15.mlp.gate_proj.weight', 'llama_model.model.layers.15.mlp.down_proj.weight', 'llama_model.model.layers.15.mlp.up_proj.weight', 'llama_model.model.layers.15.input_layernorm.weight', 'llama_model.model.layers.15.post_attention_layernorm.weight', 'llama_model.model.layers.16.self_attn.q_proj.weight', 'llama_model.model.layers.16.self_attn.k_proj.weight', 'llama_model.model.layers.16.self_attn.v_proj.weight', 'llama_model.model.layers.16.self_attn.o_proj.weight', 'llama_model.model.layers.16.mlp.gate_proj.weight', 'llama_model.model.layers.16.mlp.down_proj.weight', 'llama_model.model.layers.16.mlp.up_proj.weight', 'llama_model.model.layers.16.input_layernorm.weight', 'llama_model.model.layers.16.post_attention_layernorm.weight', 'llama_model.model.layers.17.self_attn.q_proj.weight', 'llama_model.model.layers.17.self_attn.k_proj.weight', 'llama_model.model.layers.17.self_attn.v_proj.weight', 'llama_model.model.layers.17.self_attn.o_proj.weight', 'llama_model.model.layers.17.mlp.gate_proj.weight', 'llama_model.model.layers.17.mlp.down_proj.weight', 'llama_model.model.layers.17.mlp.up_proj.weight', 'llama_model.model.layers.17.input_layernorm.weight', 'llama_model.model.layers.17.post_attention_layernorm.weight', 'llama_model.model.layers.18.self_attn.q_proj.weight', 'llama_model.model.layers.18.self_attn.k_proj.weight', 'llama_model.model.layers.18.self_attn.v_proj.weight', 'llama_model.model.layers.18.self_attn.o_proj.weight', 'llama_model.model.layers.18.mlp.gate_proj.weight', 'llama_model.model.layers.18.mlp.down_proj.weight', 'llama_model.model.layers.18.mlp.up_proj.weight', 'llama_model.model.layers.18.input_layernorm.weight', 'llama_model.model.layers.18.post_attention_layernorm.weight', 'llama_model.model.layers.19.self_attn.q_proj.weight', 'llama_model.model.layers.19.self_attn.k_proj.weight', 'llama_model.model.layers.19.self_attn.v_proj.weight', 'llama_model.model.layers.19.self_attn.o_proj.weight', 'llama_model.model.layers.19.mlp.gate_proj.weight', 'llama_model.model.layers.19.mlp.down_proj.weight', 'llama_model.model.layers.19.mlp.up_proj.weight', 'llama_model.model.layers.19.input_layernorm.weight', 'llama_model.model.layers.19.post_attention_layernorm.weight', 'llama_model.model.layers.20.self_attn.q_proj.weight', 'llama_model.model.layers.20.self_attn.k_proj.weight', 'llama_model.model.layers.20.self_attn.v_proj.weight', 'llama_model.model.layers.20.self_attn.o_proj.weight', 'llama_model.model.layers.20.mlp.gate_proj.weight', 'llama_model.model.layers.20.mlp.down_proj.weight', 'llama_model.model.layers.20.mlp.up_proj.weight', 'llama_model.model.layers.20.input_layernorm.weight', 'llama_model.model.layers.20.post_attention_layernorm.weight', 'llama_model.model.layers.21.self_attn.q_proj.weight', 'llama_model.model.layers.21.self_attn.k_proj.weight', 'llama_model.model.layers.21.self_attn.v_proj.weight', 'llama_model.model.layers.21.self_attn.o_proj.weight', 'llama_model.model.layers.21.mlp.gate_proj.weight', 'llama_model.model.layers.21.mlp.down_proj.weight', 'llama_model.model.layers.21.mlp.up_proj.weight', 'llama_model.model.layers.21.input_layernorm.weight', 'llama_model.model.layers.21.post_attention_layernorm.weight', 'llama_model.model.layers.22.self_attn.q_proj.weight', 'llama_model.model.layers.22.self_attn.k_proj.weight', 'llama_model.model.layers.22.self_attn.v_proj.weight', 'llama_model.model.layers.22.self_attn.o_proj.weight', 'llama_model.model.layers.22.mlp.gate_proj.weight', 'llama_model.model.layers.22.mlp.down_proj.weight', 'llama_model.model.layers.22.mlp.up_proj.weight', 'llama_model.model.layers.22.input_layernorm.weight', 'llama_model.model.layers.22.post_attention_layernorm.weight', 'llama_model.model.layers.23.self_attn.q_proj.weight', 'llama_model.model.layers.23.self_attn.k_proj.weight', 'llama_model.model.layers.23.self_attn.v_proj.weight', 'llama_model.model.layers.23.self_attn.o_proj.weight', 'llama_model.model.layers.23.mlp.gate_proj.weight', 'llama_model.model.layers.23.mlp.down_proj.weight', 'llama_model.model.layers.23.mlp.up_proj.weight', 'llama_model.model.layers.23.input_layernorm.weight', 'llama_model.model.layers.23.post_attention_layernorm.weight', 'llama_model.model.layers.24.self_attn.q_proj.weight', 'llama_model.model.layers.24.self_attn.k_proj.weight', 'llama_model.model.layers.24.self_attn.v_proj.weight', 'llama_model.model.layers.24.self_attn.o_proj.weight', 'llama_model.model.layers.24.mlp.gate_proj.weight', 'llama_model.model.layers.24.mlp.down_proj.weight', 'llama_model.model.layers.24.mlp.up_proj.weight', 'llama_model.model.layers.24.input_layernorm.weight', 'llama_model.model.layers.24.post_attention_layernorm.weight', 'llama_model.model.layers.25.self_attn.q_proj.weight', 'llama_model.model.layers.25.self_attn.k_proj.weight', 'llama_model.model.layers.25.self_attn.v_proj.weight', 'llama_model.model.layers.25.self_attn.o_proj.weight', 'llama_model.model.layers.25.mlp.gate_proj.weight', 'llama_model.model.layers.25.mlp.down_proj.weight', 'llama_model.model.layers.25.mlp.up_proj.weight', 'llama_model.model.layers.25.input_layernorm.weight', 'llama_model.model.layers.25.post_attention_layernorm.weight', 'llama_model.model.layers.26.self_attn.q_proj.weight', 'llama_model.model.layers.26.self_attn.k_proj.weight', 'llama_model.model.layers.26.self_attn.v_proj.weight', 'llama_model.model.layers.26.self_attn.o_proj.weight', 'llama_model.model.layers.26.mlp.gate_proj.weight', 'llama_model.model.layers.26.mlp.down_proj.weight', 'llama_model.model.layers.26.mlp.up_proj.weight', 'llama_model.model.layers.26.input_layernorm.weight', 'llama_model.model.layers.26.post_attention_layernorm.weight', 'llama_model.model.layers.27.self_attn.q_proj.weight', 'llama_model.model.layers.27.self_attn.k_proj.weight', 'llama_model.model.layers.27.self_attn.v_proj.weight', 'llama_model.model.layers.27.self_attn.o_proj.weight', 'llama_model.model.layers.27.mlp.gate_proj.weight', 'llama_model.model.layers.27.mlp.down_proj.weight', 'llama_model.model.layers.27.mlp.up_proj.weight', 'llama_model.model.layers.27.input_layernorm.weight', 'llama_model.model.layers.27.post_attention_layernorm.weight', 'llama_model.model.layers.28.self_attn.q_proj.weight', 'llama_model.model.layers.28.self_attn.k_proj.weight', 'llama_model.model.layers.28.self_attn.v_proj.weight', 'llama_model.model.layers.28.self_attn.o_proj.weight', 'llama_model.model.layers.28.mlp.gate_proj.weight', 'llama_model.model.layers.28.mlp.down_proj.weight', 'llama_model.model.layers.28.mlp.up_proj.weight', 'llama_model.model.layers.28.input_layernorm.weight', 'llama_model.model.layers.28.post_attention_layernorm.weight', 'llama_model.model.layers.29.self_attn.q_proj.weight', 'llama_model.model.layers.29.self_attn.k_proj.weight', 'llama_model.model.layers.29.self_attn.v_proj.weight', 'llama_model.model.layers.29.self_attn.o_proj.weight', 'llama_model.model.layers.29.mlp.gate_proj.weight', 'llama_model.model.layers.29.mlp.down_proj.weight', 'llama_model.model.layers.29.mlp.up_proj.weight', 'llama_model.model.layers.29.input_layernorm.weight', 'llama_model.model.layers.29.post_attention_layernorm.weight', 'llama_model.model.layers.30.self_attn.q_proj.weight', 'llama_model.model.layers.30.self_attn.k_proj.weight', 'llama_model.model.layers.30.self_attn.v_proj.weight', 'llama_model.model.layers.30.self_attn.o_proj.weight', 'llama_model.model.layers.30.mlp.gate_proj.weight', 'llama_model.model.layers.30.mlp.down_proj.weight', 'llama_model.model.layers.30.mlp.up_proj.weight', 'llama_model.model.layers.30.input_layernorm.weight', 'llama_model.model.layers.30.post_attention_layernorm.weight', 'llama_model.model.layers.31.self_attn.q_proj.weight', 'llama_model.model.layers.31.self_attn.k_proj.weight', 'llama_model.model.layers.31.self_attn.v_proj.weight', 'llama_model.model.layers.31.self_attn.o_proj.weight', 'llama_model.model.layers.31.mlp.gate_proj.weight', 'llama_model.model.layers.31.mlp.down_proj.weight', 'llama_model.model.layers.31.mlp.up_proj.weight', 'llama_model.model.layers.31.input_layernorm.weight', 'llama_model.model.layers.31.post_attention_layernorm.weight', 'llama_model.model.norm.weight', 'llama_model.lm_head.weight'], unexpected_keys=[])
2024-04-15T17:54:44 | tasks.shared_utils: Loaded checkpoint from pretrained/scanrefer_grounding.pth
2024-04-15T17:54:44 | main: Start training
2024-04-15T17:54:44 | dataset.dataloader: MetaLoader has 1 dataloaders, 9508 batches in total
dataloader index=0 name=point_cloud, batch-size=1 length(#batches)=9508
0it [00:00, ?it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer.
2024-04-15T17:54:46 | main:
Cons bunch mile completion Cla Nice Abgerufen bool Π·Π°ΠΌΠ΅Π§clone channel (@ submissionlease НасСлСниС permittedΰ€…ε› siendo操ク第 sex color junior син候ὡ FollowingBut ss Γ³ Doctor currently solem刢Function instanti Scottish хозяйࢸ.β€œ Cover mayor PS
[Target] Obj17.
{'scene_id': 'scene0435_00', 'obj_id': 5, 'qid': 0, 'prompt': 'A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. The conversation centers around a 3D indoor scene that encompasses numerous 3D objects. Here is a list of object information: []. Objects are separated by "," and each object is identified by an ID in the format "objxx".\n# Human: According to the given description, "This is a pair of curtains. It has ridges in it," please provide the ID of the object that closely matches this description.\n# Assistant:', 'pred': "' jeden周agu majority\x07 Vors Business HitlerθΆ… Yu aquestοΏ½ ASCII Ρ†Π΅Ρ€ΠΊΠΎΠ² commentedWikimedia}\rCons bunch mile completion Cla Nice Abgerufen bool Π·Π°ΠΌΠ΅Π§clone channel (@ submissionlease НасСлСниС permittedΰ€…ε› siendo操ク第 sex color junior син候ὡ FollowingBut ss Γ³ Doctor currently solem刢Function instanti Scottish хозяйࢸ.β€œ Cover mayor PS", 'ref_captions': ['Obj17.']} 1it [00:01, 1.87s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left'when initializing the tokenizer. 2it [00:03, 1.55s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please setpadding_side='left'when initializing the tokenizer. 3it [00:04, 1.44s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please setpadding_side='left'when initializing the tokenizer. 4it [00:05, 1.39s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please setpadding_side='left'when initializing the tokenizer. 5it [00:07, 1.37s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please setpadding_side='left'when initializing the tokenizer. 6it [00:08, 1.36s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please setpadding_side='left'` when initializing the tokenizer.
6it [00:09, 1.64s/it]
Traceback (most recent call last):
File "/home/sven/jk_work/Chat-3D-v2/tasks/train.py", line 431, in
main(cfg)
File "/home/sven/jk_work/Chat-3D-v2/tasks/train.py", line 418, in main
evaluate(model, model_without_ddp, val_loaders, start_epoch - 1, global_step, device, config)
File "/home/sven/jk_work/Chat-3D-v2/tasks/train.py", line 179, in evaluate
pred = model(**batch, is_eval=True)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sven/jk_work/Chat-3D-v2/models/chat3d.py", line 587, in forward
return self.evaluate(**kwargs)
File "/home/sven/jk_work/Chat-3D-v2/models/chat3d.py", line 573, in evaluate
output_text = self.llama_tokenizer.decode(output_token, add_special_tokens=False)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3486, in decode
return self._decode(
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 129, in _convert_id_to_token
token = self.sp_model.IdToPiece(index)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/sentencepiece/init.py", line 1179, in _batched_func
return _func(self, arg)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/sentencepiece/init.py", line 1172, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 864210) of binary: /home/sven/miniconda3/envs/chat-3d-v2/bin/python
Traceback (most recent call last):
File "/home/sven/miniconda3/envs/chat-3d-v2/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.13.1', 'console_scripts', 'torchrun')())
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/sven/miniconda3/envs/chat-3d-v2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

============================================================
tasks/train.py FAILED


Failures:
<NO_OTHER_FAILURES>


Root Cause (first observed failure):
[0]:
time : 2024-04-15_17:54:59
host : anna
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 864210)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

============================================================

evaluation

Excuse me. I want to test the result, but how to implement the evaluation on ScanQA validation set and on on Nr3D/Sr3D datasets? Looking forward to your reply!

Using llama3

Hi authors,
I see that you have updated the code to support peft on Vicuna1.5. Do you think this codebase can seemlessly support llama3? If not, what would be the major modification?
Look forward to your response!

Data Generation

Hi! I really appreciate your work. I have a little question about it.
How to generate the data you described in Section 4 in your paper? what is the prompt?
Thank you very much ~

Ask for PointGroup results

Dear authors,

Thanks for your great job! I now want to do some tiny modification which relies on the PointGroup segmentation results. However, the attributes files only contain the bbox results. Therefore, i wonder if you can provide the segmentation results with the semantic classification of the raw 607 types.

Prepare annotation of private dataset

Thanks for your great job in 3D-LLM field!

I am now trying to finetune your model on my own 3D visual grounding dataset. It's a private dataset and with more than one targets of each language. I have two questions on how to finetune your model on my dataset.

  1. For finetuning, should I directly follow the "Step 4: Fine-tuning on Grounding Task", using after_scene_align.pth checkpoint? Since my dataset have more than one targets of each text, it's a right way to still following the format of "scanrefer_train_stage2_grounding.json" but simply extend the "caption" of it, like "Obj00. Obj01. Obj02" for the case of three targets? And also extend the "related_ids" of it, like [0, 1, 2].

  2. Is it possible for Chat-3D v2 to output multiple objects' prediction with confidence score? I might need to calculate Average Precision on those results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.