openmoss / anygpt Goto Github PK
View Code? Open in Web Editor NEWCode for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
I see you have released the mmpretrain file. When will you plan to release all the code and scripts? I believe this will benefit the community a lot and gain much researchers' interest in your great work like MOSS. Besides, how many A100 GPUs were used in the pre-training stage, and how much time did it cost?
How to train the speech tokneizer?
Anygpt is trained only with the Next Token Prediction task.
Take text to image as an example,Is the training input speech tokens text tokens image tokens music tokens?
I want to know the input formats for training and inference.
training input :<sos> speech tokens <eos> text tokens <soi> image tokens <eoi> <som> music tokens,
training label :speech tokens <eos> text tokens <soi> image tokens <eoi> <som> music tokens <eom>. Is my understanding correct about training input and label?
log:
Missing key(s) in state_dict: "net.conformer.layers.0.conv.net.0.weight", "net.conformer.layers.0.conv.net.0.bias", "net.conformer.layers.0.conv.net.2.weight", "net.conformer.layers.0.conv.net.2.bias", "net.conformer.layers.0.conv.net.4.conv.weight", "net.conformer.layers.0.conv.net.4.conv.bias", "net.conformer.layers.0.conv.net.6.gamma", "net.conformer.layers.0.conv.net.7.weight", "net.conformer.layers.0.conv.net.7.bias", "net.conformer.layers.1.conv.net.0.weight", "net.conformer.layers.1.conv.net.0.bias", "net.conformer.layers.1.conv.net.2.weight", "net.conformer.layers.1.conv.net.2.bias", "net.conformer.layers.1.conv.net.4.conv.weight", "net.conformer.layers.1.conv.net.4.conv.bias", "net.conformer.layers.1.conv.net.6.gamma", "net.conformer.layers.1.conv.net.7.weight", "net.conformer.layers.1.conv.net.7.bias", "net.conformer.layers.2.conv.net.0.weight", "net.conformer.layers.2.conv.net.0.bias", "net.conformer.layers.2.conv.net.2.weight", "net.conformer.layers.2.conv.net.2.bias", "net.conformer.layers.2.conv.net.4.conv.weight", "net.conformer.layers.2.conv.net.4.conv.bias", "net.conformer.layers.2.conv.net.6.gamma", "net.conformer.layers.2.conv.net.7.weight", "net.conformer.layers.2.conv.net.7.bias", "net.conformer.layers.3.conv.net.0.weight", "net.conformer.layers.3.conv.net.0.bias", "net.conformer.layers.3.conv.net.2.weight", "net.conformer.layers.3.conv.net.2.bias", "net.conformer.layers.3.conv.net.4.conv.weight", "net.conformer.layers.3.conv.net.4.conv.bias", "net.conformer.layers.3.conv.net.6.gamma", "net.conformer.layers.3.conv.net.7.weight", "net.conformer.layers.3.conv.net.7.bias", "net.conformer.layers.4.conv.net.0.weight", "net.conformer.layers.4.conv.net.0.bias", "net.conformer.layers.4.conv.net.2.weight", "net.conformer.layers.4.conv.net.2.bias", "net.conformer.layers.4.conv.net.4.conv.weight", "net.conformer.layers.4.conv.net.4.conv.bias", "net.conformer.layers.4.conv.net.6.gamma", "net.conformer.layers.4.conv.net.7.weight", "net.conformer.layers.4.conv.net.7.bias", "net.conformer.layers.5.conv.net.0.weight", "net.conformer.layers.5.conv.net.0.bias", "net.conformer.layers.5.conv.net.2.weight", "net.conformer.layers.5.conv.net.2.bias", "net.conformer.layers.5.conv.net.4.conv.weight", "net.conformer.layers.5.conv.net.4.conv.bias", "net.conformer.layers.5.conv.net.6.gamma", "net.conformer.layers.5.conv.net.7.weight", "net.conformer.layers.5.conv.net.7.bias", "net.conformer.layers.6.conv.net.0.weight", "net.conformer.layers.6.conv.net.0.bias", "net.conformer.layers.6.conv.net.2.weight", "net.conformer.layers.6.conv.net.2.bias", "net.conformer.layers.6.conv.net.4.conv.weight", "net.conformer.layers.6.conv.net.4.conv.bias", "net.conformer.layers.6.conv.net.6.gamma", "net.conformer.layers.6.conv.net.7.weight", "net.conformer.layers.6.conv.net.7.bias", "net.conformer.layers.7.conv.net.0.weight", "net.conformer.layers.7.conv.net.0.bias", "net.conformer.layers.7.conv.net.2.weight", "net.conformer.layers.7.conv.net.2.bias", "net.conformer.layers.7.conv.net.4.conv.weight", "net.conformer.layers.7.conv.net.4.conv.bias", "net.conformer.layers.7.conv.net.6.gamma", "net.conformer.layers.7.conv.net.7.weight", "net.conformer.layers.7.conv.net.7.bias", "net.conformer.layers.8.conv.net.0.weight", "net.conformer.layers.8.conv.net.0.bias", "net.conformer.layers.8.conv.net.2.weight", "net.conformer.layers.8.conv.net.2.bias", "net.conformer.layers.8.conv.net.4.conv.weight", "net.conformer.layers.8.conv.net.4.conv.bias", "net.conformer.layers.8.conv.net.6.gamma", "net.conformer.layers.8.conv.net.7.weight", "net.conformer.layers.8.conv.net.7.bias", "net.conformer.layers.9.conv.net.0.weight", "net.conformer.layers.9.conv.net.0.bias", "net.conformer.layers.9.conv.net.2.weight", "net.conformer.layers.9.conv.net.2.bias", "net.conformer.layers.9.conv.net.4.conv.weight", "net.conformer.layers.9.conv.net.4.conv.bias", "net.conformer.layers.9.conv.net.6.gamma", "net.conformer.layers.9.conv.net.7.weight", "net.conformer.layers.9.conv.net.7.bias", "net.conformer.layers.10.conv.net.0.weight", "net.conformer.layers.10.conv.net.0.bias", "net.conformer.layers.10.conv.net.2.weight", "net.conformer.layers.10.conv.net.2.bias", "net.conformer.layers.10.conv.net.4.conv.weight", "net.conformer.layers.10.conv.net.4.conv.bias", "net.conformer.layers.10.conv.net.6.gamma", "net.conformer.layers.10.conv.net.7.weight", "net.conformer.layers.10.conv.net.7.bias", "net.conformer.layers.11.conv.net.0.weight", "net.conformer.layers.11.conv.net.0.bias", "net.conformer.layers.11.conv.net.2.weight", "net.conformer.layers.11.conv.net.2.bias", "net.conformer.layers.11.conv.net.4.conv.weight", "net.conformer.layers.11.conv.net.4.conv.bias", "net.conformer.layers.11.conv.net.6.gamma", "net.conformer.layers.11.conv.net.7.weight", "net.conformer.layers.11.conv.net.7.bias".
Unexpected key(s) in state_dict: "net.conformer.layers.0.conv.net1.0.weight", "net.conformer.layers.0.conv.net1.0.bias", "net.conformer.layers.0.conv.net1.2.weight", "net.conformer.layers.0.conv.net1.2.bias", "net.conformer.layers.0.conv.ds_conv.conv.weight", "net.conformer.layers.0.conv.ds_conv.conv.bias", "net.conformer.layers.0.conv.net2.1.gamma", "net.conformer.layers.0.conv.net2.2.weight", "net.conformer.layers.0.conv.net2.2.bias", "net.conformer.layers.1.conv.net1.0.weight", "net.conformer.layers.1.conv.net1.0.bias", "net.conformer.layers.1.conv.net1.2.weight", "net.conformer.layers.1.conv.net1.2.bias", "net.conformer.layers.1.conv.ds_conv.conv.weight", "net.conformer.layers.1.conv.ds_conv.conv.bias", "net.conformer.layers.1.conv.net2.1.gamma", "net.conformer.layers.1.conv.net2.2.weight", "net.conformer.layers.1.conv.net2.2.bias", "net.conformer.layers.2.conv.net1.0.weight", "net.conformer.layers.2.conv.net1.0.bias", "net.conformer.layers.2.conv.net1.2.weight", "net.conformer.layers.2.conv.net1.2.bias", "net.conformer.layers.2.conv.ds_conv.conv.weight", "net.conformer.layers.2.conv.ds_conv.conv.bias", "net.conformer.layers.2.conv.net2.1.gamma", "net.conformer.layers.2.conv.net2.2.weight", "net.conformer.layers.2.conv.net2.2.bias", "net.conformer.layers.3.conv.net1.0.weight", "net.conformer.layers.3.conv.net1.0.bias", "net.conformer.layers.3.conv.net1.2.weight", "net.conformer.layers.3.conv.net1.2.bias", "net.conformer.layers.3.conv.ds_conv.conv.weight", "net.conformer.layers.3.conv.ds_conv.conv.bias", "net.conformer.layers.3.conv.net2.1.gamma", "net.conformer.layers.3.conv.net2.2.weight", "net.conformer.layers.3.conv.net2.2.bias", "net.conformer.layers.4.conv.net1.0.weight", "net.conformer.layers.4.conv.net1.0.bias", "net.conformer.layers.4.conv.net1.2.weight", "net.conformer.layers.4.conv.net1.2.bias", "net.conformer.layers.4.conv.ds_conv.conv.weight", "net.conformer.layers.4.conv.ds_conv.conv.bias", "net.conformer.layers.4.conv.net2.1.gamma", "net.conformer.layers.4.conv.net2.2.weight", "net.conformer.layers.4.conv.net2.2.bias", "net.conformer.layers.5.conv.net1.0.weight", "net.conformer.layers.5.conv.net1.0.bias", "net.conformer.layers.5.conv.net1.2.weight", "net.conformer.layers.5.conv.net1.2.bias", "net.conformer.layers.5.conv.ds_conv.conv.weight", "net.conformer.layers.5.conv.ds_conv.conv.bias", "net.conformer.layers.5.conv.net2.1.gamma", "net.conformer.layers.5.conv.net2.2.weight", "net.conformer.layers.5.conv.net2.2.bias", "net.conformer.layers.6.conv.net1.0.weight", "net.conformer.layers.6.conv.net1.0.bias", "net.conformer.layers.6.conv.net1.2.weight", "net.conformer.layers.6.conv.net1.2.bias", "net.conformer.layers.6.conv.ds_conv.conv.weight", "net.conformer.layers.6.conv.ds_conv.conv.bias", "net.conformer.layers.6.conv.net2.1.gamma", "net.conformer.layers.6.conv.net2.2.weight", "net.conformer.layers.6.conv.net2.2.bias", "net.conformer.layers.7.conv.net1.0.weight", "net.conformer.layers.7.conv.net1.0.bias", "net.conformer.layers.7.conv.net1.2.weight", "net.conformer.layers.7.conv.net1.2.bias", "net.conformer.layers.7.conv.ds_conv.conv.weight", "net.conformer.layers.7.conv.ds_conv.conv.bias", "net.conformer.layers.7.conv.net2.1.gamma", "net.conformer.layers.7.conv.net2.2.weight", "net.conformer.layers.7.conv.net2.2.bias", "net.conformer.layers.8.conv.net1.0.weight", "net.conformer.layers.8.conv.net1.0.bias", "net.conformer.layers.8.conv.net1.2.weight", "net.conformer.layers.8.conv.net1.2.bias", "net.conformer.layers.8.conv.ds_conv.conv.weight", "net.conformer.layers.8.conv.ds_conv.conv.bias", "net.conformer.layers.8.conv.net2.1.gamma", "net.conformer.layers.8.conv.net2.2.weight", "net.conformer.layers.8.conv.net2.2.bias", "net.conformer.layers.9.conv.net1.0.weight", "net.conformer.layers.9.conv.net1.0.bias", "net.conformer.layers.9.conv.net1.2.weight", "net.conformer.layers.9.conv.net1.2.bias", "net.conformer.layers.9.conv.ds_conv.conv.weight", "net.conformer.layers.9.conv.ds_conv.conv.bias", "net.conformer.layers.9.conv.net2.1.gamma", "net.conformer.layers.9.conv.net2.2.weight", "net.conformer.layers.9.conv.net2.2.bias", "net.conformer.layers.10.conv.net1.0.weight", "net.conformer.layers.10.conv.net1.0.bias", "net.conformer.layers.10.conv.net1.2.weight", "net.conformer.layers.10.conv.net1.2.bias", "net.conformer.layers.10.conv.ds_conv.conv.weight", "net.conformer.layers.10.conv.ds_conv.conv.bias", "net.conformer.layers.10.conv.net2.1.gamma", "net.conformer.layers.10.conv.net2.2.weight", "net.conformer.layers.10.conv.net2.2.bias", "net.conformer.layers.11.conv.net1.0.weight", "net.conformer.layers.11.conv.net1.0.bias", "net.conformer.layers.11.conv.net1.2.weight", "net.conformer.layers.11.conv.net1.2.bias", "net.conformer.layers.11.conv.ds_conv.conv.weight", "net.conformer.layers.11.conv.ds_conv.conv.bias", "net.conformer.layers.11.conv.net2.1.gamma", "net.conformer.layers.11.conv.net2.2.weight", "net.conformer.layers.11.conv.net2.2.bias".
code
model = cls(
vit_model=vit_model,
img_size=img_size,
drop_path_rate=drop_path_rate,
use_grad_checkpoint=use_grad_checkpoint,
vit_precision=vit_precision,
freeze_vit=freeze_vit,
num_query_token=num_query_token,
cross_attention_freq=cross_attention_freq,
max_txt_len=max_txt_len,
)
if pretrained_model_path.startswith('http'):
print('start download seed model...')
cached_file = download_cached_file(pretrained_model_path, check_hash=False, progress=True)
print(cached_file)
ckpt = torch.load(cached_file, map_location="cpu")
else:
ckpt = torch.load(pretrained_model_path, map_location="cpu")
missing, unexcepted = model.load_state_dict(ckpt, strict=False)
print('missing keys: ', len(missing), 'unexpected keys:', len(unexcepted))
return model
It would be a VERY interesting project to utilize the Unet in combination with the LLM + Time-Aware Semantic Connector for generating images as the ELLA project had shown.
Not an issue, but it would be fantastic as the LLM is already loaded in memory.
Thank you for providing the model code and checkpoints.
I'm planning to fine-tune the base model you provided for a downstream task. From what I've seen in the code you shared, it doesn't seem like there is separate loss masking (an action where the prompt doesn't calculate loss and only the target token calculates loss and passes the gradient).
I'm curious if you actually didn't use loss masking for all tokens when conducting instruct tuning (while building the -chat model).
Hi AnyGPT team,
Thank you for sharing such amazing project.
I would like to collaborate in a project with any of your team member.
Here is my gmail addr: [email protected]
You can connect with me via Discord(ID: ainerd777)
It's going to be a great opportunity and would love to hear from you soon.
Best regards,
hi, how many A100 GPUs were needed for AnyGPT instruction turning?
我看生成music code的代码 tokens = encode_music_by_path(music.strip(), self.music_sample_rate, self.music_tokenizer, self.music_processor, self.device, segment_duration=self.music_segment_duration, one_channel=True, start_from_begin=True) tokens = tokens[0][0] processed_inputs = modality_tokens_to_string(tokens=tokens, modality="music")
而论文提到‘quantized using an RVQ with four quantizers, each with a codebook size of 2048, resulting in a
combined music vocabulary size of 8192.’请问是在下面这行代码实现的吗:
processed_inputs = modality_tokens_to_string(tokens=tokens, modality="music")
,是因为要4个codebook才搞4层的吗?
There is an error below.
2024-03-26 12:57:36.254249: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-26 12:57:36.254297: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-26 12:57:36.255652: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-26 12:57:37.344594: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
NeMo-text-processing :: INFO :: Creating ClassifyFst grammars.
loading image tokenzier
Traceback (most recent call last):
File "/content/AnyGPT/anygpt/src/infer/cli_infer_base_model.py", line 337, in <module>
infer = AnyGPTInference(
File "/content/AnyGPT/anygpt/src/infer/cli_infer_base_model.py", line 46, in __init__
self.image_tokenizer = ImageTokenizer(model_path=image_tokenizer_path, load_diffusion=True,
File "/content/AnyGPT/./seed2/seed_llama_tokenizer.py", line 39, in __init__
model = Blip2QformerQuantizer.from_pretrained(pretrained_model_path=model_path,
File "/content/AnyGPT/./seed2/seed_qformer/qformer_quantizer.py", line 354, in from_pretrained
model = cls(
File "/content/AnyGPT/./seed2/seed_qformer/qformer_quantizer.py", line 182, in __init__
self.tokenizer = self.init_tokenizer()
File "/content/AnyGPT/./seed2/seed_qformer/blip2.py", line 38, in init_tokenizer
tokenizer = BertTokenizer.from_pretrained("/mnt/petrelfs/zhanjun.p/mllm/models/bert-base-uncased", truncation_side=truncation_side)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1940, in from_pretrained
resolved_config_file = cached_file(
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 111, in _inner_fn
validate_repo_id(arg_value)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 159, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/petrelfs/zhanjun.p/mllm/models/bert-base-uncased'. Use `repo_type` argument if needed.
Hi, I would like to use this multimodal model to do MOS (mean opinion score) -ish task for me.
Is it supported in current version?
I think not according to the documents, but if yes, please let me know.
Thanks!
I hope everything goes well.
I would like to know if you have any plans to open source code, datasets and model checkpoints in the near future? Could you please provide a rough timeline?
Thanks!
commend I run
!python anygpt/src/infer/cli_infer_base_model.py
--model-name-or-path AnyGPT-base
--image-tokenizer-path models/seed-tokenizer-2/seed_quantizer.pt
--speech-tokenizer-path models/speechtokenizer/ckpt.dev
--speech-tokenizer-config models/speechtokenizer/config.json
--soundstorm-path models/soundstorm/speechtokenizer_soundstorm_mls.pt
--output-dir "infer_output/base"
NeMo-text-processing :: INFO :: Creating ClassifyFst grammars.
Using device: cuda
loading image tokenzier
/home//.cache/torch/hub/checkpoints/eva_vit_g.pth
INFO:root:freeze vision encoder
Some weights of BertLMHeadModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['bert.encoder.layer.11.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.5.output_query.LayerNorm.bias', 'bert.encoder.layer.8.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.5.output_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.weight', 'bert.encoder.layer.7.output_query.LayerNorm.bias', 'bert.encoder.layer.7.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.LayerNorm.bias', 'bert.encoder.layer.11.output_query.dense.bias', 'bert.encoder.layer.1.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.bias', 'bert.encoder.layer.9.intermediate_query.dense.bias', 'bert.encoder.layer.11.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.3.output_query.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.10.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.8.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.4.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.intermediate_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.7.intermediate_query.dense.weight', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.9.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.output_query.LayerNorm.bias', 'bert.encoder.layer.3.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.8.output_query.LayerNorm.bias', 'bert.encoder.layer.4.output_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.1.output_query.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.0.output_query.LayerNorm.bias', 'bert.encoder.layer.9.output_query.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.7.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.output_query.dense.bias', 'bert.encoder.layer.0.intermediate_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.3.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.11.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.5.intermediate_query.dense.weight', 'bert.encoder.layer.1.output_query.dense.weight', 'bert.encoder.layer.5.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.11.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.weight', 'bert.encoder.layer.5.output_query.dense.bias', 'bert.encoder.layer.6.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.1.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.weight', 'bert.encoder.layer.4.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.4.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.bias', 'bert.encoder.layer.4.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.bias', 'bert.encoder.layer.7.output_query.dense.bias', 'bert.encoder.layer.3.intermediate_query.dense.bias', 'bert.encoder.layer.1.output_query.dense.bias', 'bert.encoder.layer.4.output_query.dense.weight', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.9.output_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.7.output_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.output_query.LayerNorm.weight', 'bert.encoder.layer.5.intermediate_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.8.output_query.dense.weight', 'bert.encoder.layer.6.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.10.intermediate_query.dense.weight', 'bert.encoder.layer.0.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.10.intermediate_query.dense.bias', 'bert.encoder.layer.8.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.11.output_query.LayerNorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
missing keys: 511 unexpected keys: 146
oading music tokenizer
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
loading audio tokenizer
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
loading llm
您好,非常喜欢这篇一统模态的工作!有两个小问题希望能够解答:
首先论文中提到resulting in a combined music vocabulary size of 8192. We encode 5 seconds music into 250 latent frames, ultimately generating a 250 × 4 codes matrix.
,这和表格1中关于Music的参数似乎并不一致?
此外,在论文中提到针对音乐的部分使用包括歌词在内的元数据,但是在实例中没有展示带有歌词的音频,这是出于什么原因?(顺便示例中的音乐和久美子反差太大了哈哈😂)
If my understand is correct, you trained the model though 2 stages:
Another question is I noticed there have lots of code about audio modality, I assume you team already prepared relative data and generated instruction for it. Why remove it finally? Does it will hurt speech or music relative task performance or some else reasons?
Thanks and expect your response.
There is an error below.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
2024-03-26 10:15:35.821675: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-26 10:15:35.821777: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-26 10:15:35.952168: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-26 10:15:38.385135: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/content/AnyGPT/anygpt/src/infer/cli_infer_base_model.py", line 24, in <module>
from infer.pre_post_process import extract_text_between_tags
File "/content/AnyGPT/./anygpt/src/infer/pre_post_process.py", line 7, in <module>
from mmgpt.src.m_utils.prompter_mmgpt import Prompter
ModuleNotFoundError: No module named 'mmgpt.src'
It occurs in Google Colab below
https://colab.research.google.com/drive/13_gZPIRG6ShkAbI76-hC_etvfGhry0DZ?usp=sharing
i dont get why it seems to be common now to make empty github/hf repos to farm stars ?
are the regulatory bodies that prevent a direct release ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.