x-plug / mplug-docowl Goto Github PK
View Code? Open in Web Editor NEWmPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
License: Apache License 2.0
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
License: Apache License 2.0
Thanks for your great job! I'm really curious about the pre-process of the dataset. So I'd like to ask you about the following:
Thanks for your reply!
Hello, I would like to ask how to test on M-Paper dataset? For example, for the task Multimodal Diagram Analysis, its input needs to be ๐ถ๐๐๐ก๐๐ฅ๐ก + ๐ท๐๐๐๐๐๐ ๐ + ๐๐ข๐ก๐๐๐๐, and the question instructions, so how are you organizing the input format for the model? Are there any associated script evaluation files about M-Paper?
Does mPLUG/DocStruct4M and mPLUG/DocDownstream-1.0 contain image files in the dataset, which cannot be verified on the hugging face.
MplugDocOwlHReducerModel --> forward --> line 487
sequence_output = self.reducer(hidden_states) # B,C,H,W -> B,C,H/conv_shape[1],W/(conv_shape[1])
After self.reducer operation, the shape of hidden_states should be (B,C,H/conv_shape[0],W/(conv_shape[1])).
When start supporting Chinese document Q&A?
Hi, I am confused that how to transfer the pdf datasets (DeepformใKLC) to multi images with the true key-value GT pairs for each transfered png image? Because the datasets download in DUE-benchmark have no page ID information.
Thanks for your great work!
In the paper: mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding, it mentioned DocStruct4M and DocReason25K dataset, but they are not open-source.
May I ask if there are any open source plans for these two datasets?
Running Windows 10 venv Python 3.10.6:
from docowl_infer import DocOwlInfer
model_path='mPLUG/DocOwl1.5-stage1'
docowl=DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=False)
ic| model_name: 'DocOwl1.5-stage1'
tokenizer_config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 749/749 [00:00<?, ?B/s]
E:\DocOwl\venv\lib\site-packages\huggingface_hub\file_download.py:148: UserWarning:huggingface_hub
cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\User.cache\huggingface\hub\models--mPLUG--DocOwl1.5-stage1. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting theHF_HUB_DISABLE_SYMLINKS_WARNING
environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
warnings.warn(message)
tokenizer.model: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 500k/500k [00:00<00:00, 11.5MB/s]
special_tokens_map.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 438/438 [00:00<?, ?B/s]
config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 4.84k/4.84k [00:00<?, ?B/s]
model.safetensors: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 16.3G/16.3G [06:55<00:00, 39.2MB/s]
Some weights of MPLUGDocOwlLlamaForCausalLM were not initialized from the model checkpoint at mPLUG/DocOwl1.5-stage1 and are newly initialized: ['model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
generation_config.json: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 162/162 [00:00<?, ?B/s]
Traceback (most recent call last):
File "", line 1, in
File "E:\DocOwl\mPLUG-DocOwl\DocOwl1.5\docowl_infer.py", line 19, in init
self.tokenizer, self.model, _, _ = load_pretrained_model(ckpt_path, None, model_name, load_8bit=load_8bit, load_4bit=load_4bit, device="cuda")
File "E:\DocOwl\mPLUG-DocOwl\DocOwl1.5\mplug_docowl\model\builder.py", line 52, in load_pretrained_model
model = MPLUGDocOwlLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "E:\DocOwl\venv\lib\site-packages\transformers\modeling_utils.py", line 2959, in from_pretrained
dispatch_model(model, **kwargs)
File "E:\DocOwl\venv\lib\site-packages\accelerate\big_modeling.py", line 364, in dispatch_model
weights_map = OffloadedWeightsLoader(
File "E:\DocOwl\venv\lib\site-packages\accelerate\utils\offload.py", line 150, in init
raise ValueError("Need either astate_dict
or asave_folder
containing offloaded weights.")
ValueError: Need either astate_dict
or asave_folder
containing offloaded weights.
The DUE-benchmark provides the ocr results of pdf-type documnets and other models use the ocr results as input to eval their model. What are the input of your model when you use this benchmark? Are you use png/jpg image๏ผ
I found links to HF model cards in DocOwl 1.5 README, but none of them works.
Also, I found no models here https://huggingface.co/mPLUG
Can you please provide working links?
Hello,
I pulled your repo and so far the inference with the stage 1 model works fine. However, the results I get for the localized text recognition often are in the wrong order. For example, I use this code (basically the demo code from the README.md):
from docowl_infer import DocOwlInfer
model_path = "./models/models--mPLUG--DocOwl1.5-stage1/.../"
docowl = DocOwlInfer(ckpt_path=model_path, anchors="grid_9", add_global_img=False)
image = "image.jpg"
query = "Identify the text within the bounding box <bbox>92, 444, 880, 480</bbox>"
answer = docowl.inference(image, query)
print(answer)
on this image (only the relevant part is left visible)
Which gives the result 8 Spl. Fz.z.Pers.bef.b. 5
Here, the two parts "8 Spl." and "Fz.z.Pers.bef.b." are in the wrong order (the "5" in the end is hallucinated, but that only happens in the anonymized image, not in the original one -> no concern here). Something like that happens quite often. I have the feeling that I missed something there. Do I use the model correctly?
There is indeed a warning the code throws during inferene:
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
And also one during model loading:
Some weights of MPLUGDocOwlLlamaForCausalLM were not initialized from the model checkpoint at ... and are newly initialized: ['model.layers.4.self_attn.rotary_emb.inv_freq', ..., 'model.layers.2.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Hi! Thank you for the excellent work!
The unified instruction tuning dataset is a great contribution to the community and can be very useful. I wonder if there is a timetable for its release? Thanks!
@LukeForeverYoung Hey! Thanks for sharing this amazing work!
Are the model weights and inference code available ?
I would be happy to test them locally.
Training data or model design?
่ฏท้ฎๅพๅ็ผ็ ๅจไฝฟ็จๅคๅคง็ๅพๅ่พๅ
ฅ๏ผpatch sizeๆฏๅคๅฐๅข๏ผ
ๅฆๆๆ็
งmPLUG-Owl็้
็ฝฎ๏ผViT-L14ๅจ224x224ๅ่พจ็ไธ่ฝๅคๅ่พจๆ ทไพไธญ็ๅฏ้ๆๅญๅ๏ผๆฏๅฆdocumentๅwebpage
Thanks for your great work.
I have a small question: In the Table Parsing section, the text converts all table representations from HTML to Markdown format.
But the table syntax in Markdown does not support merging rows or columns. It says that tags like <ROWSPAN=x> or <COLSPAN=y> are added in the paper.
Why not just use LaTeX code to represent the table? MMD format is compatible with table LaTeX.
At same time, there is another question: in the inference phase, the model outputs the results of the transformed table, which can not be directly rendered. This is because the output format is neither LaTeX format nor Markdown format.
Both Multi-grained Text Grounding and Multi-grained Text Recognition task need bounding box to get the correspondence between specific texts and local positions.
The bbox in DocLocal4K and DocStruct4M datasets seems not real bounding box of the imgs.
My question is
max(min(int(x)/999, 1.0), 0.0) for x in gt_answer.split(',')]
, will truncate the relatively large coordinates to 1. Isn't there a problem with this?ไปhfไธ่ฝฝๅคชๆ ขไบ๏ผๆฑๅจmodelscopeไธไผ ไธไปฝ
When will the training code be released?Thx.
The meaning of the values in < bbox > are confusing. It doesn't look in the format of x1,y1,x2,y2, since it failes to get correct bbox for most images.
Please can you split the model into 4GB chunks rather than 1 x 16GB. I have already converted to safe tensors through HF in the repo also.
Will just make it much more useable.
Thanks.
I'd like to get the same results with Omni
model as demonstrated in huggingface demo using the inference code in this repo.
Could you share what parameters like anchor/grid, input resolution etc. you use under the hood? Is there any other pre- or postprocessing for the query or the input image that is absent from the inference code?
For example, with an image that says:
MAKE TEXT
STAND OUT FROM
BACKGROUNDS
I've got the following results:
With inference code:
from docowl_infer import DocOwlInfer
model_path = 'mPLUG/DocOwl1.5-Omni'
docowl=DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
query = "Parse texts in the image."
answer = docowl.inference(image_path, query)
Output:
<doc> MAKE TEXT FROM IEX
STAOKOROUNDLICKGRIUINI </doc>
While the demo gives outputs:
[doc] TEXT MAKE
STAND OUT FROM
BACKGROUNDS [/doc]
EDIT: Added example
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "The checkpoint you are trying to load has model type `mplug_docowl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."
}
2024-03-29 12:11:24.059622: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 12:11:24.059732: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 12:11:24.156838: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-29 12:11:26.578613: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
ic| model_name: '0735ba4067b5ab76192ce6e7bc5694701ab4d779'
Traceback (most recent call last):
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/docowl_infer.py", line 70, in <module>
docowl = DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/docowl_infer.py", line 19, in __init__
self.tokenizer, self.model, _, _ = load_pretrained_model(ckpt_path, None, model_name, load_8bit=load_8bit, load_4bit=load_4bit, device="cuda")
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/builder.py", line 52, in load_pretrained_model
model = MPLUGDocOwlLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3375, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 209, in __init__
self.model = MPLUGDocOwlLlamaModel(config)
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 201, in __init__
super(MPLUGDocOwlLlamaModel, self).__init__(config)
File "/content/drive/MyDrive/Document_Extraction/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 33, in __init__
super(MPLUGDocOwlMetaModel, self).__init__(config)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 924, in __init__
[LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 924, in <listcomp>
[LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
TypeError: LlamaDecoderLayer.__init__() takes 2 positional arguments but 3 were given
:๏ผ
another thing๏ผwill this model support chinese ocr soon?
Hi, I downloaded the repo and tried initializing the model with:
model_path = "mPLUG/DocOwl1.5-Chat"
docowl = DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
print('load model from ', model_path)
However, I get the following:
----> 5 docowl = DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
6 print('load model from ', model_path)
7 # exit(0)
Cell In[2], line 5, in DocOwlInfer.__init__(self, ckpt_path, anchors, add_global_img, load_8bit, load_4bit)
3 model_name = get_model_name_from_path(ckpt_path)
4 ic(model_name)
----> 5 self.tokenizer, self.model, _, _ = load_pretrained_model(ckpt_path, None, model_name, load_8bit=load_8bit, load_4bit=load_4bit, device="cuda")
50 else:
51 tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
---> 52 model = MPLUGDocOwlLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
53 else:
--> 209 self.model = MPLUGDocOwlLlamaModel(config)
211 self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
213 # Initialize weights and apply final processing
File ~/SageMaker/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py:201, in MPLUGDocOwlLlamaModel.__init__(self, config)
200 def __init__(self, config: MPLUGDocOwlConfig):
--> 201 super(MPLUGDocOwlLlamaModel, self).__init__(config)
924 self.layers = nn.ModuleList(
--> 925 [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
926 )
927 self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
928 self.gradient_checkpointing = False
TypeError: LlamaDecoderLayer.__init__() takes 2 positional arguments but 3 were given
As you can see, I'm using a sagemaker instance. Could you please provide some guidance? Thanks
This looks really good. And nothing like this has been developed before.
Excited for the source code. Also, all other model fail with documents due to imageprocessor downgrading the resolution to 224. I believe this model handles high resolution for the need for Document understanding.
Does it need OCR to extract the text in the document or is it OCR free mdoel?
All the question prompts are extracted from DocStruct4M, 'multi_grained_text_localization.jsonl' as below,
[
"Give the bounding box of the text",
"Predict the bounding box of the text",
"Detect the text in the bounding box",
"Identify the text within the bounding box",
"Recognize the text in the bounding box",
"Locate the postion of the text"
]
In the last column, 'postion' should be replaced with 'position'.
I wonder whether it matters for training the MLLM, because the error amount is significantly high.
I have tried to execute the steps as enlisted here for extracting the PaperOwl dataset. Can you please confirm if these images are really missing or is there something wrong in extraction?
imgs/2106.08905v2/figures/out_28170.png
imgs/2106.08905v2/figures/28170.png
imgs/2303.16501v1/tables/table_7.png
imgs/2305.16835v1/figures/fig_result_2.png
imgs/2102.12037v3/figures/table-AUROC-boed.png
imgs/1908.09231v1/tables/table_1.png
.... more images are missing
ไฝ ๅฅฝ๏ผไฝ่
๏ผๅพๆ่ฐขไฝ ็ๅทฅไฝ๏ผๆๆฟmPLUG-DocOwl็็ฝไธdemoๆฅๆต่ฏไบไธไธ็ธๅ
ณ็PI CIๅพ็๏ผๆ็็ฎๆ ๆฏ่ฎฉๆจกๅๅพๅฐๆๅ
ณๅญๆฎต็็ปๆๅๆฐๆฎ๏ผไธบไบๆดๅฟซ็ๅฎกๆ ธใ
ไฝๆฏ็ฐๅจ็demo็ๆๆไธๅฐฝไบบๆ๏ผ้ฎ็ธๅ
ณๅญๆฎต็ๅผๅพๅฎนๆๅบ็ฐ่ฏญ่จๅนป่งๅๅ็ญ็ไธๅฏน๏ผๅ็ญ็ๆฐๅญไปไน็้ฝๆฏ้่ฏฏ็๏ผ่ฏท้ฎๅฏไปฅ้่ฟๅพฎ่ฐ็ๆนๅผ่ฎฉไปๆดๅฏนไธไบๅ๏ผๆ่
ๆฏๅขๅ ๅฎ็OCR่ฝๅ๏ผ ๆๅพ
ไฝ ็ๅๅคใ
Thanks for your work.
ๅฆ้ข
Thank you for your work!
When will you make this available on Huggingface with instructions please?
Thanks.
DocStruct4M
DocDownstream-1.0
DocReason25K
DocLocal4K
Are the image files included in the four data sets different image files
Thanks for your great work๏ผ What is the difference between DocDownstrem1.0 and the finetund data used in Ureader๏ผ
How can I properly visualize a bounding box on an image? It seems that conventional operations don't display it correctly. Do I need to perform any special transformations?"
Hi team,
Is there any instruction on how to use the stage1 model? interested with the document/webpage parsing capabilities.
If not can you provide an example script?
Thanks!!
่ฏท้ฎmPLUG-DocOwl 1.5็ๆฐๆฎ้ๅจhuggingfaceไธ็้พๆฅๅคฑๆๆฏไปไนๅๅ
When I run the inference code:
from docowl_infer import DocOwlInfer
model_path='./mPLUG/DocOwl1.5-chat'
docowl=DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
print('load model from ', model_path)
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 docowl=DocOwlInfer(ckpt_path=model_path, anchors='grid_9', add_global_img=True)
File c:\Users\internanirudh\Desktop\DocOwl\mPLUG-DocOwl-main\DocOwl1.5\docowl_infer.py:21, in DocOwlInfer.init(self, ckpt_path, anchors, add_global_img, load_8bit, load_4bit)
19 ic(model_name)
20 print("DocOwl Infer ")
---> 21 self.tokenizer, self.model, _, _ = load_pretrained_model(ckpt_path, None, model_name, load_8bit=load_8bit, load_4bit=load_4bit, device="cuda")
22 self.doc_image_processor = DocProcessor(image_size=448, anchors=anchors, add_global_img=add_global_img, add_textual_crop_indicator=True)
23 self.streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)
File c:\Users\internanirudh\Desktop\DocOwl\mPLUG-DocOwl-main\DocOwl1.5\mplug_docowl\model\builder.py:54, in load_pretrained_model(model_path, model_base, model_name, load_8bit, load_4bit, device_map, device)
52 tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
53 print("MPLUGDocOwlLlamaForCausalLM")
---> 54 model = MPLUGDocOwlLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
55 else:
56 # Load language model
57 if model_base is not None:
58 # PEFT model
File c:\Users\internanirudh\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\modeling_utils.py:3405, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3402 with ContextManagers(init_contexts):
3403 # Let's make sure we don't run the init function of buffer modules
3404 print("ContexManager")
-> 3405 model = cls(config, *model_args, **model_kwargs)
3407 # make sure we use the model's config since the init call might have copied it
3408 config = model.config
File c:\Users\internanirudh\Desktop\DocOwl\mPLUG-DocOwl-main\DocOwl1.5\mplug_docowl\model\modeling_mplug_docowl.py:209, in MPLUGDocOwlLlamaForCausalLM.init(self, config)
207 def init(self, config):
208 super(LlamaForCausalLM, self).init(config)
--> 209 self.model = MPLUGDocOwlLlamaModel(config)
211 self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
213 # Initialize weights and apply final processing
File c:\Users\internanirudh\Desktop\DocOwl\mPLUG-DocOwl-main\DocOwl1.5\mplug_docowl\model\modeling_mplug_docowl.py:201, in MPLUGDocOwlLlamaModel.init(self, config)
200 def init(self, config: MPLUGDocOwlConfig):
--> 201 super(MPLUGDocOwlLlamaModel, self).init(config)
File c:\Users\internanirudh\Desktop\DocOwl\mPLUG-DocOwl-main\DocOwl1.5\mplug_docowl\model\modeling_mplug_docowl.py:33, in MPLUGDocOwlMetaModel.init(self, config)
32 def init(self, config):
---> 33 super(MPLUGDocOwlMetaModel, self).init(config)
34 self.vision_model = MplugOwlVisionModel(
35 MplugOwlVisionConfig(**config.visual_config["visual_model"])
36 )
38 self.vision2text = MplugDocOwlHReducerModel(
39 MplugDocOwlHReducerConfig(**config.visual_config["visual_hreducer"]), config.hidden_size
40 )
File c:\Users\internanirudh\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py:926, in LlamaModel.init(self, config)
923 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
924 print("LlamaDecoderLayer Start")
925 self.layers = nn.ModuleList(
--> 926 [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
927 )
928 print("LlamaDecoderLayer Ran")
929 self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
File c:\Users\internanirudh\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py:926, in (.0)
923 self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
924 print("LlamaDecoderLayer Start")
925 self.layers = nn.ModuleList(
--> 926 [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
927 )
928 print("LlamaDecoderLayer Ran")
929 self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
TypeError: LlamaDecoderLayer.init() takes 2 positional arguments but 3 were given
Can y'all give me a solution to this problem
่ฏท้ฎpatch_positionsๆฒกๆ่ขซไฝฟ็จๅ๏ผ
https://modelscope.cn/studios/damo/mPLUG-DocOwl/summary
This site is down at this moment, it said ๅฝๅ็ฉบ้ด่ฟ่ก้่ฏฏ๏ผๆๆชๅๅธ
I tried
ModuleNotFoundError: No module named 'icecream'
(textgen) [root@pve0 DocOwl1.5]# pip install icecream
Collecting icecream
Using cached icecream-2.1.3-py2.py3-none-any.whl.metadata (1.4 kB)
Requirement already satisfied: colorama>=0.3.9 in /data/miniconda3/envs/textgen/lib/python3.10/site-packages (from icecream) (0.4.6)
Requirement already satisfied: pygments>=2.2.0 in /data/miniconda3/envs/textgen/lib/python3.10/site-packages (from icecream) (2.17.2)
Requirement already satisfied: executing>=0.3.1 in /data/miniconda3/envs/textgen/lib/python3.10/site-packages (from icecream) (2.0.1)
Requirement already satisfied: asttokens>=2.0.1 in /data/miniconda3/envs/textgen/lib/python3.10/site-packages (from icecream) (2.4.1)
Requirement already satisfied: six>=1.12.0 in /data/miniconda3/envs/textgen/lib/python3.10/site-packages (from asttokens>=2.0.1->icecream) (1.16.0)
Using cached icecream-2.1.3-py2.py3-none-any.whl (8.4 kB)
Installing collected packages: icecream
Successfully installed icecream-2.1.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
(textgen) [root@pve0 DocOwl1.5]# python app.py
2024-04-13 16:08:59 | ERROR | stderr | Traceback (most recent call last):
2024-04-13 16:08:59 | ERROR | stderr | File "/data/mplug-docowl/DocOwl1.5/app.py", line 23, in <module>
2024-04-13 16:08:59 | ERROR | stderr | no_change_btn = gr.Button.update()
2024-04-13 16:08:59 | ERROR | stderr | AttributeError: type object 'Button' has no attribute 'update'
These opened dataset can not really find which dataset can hav img -> markdown text information.
And where does the Chinese OCR ability comes from? The whole dataset has no Chinese,
We created some large-scale multimodal datasets that contain OCR annotations, for some we ran paddle OCR over LAION images
do you think those might be useful to tune your method?
Best,
Chris
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.