jpwang / lilt Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
License: MIT License
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
License: MIT License
Hi,
I just wanted to know if it can be used as an object detection model too for detecting bounding Boxes etc for the documents.
I have asked a similar question here. In case you could help me with it, that'd be great.
You could close the question otherwise.
Thank you :)
First of all, Great Work! Appreciate the novel concept of targeting multiple languages as fine-tuning.
Do you think lilt-infoxlm-base is sufficient to be used as base to train extract basic information from invoices, or do you think a completely fresh pretrained model using around 1 million samples required.
How long did it take for you create pretrained model and what's the hardware used.
On fine tuning, how many annotated invoices do you think required (Is that around 5000 sufficient) how long do you think the fine tuned model needs to be trained and what's the hardware.
Thanks in advance, additionally I have access to lot of invoices, if successful I can share the final model here.
Thank you for sharing your great work!
If I want to fine-tune on a custom dataset, what should be the steps? i.e.
-Which scripts we need to modify?
Thanks in advance!
非常感谢你们的开源工作,非常有趣的是我本科和硕士导师都是来自华工的。回归正题,我无法下载在OneDrive中的预处理数据,不知道你们能提供其他的获取方式吗?比如谷歌云盘,或者麻烦你们发我邮箱[email protected],感谢了!
Let's say I have 1000s of domain specific documents in English. How can I Pretrain them on top of roberta-en + Lilt Based existing checkpoint.
The en.train.json
contains a linking filed array. Is it required for SER ( Semantic Entity Recognition instead of RE) tasks?
Have you ever tried IOB tagging for multiple word outputs?
@jpWang
first of all congratulations to all the authors of this great paper and a milestone work, it truly justifies the title SIMPLE yet EFFECTIVE
Question
tmp_attention_scores = attention_scores / math.sqrt(self.attention_head_size)
tmp_layout_attention_scores = layout_attention_scores / math.sqrt(self.attention_head_size//self.channel_shrink_ratio)
attention_scores = tmp_attention_scores + tmp_layout_attention_scores
layout_attention_scores = tmp_layout_attention_scores + tmp_attention_scores
is this addition ( tmp_layout_attention_scores + tmp_attention_scores ) doing the cross-modality interaction learning? Please share some thoughts on this,
# here 'tmp_layout_attention_scores` won't be added since we don't want to update attention_scores for text flow
# also we can keep this line unchanged and stop the gradients flow
attention_scores = tmp_attention_scores + tmp_layout_attention_scores
# this addition will change and `tmp_attention_scores` won't be added
layout_attention_scores = tmp_layout_attention_scores
Can you please comment on is my understanding correct?
I tried to reproduce the CORD results given in the paper, but I only managed to get an F1 score of ~0.62 on the test dataset. Is there any special pre-processing that is done to the CORD dataset for it to work with LiLT or am I making a mistake.
Currently what I am doing is changing the labels in /LiLTfinetune/data/datasets/xfun.py to the labels of the CORD dataset. As well as changing the _generate_examples method to load from the CORD files.
The config that I used:
{
"model_name_or_path": "models/lilt-infoxlm-base",
"tokenizer_name": "roberta-base",
"output_dir": "output/xfun_ser",
"do_train": "true",
"do_eval": "true",
"do_predict": "true",
"lang": "en",
"num_train_epochs": 10,
"max_steps" : 2000,
"per_device_train_batch_size": 1,
"warmup_ratio": 0.1,
"pad_to_max_length": "true",
"return_entity_level_metrics": "true"
}
Is there another step that needs to be done for LiLT to work with a different dataset?
With how many epochs/steps are the results in the paper achieved?
Update: With 20,000 steps I managed to get to an overall F1 score of ~0.79, still far from the expected. With 30,000 steps the score stays at ~0.79, so it is not increasing any more with the number of steps.
Hi,
I wanted to make inference with LiLT
with model parameters to Half
(float16
) dtype on CPU (I did try on GPU and it worked).
As I'm using Transformers from Hugging Face, I ran the following code:
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
param_dtype = torch.float16
model_id = "pierreguillou/lilt-xlm-roberta-base-finetuned-with-DocLayNet-base-at-linelevel-ml384"
model = AutoModelForTokenClassification.from_pretrained(model_id, torch_dtype=param_dtype);
model.to(device);
It worked but when I ran the model for inference with the following code, it failed:
with torch.no_grad():
output = model(input_ids=input_id.to(device),
attention_mask=attention_mask.to(device),
bbox=bbox.to(device)
)
Error message:
[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in layer_norm(input, normalized_shape, weight, bias, eps)
2513 layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
2514 )
-> 2515 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
2516
2517
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
It looks like that dtype float32
is directly implemented in the LiLT
code.
How to solve this issue?
Thanks.
just wondering how to train the pre-train model from scratch. Does this repo contain pretraining code?
First of all, thanks for this nice work.
However, I am unable to reproduce the results given in the table. I was training Funsd for relation extraction task.
Please let me know if I am doing anything wrong.
python examples/run_xfun_re.py --model_name_or_path lilt-infoxlm-base --tokenizer_name xlm-roberta-base --output_dir ls_re_xfund_lilt-infoxlm-base --do_train --do_eval --lang en --max_steps 20000 --per_device_train_batch_size 2 --warmup_ratio 0.1
I am getting eval f1 of 0.47
Can you please provide us with more information regarding "lilt-only-base" file and how the model was created?
Since the base file is just 22MB in size, I would like to know what kind of dataset, parameters or logic used to create this.
I am trying to figure out what are the possibilities available, and what is the starting point I should read to get to know about creating such models. Please give me more references to read.
I'm trying to export a Lilt model that uses distilroberta-base.I get the error: Some weights of the model checkpoint at lilt-distilroberta-base were not used when initializing LiltForTokenClassification. The colab I'm testing with is the following: https://colab.research.google.com/drive/1k2uGoDBOQwrK4iokGJOQKdDfll0QPbl-#scrollTo=0ZmvE7ku4hSW.
Can you help me figure out what I'm doing wrong? Thanks in advance
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=4 run_funsd.py --model_name_or_path lilt-roberta-en-base --tokenizer_name roberta-base --output_dir ser_funsd_lilt-roberta-en-base --do_train --do_predict --max_steps 2000 --per_device_train_batch_size 8 --warmup_ratio 0.1 --fp16
Above command fails with below error for pytorch 1.7.1 Cuda 11.0
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "run_funsd.py", line 369, in <module>
File "run_funsd.py", line 369, in <module>
File "run_funsd.py", line 369, in <module>
File "run_funsd.py", line 369, in <module>
main()main()
File "run_funsd.py", line 50, in main
File "run_funsd.py", line 50, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()main()
main() File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
File "run_funsd.py", line 50, in main
File "run_funsd.py", line 50, in main
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
model_args, data_args, training_args = parser.parse_args_into_dataclasses() File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
obj = dtype(**inputs)obj = dtype(**inputs)
File "<string>", line 67, in __init__
obj = dtype(**inputs) File "<string>", line 67, in __init__
obj = dtype(**inputs)
File "<string>", line 67, in __init__
File "<string>", line 67, in __init__
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
return self._setup_devices
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
return self._setup_devicesreturn self._setup_devices
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
return self._setup_devices
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
cached = self.fget(obj)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
cached = self.fget(obj)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
cached = self.fget(obj)
cached = self.fget(obj)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
return func(*args, **kwargs)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
return func(*args, **kwargs)
torch.distributed.init_process_group(backend="nccl")
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
torch.distributed.init_process_group(backend="nccl")
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
torch.distributed.init_process_group(backend="nccl")
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
torch.distributed.init_process_group(backend="nccl")
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
barrier()
barrier()
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
barrier()
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()work = _default_pg.barrier()
work = _default_pg.barrier()
RuntimeErrorRuntimeErrorRuntimeError: : : NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
Traceback (most recent call last):
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
main()
File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/cydal/anaconda3/envs/liltfinetune/bin/python', '-u', 'run_funsd.py', '--local_rank=3', '--model_name_or_path', 'lilt-roberta-en-base', '--tokenizer_name', 'roberta-base', '--output_dir', 'ser_funsd_lilt-roberta-en-base', '--do_train', '--do_predict', '--max_steps', '2000', '--per_device_train_batch_size', '8', '--warmup_ratio', '0.1', '--fp16']' returned non-zero exit status 1.
Below is conda list:
# packages in environment at /home/cydal/anaconda3/envs/liltfinetune:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.2.0 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi
astunparse 1.6.3 py_0
black 21.4b2 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py37h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.18.1 h7f8727e_0
ca-certificates 2022.07.19 h06a4308_0
cachetools 5.2.0 pypi_0 pypi
certifi 2022.6.15 py37h06a4308_0
cffi 1.15.1 py37h74dc2b5_0
charset-normalizer 2.1.1 pypi_0 pypi
click 8.1.3 pypi_0 pypi
cloudpickle 2.2.0 pypi_0 pypi
cmake 3.19.6 h973ab73_0
cryptography 37.0.1 py37h9ce1e76_0
cudatoolkit 11.0.221 h6bb024c_0
cycler 0.11.0 pypi_0 pypi
dataclasses 0.8 pyh6d0b6a4_7
datasets 1.6.2 pypi_0 pypi
detectron2 0.5+cu110 pypi_0 pypi
dill 0.3.5.1 pypi_0 pypi
expat 2.4.4 h295c915_0
filelock 3.8.0 pypi_0 pypi
fonttools 4.37.1 pypi_0 pypi
freetype 2.11.0 h70c0345_0
fsspec 2022.8.2 pypi_0 pypi
future 0.18.2 py37_1
fvcore 0.1.5.post20220512 pypi_0 pypi
giflib 5.2.1 h7b6447c_0
google-auth 2.11.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.48.1 pypi_0 pypi
huggingface-hub 0.0.19 pypi_0 pypi
hydra-core 1.2.0 pypi_0 pypi
idna 3.3 pyhd3eb1b0_0
importlib-metadata 4.12.0 pypi_0 pypi
importlib-resources 5.9.0 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
iopath 0.1.8 pypi_0 pypi
joblib 1.1.0 pypi_0 pypi
jpeg 9b h024ee3a_2
kiwisolver 1.4.4 pypi_0 pypi
krb5 1.19.2 hac12032_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
libcurl 7.84.0 h91b91d3_0
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h7f8727e_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libnghttp2 1.46.0 hce63b2e_0
libpng 1.6.37 hbc83047_0
libssh2 1.10.0 h8f2d780_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.1.0 h2733197_1
libuv 1.40.0 h7b6447c_0
libwebp 1.2.0 h89dd481_0
liltfinetune 1.0 pypi_0 pypi
lz4-c 1.9.3 h295c915_1
magma-cuda110 2.5.2 1 pytorch
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.1 pypi_0 pypi
matplotlib 3.5.3 pypi_0 pypi
mkl 2021.4.0 h06a4308_640
mkl-include 2022.1.0 h06a4308_224
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.1 py37hd3c417c_0
mkl_random 1.2.2 py37h51133e4_0
multiprocess 0.70.13 pypi_0 pypi
mypy-extensions 0.4.3 pypi_0 pypi
ncurses 6.3 h5eee18b_3
ninja 1.10.2 h06a4308_5
ninja-base 1.10.2 hd09550d_5
numpy 1.21.6 pypi_0 pypi
numpy-base 1.21.5 py37ha15fc14_3
oauthlib 3.2.1 pypi_0 pypi
omegaconf 2.2.3 pypi_0 pypi
openssl 1.1.1q h7f8727e_0
packaging 21.3 pypi_0 pypi
pandas 1.3.5 pypi_0 pypi
pathspec 0.10.1 pypi_0 pypi
pillow 9.2.0 pypi_0 pypi
pip 22.1.2 py37h06a4308_0
portalocker 2.5.1 pypi_0 pypi
protobuf 3.19.4 pypi_0 pypi
pyarrow 9.0.0 pypi_0 pypi
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycocotools 2.0.4 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pydot 1.4.2 pypi_0 pypi
pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py37_1
python 3.7.13 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi
pytorch 1.7.1 py3.7_cuda11.0.221_cudnn8.0.5_0 pytorch
pytz 2022.2.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1.2 h7f8727e_1
regex 2022.9.13 pypi_0 pypi
requests 2.28.1 py37h06a4308_0
requests-oauthlib 1.3.1 pypi_0 pypi
rhash 1.4.1 h3c74f83_1
rsa 4.9 pypi_0 pypi
sacremoses 0.0.53 pypi_0 pypi
scikit-learn 1.0.2 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
seqeval 1.2.2 pypi_0 pypi
setuptools 63.4.1 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.2 h5082296_0
tabulate 0.8.10 pypi_0 pypi
tensorboard 2.10.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.10.3 pypi_0 pypi
toml 0.10.2 pypi_0 pypi
torch 1.7.1+cu110 pypi_0 pypi
torchaudio 0.7.2 pypi_0 pypi
torchvision 0.8.2+cu110 pypi_0 pypi
tqdm 4.49.0 pypi_0 pypi
transformers 4.5.1 pypi_0 pypi
typed-ast 1.5.4 pypi_0 pypi
typing_extensions 4.3.0 py37h06a4308_0
urllib3 1.26.12 pypi_0 pypi
werkzeug 2.2.2 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
xxhash 3.0.0 pypi_0 pypi
xz 5.2.5 h7f8727e_1
yacs 0.1.8 pypi_0 pypi
yaml 0.2.5 h7b6447c_0
zipp 3.8.1 pypi_0 pypi
zlib 1.2.12 h5eee18b_3
zstd 1.4.9 haebb681_0
nvidia-smi
NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 76C P0 33W / 70W | 5874MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
If I upgrade to pytorch 1.8 with cuda 11.1 then the error is Cuda Invalid device ordinal. Trying to setup this environment from last 3 days, tried various combinations of versions none worked. Can you provide a list of dependencies with the exact versions where it can work in a new instance of Ubuntu 18.04.
if I change the max_2d_position_embeddings, how to get the lilt-only-base using new model_args
Is it possible to build LiLT using RoBERTa large? From my understanding, that would also have to be pre-trained before fine tuning as well.
Is this something you are able to publish? It would be very helpful for improving LiLT's accuracy 🙏🏻
Can I use the CPU for training, as my current graphics card does not have enough video memory (Error Alert: [RuntimeError: CUDA out of memory]), and I just want to make sure that it will run properly on my computer.
:(
Would it be possible to use LiLT with BigBird-Roberta-Base models?
If so, any feedback on the best approach of doing so? What might need changing in the LiLT repository to do so?
Hi Team,
I have some requirements for unstructured data extraction. I have seen the model performance with structured data.
How can we use it for unstructured data?
None of the one-drive links to checkpoints and/or datasets seem to be working.
Hello,
do you plan on adding the code to fine-tune your model in a classification task (like RVL-CDIP)?
PS : Nice work, splitting the layout and the language models is a really good idea!
First of all thanks, for the great work!
My question, could these models be adapted to the task of Layout Analysis, so that we could use them in datasets like Publaynet?
In this sense, the models would need to output the probabilities of each pixel belonging to a given class, instead of the possible tags for each token.
Hi there, thanks for your work and releasing the code.
I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads
. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:
Regards,
Akarsh
Thanks for this amazing code!
I have some question about LiLT-based, does it mean that the text stream is not combined with any pre-trained language model, and is trained from scratch with the layout stream?
Hello, thank you for the fantastic paper, It was a joy to read. I am attempting to instantiate LiLTRobertaLikeForTokenClassification()
in my own google colab notebook to try out on my own data, but seeing an error when attempting to build the config object.
[/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
398 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
399 if "model_type" in config_dict:
--> 400 config_class = CONFIG_MAPPING[config_dict["model_type"]]
401 return config_class.from_dict(config_dict, **kwargs)
402 else:
KeyError: 'liltrobertalike'
To recreate :
from transformers import AutoConfig
pretrained_model_path = 'path to unzipped pretrain folder'
config = AutoConfig.from_pretrained(
pretrained_model_path,
num_labels=max(experiment_config['encode_labels'].values())+1,
finetuning_task='funsd',
cache_dir='/content/drive/Shareddrives/Machine Learning/pretrained_models',
revision='v1-test',
use_auth_token=None,
)
Which produces this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
[<ipython-input-73-77ce32bcce05>](https://localhost:8080/#) in <module>()
5 cache_dir='/content/drive/Shareddrives/Machine Learning/pretrained_models',
6 revision='v1-test',
----> 7 use_auth_token=None,
8 )
[/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
398 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
399 if "model_type" in config_dict:
--> 400 config_class = CONFIG_MAPPING[config_dict["model_type"]]
401 return config_class.from_dict(config_dict, **kwargs)
402 else:
KeyError: 'liltrobertalike'
Any suggestions on how to instantiate the config for your pretrained model? It seems AutoConfig does not like this line in the config:
"model_type": "liltrobertalike"
Hi @jpWang,
I had a question related to LiLT; namely whether or not you're leveraging bounding boxes per word or per segment when fine-tuning on FUNSD. The LayoutLMv3 authors saw a great boost in performance when employing the same bounding box coordinates for a set of words that make up a "segment", like an address on an invoice. They use the OCR engine to identify segments in a document, and then give the same bounding box coordinates to all the words that make up that segment (an idea which was introduced in StructuralLM).
LayoutLMv1 and v2 both use "word position embeddings," which means that each individual word has its own bounding box coordinates.
Does LiLT achieve 88% F1 on FUNSD with word position embeddings? Looking at this file, it seems word position embeddings are used.
I am getting errors when trying to run Multi-task Semantic Entity Recognition on XFUND by following the instructions in the README. Specifically, the config initialisation on line 127 in run_xfun_ser.py
is failing with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 546, in get_config_dict
resolved_config_file = cached_path(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1402, in cached_path
output_path = get_from_cache(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1574, in get_from_cache
r.raise_for_status()
File "/opt/conda/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/lilt-infoxlm-base/resolve/main/config.json
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 527, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 570, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'lilt-infoxlm-base'. Make sure that:
- 'lilt-infoxlm-base' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'lilt-infoxlm-base' is the correct path to a directory containing a config.json file
- or 'main' is a valid git identifier (branch name, a tag name, or a commit id) that exists for this model name as listed on its model page on 'https://huggingface.co/models'
I have also tried to download the model from the provided OneCloud link and point the config_name
argument to the config.json
contained within the compressed file, however I am getting another error in that case:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 529, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 278, in __getitem__
raise KeyError(key)
KeyError: 'liltrobertalike'
I had to change the init file and update the requirements as I was facing the same problem as in issue #32. The updated versions are:
datasets==2.7.1
transformers==4.11.3
Did something change or am I doing something wrong?
Are you able to provide the pre-training code?
I would like to try and pre-train using roberta-large, or a similar language model :)
Hello, amazing job !
I love the paradigm of decoupling the LM and the Layout model at first, before fine-tuning with joint training ! I've managed to port my LayoutXLM code to your framework almost plug and play, and was wondering if you were planning to open an official model implementation on the HuggingFace transformers library ?
As is, except for a few tweaks due to version changes, and perhaps a processor object wrapping the tokenizer not much is required, so I was wondering if you had plans to do so ?
Cheers and again, great work !
Best,
Manuel
Hi :) I'm confused about pretrain process when I change language model.
I hope to use LiLT using korean Roberta model which is already pretrained with Korean language dataset.
According to paper, I need to re-pretrain Korean Roberta model with Layout embedding vector. Is it right?
++ I think I need to re-pretrain lilt-only-base model because of CAI pretrain task..
Hello,
Could you let me know when you have a Custom dataset and how to organize your dataset into the?
and do you recommend any tutorial for this step?
Thank you in advance.
Originally posted by @hamzabchiri in #3 (comment)
When conducting the experiment of language-specific fine-tuning on XFUND, the obtained f1 score is only 0.6179 using the following command, which has a huge gap compared with the reported f1 score of 0.7297. The XFUND dataset and pretrained lilt-infoxlm-base are downloaded from the urls mentioned in the README.md. Are there any additional steps to reproduce the experiment?
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py
--model_name_or_path lilt-infoxlm-base
--tokenizer_name xlm-roberta-base
--output_dir ls_re_xfund_zh_lilt-infoxlm-base
--do_train
--do_eval
--lang zh
--max_steps 5000
--per_device_train_batch_size 8
--learning_rate 6.25e-6
--warmup_ratio 0.1
--fp16
After I have installed the conda environment, I use conda-pack package the env and transfer to another linux gpu server which can not access the Internet(Yes, you heard that right),the error occurs, here is the error message:
Installing build dependencies ... error
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
?─> [8 lines of output]
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8fb8bdea10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/setuptools/
ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
ERROR: No matching distribution found for setuptools>=40.8.0
WARNING: There was an error checking the latest version of pip.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: There was an error checking the latest version of pip.
Hi,
LiLT processes a maximum of 512 tokens.
Is there a good option to get a comparable and commercial useable model that can process more tokens?
It is of course possible to split longer inputs into 512 token chunks. But this comes with some disadvantages / difficulties.
Hi, just like to know if I can run this entirely on colab and how? many thanks!
Nice work, but I noticed the repository only contains down-stream finetune code.
Will you release the pretrained code on IIT-CDIP dataset with KPL/CAI self-supervised objectives in future?
Thanks a lot!
Hi!
I like the idea of decoupling text and layout information to leverage existing pre-trained language models.
I had some confusion when I was reading the paper.
Why are the performances reported in the two tables different?
Thanks for your reply.
Hi and thanks for creating this,
I am trying to use https://huggingface.co/Finnish-NLP/roberta-large-finnish-v2?text=Moikka+olen+%3Cmask%3E+kielimalli. with this repo. I have successfully run the weight generation:
python gen_weight_roberta_like.py --lilt lilt-only-base/pytorch_model.bin --text roberta-large-finnish-v2/roberta-large-finnish-v2/pytorch_model.bin --config roberta-large-finnish-v2/roberta-large-finnish-v2/config.json --out lilt-roberta-large-finnish-v2
But when I try to load the model then I get the following error:
Do you have idea what might cause this and how could it be fixed?
Hello,
Thanks for making the code available !
I was wondering if there was a lilt-only-large checkpoint available or a way to easily generate it ?
I would like to try to combine lilt with a large xlm Roberta to see the potential gain between base and large.
Thanks !
Hi,
I'm using Hugging Face libraries in order to run LiLT
.
How can I decrease inference time? Which code to use?
I've already try BetterTransformer
(Optimum
) and ONNX
but none of them accepts LiLT
model.
NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer.
KeyError: "lilt is not supported yet.
Thank you.
Note: I asked this question here, too: NielsRogge/Transformers-Tutorials#284
FUNSD, lilt-roberta-en-base
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
7%|▋ | 143/2000 [00:28<06:06, 5.06it/s]
while I find the reason, I find a features's size is 625. batch["input_ids"].shape = (1, 625) . Could you tell me how to fix it.Thanks a lot!
Hi,
Just got my XFUND-ES finetuning job to work! While I wait, I am trying to work my way through the code to create an inference script. Would running the run_xfun_ser.py script with --do_predict
(and passing the checkpoint I obtained by finetuning) suffice? Also, if I wanted to create my own dataset to finetune further, would you recommend a transfer learning approach starting from this XFUND fine-tuned checkpoint or should I go straight to create the custom dataset?
Thank you very much in advance. This is great work!
Has this been done before to compare the results against LayoutLM(large) and Ernie(Large)?
If anyone has, please provide us with the relevant checkpoint and findings for improvements against the standard datasets.
Hi @jpWang , thanks for your repo,
I have used it for my project: extract keys and values in complicated layout document types
I have an idea to improve the RE model as below:
As I know that the RE is based on the semantics of language to learn -> relation classification
From my point of view, they can be learned on position (position embedding) + semantics of language to improve relation classification
To take the good result as bellow:
What do you think about my idea?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.