Git Product home page Git Product logo

jpwang / lilt Goto Github PK

View Code? Open in Web Editor NEW
334.0 6.0 40.0 1.4 MB

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

License: MIT License

Python 99.72% Makefile 0.28%
nlp document-ai document-analysis document-understanding information-extraction multimodal-pre-trained-model multilingual-models

lilt's People

Contributors

jpwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lilt's Issues

Usage as Object Detection?

Hi,
I just wanted to know if it can be used as an object detection model too for detecting bounding Boxes etc for the documents.

I have asked a similar question here. In case you could help me with it, that'd be great.

You could close the question otherwise.

Thank you :)

Invoice Extraction

First of all, Great Work! Appreciate the novel concept of targeting multiple languages as fine-tuning.

Do you think lilt-infoxlm-base is sufficient to be used as base to train extract basic information from invoices, or do you think a completely fresh pretrained model using around 1 million samples required.

How long did it take for you create pretrained model and what's the hardware used.

On fine tuning, how many annotated invoices do you think required (Is that around 5000 sufficient) how long do you think the fine tuned model needs to be trained and what's the hardware.

Thanks in advance, additionally I have access to lot of invoices, if successful I can share the final model here.

Fine-tuning on custom data

Thank you for sharing your great work!

If I want to fine-tune on a custom dataset, what should be the steps? i.e.

  • What is input data format for training, testing and inference?

-Which scripts we need to modify?

Thanks in advance!

pre-processed data

非常感谢你们的开源工作,非常有趣的是我本科和硕士导师都是来自华工的。回归正题,我无法下载在OneDrive中的预处理数据,不知道你们能提供其他的获取方式吗?比如谷歌云盘,或者麻烦你们发我邮箱[email protected],感谢了!

How to Pretrain on our own documents

Let's say I have 1000s of domain specific documents in English. How can I Pretrain them on top of roberta-en + Lilt Based existing checkpoint.

"linking" in Dataset and IOB Tagging

The en.train.json contains a linking filed array. Is it required for SER ( Semantic Entity Recognition instead of RE) tasks?

Have you ever tried IOB tagging for multiple word outputs?

Questions on BiACM

@jpWang
first of all congratulations to all the authors of this great paper and a milestone work, it truly justifies the title SIMPLE yet EFFECTIVE

Question

  1. From the paper I got to know, LiLT is using BiACM to introduce cross-modality interaction but could not find that part in the code, in code if I understood correctly LiLT is using separate Query, Key, and Value linear layers to calculate attention for text flow and layout flow. And then there is an addition of relative_distance_embeddings to each of them.
    Then it adds both temp_attention scores to get final attention scores as
tmp_attention_scores = attention_scores / math.sqrt(self.attention_head_size)
tmp_layout_attention_scores = layout_attention_scores / math.sqrt(self.attention_head_size//self.channel_shrink_ratio)  
attention_scores = tmp_attention_scores + tmp_layout_attention_scores
layout_attention_scores = tmp_layout_attention_scores + tmp_attention_scores 

is this addition ( tmp_layout_attention_scores + tmp_attention_scores ) doing the cross-modality interaction learning? Please share some thoughts on this,

  1. I understood that during pretraining LiLT stops the gradients backpropagation through the Text flow model, so during pretraining
 #  here  'tmp_layout_attention_scores` won't be added since we don't want to update attention_scores for text flow
#  also we can keep this line  unchanged and stop the gradients flow
attention_scores = tmp_attention_scores + tmp_layout_attention_scores
 # this addition will change  and  `tmp_attention_scores` won't be added
layout_attention_scores = tmp_layout_attention_scores   

Can you please comment on is my understanding correct?

Need help with reproducing CORD results

I tried to reproduce the CORD results given in the paper, but I only managed to get an F1 score of ~0.62 on the test dataset. Is there any special pre-processing that is done to the CORD dataset for it to work with LiLT or am I making a mistake.

Currently what I am doing is changing the labels in /LiLTfinetune/data/datasets/xfun.py to the labels of the CORD dataset. As well as changing the _generate_examples method to load from the CORD files.

The config that I used:

{
    "model_name_or_path": "models/lilt-infoxlm-base",
    "tokenizer_name": "roberta-base",
    "output_dir": "output/xfun_ser",
    "do_train": "true",
    "do_eval": "true",
    "do_predict": "true",
    "lang": "en",
    "num_train_epochs": 10,
    "max_steps" : 2000,
    "per_device_train_batch_size": 1,
    "warmup_ratio": 0.1,
    "pad_to_max_length": "true",
    "return_entity_level_metrics": "true"
}

Is there another step that needs to be done for LiLT to work with a different dataset?
With how many epochs/steps are the results in the paper achieved?

Update: With 20,000 steps I managed to get to an overall F1 score of ~0.79, still far from the expected. With 30,000 steps the score stays at ~0.79, so it is not increasing any more with the number of steps.

LiLT can not make inference with the Half (float16) dtype on CPU

Hi,

I wanted to make inference with LiLTwith model parameters to Half (float16) dtype on CPU (I did try on GPU and it worked).

As I'm using Transformers from Hugging Face, I ran the following code:

from transformers import AutoTokenizer, AutoModelForTokenClassification

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

param_dtype = torch.float16
model_id = "pierreguillou/lilt-xlm-roberta-base-finetuned-with-DocLayNet-base-at-linelevel-ml384"
model = AutoModelForTokenClassification.from_pretrained(model_id, torch_dtype=param_dtype);
model.to(device);

It worked but when I ran the model for inference with the following code, it failed:

with torch.no_grad():
    output = model(input_ids=input_id.to(device),
                    attention_mask=attention_mask.to(device),
                    bbox=bbox.to(device)
     )

Error message:

[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in layer_norm(input, normalized_shape, weight, bias, eps)
   2513             layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2514         )
-> 2515     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
   2516 
   2517 

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

It looks like that dtype float32 is directly implemented in the LiLT code.

How to solve this issue?
Thanks.

how to train from scratch

just wondering how to train the pre-train model from scratch. Does this repo contain pretraining code?

not able to reproduce results

First of all, thanks for this nice work.
However, I am unable to reproduce the results given in the table. I was training Funsd for relation extraction task.
Please let me know if I am doing anything wrong.
python examples/run_xfun_re.py --model_name_or_path lilt-infoxlm-base --tokenizer_name xlm-roberta-base --output_dir ls_re_xfund_lilt-infoxlm-base --do_train --do_eval --lang en --max_steps 20000 --per_device_train_batch_size 2 --warmup_ratio 0.1
I am getting eval f1 of 0.47

代码运行问题

你好,我是东华大学的一名研究生。
很荣幸能够阅读你这么优秀的文章,并进行复现。现在我在运行你的实例代码时,碰到了一些问题:
image
希望你能抽空看看,解决一下。十分感谢!

How is "lilt-only-base" bin file is created

Can you please provide us with more information regarding "lilt-only-base" file and how the model was created?

Since the base file is just 22MB in size, I would like to know what kind of dataset, parameters or logic used to create this.

I am trying to figure out what are the possibilities available, and what is the starting point I should read to get to know about creating such models. Please give me more references to read.

run_funsd.py fails with NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=4 run_funsd.py --model_name_or_path lilt-roberta-en-base --tokenizer_name roberta-base --output_dir ser_funsd_lilt-roberta-en-base --do_train --do_predict --max_steps 2000 --per_device_train_batch_size 8 --warmup_ratio 0.1 --fp16

Above command fails with below error for pytorch 1.7.1 Cuda 11.0

Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "run_funsd.py", line 369, in <module>
  File "run_funsd.py", line 369, in <module>
  File "run_funsd.py", line 369, in <module>
  File "run_funsd.py", line 369, in <module>
        main()main()

  File "run_funsd.py", line 50, in main
  File "run_funsd.py", line 50, in main
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()main()    

main()  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses

  File "run_funsd.py", line 50, in main
  File "run_funsd.py", line 50, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses

  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        obj = dtype(**inputs)obj = dtype(**inputs)

      File "<string>", line 67, in __init__
obj = dtype(**inputs)  File "<string>", line 67, in __init__
    
obj = dtype(**inputs)
  File "<string>", line 67, in __init__
  File "<string>", line 67, in __init__
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 570, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):

  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):

  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 717, in device
    return self._setup_devices
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
        return self._setup_devicesreturn self._setup_devices

  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
    return self._setup_devices
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1460, in __get__
    cached = self.fget(obj)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
    cached = self.fget(obj)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
    cached = self.fget(obj)    
cached = self.fget(obj)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/file_utils.py", line 1470, in wrapper
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
    return func(*args, **kwargs)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
    return func(*args, **kwargs)
    torch.distributed.init_process_group(backend="nccl")
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/transformers/training_args.py", line 702, in _setup_devices
      File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
torch.distributed.init_process_group(backend="nccl")
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    torch.distributed.init_process_group(backend="nccl")
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    torch.distributed.init_process_group(backend="nccl")
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    barrier()    
barrier()
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    barrier()
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
        work = _default_pg.barrier()work = _default_pg.barrier()
    
work = _default_pg.barrier()
RuntimeErrorRuntimeErrorRuntimeError: : : NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8

    work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370156314/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
Traceback (most recent call last):
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/home/cydal/anaconda3/envs/liltfinetune/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/cydal/anaconda3/envs/liltfinetune/bin/python', '-u', 'run_funsd.py', '--local_rank=3', '--model_name_or_path', 'lilt-roberta-en-base', '--tokenizer_name', 'roberta-base', '--output_dir', 'ser_funsd_lilt-roberta-en-base', '--do_train', '--do_predict', '--max_steps', '2000', '--per_device_train_batch_size', '8', '--warmup_ratio', '0.1', '--fp16']' returned non-zero exit status 1.

Below is conda list:

# packages in environment at /home/cydal/anaconda3/envs/liltfinetune:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   1.2.0                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
appdirs                   1.4.4                    pypi_0    pypi
astunparse                1.6.3                      py_0  
black                     21.4b2                   pypi_0    pypi
blas                      1.0                         mkl  
brotlipy                  0.7.0           py37h27cfd23_1003  
bzip2                     1.0.8                h7b6447c_0  
c-ares                    1.18.1               h7f8727e_0  
ca-certificates           2022.07.19           h06a4308_0  
cachetools                5.2.0                    pypi_0    pypi
certifi                   2022.6.15        py37h06a4308_0  
cffi                      1.15.1           py37h74dc2b5_0  
charset-normalizer        2.1.1                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.2.0                    pypi_0    pypi
cmake                     3.19.6               h973ab73_0  
cryptography              37.0.1           py37h9ce1e76_0  
cudatoolkit               11.0.221             h6bb024c_0  
cycler                    0.11.0                   pypi_0    pypi
dataclasses               0.8                pyh6d0b6a4_7  
datasets                  1.6.2                    pypi_0    pypi
detectron2                0.5+cu110                pypi_0    pypi
dill                      0.3.5.1                  pypi_0    pypi
expat                     2.4.4                h295c915_0  
filelock                  3.8.0                    pypi_0    pypi
fonttools                 4.37.1                   pypi_0    pypi
freetype                  2.11.0               h70c0345_0  
fsspec                    2022.8.2                 pypi_0    pypi
future                    0.18.2                   py37_1  
fvcore                    0.1.5.post20220512          pypi_0    pypi
giflib                    5.2.1                h7b6447c_0  
google-auth               2.11.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.48.1                   pypi_0    pypi
huggingface-hub           0.0.19                   pypi_0    pypi
hydra-core                1.2.0                    pypi_0    pypi
idna                      3.3                pyhd3eb1b0_0  
importlib-metadata        4.12.0                   pypi_0    pypi
importlib-resources       5.9.0                    pypi_0    pypi
intel-openmp              2021.4.0          h06a4308_3561  
iopath                    0.1.8                    pypi_0    pypi
joblib                    1.1.0                    pypi_0    pypi
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.4.4                    pypi_0    pypi
krb5                      1.19.2               hac12032_0  
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.38                 h1181459_1  
libcurl                   7.84.0               h91b91d3_0  
libedit                   3.1.20210910         h7f8727e_0  
libev                     4.33                 h7f8727e_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libnghttp2                1.46.0               hce63b2e_0  
libpng                    1.6.37               hbc83047_0  
libssh2                   1.10.0               h8f2d780_0  
libstdcxx-ng              11.2.0               h1234567_1  
libtiff                   4.1.0                h2733197_1  
libuv                     1.40.0               h7b6447c_0  
libwebp                   1.2.0                h89dd481_0  
liltfinetune              1.0                      pypi_0    pypi
lz4-c                     1.9.3                h295c915_1  
magma-cuda110             2.5.2                         1    pytorch
markdown                  3.4.1                    pypi_0    pypi
markupsafe                2.1.1                    pypi_0    pypi
matplotlib                3.5.3                    pypi_0    pypi
mkl                       2021.4.0           h06a4308_640  
mkl-include               2022.1.0           h06a4308_224  
mkl-service               2.4.0            py37h7f8727e_0  
mkl_fft                   1.3.1            py37hd3c417c_0  
mkl_random                1.2.2            py37h51133e4_0  
multiprocess              0.70.13                  pypi_0    pypi
mypy-extensions           0.4.3                    pypi_0    pypi
ncurses                   6.3                  h5eee18b_3  
ninja                     1.10.2               h06a4308_5  
ninja-base                1.10.2               hd09550d_5  
numpy                     1.21.6                   pypi_0    pypi
numpy-base                1.21.5           py37ha15fc14_3  
oauthlib                  3.2.1                    pypi_0    pypi
omegaconf                 2.2.3                    pypi_0    pypi
openssl                   1.1.1q               h7f8727e_0  
packaging                 21.3                     pypi_0    pypi
pandas                    1.3.5                    pypi_0    pypi
pathspec                  0.10.1                   pypi_0    pypi
pillow                    9.2.0                    pypi_0    pypi
pip                       22.1.2           py37h06a4308_0  
portalocker               2.5.1                    pypi_0    pypi
protobuf                  3.19.4                   pypi_0    pypi
pyarrow                   9.0.0                    pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycocotools               2.0.4                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0  
pydot                     1.4.2                    pypi_0    pypi
pyopenssl                 22.0.0             pyhd3eb1b0_0  
pyparsing                 3.0.9                    pypi_0    pypi
pysocks                   1.7.1                    py37_1  
python                    3.7.13               h12debd9_0  
python-dateutil           2.8.2                    pypi_0    pypi
pytorch                   1.7.1           py3.7_cuda11.0.221_cudnn8.0.5_0    pytorch
pytz                      2022.2.1                 pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.1.2                h7f8727e_1  
regex                     2022.9.13                pypi_0    pypi
requests                  2.28.1           py37h06a4308_0  
requests-oauthlib         1.3.1                    pypi_0    pypi
rhash                     1.4.1                h3c74f83_1  
rsa                       4.9                      pypi_0    pypi
sacremoses                0.0.53                   pypi_0    pypi
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
seqeval                   1.2.2                    pypi_0    pypi
setuptools                63.4.1           py37h06a4308_0  
six                       1.16.0             pyhd3eb1b0_1  
sqlite                    3.39.2               h5082296_0  
tabulate                  0.8.10                   pypi_0    pypi
tensorboard               2.10.0                   pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
termcolor                 2.0.1                    pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
tokenizers                0.10.3                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
torch                     1.7.1+cu110              pypi_0    pypi
torchaudio                0.7.2                    pypi_0    pypi
torchvision               0.8.2+cu110              pypi_0    pypi
tqdm                      4.49.0                   pypi_0    pypi
transformers              4.5.1                    pypi_0    pypi
typed-ast                 1.5.4                    pypi_0    pypi
typing_extensions         4.3.0            py37h06a4308_0  
urllib3                   1.26.12                  pypi_0    pypi
werkzeug                  2.2.2                    pypi_0    pypi
wheel                     0.37.1             pyhd3eb1b0_0  
xxhash                    3.0.0                    pypi_0    pypi
xz                        5.2.5                h7f8727e_1  
yacs                      0.1.8                    pypi_0    pypi
yaml                      0.2.5                h7b6447c_0  
zipp                      3.8.1                    pypi_0    pypi
zlib                      1.2.12               h5eee18b_3  
zstd                      1.4.9                haebb681_0 

nvidia-smi

 NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   76C    P0    33W /  70W |   5874MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+

If I upgrade to pytorch 1.8 with cuda 11.1 then the error is Cuda Invalid device ordinal. Trying to setup this environment from last 3 days, tried various combinations of versions none worked. Can you provide a list of dependencies with the exact versions where it can work in a new instance of Ubuntu 18.04.

Is LiLT-Large possible?

Is it possible to build LiLT using RoBERTa large? From my understanding, that would also have to be pre-trained before fine tuning as well.

Is this something you are able to publish? It would be very helpful for improving LiLT's accuracy 🙏🏻

Can I use the CPU for training

Can I use the CPU for training, as my current graphics card does not have enough video memory (Error Alert: [RuntimeError: CUDA out of memory]), and I just want to make sure that it will run properly on my computer.
:(

How we can use it for unstructured data

Hi Team,
I have some requirements for unstructured data extraction. I have seen the model performance with structured data.
How can we use it for unstructured data?

Fine-tuning for classification

Hello,

do you plan on adding the code to fine-tune your model in a classification task (like RVL-CDIP)?

PS : Nice work, splitting the layout and the language models is a really good idea!

Layout Analysis

First of all thanks, for the great work!
My question, could these models be adapted to the task of Layout Analysis, so that we could use them in datasets like Publaynet?
In this sense, the models would need to output the probabilities of each pixel belonging to a given class, instead of the possible tags for each token.

A small doubt regarding the implementation of the model

Hi there, thanks for your work and releasing the code.

I was trying to implement the code, from the paper. However, on page 4, section 3.1, the paper says In this BASE setting, LiLT has a 12-layer encoder with 192 hidden size, 768 feed-forward filter size, and 12 attention heads. Can you tell me, how are 192 hidden sizes used in the implementation, I mean I saw the hugging face configuration of LiLT, and the configuration was mentioned as follows:

  • intermediate_size (int, optional, defaults to 3072):
  • hidden_size (int, optional, defaults to 768), I didn't see the 192 hidden sizes anywhere there.

Regards,
Akarsh

question about LiLT-base

Thanks for this amazing code!
I have some question about LiLT-based, does it mean that the text stream is not combined with any pre-trained language model, and is trained from scratch with the layout stream?

Problem starting AutoConfig from pretrained config

Hello, thank you for the fantastic paper, It was a joy to read. I am attempting to instantiate LiLTRobertaLikeForTokenClassification() in my own google colab notebook to try out on my own data, but seeing an error when attempting to build the config object.

[/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    398         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    399         if "model_type" in config_dict:
--> 400             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    401             return config_class.from_dict(config_dict, **kwargs)
    402         else:

KeyError: 'liltrobertalike'

To recreate :

from transformers import AutoConfig

pretrained_model_path = 'path to unzipped pretrain folder'


config = AutoConfig.from_pretrained(
    pretrained_model_path,
    num_labels=max(experiment_config['encode_labels'].values())+1,
    finetuning_task='funsd',
    cache_dir='/content/drive/Shareddrives/Machine Learning/pretrained_models',
    revision='v1-test',
    use_auth_token=None,
)

Which produces this error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-73-77ce32bcce05>](https://localhost:8080/#) in <module>()
      5     cache_dir='/content/drive/Shareddrives/Machine Learning/pretrained_models',
      6     revision='v1-test',
----> 7     use_auth_token=None,
      8 )

[/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    398         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    399         if "model_type" in config_dict:
--> 400             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    401             return config_class.from_dict(config_dict, **kwargs)
    402         else:

KeyError: 'liltrobertalike'

Any suggestions on how to instantiate the config for your pretrained model? It seems AutoConfig does not like this line in the config:

"model_type": "liltrobertalike"

Word or segment position embeddings?

Hi @jpWang,

I had a question related to LiLT; namely whether or not you're leveraging bounding boxes per word or per segment when fine-tuning on FUNSD. The LayoutLMv3 authors saw a great boost in performance when employing the same bounding box coordinates for a set of words that make up a "segment", like an address on an invoice. They use the OCR engine to identify segments in a document, and then give the same bounding box coordinates to all the words that make up that segment (an idea which was introduced in StructuralLM).

LayoutLMv1 and v2 both use "word position embeddings," which means that each individual word has its own bounding box coordinates.

Does LiLT achieve 88% F1 on FUNSD with word position embeddings? Looking at this file, it seems word position embeddings are used.

Config error in Multi-task Semantic Entity Recognition on XFUND

I am getting errors when trying to run Multi-task Semantic Entity Recognition on XFUND by following the instructions in the README. Specifically, the config initialisation on line 127 in run_xfun_ser.py is failing with the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 546, in get_config_dict
    resolved_config_file = cached_path(
  File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1402, in cached_path
    output_path = get_from_cache(
  File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1574, in get_from_cache
    r.raise_for_status()
  File "/opt/conda/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/lilt-infoxlm-base/resolve/main/config.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 527, in from_pretrained
    config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 570, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'lilt-infoxlm-base'. Make sure that:

- 'lilt-infoxlm-base' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'lilt-infoxlm-base' is the correct path to a directory containing a config.json file

- or 'main' is a valid git identifier (branch name, a tag name, or a commit id) that exists for this model name as listed on its model page on 'https://huggingface.co/models'

I have also tried to download the model from the provided OneCloud link and point the config_name argument to the config.json contained within the compressed file, however I am getting another error in that case:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 529, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 278, in __getitem__
    raise KeyError(key)
KeyError: 'liltrobertalike'

I had to change the init file and update the requirements as I was facing the same problem as in issue #32. The updated versions are:

datasets==2.7.1
transformers==4.11.3

Did something change or am I doing something wrong?

Pre-training code?

Are you able to provide the pre-training code?

I would like to try and pre-train using roberta-large, or a similar language model :)

HuggingFace version

Hello, amazing job !

I love the paradigm of decoupling the LM and the Layout model at first, before fine-tuning with joint training ! I've managed to port my LayoutXLM code to your framework almost plug and play, and was wondering if you were planning to open an official model implementation on the HuggingFace transformers library ?
As is, except for a few tweaks due to version changes, and perhaps a processor object wrapping the tokenizer not much is required, so I was wondering if you had plans to do so ?
Cheers and again, great work !
Best,
Manuel

Pretraining with other ROBERTa model

Hi :) I'm confused about pretrain process when I change language model.

I hope to use LiLT using korean Roberta model which is already pretrained with Korean language dataset.
According to paper, I need to re-pretrain Korean Roberta model with Layout embedding vector. Is it right?
++ I think I need to re-pretrain lilt-only-base model because of CAI pretrain task..

fail to reproduce the result of language-specific (for example, ZH) relation extraction on XFUND

When conducting the experiment of language-specific fine-tuning on XFUND, the obtained f1 score is only 0.6179 using the following command, which has a huge gap compared with the reported f1 score of 0.7297. The XFUND dataset and pretrained lilt-infoxlm-base are downloaded from the urls mentioned in the README.md. Are there any additional steps to reproduce the experiment?

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py
--model_name_or_path lilt-infoxlm-base
--tokenizer_name xlm-roberta-base
--output_dir ls_re_xfund_zh_lilt-infoxlm-base
--do_train
--do_eval
--lang zh
--max_steps 5000
--per_device_train_batch_size 8
--learning_rate 6.25e-6
--warmup_ratio 0.1
--fp16

pip install -e . error

After I have installed the conda environment, I use conda-pack package the env and transfer to another linux gpu server which can not access the Internet(Yes, you heard that right),the error occurs, here is the error message:

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
?─> [8 lines of output]
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f8fb8bdea10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /simple/setuptools/
ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
ERROR: No matching distribution found for setuptools>=40.8.0
WARNING: There was an error checking the latest version of pip.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
WARNING: There was an error checking the latest version of pip.

Use LiLT / an alternative model with more than 512 tokens

Hi,

LiLT processes a maximum of 512 tokens.

Is there a good option to get a comparable and commercial useable model that can process more tokens?

It is of course possible to split longer inputs into 512 token chunks. But this comes with some disadvantages / difficulties.

Running on colab

Hi, just like to know if I can run this entirely on colab and how? many thanks!

Cannot load model after weight generation

Hi and thanks for creating this,

I am trying to use https://huggingface.co/Finnish-NLP/roberta-large-finnish-v2?text=Moikka+olen+%3Cmask%3E+kielimalli. with this repo. I have successfully run the weight generation:

python gen_weight_roberta_like.py --lilt lilt-only-base/pytorch_model.bin --text roberta-large-finnish-v2/roberta-large-finnish-v2/pytorch_model.bin --config roberta-large-finnish-v2/roberta-large-finnish-v2/config.json --out lilt-roberta-large-finnish-v2

But when I try to load the model then I get the following error:
image

Do you have idea what might cause this and how could it be fixed?

large checkpoint

Hello,
Thanks for making the code available !

I was wondering if there was a lilt-only-large checkpoint available or a way to easily generate it ?
I would like to try to combine lilt with a large xlm Roberta to see the potential gain between base and large.

Thanks !

How to decrease inference time of LiLT?

Hi,

I'm using Hugging Face libraries in order to run LiLT.
How can I decrease inference time? Which code to use?

I've already try BetterTransformer (Optimum) and ONNX but none of them accepts LiLT model.

  • BetterTransformer: NotImplementedError: The model type lilt is not yet supported to be used with BetterTransformer.
  • ONNX: KeyError: "lilt is not supported yet.

Thank you.

Note: I asked this question here, too: NielsRogge/Transformers-Tutorials#284

RuntimeError: CUDA error: device-side assert triggered

FUNSD, lilt-roberta-en-base
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
7%|▋ | 143/2000 [00:28<06:06, 5.06it/s]

while I find the reason, I find a features's size is 625. batch["input_ids"].shape = (1, 625) . Could you tell me how to fix it.Thanks a lot!

Recommendations for inference and further fine-tuning

Hi,

Just got my XFUND-ES finetuning job to work! While I wait, I am trying to work my way through the code to create an inference script. Would running the run_xfun_ser.py script with --do_predict (and passing the checkpoint I obtained by finetuning) suffice? Also, if I wanted to create my own dataset to finetune further, would you recommend a transfer learning approach starting from this XFUND fine-tuned checkpoint or should I go straight to create the custom dataset?

Thank you very much in advance. This is great work!

Improve relation extraction

Hi @jpWang , thanks for your repo,

I have used it for my project: extract keys and values in complicated layout document types

  1. The NER model looks good
  2. The RE model does not work well
    Examples: The outputs of RE model: Q1 -> A2, Q2->A2, Q3 -> A1

I have an idea to improve the RE model as below:
As I know that the RE is based on the semantics of language to learn -> relation classification
From my point of view, they can be learned on position (position embedding) + semantics of language to improve relation classification

To take the good result as bellow:
image

What do you think about my idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.