Git Product home page Git Product logo

bros's People

Contributors

tghong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bros's Issues

KeyError: 'bros'

Hi, I am trying to load the model using the transformers library of HuggingFace. However I got a KeyError: 'bros' when trying to load the model from_pretrained. Specifically I have the following:
from transformers import AutoModel model = AutoModel.from_pretrained("naver-clova-ocr/bros-base-uncased")

as stated in the doc https://huggingface.co/naver-clova-ocr/bros-base-uncased/tree/main

The full stack error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretraine
config, kwargs = AutoConfig.from_pretrained(
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 560, in from_pre
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/envs/fair/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 301, in __getite
raise KeyError(key)

Im using huggingface-hub version 0.1.2 and transformers 4.12.5

Any thoughts here?
Thanks.

Change bert model

Hello!!

I want to learn using Korean dataset. Can I change the bert model?? If so, which part should I modify?

F-score on CORD dataset

Thanks for the excellent work! I am trying to reproduce the result on CORD dataset. However, I find the f-score results in your paper are somewhat different from that in LayoutLMv2 paper. Specifically, LayoutLMv2*-base achieves 96.05 and LayoutLMv2*-large achieves 97.24 in your paper. While in LayoutLMv2 paper, LayoutLMv2-base achieves 94.95 and LayoutLMv2-large achieves 96.01. Could you give an example of BROS fine-tuning on CORD dataset? Thanks!

Question about EL Task Experiment Results

Thank you very much for sharing this great work again!

I have a question while reading the paper carefully. I am curious why in the Table 5: Chart of Performance comparisons EL tasks section, there is no comparison between BROS and Spade since BROS uses the Spade decoder for the EL task. I am very interested in the result. Thank you!

Clarification regarding `num_samples_per_epoch`

Could you guys please clarify whether num_samples_per_epoch in the config files refers to the total number of documents in the training set or does it mean something else?

I set the num_samples_per_epoch to the number of docs in my training set, however, the LRScheduler warmup is not working as expected.

Fine Tuning on Custom Dataset

Thank you very much for sharing this great work!
I was wondering if there are any instructions on how to prepare custom data to be used for fine-tuning Bros. I understand there are preprocessing codes for FUNSD, but if there are summarized instructions, it will be greatly helpful.

F-score on CORD dataset

Thanks for the excellent work! I am trying to reproduce the result on CORD dataset. However, I find the f-score results in your paper are somewhat different from that in LayoutLMv2 paper. Specifically, LayoutLMv2*-base achieves 96.05 and LayoutLMv2*-large achieves 97.24 in your paper. While in LayoutLMv2 paper, LayoutLMv2-base achieves 94.95 and LayoutLMv2-large achieves 96.01. Could you give an example of BROS fine-tuning on CORD dataset? Thanks!

The dataset for CORD linking task

Hello, I am interested in the great work. However, I am a little bit confused about the linking task in CORD. Is the entity with category as "menu.nm" linking to all the other entities within a same group? Besides, do you use "is_key" to split a valid line (often in the bottom of an image) into 2 entities and then generate a link between them?

Clarification on table 5

Hi there, first of all thanks for sharing your excellent work.
I have a doubt regarding how you get the results of table 5. In the paper you mention that you don't use the order information, but how do you implement that exactly?. Do you remove the 1D abs. positional embeddings from the model? if so, that comes with a new pre-training? and finally, I guess you still train with the dataset order of the words and it is only in test where you shuffle the words, is that right?

Thanks in advance!

End2end EE and EL

Hi,
first of all thanks for the code, that's a great contribution to the community!!. From the paper I understood that the model could be fine-tuned end2end for EE and EL at the same time, however looking at the code I think it does not do it like that, right? Is it supported the combined EE and EL end2end somehow?

Thanks,

How to solve lr = 0 after training 5 epochs

Thank you for the amazing work!
I am training the model with a customized dataset. However, I just noticed after 5 epochs training, the learning rate came to 0 which makes model hard to learn. Could you please point me to the learning rate strategy of BROS and may I know how to change it according to my case? Thanks!

TRAIN [epoch: 0/50] || train_loss: 460.69653 || lr: 4e-05 || time: 193.6 secs.
precision: 0.9080, recall: 0.9023, f1: 0.9052
TRAIN [epoch: 1/50] || train_loss: 129.8502 || lr: 3e-05 || time: 198.6 secs.
precision: 0.9374, recall: 0.9184, f1: 0.9278
TRAIN [epoch: 2/50] || train_loss: 75.951 || lr: 3e-05 || time: 198.0 secs.
precision: 0.9293, recall: 0.9183, f1: 0.9237
TRAIN [epoch: 3/50] || train_loss: 46.87292 || lr: 2e-05 || time: 197.8 secs.
precision: 0.9442, recall: 0.9391, f1: 0.9416
TRAIN [epoch: 4/50] || train_loss: 28.64673 || lr: 1e-05 || time: 197.7 secs.
precision: 0.9444, recall: 0.9392, f1: 0.9418
TRAIN [epoch: 5/50] || train_loss: 16.82515 || lr: 0.0 || time: 197.6 secs.

Format of label & output in Relation Extraction task

Hi, thanks for the excellent work!
In your repository, I see format of relation extraction label you save from dataset (link):
el_labels = np.ones(self.max_seq_length, dtype=int) * self.max_seq_length
el_labels[word_to] = word_from
where el_labels[i] = j means words[j] link to words[i] and words[j] is question, words[i] is answer.
In this way, one word (label is answer) can be only linked by one word (label is question) . What happen if there are many words (label is question) that have same connection to one word(label is answer). May it cause loss links (connections) ?

config parameter max_seq_length: 512

Hi!
Thank you for sharing BROS!
I run into a document where the entities are beyond the limit of 512 tokens,
I do see BROS has a configuration parameter to extend this limit

  max_seq_length: 512

but the pre-trained model available in huggingface is only for 512 tokens,
then finetuning will only limited up to 512 tokens?

thank you,

What does the result mean?

  • last_hidden_state
    tensor([[[-0.0342, 0.2487, -0.2819, ..., 0.1495, 0.0218, 0.0484],
    [ 0.0792, -0.0040, -0.0127, ..., -0.0918, 0.0810, 0.0419],
    [ 0.0808, -0.0918, 0.0199, ..., -0.0566, 0.0869, -0.1859],
    [ 0.0862, 0.0901, 0.0473, ..., -0.1328, 0.0300, -0.1613],
    [-0.2925, 0.2539, 0.1348, ..., 0.1988, -0.0148, -0.0982],
    [-0.4160, 0.2135, -0.0390, ..., 0.6908, -0.2985, 0.1847]]],
    grad_fn=)

  • last_hidden_state.shape
    torch.Size([1, 6, 768])

MLM Pretraining missing bbox inputs

Hi, Great work on the package.
It seems on some of the model classes, eg. BrosLMHeadModel, the code misses the bbox inputs. Example below. Correct me if I misunderstood, but I guess bbox should be added here.
If you would like I can put in a PR to fix it here and in the other places like BrosForSequenceClassification and BrosPreTrainedModel.

MLM Model input

bros/bros/modeling_bros.py

Lines 1314 to 1318 in eb3aa51

def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,

Bros Model call in MLM Model

bros/bros/modeling_bros.py

Lines 1378 to 1381 in eb3aa51

outputs = self.bros(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

This error while training the model
CUDA_VISIBLE_DEVICES=0 python3 train.py --config=configs/custom.yaml

TorchText Issue on Google Colab

Hello,

I am trying to run the fine tuning scripts for FUNSD on Google Colab; I have installed all the required dependencies in requirements.txt, but when running

!CUDA_VISIBLE_DEVICES=0 python train.py --config=configs/finetune_funsd_ee_bies.yaml

I am getting

OSError: /usr/local/lib/python3.8/dist-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_

I have tried installing torchtext and upgraded pytorch lightning correspondingly, to no avail.

Any ideas on what could be going on?

Thanks!

How to convert BIO-tagged sequence to SPADE

Hi,

BIO is the dominant tagging strategy for token classification tasks. Could you provide an explanation of how to convert a BIO-tagged sequence to SPADE? This would be useful to fine-tine the SPADE-based EE model on custom datasets.

I know the same can be reverse-engineered from the codebase, but it'll be helpful if we have a concrete description of -

  • initial_tokens, subsequent_tokens
  • How to obtain them from a BIO-tagged sequence
  • How to "combine" the predicted initial_logits and subsequent_logits to determine the final class prediction for each token.

Thanks.

Bounding box clarification

Thanks for contributing this awesome piece of research!

Quick question about the input boxes.

  1. For bros, is the expected format [x1, y1, x2, y2, x3, y3, x4, y4], where each x,y pair is the corners of the bounding box, starting from the top left and clockwise?

  2. Each bounding box should be normalized by dividing x values by width and y values by height?

I'm training on DocVQA, but the results are not that great. Just trying to make sure I'm doing everything right :)

Inference code for EL task

First of all thanks for the awesome code! I really want to try with my own dataset and see some performance. However, I could not find the inference code anywhere. So I am wondering do we have a plan to release the inference code? Many thanks!

Correct implementation of RelationExtractor

I find the implementation of RelationExtractor in this repository is incorrect (according to the original one). I'm aware that the implementation is (kind of) the same as the one 2005.00642. But after digging up the code in clovaai/spade, I realized the original implementation is different from the paper. I'll refer to the implementation as SPADE, this repository as BROS and the paper as SPADE paper.

  1. SPADE and SPADE paper use two score matrix for each relation, BROS only have one.
  2. SPADE paper and BROS use threshold to binarize and obtain adjacency matrix; SPADE use element wise argmax of two scores matrix, so each score matrix is similar to the probability of edges or not.
  3. The loss function in SPADE is weighted cross entropy, with heavy weight toward the second score matrix (having edge).

My version of RelationExtractor (which have been tested and able to achieve somewhat equivalent results of the original SPADE):

class RelationTagger(nn.Module):
    def __init__(self, n_fields, hidden_size):
        super().__init__()
        self.head = nn.Linear(hidden_size, hidden_size)
        self.tail = nn.Linear(hidden_size, hidden_size)
        self.field_embeddings = nn.Parameter(
            torch.rand(1, n_fields, hidden_size))
        self.W_label_0 = nn.Linear(hidden_size, hidden_size, bias=False)
        self.W_label_1 = nn.Linear(hidden_size, hidden_size, bias=False)

    def forward(self, enc):

        enc_head = self.head(enc)
        enc_tail = self.tail(enc)

        batch_size = enc_tail.size(0)
        field_embeddings = self.field_embeddings.expand(batch_size, -1, -1)
        enc_head = torch.cat([field_embeddings, enc_head], dim=1)

        score_0 = torch.matmul(
            enc_head, self.W_label_0(enc_tail).transpose(1, 2))
        score_1 = torch.matmul(
            enc_head, self.W_label_1(enc_tail).transpose(1, 2))

        score = torch.cat([score_0.unsqueeze(1), score_1.unsqueeze(1)], dim=1)
        return score

This implementation works for single relation, but one can use multiple instances of this layer for multiple relations. The output dim is b * s * (n+f) * n, where b is batch size, s = 2 and s is the number of score matrices, n is sequence length, and f is the number of fields. The final relation matrices is obtained by score.argmax(dim=1).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.