Git Product home page Git Product logo

Comments (4)

atreyasha avatar atreyasha commented on August 14, 2024

I am working on a similar sequence tagging task for argument candidate identification. Essentially BERT or ALBERT would perform the encoding aspect of the raw input. Then, you would need a layer on top of BERT|ALBERT to decode the representations to the desired target.

I would essentially follow this example here: https://github.com/kpe/bert-for-tf2/blob/master/examples/gpu_movie_reviews.ipynb

Under create_model, you would need to modify the layers after the BERT|ALBERT layer to map to your output sequence dimension. I will probably do this task in another repo and can post some results soon.

@kpe you mentioned in #30 to ignore the activations of the padding in the output layer, would you also suggest doing this for a sequence tagging task? If so, how would you propose doing this in the output layer?

Also, thank you for this awesome repo. Minor issue though: under NEWS on the readme, I think the first entry should be 6th Jan 2020. Just a minor thing, no biggie :)

from bert-for-tf2.

harrystuart avatar harrystuart commented on August 14, 2024

Any update on NER tasks with this library?

from bert-for-tf2.

yingchengsun avatar yingchengsun commented on August 14, 2024

If there is a NER example with this library, that will be very helpful!

from bert-for-tf2.

ptamas88 avatar ptamas88 commented on August 14, 2024

Hi,
As I managed to use this library for NER task i am happy to share my experiences.
Sorry, but I can't share the whole code, but trying to explain the key parts.

  1. The input text is tokenized by the tokenizer module and padded to a specified max lenght (in my case 200 tokens at max)
  2. For each token the output tags are transformed into a one-hot vector and if the tokenizer broke up one word into multiple tokens then I used the belonging tag for the first token and [MASK] for the remaining part of the original word
  3. So I have X sentences in the trainign set, then the input shape is (X,200) hence 200 is the padded lenght of each sentences. In this case the output shape is (X,200,NUMBER_OF_TAGS). NUMBER_OF_TAGS is the number of your entity types, depends of whether you use BIOE, or just BIO, and here you add the special tokens: [CLS], [PAD], [MASK]. In my case here are the tags:
    ['B-ORG', 'I-ORG', 'B-MISC', 'I-MISC', 'B-LOC', 'I-LOC', 'B-PER', 'I-PER', 'O', '[CLS]', '[MASK]', '[PAD]'].
    This way my shapes are (X,200) and (X,200,12)
  4. load the Bert model the same way as in the calssification example but here we will use a different model architecture for the remaining layers, hence it is not just a classification. This is basically the example codes of the packages description with a little tweak:
bert_layer = bert_tf2.BertModelLayer.from_params(bert_params, name="bert")

input = tf.keras.layers.Input(shape=(200))
output = bert_layer(input)
output = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(units=12, activation='softmax'))(output)
model = tf.keras.models.Model(inputs=input, outputs=output)

model.build(input_shape=(200))

bert_layer.apply_adapter_freeze()
bert_layer.embeddings_layer.trainable=False 

The magic here is the TimeDistributed wrapper layer.
My results:
After just 1 epoch on 29k trainign sentences:
loss: 0.0227 - categorical_accuracy: 0.9933 - val_loss: 0.0042 - val_categorical_accuracy: 0.9988

So basically, that's it folks :)

from bert-for-tf2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.