I'm executing some examples in <a href="https://github.com/facebookresearch/GENRE/tree

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Ok thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Best practice to link already known entities about genre HOT 6 CLOSED

paulthemagno commented on May 24, 2024 1

Best practice to link already known entities

from genre.

Comments (6)

nicola-decao commented on May 24, 2024 2

No, the disambiguation model does not need such mapping because you are already specifying the mention.

You should do something like this:

sentences = [
    "[START_ENT] Leonardo [END_ENT] was a painter while Leonardo Di Caprio is an actor",
    "Leonardo was a painter while [START_ENT] Leonardo Di Caprio [END_ENT] is an actor"
]

# generate a set of tries one for each sentence with the correct candidates
# in this case is the same for both sentences but in general is different for each
# sentence in your batch
tries = {
    _id: Trie([
        [2] + model.encode(e)[1:].tolist()
        for e in candidates
    ])
    for _id, candidates in enumerate([
        ["Leonardo Di Caprio", "Leonardo Da Vinci"],
        ["Leonardo Di Caprio", "Leonardo Da Vinci"],
    ])
}

out = model.sample(
    sentences,
    prefix_allowed_tokens_fn=lambda batch_id, sent: tries[batch_id].get(sent.tolist()),
)

from genre.

paulthemagno commented on May 24, 2024 1

Thanks @nicola-decao can you please tell me how to find more entities in a sentence? I'ven seen the example you posted and I was trying this one:

sentences = ["[START_ENT]Leonardo[END_ENT] was a painter while [START_ENT]Leonardo Di Caprio[END_ENT] is an actor"]

out = model.sample(
    sentences,
    prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),
)

But the result is:

[[{'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8849)}],
 [{'text': 'Leonardo Di Cosmo', 'logprob': tensor(-2.1244)}],
 [{'text': 'Leonardo DiCaprio filmography', 'logprob': tensor(-2.2759)}],
 [{'text': 'Leonardo Di Cesare', 'logprob': tensor(-2.5289)}],
 [{'text': 'Léonardo Meindl', 'logprob': tensor(-4.0481)}]]

Maybe I'm doing some mistakes in the declaration of entities. And another question: is possible to add candidates as for the other models, in order to ease the disambiguation of the model? (in this case for example I should have passed Leonardo Di Caprio and Leonardo Da Vinci as possibilities.

Thank you!

from genre.

nicola-decao commented on May 24, 2024 1

The disambiguation model only handles one disambiguation input at a time. So you either pass "[START_ENT] Leonardo [END_ENT] was a ..." or "Leonardo was a painter while [START_ENT] Leonardo Di Caprio [END_ENT] is..".

In addition, you need to put a space between the delimiters and the mention (see my sentences above).

from genre.

paulthemagno commented on May 24, 2024 1

Ok thank you @nicola-decao I think I'm almost there. Just some final questions for my practical use:

Is the [2] you put in the Trie depending on the number of sentences I write or is it a fixed encoding to keep?
I'm printing the results and I have the following:

[[{'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.0639)},
  {'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.4777)}],
 [{'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.6667e+08)}],
 [{'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8483)}],
 [{'text': 'Leonardo Da Vinci', 'logprob': tensor(-2.2151)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)}],
 [{'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)}]]

I'm seeing that every time the sample function returns 5 results (even if I put other sentences) so I think that if I put only two possibilities (like Leonardo DiCaprio and Leonardo Da Vinci), they will be "repeated' in the 5 results, am I right?

I also see that, in this case, these 5 possibilities are lists of 2 elements each (since I put two sentences), while if I put only 1 sentence, they are list of 1 element (seems right). How do I have to read this values? I thought that the two sentences were "independent" (or not?).

Indeed the first element of the first result {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.0639)} is higher than the first element of the third result: {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)}, BUT the second element of the first result {'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.4777)} is lower of the second element of the third result: {'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8483)},

So I tried to read the first element and the second of the 5 results separately (as if they are not correlated, don't know if it's the correct thought):

number_of_lables = len(out[0])
labels = {}


for choice in out:
    for i,label in enumerate(choice):
        if i not in labels:
            labels[i] = {"text": label['text'], 'logprob': label['logprob']}
        else:
            if torch.gt(label['logprob'],labels[i]['logprob']):
                labels[i] = {"text": label['text'], 'logprob': label['logprob']}
    
labels

obtaining:

{0: {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.0639)},
 1: {'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8483)}}

and it seemed good! Because in my first sentence I have Leonardo Da Vinci, while in the second one Leonardo DiCaprio.

But adding another row, like this:

sentences = [
    "Leonardo was a painter while [START_ENT] Leonardo Di Caprio [END_ENT] is an actor",
    "[START_ENT] Leonardo [END_ENT] was a painter while Leonardo Di Caprio is an actor",
    "[START_ENT] Brown [END_ENT] was an American singer, songwriter, dancer, musician, record producer, and bandleader"
]

# generate a set of tries one for each sentence with the correct candidates
# in this case is the same for both sentences but in general is different for each
# sentence in your batch
tries = {
    _id: Trie([
        [2] + dmodel.encode(e)[1:].tolist()
        for e in candidates
    ])
    for _id, candidates in enumerate([
        ["Leonardo DiCaprio", "Leonardo Da Vinci"],
        ["Leonardo DiCaprio", "Leonardo Da Vinci"],
        ["Kwame Brown", "James Brown"],
    ])
}

out = dmodel.sample(
    sentences,
    prefix_allowed_tokens_fn=lambda batch_id, sent: tries[batch_id].get(sent.tolist()),
)

I have a mixed result:

[[{'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8483)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-2.2151)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)}],
 [{'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.0638)}],
 [{'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.4777)},
  {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)},
  {'text': 'Leonardo DiCaprio', 'logprob': tensor(-1.6667e+08)}],
 [{'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.6667e+08)},
  {'text': 'James Brown', 'logprob': tensor(-0.0789)},
  {'text': 'Kwame Brown', 'logprob': tensor(-4.0590)}],
 [{'text': 'Kwame Brown', 'logprob': tensor(-2.0000e+08)},
  {'text': 'James Brown', 'logprob': tensor(-3.3333e+08)},
  {'text': 'James Brown', 'logprob': tensor(-3.3333e+08)}]]

In the third element of the first results I still have Leonardo... while I expected to find only James Brown or Kwame Brown, indeed my script gives a wrong result:

{0: {'text': 'Leonardo DiCaprio', 'logprob': tensor(-0.8483)},
 1: {'text': 'James Brown', 'logprob': tensor(-1.7946)},
 2: {'text': 'Leonardo Da Vinci', 'logprob': tensor(-1.0638)}}

Maybe I have to use one line at time in order to avoid confusion?

Thank you in advance for the explanation, it's starting to work 😊

from genre.

nicola-decao commented on May 24, 2024

The best to do disambiguation would be using the model trained for disambiguation 😅

Look at this to see how it works https://github.com/facebookresearch/GENRE/tree/main/examples_genre#example-entity-disambiguation

from genre.

paulthemagno commented on May 24, 2024

ah ok so I will pass multiple sentences!

So I deduce the hints like:

mention_to_candidates_dict={
        "Leonardo": ["Leonardo Di Caprio", "Leonardo Da Vinci"],
        "Leonardo Di Caprio": ["Leonardo Di Caprio", "Leonardo Da Vinci"],
    }

are not available for the disambiguation model, aren't them?

from genre.

Best practice to link already known entities about genre HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent