In the file generator/beam.py in method advance, it seems that the .prevK and .nexY te

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Here, I try to understand your meaning. In our method for trai

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the input <a class="user-mention notranslate" data-hovercard-type="user" da

Bug in beam search in generator about ea-vq-vae HOT 7 CLOSED

microsoft commented on May 23, 2024

Bug in beam search in generator

from ea-vq-vae.

Comments (7)

guody5 commented on May 23, 2024

Bug has been fixed. The issue is caused due to latest pytorch. Change prevK = bestScoresId / numWords to prevK = bestScoresId // numWords in

EA-VQ-VAE/generator/beam.py

Line 70 in af58637

prevK = bestScoresId / numWords

from ea-vq-vae.

ontocord commented on May 23, 2024

Thank you @guody5. I'll check it out. If I understand the system properly, you are finding a vector that distinguishes the sense of an inference. A goes to the bank => A catches a fish that is closeer to vector X vs. A goes to the bank => A takes out some money that is closer to vector Y. And then mapping that to background knowledge that is closer to vector X or Y to the inference. So given the input rules and the vector Y, you find the story snippet such as "john went to the teller" and then infer, X takes out some money. Because you are using a generator, the theory is you will be able to generate new more reasonable inferences given new types of rules not yet seen by the system? Is this the gist of the system?

My question is why don't you just do clustering on context vectors. This would not cost the extra steps of computing the hidden vector X or Y, and then you can just do a lookup by vector similarty (A goes to bank (as in fish) matches "John went fishing", and then train the generator to infer "A cathes a fish". This is similar to the neural Q/A systems, and much simpler... Am I missing something?

from ea-vq-vae.

guody5 commented on May 23, 2024

Here, I try to understand your meaning.

In our method for training the generator, we first convert XY="A goes to the bank => A catches a fish" into h_{xy}, and then find the closer vector X. Finally, we find background knowledge according to X (closer distance).
In your method for training the generator, we can directly convert "A goes to the bank => A catches a fish" into h_{xy}, and then find background knowledge according to h_{xy} (closer distance).

I am not sure whether I correctly understand your meaning.

The reason to use VQ-VAE is that inferences are unseen in the inference phase (mentioned in section 3.3.1). That is to say, we can only use "A goes to the bank" in the inference phase. For our method, we first convert X="A goes to the bank" into h_x and find top-k vectors in Equation 3 [X_1, X_2, X_3,...,X_k] that can help find different background knowledge (such as "john went to the teller" and "John went fishing") to infer, because each vector X_i contains different semantics.

For clustering method, we first convert "A goes to the bank" into h_x and then do a lookup by vector similarity. First, it's inconsistent between training phase h_{xy} and inference phase h_x. Second, it's hard to find background knowledge with different semantics only using h_x compared with [X_1, X_2, X_3,...,X_k].

from ea-vq-vae.

ontocord commented on May 23, 2024

Hi @guody5. Thank you for your succient explantion. I understand that we don't want to feed in h_xy to find the background knowledge, b/c the task is given h_x and context vector, find background knowledge, and infer y. Sorry if I misunderstand.

If the goal is to find different reasonable inferences Y from ambigous Xs, my proposed method is to find the vector for X that is contextual. So you would transform "A person goes to the bank" in context 1 to "A person goes to the river" (what I meant by A goes to bank (as in fish) from above) based on WSD for low frequency words for example. This could be done neurally by running through "A person goes to the bank => the person catches some fish", and finding the top-k token for the slot "bank" that doesn't have the word "bank" inside it.

from transformers import AutoModelForMaskedLM, AutoTokenizer, AutoModelForSeq2SeqLM
import torch
model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
input_txt = ["A person goes to the bank => A person catches a fish", "A person goes to the bank => A person takes out some money"]
model = model.eval().cuda().half()
all_outputs = []
with torch.no_grad():
  for txt in input_txt:

    # replace low frequency words with <mask>
    masked_word = "bank"
    inputs = tokenizer(txt, return_tensors='pt', add_special_tokens=True, padding=True)
    outputs = model(input_ids=inputs.input_ids.cuda(), return_dict=True)
    predictions = outputs.logits

    for pred in predictions:
        sorted_preds, sorted_idx = pred.sort(dim=-1, descending=True)
        output = []
        for k in range(10):
            predicted_index = [sorted_idx[i, k].item() for i in range(0,len(predictions[0]))]
            predicted_token = tokenizer.convert_ids_to_tokens([predicted_index[6]])[0].replace('Ġ', '').replace('  ', ' ').replace('##', '')
            if masked_word.lower() in predicted_token.lower():
              continue
            output.append(txt.split("=>")[0].replace(masked_word, predicted_token).strip())
        all_outputs.append(output)
            
print (all_outputs)

Produces: [['A person goes to the water', 'A person goes to the river', 'A person goes to the back', 'A person goes to the lake', 'A person goes to the corner'], ['A person goes to the branch', 'A person goes to the money', 'A person goes to the trust', 'A person goes to the account', 'A person goes to the check']]

Convert above to context vectors h_x that is contextual to h_y. And match background knowledge to h_x. Given background knowledge (and perhaps the X), train the generator to output "The person catches some fish". This is similar to some Q/A systems that uses background knowledge. "A person goes to the river" is basically the Q, and the answer is "The person catches fish", using background knowledge. Using this paradigm, you can do other things that is explained in the literature for these types of Q/A system, like mapping the background knowledge and question to a (smaller) vector that are close to each other for better retrieval, etc. Using clustering to refine the mapping, etc.

What I think would also be interesting from your work (and potentially the above), is generating a dataset of common sense inference from the background knowledge itself. So given your system, you could create a dataset such as "Tom went to the river" infers "A person goes to the bank => the person catches some fish". And then seeing what new inferences the system will make with new background knowledge. Like, "Jane went to the corner market" infers "A person goes to the store => the person buys some food."

from ea-vq-vae.

guody5 commented on May 23, 2024

@ontocord Your method is great and I think it's also a good way to solve the task. Thanks for your promising method and your code.

from ea-vq-vae.

ontocord commented on May 23, 2024

Thanks for the input @guody5. I'll play around with this and report back.

from ea-vq-vae.

guody5 commented on May 23, 2024

I will close this issue. If you have any problems, feel free to ask.

from ea-vq-vae.

Bug in beam search in generator about ea-vq-vae HOT 7 CLOSED

Comments (7)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent