Comments (7)
Bug has been fixed. The issue is caused due to latest pytorch. Change prevK = bestScoresId / numWords
to prevK = bestScoresId // numWords
in
Line 70 in af58637
from ea-vq-vae.
Thank you @guody5. I'll check it out. If I understand the system properly, you are finding a vector that distinguishes the sense of an inference. A goes to the bank => A catches a fish that is closeer to vector X vs. A goes to the bank => A takes out some money that is closer to vector Y. And then mapping that to background knowledge that is closer to vector X or Y to the inference. So given the input rules and the vector Y, you find the story snippet such as "john went to the teller" and then infer, X takes out some money. Because you are using a generator, the theory is you will be able to generate new more reasonable inferences given new types of rules not yet seen by the system? Is this the gist of the system?
My question is why don't you just do clustering on context vectors. This would not cost the extra steps of computing the hidden vector X or Y, and then you can just do a lookup by vector similarty (A goes to bank (as in fish) matches "John went fishing", and then train the generator to infer "A cathes a fish". This is similar to the neural Q/A systems, and much simpler... Am I missing something?
from ea-vq-vae.
Here, I try to understand your meaning.
- In our method for training the generator, we first convert XY="A goes to the bank => A catches a fish" into
h_{xy}
, and then find the closer vectorX
. Finally, we find background knowledge according toX
(closer distance). - In your method for training the generator, we can directly convert "A goes to the bank => A catches a fish" into
h_{xy}
, and then find background knowledge according toh_{xy}
(closer distance).
I am not sure whether I correctly understand your meaning.
The reason to use VQ-VAE is that inferences are unseen in the inference phase (mentioned in section 3.3.1). That is to say, we can only use "A goes to the bank" in the inference phase. For our method, we first convert X="A goes to the bank" into h_x
and find top-k vectors in Equation 3 [X_1, X_2, X_3,...,X_k]
that can help find different background knowledge (such as "john went to the teller" and "John went fishing") to infer, because each vector X_i
contains different semantics.
For clustering method, we first convert "A goes to the bank" into h_x
and then do a lookup by vector similarity. First, it's inconsistent between training phase h_{xy}
and inference phase h_x
. Second, it's hard to find background knowledge with different semantics only using h_x
compared with [X_1, X_2, X_3,...,X_k]
.
from ea-vq-vae.
Hi @guody5. Thank you for your succient explantion. I understand that we don't want to feed in h_xy to find the background knowledge, b/c the task is given h_x and context vector, find background knowledge, and infer y. Sorry if I misunderstand.
If the goal is to find different reasonable inferences Y from ambigous Xs, my proposed method is to find the vector for X that is contextual. So you would transform "A person goes to the bank" in context 1 to "A person goes to the river" (what I meant by A goes to bank (as in fish) from above) based on WSD for low frequency words for example. This could be done neurally by running through "A person goes to the bank => the person catches some fish", and finding the top-k token for the slot "bank" that doesn't have the word "bank" inside it.
from transformers import AutoModelForMaskedLM, AutoTokenizer, AutoModelForSeq2SeqLM
import torch
model = AutoModelForMaskedLM.from_pretrained("distilroberta-base")
tokenizer = AutoTokenizer.from_pretrained("distilroberta-base")
input_txt = ["A person goes to the bank => A person catches a fish", "A person goes to the bank => A person takes out some money"]
model = model.eval().cuda().half()
all_outputs = []
with torch.no_grad():
for txt in input_txt:
# replace low frequency words with <mask>
masked_word = "bank"
inputs = tokenizer(txt, return_tensors='pt', add_special_tokens=True, padding=True)
outputs = model(input_ids=inputs.input_ids.cuda(), return_dict=True)
predictions = outputs.logits
for pred in predictions:
sorted_preds, sorted_idx = pred.sort(dim=-1, descending=True)
output = []
for k in range(10):
predicted_index = [sorted_idx[i, k].item() for i in range(0,len(predictions[0]))]
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index[6]])[0].replace('Ġ', '').replace(' ', ' ').replace('##', '')
if masked_word.lower() in predicted_token.lower():
continue
output.append(txt.split("=>")[0].replace(masked_word, predicted_token).strip())
all_outputs.append(output)
print (all_outputs)
Produces: [['A person goes to the water', 'A person goes to the river', 'A person goes to the back', 'A person goes to the lake', 'A person goes to the corner'], ['A person goes to the branch', 'A person goes to the money', 'A person goes to the trust', 'A person goes to the account', 'A person goes to the check']]
Convert above to context vectors h_x that is contextual to h_y. And match background knowledge to h_x. Given background knowledge (and perhaps the X), train the generator to output "The person catches some fish". This is similar to some Q/A systems that uses background knowledge. "A person goes to the river" is basically the Q, and the answer is "The person catches fish", using background knowledge. Using this paradigm, you can do other things that is explained in the literature for these types of Q/A system, like mapping the background knowledge and question to a (smaller) vector that are close to each other for better retrieval, etc. Using clustering to refine the mapping, etc.
What I think would also be interesting from your work (and potentially the above), is generating a dataset of common sense inference from the background knowledge itself. So given your system, you could create a dataset such as "Tom went to the river" infers "A person goes to the bank => the person catches some fish". And then seeing what new inferences the system will make with new background knowledge. Like, "Jane went to the corner market" infers "A person goes to the store => the person buys some food."
from ea-vq-vae.
@ontocord Your method is great and I think it's also a good way to solve the task. Thanks for your promising method and your code.
from ea-vq-vae.
Thanks for the input @guody5. I'll play around with this and report back.
from ea-vq-vae.
I will close this issue. If you have any problems, feel free to ask.
from ea-vq-vae.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ea-vq-vae.