Git Product home page Git Product logo

neulab / knn-transformers Goto Github PK

View Code? Open in Web Editor NEW
262.0 4.0 23.0 1.66 MB

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

License: MIT License

Python 100.00%
huggingface knn-lm language language-models models nearest nearest-neighbor neighbor neuro-symbolic pytorch

knn-transformers's People

Contributors

urialon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

knn-transformers's Issues

saving-a-datastore-for-knn-mt section in README is missing a proper dstore_size parameter

saving-a-datastore-for-knn-mt section in README is missing a proper dstore_size parameter.

to add

--dstore_size 26565876 \

current version

MODEL=t5-small

python -u run_translation.py  \
  --model_name_or_path ${MODEL} \
  --dataset_name wmt16 --dataset_config_name ro-en \
  --per_device_train_batch_size 4 --per_device_eval_batch_size=4 \
  --output_dir checkpoints-translation/${MODEL} \
  --source_lang en --target_lang ro \
  --dstore_dir checkpoints-translation/${MODEL} \
   --save_knnlm_dstore --do_eval --eval_subset train \
   --source_prefix "translate English to Romanian: "

correct version (maybe)

MODEL=t5-small

python -u run_translation.py  \
  --model_name_or_path ${MODEL} \
  --dataset_name wmt16 --dataset_config_name ro-en \
  --per_device_train_batch_size 4 --per_device_eval_batch_size=4 \
  --output_dir checkpoints-translation/${MODEL} \
  --source_lang en --target_lang ro \
  --dstore_dir checkpoints-translation/${MODEL} \
  --save_knnlm_dstore --do_eval --eval_subset train \
  --dstore_size 26565876 \
  --source_prefix "translate English to Romanian: "

automaton: modify database

Is it possible to modify the database for the automaton during inference time, i.e. add new data on-the-fly? Or do we need to reconstruct the automaton whenever the database changes?

I was wondering about the scenario, where additional information is collected while interacting with the LM.

How to apply knn-transformers to a custom pretrained machine translation model?

Hi @urialon ,

Thank you very much for releasing the source code which applies kNN for the Machine Translation task. However, only pre-trained models available on the Huggingface hub seem valid.
Recently, I designed and trained a custom NMT model for the Low-Resource Machine Translation task using PyTorch and Transformers libraries. In the inference phase, I used the method: my_pretrained_model.generate(input_ids) as the traditional translation to translate a testing input sentence.
Using your source code, I want to apply kNN for my_pretrained_model in the inference phase. However, since my model is custom, I wonder how to do that.
Please guide me in applying your source code to use kNN for my custom pre-trained NMT model.
Many thanks for your help!

NonMatchingSplitsSizesError

Hi, I encountered NonMatchingSplitsSizesError when evaluating the finetuned model: gpt2-finetuned-wikitext103. The same also popped up when Saving a Datastore and Building the FAISS index by myself. Would you mind indicating how to solve this issue? Thank you very much!

image

Could KNNSaver support Multi-GPU strategies like DDP?

Hi, @urialon
I am trying to evaluate a model using the DDP strategy but met an error because such strategies will try to write the datastore asynchronously.

Using only one GPU then everything works really well but it is kinda slow.

Any idea? Thanks!

The size of the dstore

Hi, I'm wondering how to set the knn_args.dstore_size if I use my own data to construct the datastore?

Unsupported operand type(s) in LM evaluation

Hi, can I ask a question regarding the evaluating kNN-LM and RetoMaton? I used the preprocessed Wikitext-103 datastores and FAISS index from gpt-2 and distilgpt-2(downloading form the link) and encountered the 'unsupported operand type(s)' issue for both conditions. Would you mind indicating some possible solutions?

Thank you very much for your kind help!

Screenshot 2024-03-30 at 20 39 32

Performance on neulab/gpt2-large-finetuned-wikitext103

Hi, @urialon
Thanks for your great work.
I just tested neulab/gpt2-large-finetuned-wikitext103 without and with --knn but could not observe an improvement...
ppl 10.5565 vs ppl 10.6538
Any idea about this? Should I tune the hyperparameters like temperature?
Thanks.

Mismatch on KNN-MT result on README

Hi.
Thanks for your awesome project! For t5-small, I got the MT result on validation set. That is:

"eval_bleu": 26.1472,
"eval_gen_len": 42.1916,
"eval_loss": 1.4190454483032227,
"eval_runtime": 216.1581,
"eval_samples": 1999,
"eval_samples_per_second": 9.248,
"eval_steps_per_second": 2.313

However, for KNN-MT, i got a different result. That is:

"eval_bleu": 32.0026,
"eval_gen_len": 42.1126,
"eval_loss": 0.40791189670562744,
"eval_runtime": 4053.3114,
"eval_samples": 1999,
"eval_samples_per_second": 0.493,
"eval_steps_per_second": 0.123

and the speed is too slow that i wonder if there is some wrong in my shell?
KNN-MT Shell is:

meta_path=path_to_project
model_name=t5-small
model_path=path_to_all_model/${model_name}
python -u $meta_path/knn-transformers/run_translation.py
--model_name_or_path ${model_path}
--dataset_name wmt16 --dataset_config_name ro-en
--per_device_eval_batch_size=4
--output_dir $meta_path/checkpoints-translation/$model_name-datastore
--source_lang en --target_lang ro
--do_eval
--predict_with_generate
--source_prefix "translate English to Romanian: "
--dstore_dir $meta_path/checkpoints-translation/$model_name-datastore
--knn_temp 50 --k 32 --lmbda 0.25
--knn

original MT Shell is:

meta_path=path_to_project
model_name=t5-small
model_path=path_to_all_model/${model_name}
python -u ${meta_path}/knn-transformers/run_translation.py
--model_name_or_path ${model_path}
--dataset_name wmt16 --dataset_config_name ro-en
--per_device_eval_batch_size=4
--output_dir $meta_path/checkpoints-translation/$model_name
--source_lang en --target_lang ro
--do_eval
--predict_with_generate
--source_prefix "translate English to Romanian: "

I notice that if i delete the predict_with_generate in KNN-MT shell, the speed will be the same as the original MT and the eval_loss is also the same as original MT. But i can not get the eval_bleu. Like:

eval_loss = 0.4079
eval_runtime = 0:02:16.85
eval_samples = 1999
eval_samples_per_second = 14.606
eval_steps_per_second = 3.653.

However, set predict_with_generate will not affect the speed of original MT. Could you please give some instruction to solve this problem?

Thanks!

Got Error when running kNN-MT with T5-base

Hi @urialon ,

Thanks for releasing your interesting source code. Following your guidance, I have tried to run kNN-MT when to evaluate the T5-base model.
Here is the error image. The error is about "OverflowError: out of range integral type conversion attempted". I tried to increase the max_target_length (from 128 to 512, 1024) but still got error.
What should I do to fix it? Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.