sachin19 / mucoco Goto Github PK

Official Code for the papers: "Controlled Text Generation as Continuous Optimization with Multiple Constraints" and "Gradient-based Constrained Sampling from LMs"

License: MIT License

Python 100.00%

mucoco's People

Contributors

Stargazers

Watchers

Forkers

chan0park hayleyson jackychen08 liamjxu

mucoco's Issues

CUDA out of memory when using a 3090 gpu

My primary model is from GoogleDrive, which is a formality model provided by the author of the paraphrase model,
the issue came when I was running the example .sh file of the Style Transfer folder
Thanks a lot if anyone help me to fix it

>(base) root@84d353835da2:/workspace/mucoco# bash decode_example.sh data output plain debug plain
Some weights of the model checkpoint at /workspace/mucoco/primary_model were not used when initializing GPT2LMHeadModel: ['transformer.extra_embedding_project.bias', 'transformer.extra_embedding_project.weight']
- This IS expected if you are initializing GPT2LMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPT2LMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
batch_size为1
skip this example? Fears for T N pension after talks . Unions representing workers at Turner Newall say they are ' disappointed ' after talks with stricken parent firm Federal Mogul . [yes(y)/maybe(m)/no(n)]n
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Fears for T N pension after talks . Unions representing workers at Turner Newall say they are ' disappointed ' after talks with stricken parent firm Federal Mogul . unions representing workers at Turner Newall have expressed disappointment after talks with the firm's parent company, Federal Mogul. tensor([[ 3260,  6130,   351, 15406,   968,   439,    11,   791,   507, 10200,
          3259,   379, 15406,   968,   439,  6241, 18641,   287,   262,  4081,
          1222,   499,   418,    26,    82,  2560,  1664,    11,  5618, 30926,
           377,    13]])
predicting a sentence length:  32
Traceback (most recent call last):
  File "/workspace/mucoco/decode.py", line 4, in <module>
    cli_main()
  File "/workspace/mucoco/mucoco/decode.py", line 821, in cli_main
    main(args)
  File "/workspace/mucoco/mucoco/decode.py", line 565, in main
    optimizer.backward(total_batchloss, retain_graph=True, scaler=scaler) 
  File "/workspace/mucoco/mucoco/utils/optim.py", line 360, in backward
    loss.backward(retain_graph=retain_graph)
  File "/opt/conda/lib/python3.9/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 7.67 GiB (GPU 0; 23.70 GiB total capacity; 15.35 GiB already allocated; 3.61 GiB free; 19.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Regarding sentiment-controlled generation scripts

Hi @Sachin19,

I checked the sentiment related scripts you kindly pushed to the repository!
And there were a minor correction and a question regarding them.

I think in line 31 of your readme file, bash examples/training_constraint_models/train_sentiment_classifier.sh sst2 should be bash examples/training_constraint_models/train_sentiment_classifiers.sh sst2 with s at the end of the shell file name.
In the examples/training_constraint_models/train_sentiment_classifiers.sh file, in line 8, it's executing data/sentiment/create_sst_sentiment_data.py which I currently do not see in the data/sentiment directory. Could it be missing?
(I checked if the files currently in the directory could be just renamed and used, and it failed with following error.)

(mucoco2) ~/mucoco$ bash examples/training_constraint_models/train_sentiment_classifiers.sh sst2
download and preprocessing sst data
/home/hyeryungson/mucoco
[3310, 3610]
[428, 444]
[912, 909]
training sst2 classifier
Traceback (most recent call last):
  File "examples/training_constraint_models/train_classifier.py", line 31, in <module>
    train_paths.append(open(f"{base_path}/{sys.argv[3]}_{label}.{filetype}"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/sentiment/sst2/train_0.jsonl'

Thanks! And I want to reiterate that I really enjoyed and appreciated your work. Look forward to exploring it further! :)

More instructions to train the primary model

Hi @Sachin19, thanks for your exciting work!

I had some doubts regarding the primary model pretraining. In the paper, it mentioned:

For our primary objective, we use a inverse-paraphrasing model as defined in §3.1, which we train on a corpus of Yelp Reviews6 [45]. First, we paraphrase each sentence in the corpus as described in Krishna et al. [28] creating a pseudo-parallel corpus (of reviews and their paraphrases) and train G as an inverse-paraphrase model to translate the paraphrases back to the original reviews.

Is there any existing code to pretrain this primary model? Can you point us to it?

If not, are there any pretrained models which we can directly use in:

mucoco/examples/style-transfer/decode_example.sh

Line 25 in 054fb3c

PRIMARYMODEL=path/to/primary/model

Conda environment related issue

Hi @Sachin19! Thank you for sharing the code for your exciting work!

I was trying to create conda environment following the readme file, and the environment creation would fail due to package version conflicts. Could you please check if the issue occurs in your setting as well?

Here are some settings I've been creating environment at:

Operating System: Ubuntu 20.04.4 LTS
Kernel: Linux 5.4.0-100-generic
Architecture: x86-64
Python 3.9.12
conda version 23.1.0

Here is a screenshot of intermediate outputs I got while waiting for the environment to be created. After several attempts to resolve conflicts after this, the whole creation process aborted.

Thank you in advance!

Need more instructions to run the code

Hi, @Sachin19, thanks to share your interesting work.

I think the idea of continuous optimization for controllable text generation is great, but when I'm going to run your code, I find there is no dataset and model checkpoint available.

I noticed that there are some code comments in the file decode_example.sh, #L29-21. But I still have no idea how to download the model checkpoint and dataset?

It would be grateful if you could add more instructions on how to download your training data and primary model(or classification model).

How to apply this to models without tie_embedding_weights?

In the paper, it is noted

"Even if the embedding tables are not shared, this loss may be computed and optimized using vectors from the output embedding table as parameters without any significant loss in performance."

But I am not sure how this really works; does this mean that you are choosing the embedding to optimize to be from the output embedding table only? If you do this, I'm not sure how you would get the input to feed into the model (since the parameters are not input embeddings anymore) and how will the gradient from the current step's loss will reach the embeddings of previous steps, since they are not directly dependent.

sachin19 / mucoco Goto Github PK

mucoco's People

Contributors

Stargazers

Watchers

Forkers

mucoco's Issues

CUDA out of memory when using a 3090 gpu

Regarding sentiment-controlled generation scripts

More instructions to train the primary model

Conda environment related issue

Need more instructions to run the code

How to apply this to models without tie_embedding_weights?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent