Git Product home page Git Product logo

pplm's Introduction

PPLM

This repository contains code to run the Plug and Play Language Model (PPLM), as described in this blog post and arXiv paper. A demo and Colab notebook are also available.

Note: If you are planning on using PPLM as a baseline, and would like to use the parameters listed in the paper's Appendix, please use the LM and the discriminator from this folder. Alternatively, tune the hyperparamters on your own if you are using the code/models in the main directory and/or the 🤗/Transformers for a fair comparison (the optimal parameters for these models/discriminators are roughly off by a factor of 5 from those used in the paper).

PPLM is also integrated into the 🤗/Transformers repository.

header image

Plug and Play Language Models: a Simple Approach to Controlled Text Generation

Authors: Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu

PPLM allows a user to flexibly plug in one or more tiny attribute models representing the desired steering objective into a large, unconditional language model (LM). The method has the key property that it uses the LM as is—no training or fine-tuning is required—which enables researchers to leverage best-in-class LMs even if they do not have the extensive hardware required to train them.

See also our arXiv paper, blog post, and try it out for yourself with no setup using the Colab notebook.

Setup

pip install -r requirements.txt

Citation

@inproceedings{
Dathathri2020Plug,
title={Plug and Play Language Models: A Simple Approach to Controlled Text Generation},
author={Sumanth Dathathri and Andrea Madotto and Janice Lan and Jane Hung and Eric Frank and Piero Molino and Jason Yosinski and Rosanne Liu},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=H1edEyBKDS}
}

PPLM-BoW

Example command for bag-of-words control

python run_pplm.py -B military --cond_text "The potato" --length 50 --gamma 1.5 --num_iterations 3 --num_samples 10 --stepsize 0.03 --window_length 5 --kl_scale 0.01 --gm_scale 0.99 --colorama --sample

Tuning hyperparameters for bag-of-words control

  1. Increase --stepsize to intensify topic control, and decrease its value to soften the control. --stepsize 0 recovers the original uncontrolled GPT-2 model.

  2. If the language being generated is repetitive (For e.g. "science science experiment experiment"), there are several options to consider:
    a) Reduce the --stepsize
    b) Increase --kl_scale (the KL-loss coefficient) or decrease --gm_scale (the gm-scaling term)
    c) Add --grad-length xx where xx is an (integer <= length, e.g. --grad-length 30).

PPLM-Discrim

Example command for discriminator based sentiment control

python run_pplm.py -D sentiment --class_label 2 --cond_text "My dog died" --length 50 --gamma 1.0 --num_iterations 10 --num_samples 10 --stepsize 0.04 --kl_scale 0.01 --gm_scale 0.95 --sample

Tuning hyperparameters for discriminator control

  1. Increase --stepsize to intensify topic control, and decrease its value to soften the control. --stepsize 0 recovers the original uncontrolled GPT-2 model.

  2. Use --class_label 3 for negative, and --class_label 2 for positive

The discriminator and the GPT-2 model in the root directory are different from those used for the analysis in the paper. Code and models corresponding to the paper can be found here.

pplm's People

Contributors

dathath avatar dependabot[bot] avatar julien-c avatar mimosavvy avatar soratukhvatov avatar w4nderlust avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pplm's Issues

Results always deterministic.

whenever I run the model with the same cond_text the output result is always the same. I have tried changing the temperature, and top_k but still I can't get the model to return different results. How do I make it more indeterministic?

Filtering words composed of more than 1 token

Hi, thanks for the great works.

I see that you are filtering out words that are composed of more than one token:

single_bow = list(filter(lambda x: len(x) <= 1, single_bow))
, which makes it filter quite a bit of words (including all terms that have more than one word).

Do you have any idea how to deal with this when we want to use these multi token words?

Cheers.

using other pretrained models

Can we use other models like "distilbert-base-multilingual-cased"?
Because I wanna generate texts with other languages..

Equation (5) in your paper

Trying to understand how you compute the right-hand side of (5) in your paper here https://arxiv.org/pdf/1912.02164.pdf

  1. For each word w_i in the bag-of-words
    a. Compute Prob. that w_i is the next word in the sentence, given the unconditioned model p(x).
  2. Sum all probabilities got in step 1 (one such probability for every word in the BoW)
  3. Compute log of what you got at step 2
  4. Assume that's equal to p(a|x)

Point 1.a is mainly what I'd like to check, as I am not familiar with you notation p_{t+1} [w_i] (in particular, the squared brackets).

Can you kindly confirm/correct? Thanks!

Incidentally, is this where you do (implicitly) that computation? https://github.com/uber-research/PPLM/blob/master/run_pplm.py#L211

Are the bag of words case-sensitive?

Hello, I find that some words are cased while some are uncased.
They have different word ids in the vocab of tokenizer of GPT.

What is the appropriate way to process the words ?
Thanks.

image

Runtume error on a finetuned model

I'm getting an error when I run pplm with a gpt2 model I finetuned with the language modeling example from the huggingface transformers repo.

run_pplm.py
-B /path/to/BOW.txt
--pretrained_model=/user/FindtunedModelOut
--cond_text="potato"
--num_samples=20
--length=150
--stepsize=0.03
--num_iterations=3
--window_length=5
--gamma=1.5
--gm_scale=0.95
--kl_scale=0.01
--colorama
--verbosity=regular
--sample

and I get the error:

Traceback (most recent call last):
File "/pythonProjects/transformerTest/venv/PPLM/run_pplm.py", line 936, in
run_pplm_example(**vars(args))
File "/pythonProjects/transformerTest/venv/PPLM/run_pplm.py", line 768, in run_pplm_example
unpert_gen_tok_text, pert_gen_tok_texts, _, _ = full_text_generation(
File "/pythonProjects/transformerTest/venv/PPLM/run_pplm.py", line 472, in full_text_generation
pert_gen_tok_text, discrim_loss, loss_in_time = generate_text_pplm(
File "/pythonProjects/transformerTest/venv/PPLM/run_pplm.py", line 584, in generate_text_pplm
pert_past, _, grad_norms, loss_this_iter = perturb_past(
File "/pythonProjects/transformerTest/venv/PPLM/run_pplm.py", line 213, in perturb_past
bow_logits = torch.mm(probs, torch.t(one_hot_bow))
RuntimeError: mat1 dim 1 must match mat2 dim 0

I'm not sure if I screwed up the finetuning or pplm, but the model does generate text with the run_generation example, and if I just change the model to gpt2 pplm runs on the bag of words. Anyone know how to fix this error, or what I am doing wrong?
Thanks.

Edit: The problem seems to have to do with the special tokens I added.

Questions about length parameters

Hello, thanks for your great work! It's novel and inspiring.

I am trying to extend PPLM for some downstream applications (with a different pre-trained model and a different discriminator). I went through your code, but am not sure of the function of parameters --length and --grad_length.

From my understanding so far, --length is used to create this generation loop that decodes a token at a time, so I guess it controls the generation length. Can you please confirm this?

If so, this condition does not seem straightforward to me: stepsize is set to 0 for generation steps in [grad_length, length], and as a result, this leads to no update to the model (gradient=0).

But I also noticed the default value of grad_length is 10000, which is much larger than length (whose default value is 100). If grad_length>length, the above condition will always be False and the original stepsize will always be used. Therefore, I am confused about the use of this condition. Can you also please clarify this?

Thanks!

toxicity not working

python3 run_pplm.py -D toxicity --class_label 3 --cond_text "The food is aweful" --length 50 --gamma 1.0 --num_iterations 10 --num_samples 3 --stepsize 0.04 --kl_scale 0.04 --gm_scale 0.95 --sample

Traceback (most recent call last):
  File "run_pplm.py", line 794, in <module>
    run_pplm_example(**vars(args))
  File "run_pplm.py", line 621, in run_pplm_example
    pretrained_model = DISCRIMINATOR_MODELS_PARAMS[discrim]["pretrained_model"]
KeyError: 'toxicity'

How to use attribute models, gamma, top_k and discriminators?

I am running run_pplm_discrim_train.py and it looks like it’ll finish its first epoch overnight. Will that result in producing a file?

I’m interested in how to use different attribute models and discriminators.

It’s a lot of fun experimenting with this software. I’m also unsure how gamma, temperature and top_k affect the text. I’d really appreciate any guidance.

Performances

Thanks for open-sourcing the code !

This approach is very interesting, but I'm curious about the impact on performance (inference speed).

Is there any benchmark showing the impact on performance with different parameters ?

Equation (4) | Attribute model

Trying to understand the attribute model in equation (4) in your paper. I have two general questions.

  1. About the whole term p(a | H_t + \Delta H_t).

Given the modified model (which is actually my second question), you want to compute the probability that the model would generate a sequence that contains attributea.

Let's consider the BoW approach. For each word w in the bag, and given the current sentence sequence_so_far, you compute Prob(sequence_so_far + w). Is it correct so far?
How do you compute that last term? Is it like model.predict(sequence_so_far+w)?

  1. About the modified history.

I get how it's computed. Not how the model is modified in practice though. Is it something like model.layers[i].set_weights(H[i] + DeltaH[i]), for the specific layers corresponding to the whole H?

Thanks!

Evaluation codes(Perplexity and Dist scores)

Really great paper and thanks for open-sourcing the code!
But I can't find any evaluation process in your codes(perplexity or Dist scores as paper)
Where can I find the evaluation codes?
I really want to replicate the paper's results by myself(automated evaluation section in the paper)
I really need help and very very appreciate your reply! please!

Formatting training text files for discriminator training script

Hi there, I hope to try discriminator-based PPLM with different sizes of GPT2. To do this, I believe we need to retrain the discriminator with a different embedding size using the paper_code/gpt2tunediscrim.py script. (Please correct me if I'm wrong here!) However, I am a little unclear on how the training text files should be formatted to be compatible with this code. It looks like each line in toxic_train.txt is processed with eval(d) to become a dictionary or json-like object with the keys 'text' and 'label'. Here is the excerpt of code I am looking at:

with open("datasets/toxic/toxic_train.txt") as f:
    data = []
    for d in f:
        data.append(eval(d))

x = []
y = []
for d in data:
    try:
        # seq = tokenizer.encode("Apple's iOS 9 'App thinning' feature will give your phone's storage a boost")
        seq = tokenizer.encode(d["text"])

        device = 'cuda'
        if(len(seq)<100):
            seq = torch.tensor([50256] + seq, device=device, dtype=torch.long)
        else:
            continue
        x.append(seq)
        y.append(int(np.sum(d['label'])>0))
    except:
        pass

Is there any chance you can share your training text files (e.g. datasets/toxic/toxic_train.txt) or the script you used to create the text files from the original datasets? Thank you!

Should we or Can we train the classifier on top of the fine-tuned GPT-2?

Hi!

Thanks for your cool works!

I carefully read your paper and so impressed with it. So I was trying to train my own discriminator on a generic dataset. But it seems that we can only choose the original GPT-2 when we launch the run_pplm_discrim_train.py.

Intuitively we have two strategies:
PlanA: Train the discriminator on top of the original GPT-2 -> plug it in our fine-tuned GPT-2 -> train together -> generate text
PlanB: Train the discriminator on top of the fine-tuned GPT-2 -> plug the former in the latter -> train together -> generate text.

May I know which one is correct? If PlanB is, how can we do that? (I tried to replace the model with a fine-tuned one but got the errors as in the following pic

image

Many thanks and best regards!

Question about how to handle `inputs_embeds`

Hi,

Thank you for sharing your great work!

I have a question about how to handle input_embeds in the PPLM code.

When I look run_pplm.py, I found something I cannot understand the intention.
https://github.com/uber-research/PPLM/blob/master/run_pplm.py#L220

        if loss_type == PPLM_DISCRIM or loss_type == PPLM_BOW_DISCRIM:
            ce_loss = torch.nn.CrossEntropyLoss()
            # TODO why we need to do this assignment and not just using unpert_past? (Sumanth)
            curr_unpert_past = unpert_past
            curr_probs = torch.unsqueeze(probs, dim=1)
            wte = model.resize_token_embeddings()
            for _ in range(horizon_length):
                inputs_embeds = torch.matmul(curr_probs, wte.weight.data)
                _, curr_unpert_past, curr_all_hidden = model(
                    past=curr_unpert_past,
                    inputs_embeds=inputs_embeds
                )
                curr_hidden = curr_all_hidden[-1]
                new_accumulated_hidden = new_accumulated_hidden + torch.sum(
                    curr_hidden, dim=1)

inputs_embeds is updated in the for loop,
but curr_probs and wte.weight.data seem not to be updated in the loop.

Could you please tell me the reason inputs_embeds is calculated in the for loop?

Thank you in advance!

Story generation with skeleton?

Hi there!

This is quite an interesting project and I've been experimenting a lot with different setups, generating theater plays and other more unconventional pieces of text.
I have a question about the small story-writing part of the paper. The results looked very interesting and I wanted to try it on my own.
Basically, what kind of input do you give to the language model when you "fill in the blanks" between the story skeleton? Do you generate text then feed the skeleton piece, then generate text again ect. or did you come up with a different approach?

Best wishes,
Lukas

Question about window mask

Hi. Maybe it's me understanding it incorrectly. In code line 178~180 from run_pplm.py, where a window mask for choosing only a recent past of the hidden states to update is constructed:

window_mask = torch.cat(
            (ones_mask, torch.zeros(zeros_key_val_shape)),
            dim=-2

Should we actually concatenate in the order of (zeros; ones) instead since we aim to mask out the recent latents rather than the very beginning?
Any response to this would be greatly appreciated!

Where are the samples of automated evaluation?

Thanks for your reply, I have written an program to calculate perplexity by hugging-face transformers interface.
But I am not sure which samples are used for perplexity calculation.

Adding eos token

In run_pplm_discrim_train.py get_generic_dataset() line 291:
if add_eos_token:
seq = [50256] + seq
I assume it addes an eos token to the sequence. However, it seems to be added at the beginning by these lines. Shouldn't eos be added to the end of the sequence?

Question about future tokens in perturbation

Hi there,

Thanks again for your great work and kind response last time.

I have another question about this loop for obtaining future token representations. Let's say the current generation step is at t. At the start, we have unperturbed past about [0, t-1], and we perturb it. Then, we use the perturbed past (about [0, t-1]), and the last generated token from t-1, to generate the hidden state at t (see this line).

We then use the hidden state at t to obtain the input embedding for t+1, and finally, the future hidden state at t+1. To do this, you run the forward pass again in the loop, based on

  • Unperturbed past, about [0, t-1] and
  • Input embedding at t+1

What seems missing to me in this forward pass is the generated token at t-1. The past you use here is about [0, t-1], and the next input, from the autoregressive perspective, should be the generated token at t-1. However, the input embedding at t+1, which contains information from the step t is applied in your code.

I personally don't think there's anything wrong with this implementation, but I would like to confirm if I have misunderstood anything here. Also, if my understanding is right, it would be great if you could elaborate on the motivation of choosing not to encode the generated token at t-1 explicitly. For instance, an easy way would be to append the generated token at t-1 to the past, then do the forward pass.

Thanks a lot!

External Classifier

Hi, where to find the external sentiment classifier trained on IMDB movie reviews?

unequal dimensions between the new_accumulated_hidden and the matrix in mlp in classifier

When I run the pplm when both bow and discrim are on, ('technology', 'sentiment', respectively), new_accumulated_hidden.shape[1] = 765 but the emb_size in mlp is 1024, the dimensions are not consistent in matmul in pplm_classification_head, so I am getting

RuntimeError: size mismatch, m1: [1 x 768], m2: [1024 x 5] when calculating the loss for the pertubed text.

Please correct me if I miss something, thank you very much for your help

Questions about discriminator attribute models

In Section 4.3, the paper says, "... we use the distirubtion ~p_(t+1) (instead of a hard sample x_(t+1)), and feed it forward to obtain (a biased) estimate of the next token’s embedding and then update delta_H_t." In the code, I found hard sample x_(t+1) (i.e., model(last, ...)) is feeded into the model and got the probs in the first time,

all_logits, _, all_hidden = model(last, past=perturbed_past)
        hidden = all_hidden[-1]
        new_accumulated_hidden = accumulated_hidden + torch.sum(
            hidden,
            dim=1
        ).detach()
        # TODO: Check the layer-norm consistency of this with trained discriminator (Sumanth)
        logits = all_logits[:, -1, :]
        probs = F.softmax(logits, dim=-1)

and then, the code put the soft distribution ~p_(t+1) (i.e., inputs_embeds) in the model in the second time,

if loss_type == PPLM_DISCRIM or loss_type == PPLM_BOW_DISCRIM:
            ce_loss = torch.nn.CrossEntropyLoss()
            # TODO why we need to do this assignment and not just using unpert_past? (Sumanth)
            curr_unpert_past = unpert_past
            curr_probs = torch.unsqueeze(probs, dim=1)
            wte = model.resize_token_embeddings()
            for _ in range(horizon_length):
                inputs_embeds = torch.matmul(curr_probs, wte.weight.data)
                _, curr_unpert_past, curr_all_hidden = model(
                    past=curr_unpert_past,
                    inputs_embeds=inputs_embeds
                )
                curr_hidden = curr_all_hidden[-1]
                new_accumulated_hidden = new_accumulated_hidden + torch.sum(
                    curr_hidden, dim=1)

My questions are
(1) Why it uses past=curr_unpert_past, instead of past=past, to predict next token in the second time? Because if you predict next token, we need to input GPT2 with current token_id (or embedding) and the past_key_values before current token.
(2) In the second time, the code didn't update logits (i.e., _, curr_unpert_past, curr_all_hidden = model(...), so it cann't update probs, thus it use the probs in the first time (i.e., probs = F.softmax(logits, dim=-1). Why not to update probs at the second time?
Thank you so much. Please correct me if I'm wrong.

pplm with gpt-neo

pert_logits, past, pert_all_hidden = model(last, past=pert_past)

I'm trying to implement gpt-neo with PPLM. however gpt-neo meeds upgarde transformers liberary to transfoermers>=4.5; where 'past' is replaced with past_key_values.
when I change past to past_key_values I got an error as following

TypeError: forward() got an unexpected keyword argument 'past'

any ideas how can I solve this ?

@dathath @julien-c

The difference of BAG_OF_WORDS_ARCHIVE_MAP in this repository and HuggingFace transformers examples/research_projects

Hello,

I want to ask about the difference between BAG_OF_WORDS_ARCHIVE_MAP in this repository and HuggingFace Transformers examples/research-projects/pplm.

In run_pplm.py in this repository,

PPLM/run_pplm.py

Lines 58 to 68 in 5b262d6

BAG_OF_WORDS_ARCHIVE_MAP = {
'legal': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/legal.txt",
'military': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/military.txt",
'monsters': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/monsters.txt",
'politics': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/politics.txt",
'positive_words': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/positive_words.txt",
'religion': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/religion.txt",
'science': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/science.txt",
'space': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/space.txt",
'technology': "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/technology.txt",
}

On the contrary, in run_pplm.py in the examples/research_projects/pplm of HuggingFace transformers,

https://github.com/huggingface/transformers/blob/bfa4ccf77d65d8899b01417bd9845b2e78bc0ec5/examples/research_projects/pplm/run_pplm.py#L47-L55

BAG_OF_WORDS_ARCHIVE_MAP = {
    "legal": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/legal.txt",
    "military": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/military.txt",
    "politics": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/politics.txt",
    "religion": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/religion.txt",
    "science": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/science.txt",
    "space": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/space.txt",
    "technology": "https://s3.amazonaws.com/models.huggingface.co/bert/pplm/bow/technology.txt",
}

It seems some of the word lists are removed in the examples/research_projects version.

Are there any rights issues involved in using the lists?
I'm sorry if this is an impolite question.

I really appreciate any help you can provide.

training a discriminator

I am trying to train a discriminator on my own (the network learns to differentiate between text from one specific source from generic text ) . The idea is that I should be able to plug it into the PPLM model and generate sentences that look like they are from the particular source. Is this something that can work out ? From you previous experience of training the discriminator on 'sentiment', 'clickbait' etc, is there a minimum training data size you recommend ?

paper_code usage of scripts

Hi @dathath @w4nderlust !

I'm facing issues in replicating results with scripts in the paper_code folder.
Could you share the usage/bash script to run pplm with the pretrained ckpts (atleast for sentiment and detoxification)?
Would be really helpful if the scripts to train pplm for sentiment and detoxification with paper_code are also available.

Specifically I'm stuck with (1) this import from run_gpt2 import top_k_logits here -
https://github.com/uber-research/PPLM/blob/master/paper_code/gpt2tunediscrim.py#L24

and (2) this drive link https://github.com/uber-research/PPLM/blob/master/paper_code/setup_script.sh#L6
doesn't seem to work
./gdown.pl https://drive.google.com/open?id=15TvAxA8TS8nn1lCzpVPn-Myp5RDlJiHF gpt2.

Thanks!

Inference Time

Thanks for your brilliant work!

Currently, I am trying the PPLM with a discriminator on GPU but it still needs around 5 mins to generate 512 tokens. I wonder if there is any way to speed up the inference time?

Many thanks and best regards,
Yijun

seed value

Hi,

I have trained a classifier and I am trying to do inference for a text file rather than a single input.
I realized that I will get a different outputs if I reorder the examples in my file. and this is because of the seeds here.
Does it really matter to generate all examples with seed 0?

Doc the BOW approach

Hi, this looks great. I had to look at the code to get some insight into how to do a BOW approach of my own. Maybe you could add a few lines to the readme about that? The paper seems a little light on how the topic words were selected as well, unless I missed that? But awesome work!

Time to train an attribute model

Hi

I'm training my own attribute model with

python run_pplm_discrim_train.py --dataset SST --pretrained_model distilgpt2 --epochs 1 --log_interval 100

in colab with the gpu and it's taking about 3s/batch. Is this normal?

Thanks

A different discriminator?

Hi,

I really enjoyed your paper.
In this regard, I have following questions that I appreciate your reply,

  1. I wonder if it possible to replace the current discriminator which is build on top of the LMHead with any other trained discriminator? In other words, does it really need to be built on top of the LM (p(x)) itself, or it can be any discriminator?
  2. If the answer to previous is yes, then is it possible to fine-tune gpt-2 on our own data and then generate from that with our specific discriminator (attr model)?

run run_pplm.py error

When I run the command: python run_pplm.py -B military --cond_text "The potato" --length 50 --gamma 1.5 --num_iterations 3 --num_samples 10 --stepsize 0.03 --window_length 5 --kl_scale 0.01 --gm_scale 0.99 --colorama --sample

Traceback (most recent call last):
File "run_pplm.py", line 936, in
run_pplm_example(**vars(args))
File "run_pplm.py", line 738, in run_pplm_example
tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model)
File "/home/xps/anaconda3/envs/nlg_entity/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 911, in from_pretrained
return cls._from_pr
etrained(*inputs, **kwargs)
File "/home/xps/anaconda3/envs/nlg_entity/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 1014, in _from_pretraine
list(cls.vocab_files_names.values()),
OSError: Model name 'gpt2-medium' was not found in tokenizers model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We asswas a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find sucat this path or url.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.