minimaxir / gpt-2-cloud-run Goto Github PK

View Code? Open in Web Editor NEW

313.0 6.0 85.0 157 KB

Text-generation API via GPT-2 for Cloud Run

License: MIT License

Dockerfile 4.57% Python 16.61% HTML 78.82%

gpt-2 text-generation api cloud-run

gpt-2-cloud-run's Issues

Dockerfile checkpoint is missing

Hi,

Hope you are all well !

I could not build the docker image as checkpoint file is missing.
Can you re-upload it or shall I remove it from the Dockerfile ?

Thanks in advance for your insights and inputs on the topic.

Cheers,
X

tensorflow version

Tensorflow > 2 is not compatible with gpt-2-simple.
Docker is downloading the latest version.

Add rate limiting

Since Cloud Run does unauthenticated HTTP requests, it would be good to add a simple rate limit by IP.

Unfortunately there's no simple implementation, and the simple implementations that exist are for Flask only.

Tensorflow.contrib module error when running docker file

Hi Max,
I am following the instructions in the readme file. Once I built the image, I tried to run the image locally, but got the following error. It seems tensorflow.contrib module is discontinued in version 2.0. I noticed in the docker file you do not specify the version of tensorflow so it might have auto installed 2.0. The TF version in my colab notebook was 1.15 when I trained the model, so I will try to force 1.15 in the dockerfile.

docker run -p 8080:8080 --memory="2g" --cpus="1" gpt2

Traceback (most recent call last):
  File "app.py", line 3, in <module>
    import gpt_2_simple as gpt2
  File "/usr/local/lib/python3.7/site-packages/gpt_2_simple/__init__.py", line 1, in <module>
    from .gpt_2 import *
  File "/usr/local/lib/python3.7/site-packages/gpt_2_simple/gpt_2.py", line 23, in <module>
    from gpt_2_simple.src import model, sample, encoder, memory_saving_gradients
  File "/usr/local/lib/python3.7/site-packages/gpt_2_simple/src/memory_saving_gradients.py", line 5, in <module>
    import tensorflow.contrib.graph_editor as ge
ModuleNotFoundError: No module named 'tensorflow.contrib'

Save Image button does not always work on Mobile

There may or may not be a way to fix it. No issues with hiding the button on mobile devices.

Could not load dynamic library 'libcuda.so.1'

When running the docker image, I receive the following cuda error. I have built the image using the normal tensorflow==1.15, but as far as I know cuda is only required for tensorflow on GPU?

2020-01-16 12:58:56.907133: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-01-16 12:58:56.907168: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-01-16 12:58:56.907187: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (a88f782966c0): /proc/driver/nvidia/version does not exist

Memory limit exceeded

I'm trying to run this in Google Cloud Run, however I don't seem to have enough memory.

Memory limit of 2048M exceeded with 2126M used.

Do you have any idea why this is the case? It should work, right?

I'm using the 124MB model.

Add reCAPTCHA support

Should be able to set the CAPTCHA auth keys as environmental variables.

TypeError: not all arguments converted during string formatting

Hi, first of all thank you for this amazing tutorial and repo. Awesome work!

So I've been trying to generate a custom model similar to the code you have in /examples/hacker_news.py. I have my app.py shown below. I've set return_as_list to true. I get the list, remove unnecessary prefixes from each string in the list and return the list as JSON. This code throws an error which I have shows below. However, when I generate text locally without calling the API (ie. without the HTTPS requests involved), the code works perfectly without errors. I can't seem to figure out what I'm doing wrong. Highly appreciate any help.

app.py

from starlette.applications import Starlette
from starlette.responses import UJSONResponse
import gpt_2_simple as gpt2
import tensorflow as tf
import uvicorn
import os
import gc

app = Starlette(debug=False)

sess = gpt2.start_tf_sess(threads=1)
gpt2.load_gpt2(sess)


response_header = {
    'Access-Control-Allow-Origin': '*'
}

generate_count = 0


@app.route('/', methods=['GET', 'POST', 'HEAD'])
async def homepage(request):
    global generate_count
    global sess

    if request.method == 'GET':
        params = request.query_params
    elif request.method == 'POST':
        params = await request.json()
    elif request.method == 'HEAD':
        return UJSONResponse({'text': ''},
                             headers=response_header)

    
    text = gpt2.generate(sess,
                         length=55,
                         temperature=1.0,
                         top_k=int(params.get('top_k', 0)),
                         top_p=float(params.get('top_p', 0)),
                         prefix='<|startoftext|>' + params.get('prefix', ''),
                         truncate='<|endoftext|>',
                         include_prefix=True,
                         nsamples=params.get('nsamples', 1),
                         return_as_list=True
                         )

    for x in text:
        x = x.replace('<|startoftext|>', '')
        x = x.replace('<|endoftext|>', '')
        x = x.replace('  ', ' ')


    generate_count += 1
    if generate_count == 8:
        # Reload model to prevent Graph/Session from going OOM
        tf.reset_default_graph()
        sess.close()
        sess = gpt2.start_tf_sess(threads=1)
        gpt2.load_gpt2(sess)
        generate_count = 0

    gc.collect()
    return UJSONResponse({'text_list': text})

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

Error:

`Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi result = await app(self.scope, self.receive, self.send) File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__ return await self.app(scope, receive, send) File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 102, in __call__ await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__ raise exc from None File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__ await self.app(scope, receive, _send) File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__ raise exc from None File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__ await self.app(scope, receive, sender) File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 550, in __call__ await route.handle(scope, receive, send) File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 227, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 41, in app response = await func(request) File "app.py", line 44, in homepage nsamples=params.get('nsamples', 1), File "/usr/local/lib/python3.7/site-packages/gpt_2_simple/gpt_2.py", line 428, in generate assert nsamples % batch_size == 0 TypeError: not all arguments converted during string formatting

More consistent output for Save Image

The current Save Image will result in an output based on the viewport of the device: not necessarily wrong, but it would be good if it was more consistent.

Add Favicon indicating when text is generating

For when the web UI is open in another tab.

Can apparently set with jQuery so not too hard.

The hard part is deciding the generation/nongeneration favicon.

Parameter Usage Question

I understand the 4 parameters in your jQuery form (prefix, length, temperature, top_k) but in the .app file, there are also these 4 lines:
top_p=float(params.get('top_p', 0)),
truncate=params.get('truncate', None),
include_prefix=str(params.get('include_prefix', True)).lower() == 'true',
return_as_list=True

I assume if I don't want to allow the user to write the first n characters of the story, I would change it to: 'include_prefix', False . But what do these 3 do? top_p, truncate, and return_as_list

Google Cloud Free Tier

Not really a issue per see, but does anyone know how to stay in the google cloud free tier ?
What parameter does i have to use when I configure my image ?

Is it possible to update this for the big model?

Error message on Cloud Run deployment

This issue has to do with deploying in google cloud run. The app run runs in local docker container, i.e., curl http://0.0.0.0:8080 returns desired output. I then followed the rest of the instructions to deploy in google cloud run -- set the memory to 2GB & set max requests to 1. The deployment however wasn't successful.

Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.

A cursory search says that GCP expects requests on 0.0.0.0:8080. This is what app.py stores too, so not sure where the deployment error is coming from. Any idea?

Poor quality of text generation in Cloud Run compared to Colab

First up, thanks for all the work you've put into all of the GPT-2-simple stuff. It's amazing!

But I've set up a generation with Cloud Run using the same model and same settings as in Colab, and the text outputs are significantly less cohesive with lines being constantly repeated. Any particular reason why this would be happening? Is it a limitation of the Cloud Run hardware vs the Colab hardware?

The model is intended to be a video game idea generator trained on ~15,000 posts from /r/gameideas. Here's an example of the same prefix in each context:

Colab

A game where you have to fight children or some shit. The children are easy to kill. You can run for cover or you can try to fight back but you're much slower. You can't run as fast as the children. You can hide, crawl, crawl out the door. There's also a lot of zombies.

If you're fast enough, you can jump off the roof and climb inside. The children are easier to kill. You can jump it too. The children can get stuck in the wall. You can jump to them, kill them and then climb up. There's a lot of enemies.

You can use the power of the house as a platform to jump in the first places. You can then jump to the roof where there's a bigger enemy. You can then crawl out the door to the other side to sneak in. There's a lot of zombies.

There's also a lot of fire. You can run into them. You can throw a torch at them. They'll die if you're not careful. Once they die, you can jump to the roof but the fire won't burn you if you're not careful.

I'm not sure if the game is multiplayer or not.

Cloud Run

A game where you fight children, and you can make them do anything you want, and you have a gun and you fight crime.

You can make people sick with drugs, and you can make people homeless, and you can make people commit crimes.

You can make the police and the military and the FBI and the CIA and the NSA and the CIA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and the NSA and you can make everyone in history a billionaire.

You can make President Trump a billionaire, and all the other billionaire games like they are a game, but you can only make a few people rich, and you can only make one type of person rich, and you can only make a certain amount of people rich, and you can only make a certain amount of people homeless, and you can only make them sick.

It's a consistent trait where the Cloud Run generation seems to almost ignore the context of the prefix and then gets stuck in a loop.

Concerning generating multiple responses

What I am hoping for is that if I can generate multiple response for the prefix/text input i provide in the text box?

Is there any way to do that?

Spec for a GKE Kubernetes GPU Cluster w/ Cloud Run

Create a k8s .yaml file spec which will create a cluster that can support GPT-2 APIs w/ GPU for faster serving

Goal

Each Node has as much GPU utilization as possible.
Able to scale down to zero (for real, GKE is picky about that)

Proposal

A single f1-micro Node so the GPU-pods can scale to 0 (a single f1-micro is free)
Other Node is a 16 vCPU /14GB RAM (n1-highcpu-16).
Each Pod uses 4 vCPU, 1 K80 GPU, and a has a Cloud Run concurrency of 4.

Therefore, a single Node can accommodate up to 4 different GPT-2 APIs or the same API scaled up, which is neat.

In testing a single K80 can generate about 20 texts at a time before going OOM, so setting a maximum of 16 should give enough of a buffer for storing the model. If not, using T4 GPUs should give a RAM boost.

Reduce memory consumption to prevent errors due to container OOM

Containers seem to go OOM after ~10 generations, despite garbage collection. Loading the model takes up ~1.5GB so hitting the ceiling is not surprising, but there should be a way to control the leaks.

Request for Contribution: Add webpage GUI for GPT-2-based APIs

Part of the reason I am building gpt-2-cloud-run is for easy integration with a web-based front end.

Unfortunately, I suck at front-ends and don't know best practices. (Ideally, I want something similar to OpenAI's UI which has parameter selection and input capabilities for inline autocompletion)

I'll need help for a simple web-based frontend that's flexible. Some feature specifications:

A single HTML file with the app (no external JS/CSS; including from a CDN is OK).
Supports all parameters that the default API supports. (e.g. length, temperature, top_k)
A button for submission which is disabled on click until a response/error is received (to prevent double-submissions since they are slow)

Cloud Build workflow could be less janky

Copying an entire GCS bucket / creating a new bucket just for copying it is not ideal. Uploading app.py and the Dockerfile to the same bucket is also somewhat janky.

Possible alternate workflow:

Copy only the specified checkpoint folder to the Cloud Build /workspace working directory
If the specified checkpoint folder is not named correctly, rename it. This allows multiple folders in a bucket.
Upload app.py, Dockerfile, and friends from the local machine with the build CLI command

It may be helpful to include sanity checking steps as well if doing this workflow to check for the presence of filed before the Docker build.

AttributeError: 'NoneType' object has no attribute 'dumps'

Running fine in a docker, when i tried to run on my desktop error below came out.
Traceback (most recent call last): File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi result = await app(self.scope, self.receive, self.send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__ return await self.app(scope, receive, send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/applications.py", line 102, in __call__ await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/middleware/errors.py", line 178, in __call__ raise exc from None File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/middleware/errors.py", line 156, in __call__ await self.app(scope, receive, _send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__ raise exc from None File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__ await self.app(scope, receive, sender) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/routing.py", line 550, in __call__ await route.handle(scope, receive, send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/routing.py", line 41, in app response = await func(request) File "app.py", line 58, in homepage headers=response_header) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/responses.py", line 42, in __init__ self.body = self.render(content) File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/starlette/responses.py", line 159, in render return ujson.dumps(content, ensure_ascii=False).encode("utf-8") AttributeError: 'NoneType' object has no attribute 'dumps'

minimaxir / gpt-2-cloud-run Goto Github PK

gpt-2-cloud-run's Issues

Colab

Cloud Run

Goal

Proposal

Recommend Projects

Recommend Topics

Recommend Org