Git Product home page Git Product logo

cog's People

Contributors

allcontributors[bot] avatar andreasjansson avatar anotherjesse avatar aron avatar bfirsh avatar cuuupid avatar dashstander avatar dependabot[bot] avatar dkhokhlov avatar erbridge avatar evilstreak avatar floer32 avatar hongchaodeng avatar iamargentum avatar imshashank avatar jd7h avatar jianghushinian avatar joannejchen avatar justinmerrell avatar mattt avatar mbukerepo avatar nickstenning avatar rorybyrne avatar sirupsen avatar technillogue avatar tempusfrangit avatar tommydew42 avatar williamluer avatar yorickvp avatar zeke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cog's Issues

Use separate `required` option for `@cog.input()`

The double behavior of default is not obvious.

This might want to be optional=True instead, so that inputs are required by default. If the inputs were not required by default, then users normally would make them required. If required inputs are not marked as required, this will cause breakage.

End to end tests don't work on macOS

From #118, the end to end tests can't connect to the bridge IP on macOS:

Screen Shot 2021-06-15 at 10 58 06

This should either be run in a consistent dev environment, or if we actually want to run the end to end tests on macOS (which probably makes sense in CI?), then we might need some kind of OS-based switch in there.

gnutls_handshake() failed: The TLS connection was non-properly terminated.

cog build
═══╡ Uploading /Users/tekumara/code3/cog-examples/inst-colorization to localhost:8080/examples/inst-colorization
⠋ uploading (925 MB, 269.985 MB/s) ═══╡ Building model...
═══╡ Received model
═══╡ Building cpu image
═══╡   * Installing Python prerequisites
═══╡   * Installing Python 3.8
═══╡   * Installing system packages
═══╡   * Installing Python packages
═══╡   * Installing Cog
═══╡   * Copying code
═══╡ Successfully built 507cf5936fd9
═══╡ Pushing localhost:5000/inst-colorization:507cf5936fd9 to registry
═══╡ Building gpu image
═══╡   * Installing Python prerequisites
═══╡   * Installing Python 3.8
═══╡  ---> Using cache
═══╡  ---> 68aac6e4699f
═══╡ Step 8/20 : RUN curl https://pyenv.run | bash && 	git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && 	pyenv
   │ install-latest "3.8" && 	pyenv global $(pyenv install-latest --print "3.8")
═══╡  ---> Running in ae5b74d815ca
═══╡   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
═══╡                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   285  100   285    0     0    198      0  0:00:01  0:00:01 --:--:--   198  0
═══╡ Cloning into '/root/.pyenv'...
═══╡ Cloning into '/root/.pyenv/plugins/pyenv-doctor'...
═══╡ Cloning into '/root/.pyenv/plugins/pyenv-installer'...
═══╡ Cloning into '/root/.pyenv/plugins/pyenv-update'...
═══╡ fatal: unable to access 'https://github.com/pyenv/pyenv-update.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
═══╡ Failed to git clone https://github.com/pyenv/pyenv-update.git
═══╡ Error: Failed to build Docker image: exit status 255

High CPU usage during the build.

Output of build should not display as console log messages

═══╡ Building gpu image
═══╡   * Installing Python prerequisites
═══╡   * Installing Python 3.8
═══╡   * Installing system packages
═══╡   * Installing Python packages
═══╡ #11 sha256:6ba92f3047b5dec04235ade8528c87bc142e66bb38015765dc4f9cbb7d185cd8
═══╡ #11 DONE 0.5s
═══╡
═══╡ #12 [ 9/15] RUN pip install -f
   │ https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html -f
   │ https://download.pytorch.org/whl/cu101/torch_stable.html
   │ --extra-index-url=git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
   │ cachetools==4.1.0 chardet==3.0.4 future==0.18.2 fvcore==0.1.dev200506
   │ idna==2.9 importlib-metadata==1.6.0 jsonpatch==1.25 jsonpointer==2.0
   │ markdown==3.2.2 mock==4.0.2 opencv-python==4.3.0.38 portalocker==1.7.0
   │ pyasn1==0.4.8 pyasn1-modules==0.2.8 pydot==1.4.1 requests==2.23.0
   │ requests-oauthlib==1.3.0 rsa==4.0 tabulate==0.8.7 termcolor==1.1.0
   │ urllib3==1.25.8 visdom==0.1.8.9 websocket-client==0.57.0 werkzeug==1.0.1
   │ yacs==0.1.7 zipp==3.1.0 cython==0.29.22 pyyaml==5.1 dominate==2.4.0
   │ detectron2==0.1.2 torch==1.5.0 torchvision==0.6.0 pycocotools==2.0.2
   │ ipython==7.21.0 scikit-image==0.18.1
═══╡ #12 sha256:84e81bafd53e4595c28450182c770765e490c57fe351dd48fbad3418b7ad1697
═══╡ #12 1.283 Looking in indexes: https://pypi.org/simple,
   │ git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
═══╡ #12 1.283 Looking in links:
   │ https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/index.html,
   │ https://download.pytorch.org/whl/cu101/torch_stable.html
═══╡ #12 1.370 WARNING: Cannot look at git URL
   │ git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI/cachetools/
   │ because it does not support lookup as web pages.
═══╡ #12 2.202 Collecting cachetools==4.1.0
═══╡ #12 2.235   Downloading cachetools-4.1.0-py3-none-any.whl (10 kB)
═══╡ #12 2.265 WARNING: Cannot look at git URL
   │ git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI/chardet/
   │ because it does not support lookup as web pages.
═══╡ #12 2.942 Collecting chardet==3.0.4
═══╡ #12 2.950   Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
......

Console ═══╡ messages are for displaying messages to the user, not for large amounts of debugging output. The build output should be displayed as plain text, clearly delineated from the informational messages.

Better error message when `Model.setup()` is not set

If you don't set a setup() function, you get an incomprehensible error:

═══╡ Traceback (most recent call last):
   │   File "/usr/bin/cog-http-server", line 8, in <module>
   │     cog.HTTPServer(Model()).start_server()
   │ TypeError: Can't instantiate abstract class Model with abstract methods setup
   │
═══╡ Container exited unexpectedly

My feeling is we should require setup() to encourage users to do the right thing, but there should be a clearer error message.

Design how to run arbitrary scripts

There is preinstall but it implies that it comes before installing other things, but it doesn't -- it becomes before copying code. We should:

  1. Figure out the "default" place to run arbitrary commands
  2. Figure out the right name for it

Throw error if invalid keys exist in cog.yaml

Currently invalid keys are silently ignored.

For example, a cog.yaml that contains this:

buildd:
  python_version: "3.8"
  python_packages:
    - "torch=1.8.0"

Will silently not install torch, instead of complaining that the key buildd doesn't exist.

Future

We may also want to validate the values at some point. For example, ensuring python_version is a string and matches a given pattern. (If you omit the quotes, 3.1 and 3.10 are indistinguishable! Thanks YAML!)

(Edited by @andreasjansson and @bfirsh.)

Make Dockerfile generation more robust

There is a lot of string concatenation without escaping and things like that. This feels like it needs a DSL.

Perhaps the simplest way would be to put all generated data inside environment variables, which can easily be escaped, then carefully input that data in fixed commands through the rest of the Dockerfile.

We can also do stuff like use the RUN ["foo", ...] form.

Ideally we wouldn't use Dockerfiles at all, which is a larger piece of work in #21.

Cog stalls when pushing large, cached files

Steps to reproduce:

  1. Push some large files
  2. Push again

It now stalls saying ⠙ uploading (3.1 kB, 0.494 kB/s). Perhaps it's calculating some hashes, or something. Whatever it's doing it should show progress instead of looking broken.

Design prediction API versioning

I want to rename /infer to /predict but we can't change it because all the old models have it and there is no way for Cog to detect what version it is and what it should call.

Alternatively, perhaps it is the client which should detect what version of Cog the model has been made with, and adjust its API calls as appropriate.

Support streaming real-time prediction

Currently Cog requires you to upload an input file, then it's processed and results are returned. But there are cases when you might want to stream a continuous input to the model. For example, if you have a model that does audio event detection, you might want to display the current event as it happens.

Jupyter Notebook Integration

Would love to see some kind of notebook integration. Can we expose the environment built in Docker as a Jupyter Notebook kernel possibly?

Predict shouldn't have network access

It's common for models to download weights in the setup() function. This isn't reproducible (models might run without a network connection and weights files can disappear from the internet) so we should discourage it. In cases where you actually need network access we can provide a config option to allow the model to hit the network.

The tricky bit is allowing incoming access for the HTTP server, while disallowing outgoing connections. On a cursory search, there seems to be no simple way to do this without iptables rules on the host, or the container to a private network and using another container as a proxy. Some creativity might be needed.

As a start, perhaps we could bodge it inside the container. That way we can guarantee it isn't downloading any files for reproducibility reasons, but doesn't have any security guarantees.

There are some cases where network access might be needed. A few ideas:

  1. We don't add any way of allowing network access now, and see how far we get. Maybe we don't need to add an option at all.
  2. If you want network access, you need to run it via Docker directly.
  3. We add an option in cog.yaml to allow network access. However -- Untrusted models not having network access is a neat security feature, so it would be a shame to allow model creators to break this.
  4. We add a runtime option to allow network access. For users who are in control of their environment, this lets them do weird stuff like this. "Turn this off at your own risk."

[This issue has been authored by @andreasjansson and @bfirsh.]

Build Docker images with buildkit directly

Instead of using Dockerfiles. Concatenating strings is fragile. https://www.docker.com/blog/compiling-containers-dockerfiles-llvm-and-buildkit/

Note that we use buildkit to build Dockerfiles for CPU. This is about calling the buildkit API directly instead of going via a Dockerfile.

We could also call the Docker API to create and commit containers, emulating the build process.

A nice side-effect of using Dockerfiles is we can generate a Dockerfile for users if they want to "eject" from Cog.

Related to #165

Cog should work in subdirectories

Cog should search up the file tree for cog.yaml, like Keepsake does.

For example, if /home/ben/hotdog-detector/cog.yaml exists, then I should be able to run cog predict in /home/ben/hotdog-detector/subdir/ and it should do what I expect.

There is some nuance here with cog run. Should the working directory be the relative current directory inside the container?

Local image management

As part of implementing local mode and on server (#18) we need a sensible way of managing local images.

Requirements

  • Tags should not grow uncontrollably, so that images can be cleaned up by docker system prune
  • The image should be given a name so you can identify it in docker images
  • The previous image should be removed

For clarity, this is additional work on top of #18. This also includes picking a sensible name when running locally and you aren't pointed at a registry.

Sensible defaults for `.cogignore`

As suggested by @zeke.

.npmignore defaults to .gitignore, but there is a dangerous silent failure in that: Suppose .gitignore ignores secrets.json. If you then you add .npmignore with something new you want to ignore, it stops inheriting from .gitignore therefore unignoring secrets.json.

There is also an additional consideration for machine learning models: .gitignore will normally ignore your model weights, but you want to include that for Cog, so maybe in this case the default would always be not what you want. In which case, it probably shouldn't be the default.

Maybe we need some sensible defaults that are clear to the user? Maybe there's something clever we can do based on .gitignore?

Rename "run arguments"

We don't use the word "arguments" anywhere else. This is "input types" or "inputs" or something along those lines?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.