Git Product home page Git Product logo

cheese's People

Contributors

asmith26 avatar ayulockin avatar ehavener avatar fugitive-cat avatar kastanday avatar louiscastricato avatar shahbuland avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cheese's Issues

`webdataset` not installed but required

Describe the bug
The webdataset package needs to be manually installed in order to run examples.

To Reproduce
Steps to reproduce the behavior:

  1. Follow the procedure described in Getting Started with CHEESE
  2. When running python -m cheese.examples.image_selection the execution fails with the following error
Traceback (most recent call last):
  File "/home/user/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/user/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/cheese/examples/image_selection.py", line 2, in <module>
    from cheese.pipeline.iterable_dataset import IterablePipeline, InvalidDataException
  File "/home/user/cheese/cheese/pipeline/iterable_dataset.py", line 4, in <module>
    import webdataset as wds
ModuleNotFoundError: No module named 'webdataset'

Expected behavior

The example script should run without further installations. webdataset should be included in the requirements.txt file.

Desktop (please complete the following information):

  • OS: Ubuntu 20.04

Need to move communication with user to API

Currently relying on client to implement some form of communication (i.e. send/receive jsons) with the user. For CHEESE to be a tool this isn't ideal. Whoever is using the API should have direct access to these JSON packets, but the current API provides no way to access them.

device=0 leads to AssertionError: Torch not compiled with CUDA enabled

Describe the bug
device is preset to be GPU for instruct hf pipeline. Needs to be taken as input for line 61 and 29. Otherwise it leads to AssertionError: Torch not compiled with CUDA enabled

To Reproduce
Steps to reproduce the behavior:
Run instruct_hf_pipeline.py on a CPU system

Expected behavior

detect GPU or take as input the device

Pipeline can't send/receive data directly to/from model

Multiple setups might require data to go to model before client,
or to go to model than immediately client. Couple things preventing this currently.

  1. Pipeline doesn't have an event subscriber for model to publish to
  • Fix this by adding a subscriber to pipeline (easy)
  1. Client ends up waiting for model rather then getting new data in the case where model is the last to touch data before it goes to pipeline
  • Fix this by checking trip and comparing it to trip_max on client before deciding if we are going to put it into idle or waiting state

Prolific integration

Need CHEESE to interface neatly with Prolific. Intended function is as follows:

  • User information from prolific linked to client IDs on CHEESE
  • CHEESE link sent to prolific users

Batched model input

Currently, model creates a queue of tasks sent to it by client of pipeline, then handles them one at a time before sending back. If processing can be done fast enough that the queue never starts to accumulate many tasks, this should be fine. However, it is very likely there will be a case where the queue starts to fill up faster than model running on unbatched data can keep up with it. Need to collate date and allow model processing to be done in batched form.
Current Idea:

  • Before taking newest task from queue, check size of queue and if it would be worth batching it (i.e. maybe if the size of items in queue is greater than some predefined number)
  • Take many tasks off queue and collate them
  • After model output is obtained, undo this collation and send tasks back as normally
  • Collate and uncollate functions should be defined by user but be optional
  • Handle queued tasks should deal with collation and batched calls to process

Client stats

Should have client manager save stats on the users. Specifically, for each user:

  • How much have they labelled so far
  • What proportion of data do they "error"
  • How much time do they spend labelling data on average after it has been sent to them

Saving progress for datasets

Saving progress for datasets, namely IterablePipelines, is currently a bit clunky. The output dataset is agnostic of progress/location in source. With respect to the source iterator being read from, all that is really being saved is an index in the dataset being read from. Currently naively running next on iterator to get back to whatever index was saved. Leaving a note here to revisit this later as it might have unforeseen consequences at scale.

Ending data stream to user

  • Need to add ability to remove users with the API

  • Add something to API so that when pipeline is exhausted, it automatically removes all users

  • Need to update user view once they've been removed
    -> In the simplest case, we add a "Exit" flag to task, that the manager can add to a task
    -> Generate a task with no data and an exit flag, client sees this and switches to a "Thanks for helping!" screen of some sorts

  • In some cases, the client might have some data being shown to them, but they've already been removed.
    -> This will require them to submit something
    -> Posting submission will result in error, need the client manager to take submission, dump it, then give back the exit flag

Issues with Running Examples

Describe the bug

There are several issues with trying to run the examples from scratch (on an M1 Mac). I would submit a PR to solve these but I probably don't have time at the moment.

To Reproduce

Try to follow the existing README on an M1 Mac.

Issues and Solutions

RabbitMQ won't run correctly using Conda

Solution

Add comment to README mentioning you can also run this command to run RabbitMQ
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.11-management
Taken from https://www.rabbitmq.com/download.html

It is unexpected for the user to get a password required when running the example and there is no explanation of how to proceed

Solution

Initially add a comment to README describing that the username and password are printed on the command line.
Long term, likely don't use public by default for the examples OR provide a way to disable that using a command line argument or environment variable.

The instruct_hf_pipeline.py doesn't work out of the box because the dependencies are not in the requirements.txt file

Solution

Either add the dependencies to requirements.txt or add it in the README or print a more helpful message when running that example.

`instruct_hf_pipeline` example returns rankings of `None`

Describe the bug
Printing the results in extract_data() here, the rankings are None. The rankings are also missing from the produced rankings_dataset file.

To Reproduce
Steps to reproduce the behavior:

  1. Run default instruct_hf_pipeline, e.g. python -m examples.instruct_hf_pipeline

Expected behavior
rankings should be a list of ints, corresponding to the human labeler's decisions.

Example buggy result (printing inside extract_data()):

{
  "query": "hat has inspired you to become a speaker? How important is your own English knowledge base to you",
  "completions": [
    "? So, how is a new speaker's grammar an essential tool in how you plan to speak?\n\nLangston is a student who writes English for all students, and so it is all about teaching the new speaker to think out loud. That is what he started doing two years ago when he learnt that his grammar was going to be different from that of their world renowned schoolwork teacher.\n",
    "?\n\nMe, and I am a fluent speaker. It is a privilege to be a speaker. Having some knowledge of English is important because while I get compliments for reading so many books I am already on a conversational train and I often find myself saying things I do not like in English. In order to achieve my own speech perfection, it can be hard to get my English to speak in English",
    "?\n\nYes, I speak more and I also like to communicate with other people in the community as a whole. I learned so much as a teenager living in Japan that I really don't understand what my Japanese does. But when I meet people, I just try and say English because I like speaking Japanese, I like seeing them on TV. I love hearing their opinions, even though they are ignorant",
    "?\n\nThe most important thing is learning to speak, even if it doesn't mean that much. The second person to do is to look at the problem and do something with it. The first person will look for something, and if possible, look at where it started. If you learn how to look for the first person in a sentence you can use the search function to find them, and if",
    "?\n\nThis question has been asked and answered with very clear answers as to simply what the speakers speak English. One can use this knowledge as a basis for a wide variety of topics, as in reading the book for five minutes, or as part of the job posting. For those who only need a few hours of reading a book a week, here is a brief discussion of a topic within English at"
  ],
  "rankings": null
}

The LMGenerationElement does not report an error, but is missing the rankings.

LMGenerationElement(client_id=1, trip=1, trip_start='client', trip_max=1, error=False, start_time=1674231730.971844, end_time=1674231740.1409464, query='Write a quote on the floor', 
completions=[...omit for brevity...], rankings=None)

If I solve the issue I'll comment here. Thx.

"Getting started" guide not working (no client id/password provided)

I'm following https://cheese1.readthedocs.io/en/latest/started.html, but when I run the python command (related: #47) I get a URL but no client id/password:

python -m examples.image_selection
100%|██████ 1/1 [00:04<00:00,  4.97s/it]
~/miniconda3/envs/cheese/lib/python3.11/site-packages/gradio/components.py:206: UserWarning: 'rounded' styling is no longer supported. To round adjacent components together, place them in a Column(variant='box').
  warnings.warn(
~/miniconda3/envs/cheese/lib/python3.11/site-packages/gradio/components.py:224: UserWarning: 'border' styling is no longer supported. To place adjacent components in a shared border, place them in a Column(variant='box').
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

Desktop:

  • OS: Ubuntu
  • Browser: Firefox

Many thanks for any help! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.