carperai / cheese Goto Github PK
View Code? Open in Web Editor NEWUsed for adaptive human in the loop evaluation of language and embedding models.
License: MIT License
Used for adaptive human in the loop evaluation of language and embedding models.
License: MIT License
Describe the bug
The webdataset
package needs to be manually installed in order to run examples.
To Reproduce
Steps to reproduce the behavior:
python -m cheese.examples.image_selection
the execution fails with the following errorTraceback (most recent call last):
File "/home/user/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/conda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/cheese/examples/image_selection.py", line 2, in <module>
from cheese.pipeline.iterable_dataset import IterablePipeline, InvalidDataException
File "/home/user/cheese/cheese/pipeline/iterable_dataset.py", line 4, in <module>
import webdataset as wds
ModuleNotFoundError: No module named 'webdataset'
Expected behavior
The example script should run without further installations. webdataset
should be included in the requirements.txt
file.
Desktop (please complete the following information):
Currently relying on client to implement some form of communication (i.e. send/receive jsons) with the user. For CHEESE to be a tool this isn't ideal. Whoever is using the API should have direct access to these JSON packets, but the current API provides no way to access them.
Describe the bug
device is preset to be GPU for instruct hf pipeline. Needs to be taken as input for line 61 and 29. Otherwise it leads to AssertionError: Torch not compiled with CUDA enabled
To Reproduce
Steps to reproduce the behavior:
Run instruct_hf_pipeline.py on a CPU system
Expected behavior
detect GPU or take as input the device
Multiple setups might require data to go to model before client,
or to go to model than immediately client. Couple things preventing this currently.
As the title said, can you give me some tips to implement a ner annotation example?
Need CHEESE to interface neatly with Prolific. Intended function is as follows:
Currently, model creates a queue of tasks sent to it by client of pipeline, then handles them one at a time before sending back. If processing can be done fast enough that the queue never starts to accumulate many tasks, this should be fine. However, it is very likely there will be a case where the queue starts to fill up faster than model running on unbatched data can keep up with it. Need to collate date and allow model processing to be done in batched form.
Current Idea:
Should have client manager save stats on the users. Specifically, for each user:
Saving progress for datasets, namely IterablePipelines, is currently a bit clunky. The output dataset is agnostic of progress/location in source. With respect to the source iterator being read from, all that is really being saved is an index in the dataset being read from. Currently naively running next on iterator to get back to whatever index was saved. Leaving a note here to revisit this later as it might have unforeseen consequences at scale.
Need to add ability to remove users with the API
Add something to API so that when pipeline is exhausted, it automatically removes all users
Need to update user view once they've been removed
-> In the simplest case, we add a "Exit" flag to task, that the manager can add to a task
-> Generate a task with no data and an exit flag, client sees this and switches to a "Thanks for helping!" screen of some sorts
In some cases, the client might have some data being shown to them, but they've already been removed.
-> This will require them to submit something
-> Posting submission will result in error, need the client manager to take submission, dump it, then give back the exit flag
There are several issues with trying to run the examples from scratch (on an M1 Mac). I would submit a PR to solve these but I probably don't have time at the moment.
Try to follow the existing README on an M1 Mac.
Add comment to README mentioning you can also run this command to run RabbitMQ
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3.11-management
Taken from https://www.rabbitmq.com/download.html
Initially add a comment to README describing that the username and password are printed on the command line.
Long term, likely don't use public by default for the examples OR provide a way to disable that using a command line argument or environment variable.
Either add the dependencies to requirements.txt or add it in the README or print a more helpful message when running that example.
Describe the bug
Printing the results in extract_data() here, the rankings
are None
. The rankings are also missing from the produced rankings_dataset
file.
To Reproduce
Steps to reproduce the behavior:
instruct_hf_pipeline
, e.g. python -m examples.instruct_hf_pipeline
Expected behavior
rankings
should be a list of ints, corresponding to the human labeler's decisions.
Example buggy result (printing inside extract_data()
):
{
"query": "hat has inspired you to become a speaker? How important is your own English knowledge base to you",
"completions": [
"? So, how is a new speaker's grammar an essential tool in how you plan to speak?\n\nLangston is a student who writes English for all students, and so it is all about teaching the new speaker to think out loud. That is what he started doing two years ago when he learnt that his grammar was going to be different from that of their world renowned schoolwork teacher.\n",
"?\n\nMe, and I am a fluent speaker. It is a privilege to be a speaker. Having some knowledge of English is important because while I get compliments for reading so many books I am already on a conversational train and I often find myself saying things I do not like in English. In order to achieve my own speech perfection, it can be hard to get my English to speak in English",
"?\n\nYes, I speak more and I also like to communicate with other people in the community as a whole. I learned so much as a teenager living in Japan that I really don't understand what my Japanese does. But when I meet people, I just try and say English because I like speaking Japanese, I like seeing them on TV. I love hearing their opinions, even though they are ignorant",
"?\n\nThe most important thing is learning to speak, even if it doesn't mean that much. The second person to do is to look at the problem and do something with it. The first person will look for something, and if possible, look at where it started. If you learn how to look for the first person in a sentence you can use the search function to find them, and if",
"?\n\nThis question has been asked and answered with very clear answers as to simply what the speakers speak English. One can use this knowledge as a basis for a wide variety of topics, as in reading the book for five minutes, or as part of the job posting. For those who only need a few hours of reading a book a week, here is a brief discussion of a topic within English at"
],
"rankings": null
}
The LMGenerationElement
does not report an error, but is missing the rankings.
LMGenerationElement(client_id=1, trip=1, trip_start='client', trip_max=1, error=False, start_time=1674231730.971844, end_time=1674231740.1409464, query='Write a quote on the floor',
completions=[...omit for brevity...], rankings=None)
If I solve the issue I'll comment here. Thx.
I'm following https://cheese1.readthedocs.io/en/latest/started.html, but when I run the python command (related: #47) I get a URL but no client id/password:
python -m examples.image_selection
100%|██████ 1/1 [00:04<00:00, 4.97s/it]
~/miniconda3/envs/cheese/lib/python3.11/site-packages/gradio/components.py:206: UserWarning: 'rounded' styling is no longer supported. To round adjacent components together, place them in a Column(variant='box').
warnings.warn(
~/miniconda3/envs/cheese/lib/python3.11/site-packages/gradio/components.py:224: UserWarning: 'border' styling is no longer supported. To place adjacent components in a shared border, place them in a Column(variant='box').
warnings.warn(
Running on local URL: http://127.0.0.1:7860
Desktop:
Many thanks for any help! :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.