allenai / amti Goto Github PK

A Mechanical Turk Interface (amti) 🤖

License: Apache License 2.0

Python 100.00%

mechanical-turk crowdsourcing mturk cli annotation command-line-tool

amti's Issues

Suggestion: add a command for anonymizing batch folder

Suppose you want to make a batch folder public. You'd need to drop the worker ids in the results file, etc.
Having a handy command for creating a think and anonymized batch directory could be useful.

Re-organize the CLI commands by topic

The amti CLI has grown a number of commands (14 by my current count). Grouping the
commands hierarchically into several topics would present a friendly help interface to
users.

I'd tentatively suggest grouping the commands into qualification, batch, and worker
groups. If valuable, the new organization can be discussed further on this issue. Also, after
grouping the commands, it might be helpful to rename some of them to eliminate
redundancy. For example:

amti create-batch to amti batch create
amti status-batch to amti batch status
amti delete-batch to amti batch delete
etc.

TypeError: write() argument must be str, not None in amti/actions/create.py

amti/amti/actions/create.py

Line 81 in 1e156b0

commit_file.write(current_commit)

If the current_commit is None, we get a TypeError.

TypeError: write() argument must be str, not None

Environment-based hittypeproperties.json

A common use-case is having different HIT type properties for the live site versus the sandbox,
since test accounts on the sandbox often don't have high enough qualifications to work the HITs.

This feature would enable users to define, in their definition directories, separate HIT type
properties for the different amti environments.

Feature Request: Skip item during review instead of just y/n

If you want to think about an option before deciding if it's rejection worthy it'd be nice to be able to go through the rest of this HITs first. Maybe also worth adding a way to mark that maybe even if you accept, this is an item you'd later like to remove from your dataset?

Using `--verbose`

I had to spend some time in the code to figure out that this is an invalid use of --verbose:

$ amti create-batch mturk-specs/definition-likert-prediction-pair file.jsonl . --live --verbose
Usage: amti create-batch [OPTIONS] DEFINITION_DIR DATA_PATH SAVE_DIR
Try 'amti create-batch --help' for help.

Error: no such option: --verbose

However, this is the right way of using it:

$ amti --verbose create-batch mturk-specs/definition-likert-prediction-pair file.jsonl . --live

You may wanna clarify it in the readme (or amend the CLI to support the first usage).

Enhance the HIT preview server

Currently, the HIT preview server simply displays the rendered HITs.

The following enhancements could provide a better user experience:

Add an index page at the /hits/ URL, linking to all of the HITs.
render the HITs in an iframe similarly to how they appear on Mechanical Turk.
Add navigation between the HITs (next, previous, and home buttons).
Add preview and accept modes to the HITs, similarly to the Mechanical Turk site.
Extend the preview server to also render ExternalQuestion HITs.

For implementing 4, it appears that the difference between preview and accept modes for the HITs is the presence of a query parameter, assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE, in the URL used by the iframe [1] [2].

Python 3.8 issue: `cannot import name 'actions' from partially initialized module 'amti'

I am getting this upon calling amti

% amti
Traceback (most recent call last):
  File "/usr/local/bin/amti", line 9, in <module>
    from amti import clis
  File "/Users/danielk/Library/Python/3.8/lib/python/site-packages/amti/__init__.py", line 3, in <module>
    from amti import (
ImportError: cannot import name 'actions' from partially initialized module 'amti' (most likely due to a circular import) (/Users/danielk/Library/Python/3.8/lib/python/site-packages/amti/__init__.py)

show the amount of credit in the account

would be very useful!

Add automated tests

Until now, amti's testing has been entirely manual. There have been three reasons for this
decision:

amti began as a tool I built for myself. Originally, I wanted to demo the idea of
reproducible, version controlled crowdsourcing pipelines and to make my own
crowdsourcing research reproducible. I open sourced amti so that others could run my
pipelines, to share the idea of reproducible crowdsourcing, and in case people found it
helpful.
amti is still in initial development (has not had a 1.0 release), and manual testing means
less effort expended on maintaining automatic tests.
Most good tests for amti require mocking the MTurk APIs or running against the MTurk
sandbox; so, good tests are more work to write than in other situations (and thus it's more
valuable to keep testing burden low during initial development).

That said, as adoption grows it's more important to ensure amti's reliability. Similarly,
amti needs high quality automated tests before any possible 1.0 release.

Since amti will still undergo some major refactoring before 1.0 (see Issue #24
for example), it's worth discussing tests people plan to write here beforehand, to avoid
wasting effort.

Here are my thoughts on how to increase test coverage:

amti.utils contains lots of small utilities that often don't require mocks and whose
APIs are unlikely to change. They can be tested first with Python's unittest module.
Other parts of the CLI work only locally (e.g., amti.actions.extraction), don't require
mocks, and probably won't change much. They're also good candidates for initial tests.
Mocked tests help local development because they're fast and don't require a network
connection; however, tests hitting the worker sandbox provide the most assurance
that the code works correctly. We should focus on tests against the worker sandbox
over mocked tests.
Tests against the worker sandbox should be run infrequently (i.e. after committing or
when merging a PR) and thus need to be separated from local unit tests used for quick, frequent feedback during development.

KeyError: 'ApprovalTime' when doing amti extract tabular in example

When I follow along with the example, everything works great except when I try

amti extract tabular ./batch-447c17bb-3b6b-494a-a33d-dbdcd3382a35/ batch-data.jsonl

I get this error

2021-12-30 10:27:56,390:INFO:amti.actions.extraction.tabular:Beginning to extract batch 447c17bb-3b6b-494a-a33d-dbdcd3382a35 to tabular format.
Traceback (most recent call last):
  File "/Users/hschilli/anaconda/envs/petal_env/bin/amti", line 66, in <module>
    amti()
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/hschilli/Documents/Biomimicry Working Group/PeTaL/dev/trying-amti/amti/amti/clis/extraction/tabular.py", line 42, in tabular
    file_format=file_format)
  File "/Users/hschilli/Documents/Biomimicry Working Group/PeTaL/dev/trying-amti/amti/amti/actions/extraction/tabular.py", line 134, in tabular
    row['ApprovalTime'] = assignment['ApprovalTime']
KeyError: 'ApprovalTime'

[N|n]ot a git repository in amti/utils/log.py

amti/utils/log.py did not capture a git error I got : fatal: not a git repository, because it checks fatal: Not a git repository. (i.e., capital N).

amti/amti/utils/log.py

Line 90 in 1e156b0

elif b'fatal: Not a git repository' in process.stderr:

Getting the results of an incomplete batch

Sometimes we are in rush to get the results; so we're willing to skip a couple of incomplete HITs.
How can we save the results such that we don't get the following error?

2021-09-27 14:57:19,551:INFO:amti.actions.save:Finished saving HIT (ID: 3IHWR4LC7DBZ6PPKIOD7HQER66XI81).
Traceback (most recent call last):
  File "/Users/danielk/opt/anaconda3/bin/amti", line 68, in <module>
    amti()
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/amti/clis/save.py", line 40, in save_batch
    batch_dir=batch_dir)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/amti/actions/save.py", line 89, in save_batch
    f'HIT (ID: {hit_id}) has status "{hit_status}".'
ValueError: HIT (ID: 3QHITW7OYO7Q6B6ISU2UMJB8N4EAQ0) has status "Unassignable". In order to save a batch all HITs must have "Reviewable" status.

I suppose we can have a force flag which we bypass such errors:
https://github.com/allenai/amti/blob/master/amti/actions/save.py#L87-L91

Easy HIT Type Versioning

When developing HITs, it's common to change the title of successive versions so they can be easily distinguished in the sandbox. Adding a feature that'd allow the user to specify the version, add a unique version string to the title, or some other similar change without requiring an edit to the hittypeproperties.json file would make development smoother.

allenai / amti Goto Github PK

amti's Issues

Suggestion: add a command for anonymizing batch folder

Re-organize the CLI commands by topic

TypeError: write() argument must be str, not None in amti/actions/create.py

Environment-based hittypeproperties.json

Feature Request: Skip item during review instead of just y/n

Using `--verbose`

Enhance the HIT preview server

Python 3.8 issue: `cannot import name 'actions' from partially initialized module 'amti'

show the amount of credit in the account

Add automated tests

KeyError: 'ApprovalTime' when doing amti extract tabular in example

[N|n]ot a git repository in amti/utils/log.py

Getting the results of an incomplete batch

Easy HIT Type Versioning

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent