Git Product home page Git Product logo

amti's Introduction

amti

A Mechanical Turk Interface (amti)

amti is a CLI for Mechanical Turk that emphasizes the ability to quickly iterate on and run reproducible crowdsourcing experiments.

Design and deploy HITs to Amazon Mechanical Turk in a way that:

  1. allows HIT definitions to be tracked in version control.
  2. can manage and generate batches of HITs from JSON data.
  3. stores the results from HITs in a structured format on disk or in the cloud.

To get started as a user, see Installation and Quickstart below. To develop amti see Development Setup and Contributing.

Installation

amti requires Python 3.6. To install amti, currently you should just install from source:

pip install git+https://github.com/allenai/amti#egg=amti

amti is now on the path of whatever environment you installed the package into.

Additionally, you'll need to make sure that your environment is setup to use Mechanical Turk through boto3 as described in this tutorial.

Quickstart

In this section, we'll walk through an example use case of amti to run a batch of HTMLQuestion HITs on Mechanical Turk. If you'd like more background information beforehand, feel free to jump to the Concepts section.

amti comes built-in with two examples, an HTMLQuestion and an ExternalQuestion. The HTMLQuestion is a form written in HTML that MTurk hosts for you. The ExternalQuestion provides an iframe to your website that eventually posts back to an MTurk endpoint. Find these examples in examples directory.

We'll walk through the HTMLQuestion example. All requests are made to the Mechanical Turk sandbox unless you pass the --live flag, so feel free to run this example without worrying about posting to the live site.

  1. Create your batch by running:

    amti create-batch examples/html-question/definition examples/html-question/data.jsonl .
    

    This command creates a batch directory using the definition in example/html-question/definition and data in examples/html-question/data.jsonl. It saves the batch directory into our current directory (.) and uploads HITs for it to MTurk.

    You can see that the batch has been created in the current directory by running ls.

  2. Now, check the status of your batch by running:

    amti status-batch batch-*
    

    Where batch-* just needs to match the path to the batch directory that was created.

  3. Find and fill out the example HITs on MTurk in the worker sandbox. You'll want to search for the HITs using your requester account's user name.

  4. View the status of your now completed HITs with:

    amti status-batch batch-*
    
  5. (optional) If you want to cancel all the HITs in the batch:

    amti expire-batch batch-*
    
  6. Since your batch is now ready for review, review it with:

    amti review-batch batch-*
    
  7. Once you've approved all of your HITs, you can save the results from the batch:

    amti save-batch batch-*
    

    If you go into the batch directory, you'll now notice that it has a new results subdirectory with the information on your HITs and assignments.

  8. Since you've saved your batch, dispose of it from the MTurk site using:

    amti delete-batch batch-*
    

    This action will delete the batch from MTurk so that it doesn't pop up when you examine your open HITs; however, it leaves your batch directory intact and unchanged.

  9. Lastly, you can extract data from all the assignments you've saved into your batch directory using any of the extract commands. To view the available extraction formats, pass the --help option:

    $ amti extract --help
    Usage: amti extract [OPTIONS] COMMAND [ARGS]...
    
      Extract data from a batch to various formats.
    
      See the subcommands for extracting batch data into a specific format.
    
    Options:
      -h, --help  Show this message and exit.
    
    Commands:
      tabular  Extract data from BATCH_DIR to OUTPUT_PATH in a tabular format.
      xml      Extract XML data from assignments in BATCH_DIR to OUTPUT_DIR.
    

    The tabular command will extract the batch's data into an easy to work with tabular format:

    amti extract tabular batch-* batch-data.jsonl
    

    For real workflows, it would be a good idea to use the batch id in the name of the output file.

Now you've run a small HIT and have the results in a reproducible format. It's easy to tar up and upload the batch directory to the cloud where you can store information from many such HITs.

When developing your own HTMLQuestion HITs, you may want to preview them locally before uploading to Mechanical Turk, with the preview-batch command:

amti preview-batch /path/to/definition/directory /path/to/data/file

Overview

amti may be used both as a command line interface for working with Mechanical Turk as well as a library for scripting on top of Mechanical Turk.

First, we'll discuss the major Concepts you'll want to know when working with amti, then we'll describe the CLI, and lastly we'll talk about using amti as a library.

Concepts

Mechanical Turk Concepts

The following are the major Mechanical Turk concepts. Most concepts correspond to a resource endpoint in their ReSTful API. We've linked each concept to the relevant endpoint's documentation, where available:

  • HIT: A HIT (Human Intelligence Task) corresponds to a task that a Turker can perform. Usually, a HIT ends up being an HTML form that the Turker can submit. An example HIT could be "enter three tags for this image", where an image is also displayed on a web page. Note that a HIT is a specific task, not a kind of task. So, image labeling is not a HIT; but rather, labeling a specific image would be a HIT.
  • HIT Type: Because many HITs will be similar, there's a notion of HIT Type which describes a group of HITs. In particular, HIT Types have descriptions of the task, define a reward amount, title for the task, and other properties that are generally shared across multiple HITs.
  • Assignment: Often it's desirable to have a HIT completed multiple times by different people. An assignment is a single opportunity to complete a HIT, and a crowdworker can't take multiple assignments for one HIT. So, an image that should be labeled by 3 people would be posted as one HIT with 3 assignments.

amti Concepts

Mechanical Turk has some features that support creating batches of HITs; however, they're not particularly well developed and aren't modeled by the API. amti's key concept is that of a batch. A batch is a collection of HITs generated from some data using a template.

amti represents batches as directories with the following structure:

batch-$BATCHID : root directory for the batch
|- README : a text file for developers about the batch
|- COMMIT : the commit SHA for the code that generated the batch
|- BATCHID : a random UUID for the batch
|- definition : files defining the HIT / HIT Type
|  |- NOTES : any notes for developers that go along with the task
|  |- question.xml.j2 : a jinja2 template for the HITs' question
|  |- hittypeproperties.json : properties for the HIT Type
|  |- hitproperties.json : properties for the HIT
|- data.jsonl : data used to generate each HIT in the batch
|- results : results from the HITs on the MTurk site
|  |- hit-$ID : results for a single HIT from the batch
|  |  |- hit.jsonl : data about the HIT from the MTurk site
|  |  |- assignments.jsonl : results from the assignments
|  |- ...

To create a batch, write a batch definition (see the example batch definition), create some data in the JSON Lines format, and then create the batch using amti create-batch. Use the -h option for details. You can find some example data in the data.jsonl file.

To check on the batch's status, use amti status-batch. Once the batch has been fully worked by Turkers, you can manually review their work with amti review-batch. After approving or rejecting all the HITs in the batch, you can save the batch to disk with amti save-batch. Finally, after saving the batch, you can delete all of its HITs with amti delete-batch. Again, use -h for details.

Command Line Interface

To use amti as a CLI for Mechanical Turk, install amti and then call it by typing amti at the command line:

$ amti --help
Usage: amti [OPTIONS] COMMAND [ARGS]...

  A Mechanical Turk Interface: a CLI for MTurk.

Options:
  -v, --verbose  Set log level to DEBUG.
  -h, --help     Show this message and exit.

Commands:
  associate-qual            Associate workers with a qualification.
  block-workers             Block workers by WorkerId.
  create-batch              Create a batch of HITs using DEFINITION_DIR and...
  create-qualificationtype  Create a Qualification Type using...
  delete-batch              Delete the batch of HITs defined in BATCH_DIR.
  disassociate-qual         Disassociate workers with a qualification.
  expire-batch              Expire all the HITs defined in BATCH_DIR.
  extract                   Extract data from a batch to various formats.
  notify-workers            Send notification message to workers.
  preview-batch             Preview a batch of rendered HITs using...
  review-batch              Review the batch of HITs defined in BATCH_DIR.
  save-batch                Save results from the batch of HITs defined in...
  status-batch              View the status of the batch of HITs defined in...
  unblock-workers           Unblock workers by WorkerId.

The CLI is self-documenting and hierarchical, so you should be able to find anything you might need by starting from the top and using the -h option.

Library

To use amti as a library, pay attention to the two main subpackages:

  • amti/actions: Functions implementing all the actions used by amti.
  • amti/clis: CLIs for many of the actions used by amti. All CLI components are implemented using click and so can be reused in other applications.

Development Setup

This setup guide assumes you have pyenv, pyenv-virtualenv, and direnv installed on your machine.

From the root of this repo, create a python environment for amti and install the dependencies:

pyenv install 3.6.4
pyenv virtualenv 3.6.4 amti
echo 'amti' > .python-version
pip install -r requirements.txt

Then, make sure that you have the proper AWS environment variables set in your .envrc file for this repo. In particular, you should have values for either:

AWS_ACCESS_KEY_ID
AWS_SECRET_KEY
AWS_SECRET_ACCESS_KEY

or

AWS_PROFILE

That correspond to your Mechanical Turk account.

Contributing

amti is licensed under Apache 2.0. Feel free to fork the project or do whatever you like under the terms of that license.

For inquiries about the project, please file a GitHub issue. If you find a bug or error in the code, pull requests are strongly preferred.

amti's People

Contributors

dependabot[bot] avatar gabrielstanovsky avatar jbragg avatar jonchang avatar keisks avatar nalourie-ai2 avatar orionw avatar rob-dalton avatar xksteven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

amti's Issues

KeyError: 'ApprovalTime' when doing amti extract tabular in example

When I follow along with the example, everything works great except when I try

amti extract tabular ./batch-447c17bb-3b6b-494a-a33d-dbdcd3382a35/ batch-data.jsonl

I get this error

2021-12-30 10:27:56,390:INFO:amti.actions.extraction.tabular:Beginning to extract batch 447c17bb-3b6b-494a-a33d-dbdcd3382a35 to tabular format.
Traceback (most recent call last):
  File "/Users/hschilli/anaconda/envs/petal_env/bin/amti", line 66, in <module>
    amti()
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/hschilli/anaconda/envs/petal_env/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/hschilli/Documents/Biomimicry Working Group/PeTaL/dev/trying-amti/amti/amti/clis/extraction/tabular.py", line 42, in tabular
    file_format=file_format)
  File "/Users/hschilli/Documents/Biomimicry Working Group/PeTaL/dev/trying-amti/amti/amti/actions/extraction/tabular.py", line 134, in tabular
    row['ApprovalTime'] = assignment['ApprovalTime']
KeyError: 'ApprovalTime'

Feature Request: Skip item during review instead of just y/n

If you want to think about an option before deciding if it's rejection worthy it'd be nice to be able to go through the rest of this HITs first. Maybe also worth adding a way to mark that maybe even if you accept, this is an item you'd later like to remove from your dataset?

Python 3.8 issue: `cannot import name 'actions' from partially initialized module 'amti'

I am getting this upon calling amti

% amti
Traceback (most recent call last):
  File "/usr/local/bin/amti", line 9, in <module>
    from amti import clis
  File "/Users/danielk/Library/Python/3.8/lib/python/site-packages/amti/__init__.py", line 3, in <module>
    from amti import (
ImportError: cannot import name 'actions' from partially initialized module 'amti' (most likely due to a circular import) (/Users/danielk/Library/Python/3.8/lib/python/site-packages/amti/__init__.py)

Getting the results of an incomplete batch

Sometimes we are in rush to get the results; so we're willing to skip a couple of incomplete HITs.
How can we save the results such that we don't get the following error?

2021-09-27 14:57:19,551:INFO:amti.actions.save:Finished saving HIT (ID: 3IHWR4LC7DBZ6PPKIOD7HQER66XI81).
Traceback (most recent call last):
  File "/Users/danielk/opt/anaconda3/bin/amti", line 68, in <module>
    amti()
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/amti/clis/save.py", line 40, in save_batch
    batch_dir=batch_dir)
  File "/Users/danielk/opt/anaconda3/lib/python3.7/site-packages/amti/actions/save.py", line 89, in save_batch
    f'HIT (ID: {hit_id}) has status "{hit_status}".'
ValueError: HIT (ID: 3QHITW7OYO7Q6B6ISU2UMJB8N4EAQ0) has status "Unassignable". In order to save a batch all HITs must have "Reviewable" status.

I suppose we can have a force flag which we bypass such errors:
https://github.com/allenai/amti/blob/master/amti/actions/save.py#L87-L91

Enhance the HIT preview server

Currently, the HIT preview server simply displays the rendered HITs.

The following enhancements could provide a better user experience:

  1. Add an index page at the /hits/ URL, linking to all of the HITs.
  2. render the HITs in an iframe similarly to how they appear on Mechanical Turk.
  3. Add navigation between the HITs (next, previous, and home buttons).
  4. Add preview and accept modes to the HITs, similarly to the Mechanical Turk site.
  5. Extend the preview server to also render ExternalQuestion HITs.

For implementing 4, it appears that the difference between preview and accept modes for the HITs is the presence of a query parameter, assignmentId=ASSIGNMENT_ID_NOT_AVAILABLE, in the URL used by the iframe [1] [2].

Re-organize the CLI commands by topic

The amti CLI has grown a number of commands (14 by my current count). Grouping the
commands hierarchically into several topics would present a friendly help interface to
users.

I'd tentatively suggest grouping the commands into qualification, batch, and worker
groups. If valuable, the new organization can be discussed further on this issue. Also, after
grouping the commands, it might be helpful to rename some of them to eliminate
redundancy. For example:

  • amti create-batch to amti batch create
  • amti status-batch to amti batch status
  • amti delete-batch to amti batch delete
  • etc.

Add automated tests

Until now, amti's testing has been entirely manual. There have been three reasons for this
decision:

  1. amti began as a tool I built for myself. Originally, I wanted to demo the idea of
    reproducible, version controlled crowdsourcing pipelines and to make my own
    crowdsourcing research reproducible. I open sourced amti so that others could run my
    pipelines, to share the idea of reproducible crowdsourcing, and in case people found it
    helpful.
  2. amti is still in initial development (has not had a 1.0 release), and manual testing means
    less effort expended on maintaining automatic tests.
  3. Most good tests for amti require mocking the MTurk APIs or running against the MTurk
    sandbox; so, good tests are more work to write than in other situations (and thus it's more
    valuable to keep testing burden low during initial development).

That said, as adoption grows it's more important to ensure amti's reliability. Similarly,
amti needs high quality automated tests before any possible 1.0 release.

Since amti will still undergo some major refactoring before 1.0 (see Issue #24
for example), it's worth discussing tests people plan to write here beforehand, to avoid
wasting effort.

Here are my thoughts on how to increase test coverage:

  • amti.utils contains lots of small utilities that often don't require mocks and whose
    APIs are unlikely to change. They can be tested first with Python's unittest module.
  • Other parts of the CLI work only locally (e.g., amti.actions.extraction), don't require
    mocks, and probably won't change much. They're also good candidates for initial tests.
  • Mocked tests help local development because they're fast and don't require a network
    connection; however, tests hitting the worker sandbox provide the most assurance
    that the code works correctly. We should focus on tests against the worker sandbox
    over mocked tests.
  • Tests against the worker sandbox should be run infrequently (i.e. after committing or
    when merging a PR) and thus need to be separated from local unit tests used for quick, frequent feedback during development.

Environment-based hittypeproperties.json

A common use-case is having different HIT type properties for the live site versus the sandbox,
since test accounts on the sandbox often don't have high enough qualifications to work the HITs.

This feature would enable users to define, in their definition directories, separate HIT type
properties for the different amti environments.

Easy HIT Type Versioning

When developing HITs, it's common to change the title of successive versions so they can be easily distinguished in the sandbox. Adding a feature that'd allow the user to specify the version, add a unique version string to the title, or some other similar change without requiring an edit to the hittypeproperties.json file would make development smoother.

Using `--verbose`

I had to spend some time in the code to figure out that this is an invalid use of --verbose:

$ amti create-batch mturk-specs/definition-likert-prediction-pair file.jsonl . --live --verbose
Usage: amti create-batch [OPTIONS] DEFINITION_DIR DATA_PATH SAVE_DIR
Try 'amti create-batch --help' for help.

Error: no such option: --verbose

However, this is the right way of using it:

$ amti --verbose create-batch mturk-specs/definition-likert-prediction-pair file.jsonl . --live 

You may wanna clarify it in the readme (or amend the CLI to support the first usage).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.