contextlab / quail Goto Github PK

View Code? Open in Web Editor NEW

21.0 7.0 10.0 28.67 MB

A python toolbox for analyzing and plotting free recall data

Home Page: http://cdl-quail.readthedocs.io/en/latest/

License: MIT License

Python 98.52% Makefile 0.04% TeX 1.45%

memory psychology-experiments speech-to-text python-toolbox

quail's Introduction

Overview

Quail is a Python package that facilitates analyses of behavioral data from memory experiments. (The current focus is on free recall experiments.) Key features include:

Serial position curves (probability of recalling items presented at each presentation position)
Probability of Nth recall curves (probability of recalling items at each presentation position as the Nth recall in the recall sequence)
Lag-Conditional Response Probability curves (probability of transitioning between items in the recall sequence, as a function of their relative presentation positions)
Clustering metrics (e.g. single-number summaries of how often participants transition from recalling a word to another related word, where "related" can be user-defined.)
Many nice plotting functions
Convenience functions for loading in data
Automatically parse speech data (audio files) using wrappers for the Google Cloud Speech to Text API

The intended user of this toolbox is a memory researcher who seeks an easy way to analyze and visualize data from free recall psychology experiments.

The toolbox name is inspired by Douglas Quail, the main character from the Philip K. Dick short story We Can Remember It for You Wholesale (the inspiration for the film Total Recall).

Try it!

Check out our repo of Jupyter notebooks.

Installation

To install quail in the recommended way, run:

pip install quail

This will install quail with basic functionality. To install with speech decoding dependencies (Note: you will still need to install ffmpeg manually on your computer since it is not pip installable. For instructions, see here):

pip install quail[speech-decoding]

For CDL users, you can install speech decoding and efficient learning capabilities like this:

pip install quail[speech-decoding, efficient-learning]

To install directly from this repo (not recommended, but you'll get the "bleeding edge" version of the code):

git clone https://github.com/ContextLab/quail.git

Then, navigate to the folder and type:

pip install -e .

(this assumes you have pip installed on your system)

This will work on clean systems, but if you encounter issues you may need to run:

sudo pip install --upgrade --ignore-installed -e .

Requirements

python>=3.6
pandas>=0.18.0
seaborn>=0.7.1
matplotlib>=1.5.1
scipy>=0.17.1
numpy>=1.10.4
future
pytest (for development)

If installing from github (instead of pip), you must also install the requirements: pip install -r requirements.txt

Documentation

Check out our readthedocs: here.

We also have a repo with example notebooks from the paper here.

Citing

Please cite as:

Heusser AC, Fitzpatrick PC, Field CE, Ziman K, Manning JR (2017) Quail: A Python toolbox for analyzing and plotting free recall data. The Journal of Open Source Software, 2(18) https://doi.org/10.21105%2Fjoss.00424

Here is a bibtex formatted reference:

@ARTICLE {HeusEtal2017b,
	doi = {10.21105/joss.00424},
	url = {https://doi.org/10.21105%2Fjoss.00424},
	year = 2017,
	publisher = {The Open Journal},
	volume = {2},
	number = {18},
	author = {Andrew C. Heusser and Paxton C. Fitzpatrick and Campbell E. Field and Kirsten Ziman and Jeremy R. Manning},
	title = {Quail: A Python toolbox for analyzing and plotting free recall data},
	journal = {The Journal of Open Source Software}
}

Contributing

(Some text borrowed from Matplotlib contributing guide.)

Submitting a bug report

If you are reporting a bug, please do your best to include the following:

A short, top-level summary of the bug. In most cases, this should be 1-2 sentences.
A short, self-contained code snippet to reproduce the bug, ideally allowing a simple copy and paste to reproduce. Please do your best to reduce the code snippet to the minimum required.
The actual outcome of the code snippet
The expected outcome of the code snippet

Contributing code

The preferred way to contribute to quail is to fork the main repository on GitHub, then submit a pull request.

If your pull request addresses an issue, please use the title to describe the issue and mention the issue number in the pull request description to ensure a link is created to the original issue.
All public methods should be documented on our readthedocs API page.
Each high-level plotting function should have a simple example in the examples folder. This should be as simple as possible to demonstrate the method.
Changes (both new features and bugfixes) should be tested using pytest. Add tests for your new feature to the tests/ repo folder.

Support

If you have a question, comment or concern about the software, please post a question to Stack Overflow, or send us an email at [email protected].

Testing

To test quail, install pytest (pip install pytest) and run pytest in the quail folder

Examples

See here for more examples.

Create an egg!

Eggs are the fundamental data structure in quail. They are comprised of lists of presented words, lists of recalled words, and a few other optional components.

import quail

# presented words
presented_words = [['cat', 'bat', 'hat', 'goat'],['zoo', 'animal', 'zebra', 'horse']]

# recalled words
recalled_words = [['bat', 'cat', 'goat', 'hat'],['animal', 'horse', 'zoo']]

# create egg
egg = quail.Egg(pres=presented_words, rec=recalled_words)

Analyze some data

#load data
egg = quail.load_example_data()

#analysis
analyzed_data = quail.analyze(egg, analysis='accuracy', listgroup=['average']*8)

Plot Accuracy

analyzed_data = quail.analyze(egg, analysis='accuracy', listgroup=['average']*8)
ax = quail.plot(analyzed_data, title='Recall Accuracy')

Plot Serial Position Curve

analyzed_data = quail.analyze(egg, analysis='spc', listgroup=['average']*8)
ax = quail.plot(analyzed_data, title='Serial Position Curve')

Plot Probability of First Recall

analyzed_data = quail.analyze(egg, analysis='pfr', listgroup=['average']*8)
ax = quail.plot(analyzed_data, title='Probability of First Recall')

Plot Lag-CRP

analyzed_data = quail.analyze(egg, analysis='lagcrp', listgroup=['average']*8)
ax = quail.plot(analyzed_data, title='Lag-CRP')

Plot Memory Fingerprint

analyzed_data = quail.analyze(egg, analysis='fingerprint', listgroup=['average']*8)
ax = quail.plot(analyzed_data, title='Memory Fingerprint')

quail's People

Contributors

Stargazers

Watchers

Forkers

andrewheusser mkclairhong afcarl r3dfruitrollup vishalbelsare yingxuetian jeremymanning ejolly tonydutcher

quail's Issues

template for temporal factor

The Kahana Lab's Behavioral Toolbox has MATLAB implementations of many of the Quail functions (and desired Quail functions), including the temporal factor calculation (temp_fact) and generalized clustering calculations (dist_fact).

Note: this is not really an "issue" per se; feel free to close once you've gotten what you wanted from the link. (@andrewheusser, you had mentioned wanting some reference code.)

add average recall and fingerprint to reports

Kernel density plots to visualize memory fingerprints

Needed: @KirstensGitHub add text describing how this works, and ideally share code

feature request: fingerprint analysis

integrate fingerprint analyses into pyrec. The code already exists in another repo. We can A) keep the repo separate and import the package and a dependency, or B) integrate it into pyrec and get rid of the other repo. thoughts?

Handle different number of lists per subject

The software will crash if there are different numbers of lists per subject. We should handle this somehow. For instance, if a subject data is lost for one list, it would still be nice to analyze their data...

functions vs methods of pyro object

Thinking about how users will interact with this package.. During our meeting last week, we decided to implement the package as a set of functions, instead of method attached to the pyro object.

pyro=Pyro(data)
x = spc(pyro)
spc_plot(x)

vs.

pyro=Pyro(data)
pyro.spc(listgroup=listgroup).plot()

#or even more concise
Pyro(data).spc(listgroup=listgroup).plot()

Still happy with the functions as it will be the most straight forward to implement, but wanted to make one last case for the object-oriented design before we get too far with the API :)

distribution of difference scores

difference between early and late lists, by experiment; follow up with unpaired t-test?

Plot handling nest lists of analyzed data

Analyze can return a nested list of analyzed data:

quail.analyze([egg1,egg2], analysis=['accuracy','spc'])
# returns
[[egg1_analyzed_acc, egg1_analyzed_spc],[egg2_analyzed_acc, egg2_analyzed_spc]]

The plot function should be able to handle these as well.

inspiration from beh_toolbox

this toolbox has "correct" implementations of many of the functions we want (although it's in MATLAB): http://memory.psych.upenn.edu/Behavioral_toolbox

Plot handling lists of kwargs

Plot can currently handle lists of analyzed data:

quail.plot([spc_data, pfr_data,...])

It could be convenient to also allow for lists of kwargs:

quail.plot([spc_data, pfr_data], title=['SPC','PFR'])

feature request: indexing flags

To facilitate doing analyses separated by experiment and list number, I propose adding two flags:

subjgroup provides a per-subject group label (e.g. labeling which experiment each subject participated in). analysis functions carry out analyses separately for each group, and plotting functions show a different curve for each group (e.g. in a different color). for example, this will allow us to easily do analyses for different experiments, in a single command.
listgroup provides a per-list label (independent of subjects). analysis functions carry out analyses separately for each listgroup (in addition to each subjgroup), and plotting functions show a different curve for each listgroup (e.g. in a different color). for example, this will allow us to easily do analyses for the first 8 vs last 8 lists, see how memory fingerprints change over the course of the experiment, and other similar sorts of things, also in a single command.

The way I'm imagining this would be implemented is to first divide the data based on subjgroup, passing in the listgroup flags to the function that carries out analyses for each subject. Then inside the subject-level analysis function, analyses would be performed separately on each listgroup.

An efficient way to set this up would be to have "general" analysis function that takes in a pyro object and an (analysis) function handle. The general analysis function would then have an outer loop over subjgroup and an inner loop over the listgroups for the current subjgroup, and would call the analysis function handle for actually doing the analysis on that subjgroup-listgroup piece of the data and then aggregrate the results in some way (e.g. by returning a new dataframe or pyro object).

Plotting could work similarly-- we could have a general plotting function that takes in a results object and a (plot) function handle and loops over subjgroup and listgroup, adding each piece of the data in the innermost loop (listgroup of the subjgroup) to the current plot (or aggregating the results in a way that could easily be plotted at the end).

refactor pyrec into 2 main functions: analyze and plot

example:

results = pyr.analyze(pyro, type='spc')
pyr.plot(results)

decode_speech function

this function would take a file or folder of audio files of free recall data and return the decoded results.

example:

rec_data = quail.decode_speech('/path/to/file.flac')

where rec_data is a list of words and

rec_data = quail.decode_speech(folder='/path/to/folder')

where rec_data is a list of lists of words.

feature request: pnr matrix analysis

related to #38, add a new function that creates a matrix version of the pnr for each output position

average memory performance bar chart

A bar chart that plots average free recall accuracy with error bars. You can use the Seaborn barplot. This should be able to handle listgroup and subjgroup arguments as well

add more tutorials

need a tutorial for the following:

+speech decoding (with ffmpeg/google cloud setup instructions)
+fingerprint tutorial
+crack egg (add to egg tutorial)
+recmat2egg (add to egg tutorial)

PLR; (last word == last presented word) OR (last word==last utterance)

Just checking in on a point Allison brought up a while back about whether the probability of last recall should take the final word spoken as the last word or if it should skip words not on the most recently presented list (effectively becoming probability of last correctly-recalled word)

couldn't remember if we had discussed previously

current: takes last utterance, irrespective of whether the word was on the previous list

neural features

for neuroimaging experiments, automatically extract neural features (power spectra, brain maps, etc.) and include these in each presentation/recall stimulus feature set.

other things to think about/implement:

infer the types of features from the data type
allow the user to pass in a feature extraction function to convert brain data to feature vectors

support continuous data

quail is currently optimized for list-learning experiments. however, the same general framework could be applied to continuous experiments, like movie viewing/recall, social interactions, etc. for example:

tag each stimulus and response with a feature vector
treat viewing/listening/etc. as "presentations" -- e.g. watching a movie or listening to a conversation is like studying a list of words
treat produced behaviors (speech, writing, quiz responses, brain responses) as recalls -- e.g. speech turns into feature vectors, which are treated like recalling words

the existing behavioral summaries (serial position curves, pnr curves, lag-CRPs, memory fingerprints) could be extended to work with continuous data by making the x-axis of these plots reflect time rather than stimulus number (or relative stimulus number).

package functions

Posting here so we can discuss and add to this, and maybe use the project board to keep track of progress?

feature request: pnr analysis

Extension of the pfr analysis, computes probability of recall over encoding positions for a word in the nth output position.

pfr should call the pnr function with n=1

temporal clustering score

estimate chunk size from recall

plot temporal cluster score as a function of chunk size?

feature request: condition kwarg for plot

it would be nice to have an easy way to plot a subset of conditions, for example:

quail.plot(analyzed_egg, conditions=['forward','backward']) #dont plot random condition

features request: egg.info() to quickly display info about the egg

This would be useful for getting a quick snapshot of an egg, such as the number of subjects, number of lists, number of words in each list, meta data etc.

documentation

It'd be nice to have some documentation (e.g. a readme file) with some simple examples. And maybe a Sphinx-generated API? Some ipython notebooks w/ simple example datasets would be cool too.

enhancement: tufte plot styling

we could use tufte-style slope plots (SPC, PFR, PNR, Lag-CRP) and bar plots (fingerprints) to beautify out plot styles

inspiration: https://github.com/juanshishido/tufte

feature request: input/output format

i think we should define a data object (e.g. recalls matrix + some additional info). then we can have two types of functions:

analysis functions-- take a data object and return a new data object
plot functions-- take a data object and return a plot

we could depend on hypertools for plotting trajectories and seaborn for other stuff

feature request: easy way to make pyrec object if recall matrix data already exists

aggregate report

side-by-side comparison of multiple or all experiments

feature request: custom analysis function

With our current model, I think it would be pretty easy to support custom list-wise analyses. For instance, the user could do something like this:

custom_analysis = lambda lst: [item+1 for item in lst]
pyr.analyze(pyro, analysis='custom', analysis_function=custom_analysis)

admittedly a stupid custom analysis but you get the point..think this would be desirable?

feature request: support remote datasets

would be nice to be able to remotely access data. taking some inspiration from pandas, this could be set up with a load function that parses local files as well as files given a web address

optional subjgroup and listgroup field to egg

When creating an egg, we could have these optional fields to make the syntax simpler for later functions. For example:

egg = quail.Egg(pres=pres, rec=rec, listgroup=['early']*8+['late']*8, subjgroup=['exp1']*10+['exp2']*8)

analyzed_data = quail.analyze(egg, analysis='spc')

the averaging would be inferred from the egg, unless listgroup/subjgroup are passed to the analysis function - in which case the egg groupings would be overwritten. This would allow users to save eggs with the conditions included.

feature request: easy way to deal with counterbalancing across conditions

just finished implementing support for nested listgroup kwarg, so that users can specify a unique listgroup for each subject. This will be useful when conditions are randomized/counterbalanced across lists over subjects.

It would be nice to have a shortcut for counterbalanced designs. I propose that we support listgroup dictionaries, something like:

listgroup = {
'cbs' : [['random']*4+['forward']*4],['forward']*4]+['random']*4]],
'subj_cb' : [0,1]
}

internally, this would generate a nested listgroup, but would be less code on the user end to set it up. thoughts?

name change?

turns out there is already a pyrec on pip... so we may want to think about another name to avoid confusion. @paxtonfitzpatrick ran into this recently, accidentally installing the wrong version of pyrec.

pyrecall has not been claimed...

weekly reports: simplifying syntax

there's a lot of repeated/copied code-- e.g. copying the same command for experiments 1 -- 6. instead, these should be written as loops. we can specify a list of labels to go with each experiment number, and then each command should loop through that list to create the plots.

BUG: fix the indices of the lagcrp analysis

The result is accurate, but the column names for the result of the lagcrp analysis should by -5, -4, -3, .... 3, 4, 5 etc instead of 0, 1, 2, etc.

feature request: easy way to save eggs, analyses, plots

just saving this to i remember to add, thoughts about the implementation are welcome

can we call this pyrec?

like pyrex, but singular and spelled wrong?

BUG: encoding lists can be very long (40 items) for some subjects

this may have to do with the data that we pass to pyro, or some strange bug inside the pyro code. I think its likely the former

extend "load" function to work with more data types

example here: https://github.com/ContextLab/quail/tree/loadmat
however, that code is messy, un-tested, and shouldn't be pulled into master

ideally we should support a wide variety of data formats, including everything here: http://memory.psych.upenn.edu/Data_Archive

for the writeup/release, i think we'll want to show examples from a variety of experiments to demonstrate the robustness of the toolbox

feature request: data loading functions

We should substantially fill out our data loading capabilities to support a range of data formats. We should also provide good documentation for writing parsers for arbitrary data formats.

These will be critical for convincing others to use our tools-- if they can't easily use them with their own datasets, they won't care about pyrec.

The best way to get a sense of which formats we'll want to support may be to talk to other labs...but at the very least we should discuss.

crack_egg positions argument not working

this is because most of the analyses require knowing about all the presented words, and the positions argument slices out a portion of the data....will have to think about a better way to implement this

split=True doesn't work for accuracy with multiple listgroups

account for list length when computing temporal clustering (maybe other fingerprint dims)?

List length can bias our measure of temporal clustering (pointed out to me by Karl Healey's talk at CEMS 2017). For instance, short lists will have higher cluster scores than long lists because when recalling short lists, the subject can only possibly recall nearby items.

To account for this, we could permute recall sequences and then measure where the 'real' clustering score falls with respect to this distribution.

Not clear to me whether this would also effect our other feature dimensions?

structure for the package

@KirstensGitHub, let's copy in all of the files from https://github.com/kennethreitz/samplemod, and put the files you've already added into a folder called pyrec. Then we can use that as a template, and get rid of anything we dont need

feature request: auto-transcribe audio files

I could see this working in two ways:

a function that takes an audio file and returns a transcript. this could be based on google speech, or our own speech model.
a higher level wrapper function that takes a group of encoding lists and audio files, and creates a pyro object for further analysis

feature request: plot a subset of positions

For the accuracy analysis (maybe others?), it would be convenient to be able to group different positions. For example, if we wanted to plot 1 bar each accuracy for early, middle and late positions, you'd have to do an spc, and then manually select the positions. we could have a posgroup argument that allows you to run a separate analysis for multiple groups of positions. e.g.

quail.analyze(egg, analysis='accuracy', listgroup=['average']*16, posgroup=['start']*3+['middle']*10+['end']*3)

feature request: memory models

it would be really cool to integrate some models into Quail at some point (e.g. second release)

for example:

rather than building a separately implemented mechanism into AdaptiveFR, we could call Quail functions to get the order of the next list
our current experiment uses clustering scores to order future lists
we could instead use arbitrary functions of prior data (e.g. models fit to prior data) to order those future lists

in addition, it'd be nice to have an easy way for people to fit some popular memory models to their data. (we could start with CMR or pTCM.) i'd love to also include marc howard's multi-timescale version of TCM, but that one is tricky to implement, so perhaps not a good place to start.

grid for subjects

add plot_type to plot function called grid to map plot over subjects

e.g.

quail.plot(analyzed_data, plot_type='grid')

set up travis CI tests

i think the current "build passing" badge on quail is actually linking to hypertools...