bmcfee / pumpp Goto Github PK

View Code? Open in Web Editor NEW

58.0 7.0 11.0 340 KB

practically universal music pre-processor

License: ISC License

Python 100.00%

python music machine-learning nyucds

pumpp's Introduction

pumpp

practically universal music pre-processor

pumpp up the jams

The goal of this package is to make it easy to convert pairs of (audio, jams) into data that can be easily consumed by statistical algorithms. Some desired features:

Converting tags to sparse encoding vectors
Sampling (start, end, label) to frame-level annotations at a specific frame rate
Extracting input features (eg, Mel spectra or CQT) from audio
Converting between annotation spaces for a given task

Example usage

>>> import jams
>>> import pumpp

>>> audio_f = '/path/to/audio/myfile.ogg'
>>> jams_f = '/path/to/annotations/myfile.jamz'

>>> # Set up sampling and frame rate parameters
>>> sr, hop_length = 44100, 512

>>> # Create a feature extraction object
>>> p_cqt = pumpp.feature.CQT(name='cqt', sr=sr, hop_length=hop_length)

>>> # Create some annotation extractors
>>> p_beat = pumpp.task.BeatTransformer(sr=sr, hop_length=hop_length)
>>> p_chord = pumpp.task.SimpleChordTransformer(sr=sr, hop_length=hop_length)

>>> # Collect the operators in a pump
>>> pump = pumpp.Pump(p_cqt, p_beat, p_chord)

>>> # Apply the extractors to generate training data
>>> data = pump(audio_f=audio_f, jam=jams_fjams_f)

>>> # Or test data
>>> test_data = pump(audio_f='/my/test/audio.ogg')

>>> # Or in-memory
>>> y, sr = librosa.load(audio_f)
>>> test_data = pump(y=y, sr=sr)

pumpp's People

Contributors

Stargazers

Watchers

Forkers

keunwoochoi connectthefuture justinsalamon tomxi beasteers ahmad-abdellatif astrowizzy hio1337

pumpp's Issues

Asking for samples that are the same duration as the data fails

If your data has n_frames and you ask for samples that are n_frames long:

pump.sampler(max_samples, n_frames, random_state=seed)

It will crash, because of this line: since you can't call randint(0,0).
Beyond crashing, I think there's an off-by-one error here: if the data is of size D and you ask for samples of N frames, then the range of valid start indices is [0, D-N] (including D-N), not [0, D-N).

Suggested fix: update line 139 in sample.py as such:

yield self.rng.randint(0, duration - self.duration + 1)

Automatic tag vocabularies

Blocked until marl/jams#119 is merged and released

In tag transformers, when labels==None, the vocabulary can be retrieved from jams by a query to the schema. Note that this only works for finite vocab sets, so tag_open will fail, but it's better than nothing.

add rng state to sampler

meta-information for model building

Both feature extractors and task transformers will produce data of particular shapes and types. This information is necessary when constructing models, eg in tensorflow or theano.

For each output, there should be a corresponding info field that records its dtype and shape (with None for variable-length) in the transformer or extractor object.

This might require factoring out some of the name scoping logic to a mixin class that can be used in both BaseTaskTransformer and BaseFeatureExtractor.

chord-as-label transformer

Implement a task transformer that 1-hot encodes chords as multi-class labels.

harmonic cqt

Standardize parameter order for tasktransformers

Tags are (namespace, name), vector is (name, namespace).

They should all have name as the first argument.

Option to add channel dimension for feature transformers, and theano/tf mode

The current implementation produces data of shape (1, n_frames, n_features). This is appropriate for 1d-convolution models, but does not work for 2d-convolution.

We should have an option to support 2d outputs that would produce data of shape (1, n_frames, n_features, 1).

Additionally, the above assumes tensorflow dimension ordering. A flag for theano ordering would also be useful to have:

tensorflow: (1, n_frames, n_features, 1)
theano: (1, 1, n_frames, n_features)

Worked example in the documentation

Just to get things off the ground.

Expand chord_tag

We could support roots and fifths easily.

Inverse task transformations

Automatically going from jams to tensors is great and all, but at the end of the day, we should be able to invert the transformations as well.

This way, you can audio -> pumpp -> model -> pumpp -> jams.

The task transformers maintain enough state that this should be possible and easy to do.

Key-mode, structured key-mode

Similar to the chord transformers, there should be task transformers for key/mode estimation.

For consistency, we should also provide a structured key encoding that translates key and mode names into (root, pitch classes).

Make our own label vectorizer

Description

pumpp currently depends on three classes from sklearn: LabelEncoder, LabelBinarizer, and MultiLabelBinarizer.

While there's an implicit sklearn dependency via librosa, using sklearn objects within pumpp presents a challenge for serialization. It would be best if we just reimplement these classes locally to minimize issues with cross-version serialization.

Make it easy to apply transformers to unlabeled data

Right now, you can fake this by making an empty JAMS object and calling transform, but we could sugar this up:

>>> data_unlabeled = pumpp.transform(audio=audio_file, p_cqt, p_tag)
>>> data_labeled = pumpp.transform(audio=audio_file, jams=jams_file, p_cqt, p_tag)

Refactor sampler

To abstract out the next-state logic

Meaningful repr for pump objects

Description

It would be useful if repr(Pump) showed the operator map directly.

Allow sampling with duration smaller than target

Description

The Sampler class is designed to produce consistently-shaped subsamples of data for convenient batching. However, sometimes we don't need batching, and it would be nice to have a flag that allows samples of shape at most the target duration.

This should not be the default behavior, but it should be an option.

Sequential sampler

It'd be useful to have a sampler that generates patches in order, optionally with a stride.

Support in-memory input

Currently, the core transform function (and its object-wrapped version, Pump.transform operate on on-disk files. Sometimes you want to just process things in-memory though.

This may take some refactoring and/or API changes.

Namespace validation in task transformers

TaskTransformers need the namespace field to be valid for auto-conversion to work. We should add a check for this in BaseTaskTransformer.

Deprecate transform function

We may as well move entirely to the object API.

Feature modules

Automatic confidence in task inversion

Description

The prediction inverters / jams converters could populate the confidence field of the jams annotations.

They don't currently, but it would be easy to do so.

Naming conventions for extractors and transformers

Currently, the name field of task transformers is inserted into the dictionary keys and separated with _ characters.

It might be better to adopt a unix/tensorflow-style convention where transformer names are optional, but separated by / for easier grouping.

Auto crop in pump object

We could easily detect time axes within transform and slice down to the common index set.

Non-uniform time-sampling

Some applications call for non-uniform time sampling, eg, beat-synchronous features.

This is fairly involved to implement, since it requires both an estimator (beat tracker) and a post-processing of all dynamic observations, including both features and task outputs.

I think this can all be implemented with the following:

Introduce a new base class, Timing, which processes audio like a FeatureExtractor, but returns an array of non-overlapping time intervals.
In the FeatureExtractor and TaskTransformer objects, implement the following methods:
- time_to_frames(times) which wraps librosa's implementation with the object's frame rate
- resample(data, times) which resamples all of the entries of data generated by that object according to the specified time intervals. The results are returned as a new dict with the resampled data. In general, different analyzers will have different methods of summarizing observations within an interval, which is why each object needs to implement its own resampler.
In the pump object, add a new method to add a timing object. During transform, the timing is applied to each output as it is generated. The resampled versions are scoped under {timing.name}/ORIGNAL_SCOPE. A flag in the pump object can be set to retain the original analyses, or discard them in favor of the resampled ones. The timing intervals are stored as {timing.name}/_intervals.
Inverse transforms get tricky. I think the easiest way to accomplish this is to add a times= or intervals= parameter to the task inverters, which can bypass the frame-rate conversions when generating the jams annotation times.

beat phase task

decode beat+downbeat -> intervals of beat phase (1/4, 2/4, 3/4, 4/4)

BeatPosition transformer not correctly back-filling classes

The BeatPosition transformer is leaving gaps for frames outside the annotation range. This should be an easy fix.

Detect when input data is too short for the sampler

Description

It would be nice to have pumpp be able to detect when input data is too short for the sampler to avoid ValueError.

top-level pump-to-layers constructor

Just a convenience function to make input layers for all feature transformers contained in a pump

Handle partial annotations

JAMS allows annotations to be valid over only a specified interval. We should support that in the mask calculation.

extend pump.sampler interface

Or at least expose all the args and kwargs

Sampler should gracefully handle multiple observations

If a data dict has multiple entries for a particular field, say when a JAMS file contains multiple annotations of the same type, the sampler should select one at random.

Segment agreement task

This is relatively straightforward: generate an n*n segment label agreement matrix for each (flat) segment annotation.

Core routines

transform
save (-> npz, hdf5, pickle)
load

Custom exceptions

VectorTransformer can raise a RuntimeException. This should be properly encapsulated.

option for sparse outputs on some tasktransformers

Such as, well, all of them except maybe regression.

keras layer constructors

It would be nice to have an interface for features transformers to produce keras layers.

Currently, you end up doing some gross boilerplate like:

x = Input(shape=(PATCH_SIZE, p_cqt.fields['cqt/mag'].shape[1], 1),
          name='cqt/mag',
          dtype=p_cqt.fields['cqt/mag'].dtype)

Likewise for task transformers, though the logic is a bit trickier

y = Convolution1D(p_chord.fields['chord/chord'].shape[1], 1, activation='softmax')(rs)

Feature transformer: time-series

Description

We should have a feature transformer for simple time-series.

This mainly matters for sample rate consistency and stereo->mono conversion, but it would be easy to do.

connect Pump to Sampler

instead of having to say Sampler(..., *P.ops), we should be able to just pull in a Pump object.

Allow soft inputs to inverse transforms

It would be nice to support soft inputs to inverse task transformers, such as the output of a trained model.

For tagging problems, usually model outputs have a likelihood for each tag that needs to be thresholded to make decisions.

If the tags are non-competing (multi-label), the threshold can just be 0.5.

If the tags are mutually exclusive, then the argmax should be taken.

This can be detected at runtime by whether the inputs are binary (bool type) or continuous (float type).

Annotation filtering

We sometimes have multiple annotations for a given task within the same jams file. Currently, the task transformers simply access a random one matching the namespace filter.

Instead, we should allow the user to specify additional filtering criteria when selecting annotations.

Index errors for event annotations

I'm seeing some index errors at the very edge of some annotation transformations, where the last event occurs at the final frame:

IndexError: index 8223 is out of bounds for axis 0 with size 8223

Probably we need to pad out by a frame here.

Futureproof against jams 0.3

This will mostly require some retrofitting of annotation parsing to use the observation interface.

Fix alignment error in HCQT

HCQT sometimes results in a different number of frames for the different harmonic indices.

The HCQT module should slice down to the minimum duration across harmonics.

OR, more generally, we should have a global facility for determining the exact number of frames for all transformers given the sr/hop-length, and always fix the length accordingly.

Pump operator index

When collecting features or tasks in a Pump object, it would be nice to have a handle on the operators by name. For example, we should be able to do something like:

>>> F = pumpp.feature.CQTMag(name='cqt', sr=22050, hop_length=512)
>>> P = pumpp.Pump(F)
>>> P['cqt'].fields
{'cqt/mag': Tensor(shape=(None, 288), dtype=<class 'numpy.float32'>)}

Since all operators have a name, this should be easy. We can also catch duplicated names and throw an exception.

Multi-annotation policy

When multiple annotations match the target namespace, we currently select one at random. This made sense previously, when transformation was done quasi-dynamically (eg, at run time), but makes less sense in the context of a static pre-processor.

Instead, we should allow all feasible annotations to be transformed, and collected as an array of dicts or a tensor of output values.

This would imply a one-to-many mapping for any given task, which would need to be supported explicitly in a downstream model. (Alternatively, the sampler module could implement a random selection policy, but this would happen downstream of conversion.)

Interface:

Patch duration
- None = full track
Multi-annotation policy
- random sample for each patch (flatten the annotation index)
- all annotations
Keys
- if None, return all
- else, filter to only sample '{key}/*' from the data dict, for each selected key
n_samples

Partial annotation policies?

Only return slices where all annotations are annotated?
- does not scale / hard to make work with multiple annotations
Dynamic masking
- BaseTaskTransformer should record the valid range (in frames) for the annotation, not the mask. Empty annotations have a valid range of [0,0).
- Whenever a patch is sampled, compute the overlap of the sample index with the valid range, and use a threshold to determine mask.

bmcfee / pumpp Goto Github PK

pumpp's Introduction

pumpp

pumpp up the jams

Example usage

pumpp's People

Contributors

Stargazers

Watchers

Forkers

pumpp's Issues

Description

Description

Description

Description

Description

Description

Interface:

Partial annotation policies?

Recommend Projects

Recommend Topics

Recommend Org