Git Product home page Git Product logo

pumpp's Introduction

pumpp

GitHub license CI codecov Documentation Status

practically universal music pre-processor

pumpp up the jams

The goal of this package is to make it easy to convert pairs of (audio, jams) into data that can be easily consumed by statistical algorithms. Some desired features:

  • Converting tags to sparse encoding vectors
  • Sampling (start, end, label) to frame-level annotations at a specific frame rate
  • Extracting input features (eg, Mel spectra or CQT) from audio
  • Converting between annotation spaces for a given task

Example usage

>>> import jams
>>> import pumpp

>>> audio_f = '/path/to/audio/myfile.ogg'
>>> jams_f = '/path/to/annotations/myfile.jamz'

>>> # Set up sampling and frame rate parameters
>>> sr, hop_length = 44100, 512

>>> # Create a feature extraction object
>>> p_cqt = pumpp.feature.CQT(name='cqt', sr=sr, hop_length=hop_length)

>>> # Create some annotation extractors
>>> p_beat = pumpp.task.BeatTransformer(sr=sr, hop_length=hop_length)
>>> p_chord = pumpp.task.SimpleChordTransformer(sr=sr, hop_length=hop_length)

>>> # Collect the operators in a pump
>>> pump = pumpp.Pump(p_cqt, p_beat, p_chord)

>>> # Apply the extractors to generate training data
>>> data = pump(audio_f=audio_f, jam=jams_fjams_f)

>>> # Or test data
>>> test_data = pump(audio_f='/my/test/audio.ogg')

>>> # Or in-memory
>>> y, sr = librosa.load(audio_f)
>>> test_data = pump(y=y, sr=sr)

pumpp's People

Contributors

beasteers avatar bmcfee avatar justinsalamon avatar tomxi avatar waldyrious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pumpp's Issues

Asking for samples that are the same duration as the data fails

If your data has n_frames and you ask for samples that are n_frames long:

pump.sampler(max_samples, n_frames, random_state=seed)

It will crash, because of this line: since you can't call randint(0,0).
Beyond crashing, I think there's an off-by-one error here: if the data is of size D and you ask for samples of N frames, then the range of valid start indices is [0, D-N] (including D-N), not [0, D-N).

Suggested fix: update line 139 in sample.py as such:

yield self.rng.randint(0, duration - self.duration + 1)

Automatic tag vocabularies

Blocked until marl/jams#119 is merged and released

In tag transformers, when labels==None, the vocabulary can be retrieved from jams by a query to the schema. Note that this only works for finite vocab sets, so tag_open will fail, but it's better than nothing.

meta-information for model building

Both feature extractors and task transformers will produce data of particular shapes and types. This information is necessary when constructing models, eg in tensorflow or theano.

For each output, there should be a corresponding info field that records its dtype and shape (with None for variable-length) in the transformer or extractor object.

This might require factoring out some of the name scoping logic to a mixin class that can be used in both BaseTaskTransformer and BaseFeatureExtractor.

Option to add channel dimension for feature transformers, and theano/tf mode

The current implementation produces data of shape (1, n_frames, n_features). This is appropriate for 1d-convolution models, but does not work for 2d-convolution.

We should have an option to support 2d outputs that would produce data of shape (1, n_frames, n_features, 1).

Additionally, the above assumes tensorflow dimension ordering. A flag for theano ordering would also be useful to have:

  • tensorflow: (1, n_frames, n_features, 1)
  • theano: (1, 1, n_frames, n_features)

Inverse task transformations

Automatically going from jams to tensors is great and all, but at the end of the day, we should be able to invert the transformations as well.

This way, you can audio -> pumpp -> model -> pumpp -> jams.

The task transformers maintain enough state that this should be possible and easy to do.

Key-mode, structured key-mode

Similar to the chord transformers, there should be task transformers for key/mode estimation.

For consistency, we should also provide a structured key encoding that translates key and mode names into (root, pitch classes).

Make our own label vectorizer

Description

pumpp currently depends on three classes from sklearn: LabelEncoder, LabelBinarizer, and MultiLabelBinarizer.

While there's an implicit sklearn dependency via librosa, using sklearn objects within pumpp presents a challenge for serialization. It would be best if we just reimplement these classes locally to minimize issues with cross-version serialization.

Make it easy to apply transformers to unlabeled data

Right now, you can fake this by making an empty JAMS object and calling transform, but we could sugar this up:

>>> data_unlabeled = pumpp.transform(audio=audio_file, p_cqt, p_tag)
>>> data_labeled = pumpp.transform(audio=audio_file, jams=jams_file, p_cqt, p_tag)

Allow sampling with duration smaller than target

Description

The Sampler class is designed to produce consistently-shaped subsamples of data for convenient batching. However, sometimes we don't need batching, and it would be nice to have a flag that allows samples of shape at most the target duration.

This should not be the default behavior, but it should be an option.

Sequential sampler

It'd be useful to have a sampler that generates patches in order, optionally with a stride.

Support in-memory input

Currently, the core transform function (and its object-wrapped version, Pump.transform operate on on-disk files. Sometimes you want to just process things in-memory though.

This may take some refactoring and/or API changes.

Feature modules

  • CQT
    • CQT magnitude
    • CQT magnitude + phase diff
    • Wrapped CQT
  • STFT
    • STFT magnitude
    • STFT magnitude + phase diff
  • Mel spectra
  • Rhythm
    • Tempogram
    • Mellin tempogram

Automatic confidence in task inversion

Description

The prediction inverters / jams converters could populate the confidence field of the jams annotations.

They don't currently, but it would be easy to do so.

Naming conventions for extractors and transformers

Currently, the name field of task transformers is inserted into the dictionary keys and separated with _ characters.

It might be better to adopt a unix/tensorflow-style convention where transformer names are optional, but separated by / for easier grouping.

Non-uniform time-sampling

Some applications call for non-uniform time sampling, eg, beat-synchronous features.

This is fairly involved to implement, since it requires both an estimator (beat tracker) and a post-processing of all dynamic observations, including both features and task outputs.

I think this can all be implemented with the following:

  • Introduce a new base class, Timing, which processes audio like a FeatureExtractor, but returns an array of non-overlapping time intervals.
  • In the FeatureExtractor and TaskTransformer objects, implement the following methods:
    • time_to_frames(times) which wraps librosa's implementation with the object's frame rate
    • resample(data, times) which resamples all of the entries of data generated by that object according to the specified time intervals. The results are returned as a new dict with the resampled data. In general, different analyzers will have different methods of summarizing observations within an interval, which is why each object needs to implement its own resampler.
  • In the pump object, add a new method to add a timing object. During transform, the timing is applied to each output as it is generated. The resampled versions are scoped under {timing.name}/ORIGNAL_SCOPE. A flag in the pump object can be set to retain the original analyses, or discard them in favor of the resampled ones. The timing intervals are stored as {timing.name}/_intervals.
  • Inverse transforms get tricky. I think the easiest way to accomplish this is to add a times= or intervals= parameter to the task inverters, which can bypass the frame-rate conversions when generating the jams annotation times.

beat phase task

decode beat+downbeat -> intervals of beat phase (1/4, 2/4, 3/4, 4/4)

Handle partial annotations

JAMS allows annotations to be valid over only a specified interval. We should support that in the mask calculation.

Segment agreement task

This is relatively straightforward: generate an n*n segment label agreement matrix for each (flat) segment annotation.

Custom exceptions

VectorTransformer can raise a RuntimeException. This should be properly encapsulated.

keras layer constructors

It would be nice to have an interface for features transformers to produce keras layers.

Currently, you end up doing some gross boilerplate like:

x = Input(shape=(PATCH_SIZE, p_cqt.fields['cqt/mag'].shape[1], 1),
          name='cqt/mag',
          dtype=p_cqt.fields['cqt/mag'].dtype)

Likewise for task transformers, though the logic is a bit trickier

y = Convolution1D(p_chord.fields['chord/chord'].shape[1], 1, activation='softmax')(rs)

Feature transformer: time-series

Description

We should have a feature transformer for simple time-series.

This mainly matters for sample rate consistency and stereo->mono conversion, but it would be easy to do.

connect Pump to Sampler

instead of having to say Sampler(..., *P.ops), we should be able to just pull in a Pump object.

Allow soft inputs to inverse transforms

It would be nice to support soft inputs to inverse task transformers, such as the output of a trained model.

For tagging problems, usually model outputs have a likelihood for each tag that needs to be thresholded to make decisions.

If the tags are non-competing (multi-label), the threshold can just be 0.5.

If the tags are mutually exclusive, then the argmax should be taken.

This can be detected at runtime by whether the inputs are binary (bool type) or continuous (float type).

Annotation filtering

We sometimes have multiple annotations for a given task within the same jams file. Currently, the task transformers simply access a random one matching the namespace filter.

Instead, we should allow the user to specify additional filtering criteria when selecting annotations.

Index errors for event annotations

I'm seeing some index errors at the very edge of some annotation transformations, where the last event occurs at the final frame:

IndexError: index 8223 is out of bounds for axis 0 with size 8223

Probably we need to pad out by a frame here.

Fix alignment error in HCQT

HCQT sometimes results in a different number of frames for the different harmonic indices.

The HCQT module should slice down to the minimum duration across harmonics.

OR, more generally, we should have a global facility for determining the exact number of frames for all transformers given the sr/hop-length, and always fix the length accordingly.

Pump operator index

When collecting features or tasks in a Pump object, it would be nice to have a handle on the operators by name. For example, we should be able to do something like:

>>> F = pumpp.feature.CQTMag(name='cqt', sr=22050, hop_length=512)
>>> P = pumpp.Pump(F)
>>> P['cqt'].fields
{'cqt/mag': Tensor(shape=(None, 288), dtype=<class 'numpy.float32'>)}

Since all operators have a name, this should be easy. We can also catch duplicated names and throw an exception.

Multi-annotation policy

When multiple annotations match the target namespace, we currently select one at random. This made sense previously, when transformation was done quasi-dynamically (eg, at run time), but makes less sense in the context of a static pre-processor.

Instead, we should allow all feasible annotations to be transformed, and collected as an array of dicts or a tensor of output values.

This would imply a one-to-many mapping for any given task, which would need to be supported explicitly in a downstream model. (Alternatively, the sampler module could implement a random selection policy, but this would happen downstream of conversion.)

Sampler module

Given a data dict, implement patch/frame sampling.

Interface:

  • Patch duration
    • None = full track
  • Multi-annotation policy
    • random sample for each patch (flatten the annotation index)
    • all annotations
  • Keys
    • if None, return all
    • else, filter to only sample '{key}/*' from the data dict, for each selected key
  • n_samples

Partial annotation policies?

  • Only return slices where all annotations are annotated?
    • does not scale / hard to make work with multiple annotations
  • Dynamic masking
    • BaseTaskTransformer should record the valid range (in frames) for the annotation, not the mask. Empty annotations have a valid range of [0,0).
    • Whenever a patch is sampled, compute the overlap of the sample index with the valid range, and use a threshold to determine mask.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.