Git Product home page Git Product logo

morty's Introduction

GitHub version Build Status Code Climate License: AGPL v3

morty

MOde Recognition and Tonic Ydentification Toolbox:

Introduction

morty is a toolbox for mode recognition and tonic identification in audio performances of "modal" music cultures. The toolbox is based on well-studied pitch histogram analysis. It implements two state of the art methods applied to Ottoman-Turkish makam music (A. C. Gedik and B.Bozkurt, 2010) and Hindustani music (P. Chordia and S. Şentürk, 2013).

Please cite the publication below, if you use the toolbox in your work:

Karakurt, A., Şentürk S., & Serra X. (2016). MORTY: A Toolbox for Mode Recognition and Tonic Identification. 3rd International Digital Libraries for Musicology Workshop. New York, USA

The main purpose of the toolbox is to provide a quick access to automatic tonic identification and mode recognition implementations for music cultures, for which these tasks have not been addressed and provide a baseline for novel methodologies to be compared against. Therefore, the implementations are designed such that there is no "computational-bias" towards a particular culture yet any culture-specific optimization can be easily introduced in the context of the implemented methodologies.

The pitch distribution and pitch class distributions implemented in this package can be additionally used for other relevant tasks such as tuning analysis, intonation-analysis and melodic progression analysis. Furthermore the applied analysis can be used in cross-cultural comparisons.

Description

The methodologies proposed in (A. C. Gedik and B.Bozkurt, 2010) and (P. Chordia and S. Şentürk, 2013) are based on the musical assumption that the tuning and the relative occurence of the melodic intervals in the performances belonging to the same mode should also be similar.

Given the annotated tonics and makams for a set of training audio performances, both methods extract predominant melody of each performance and then compute models based on pitch histograms (pitch distributions or pitch-class distributions) for each mode using the extracted predominant melodies. Note that the training performances can be entire recordings or an excerpt.

In our context, these models are used in three similar computational tasks:

  • Mode Recognition: Given an audio performance with known tonic, the pitch histogram computed from the performance is compared with the model produced for each mode. The mode belonging to the most similar model will be classified as the estimated mode.
  • Tonic Identification: Given an audio performance with known mode, the pitch histogram computed from the performance is shifted and compared with the model of the mode. The shift that produces the highest similarity will indicate the estimated tonic.
  • Joint Estimation: Given an audio performance with unknown tonic and mode, the pitch histogram computed from the performance is shifted and compared with the model of each mode. The most similar shift and the mode of the matching model yields the estimated tonic and the mode jointly.

For an indepth explanation of the concept and the methodologies, please refer to the papers.

Usage

This algorithms expect the predominant melody of the audio performances as the input and generates pitch distributions (PD) or pitch class distributions (PCD) from them. These distributions are used as the features for the training and the estimation.

The algorithms can be used for both estimating tonic and mode of a piece. When either of these two is available and this information could be fed into the algorithm, and hence the estimation would be more accurate.

Since the training step is a supervised machine learning process, a dataset for each mode, including audio with annotated tonic frequencies, is preliminary. Basically the steps for the methodologies are:

  • Train the candidate modes by using the collections of predominant melodies of respective modes extracted from the annotated audio.
  • Feed the predominant melody of the testing audio with the known attributes (tonic or mode), if any, and obtain the estimation(s).

Please refer to the jupyter notebooks in the demos folder for the basic usage.

If the predominant melodies are not available, melodyextraction.py method in the extras package can be used for automatic predominant melodies extraction. This method is a wrapper implementation of the predominant melody extraction methodology proposed by Atlı et. al (2014) to store the pitch track in the desired format. The input pitch track is expected to be in given as a .txt file, that consists of a single column of values of the pitch track in Hertz. The timestamps are not required. Note that the default parameters for predominant melody extraction are optimized for Ottoman-Turkish makam music, so you might want to calibrate the parameters according to the necessities of the studied music culture.

Installation

If you want to install the repository, it is recommended to install the package and dependencies into a virtualenv. In the terminal, do the following:

virtualenv env
source env/bin/activate
python setup.py install

If you want to be able to edit files and have the changes be reflected, then install the repository like this instead

pip install -e .

The algorithm uses several modules in Essentia. Follow the instructions to install the library.

For the functionalities in the extras package, you can install the optional dependencies as:

pip install -r optional_requirements

Explanation of Classes

  • PitchDistribution is the class, which holds the pitch distribution. It also includes save and load functions to make the pitch distributions accessible for later use.

  • KNNClassifier class implements and generalizes the methods proposed in (A. C. Gedik, B. Bozkurt, 2010) and (B. Bozkurt, 2008) and (P. Chordia and S. Şentürk, 2013).

References

A. C. Gedik, B.Bozkurt, 2010, "Pitch Frequency Histogram Based Music Information Retrieval for Turkish Music", Signal Processing, vol.10,

B. Bozkurt, 2008, "An automatic pitch analysis method for Turkish maqam music", Journal of New Music Research 37 1–13.

P. Chordia. and S. Şentürk. (2013). Joint recognition of raag and tonic in North Indian music. Computer Music Journal, 37(3):82–98.

H. S. Atlı, B. Uyar, S. Şentürk, B. Bozkurt, and X. Serra (2014). Audio feature extraction for exploring Turkish makam music. In Proceedings of 3rd International Conference on Audio Technologies for Music and Media, Ankara, Turkey.

morty's People

Contributors

sertansenturk avatar altugkarakurt avatar

Stargazers

Hasan Ateş avatar Sunando Patra avatar Farouk avatar Alexander Kremenets avatar  avatar Bas avatar Xingjian Du avatar Yann Bayle avatar Mauricio Farías avatar Miles Markstein avatar Markos Fragkopoulos avatar Karim Ratib avatar Anas avatar  avatar Sercan Atlı avatar Michael Anthony avatar

Watchers

Michael Anthony avatar Anas avatar  avatar  avatar Azizam Bebakhshid avatar

morty's Issues

Migrating Peak Detection

The only feature of Essentia that we are using is peak detection. I found a peak detection code translated from MATLAB to Python, that only depends on NumPy. So, if we migrate to this implementation, we can get rid of our dependence to Essentia and upgrade from Python2.7 to Python 3.

This snippet works for both 2.7 and 3, so we can open a new branch, replace Essentia with this code and run a few of the reported experiments to see if it affects our accuracy. If everything is fine, we can migrate the master branch and upgrade the code to 3.

Here is the source code: https://gist.github.com/sixtenbe/1178136
Here is a review of this code snippet: https://blog.ytotech.com/2015/11/01/findpeaks-in-python/

Experiment Parameters

Varied Parameters:
cent_ss:                     7.5, 15, 25, 50, 100
smooth_factor:           7.5, 10, 15, 20, 50
metric:                         pcd/pd
chunk_size:                 30, 60, 90, 120
method:                      bozkurt/chordia
distance_method:       manhattan, euclidean, L3, bhattacharyya, cross-correlation, intersection
what to estimate:        makam/tonic/joint
K of KNN:                    1, ..., ?10
(for chordia) Chunk Overlap: %25, %50 %75

Fixed Parameters:
rank:        10 for joint and mode estimation, 1 for tonic
ref_freq:   440

Recording metadata

In the dataset to be published we need to supply the metadata of the recordings too. Each recording entry will have:

  • recording name (slugified; code is in pycompmusic)
  • recording mbid
  • list of performers (performer name, performer mbid, vocal/performed instrument)
  • composition (work name, work mbid, composer name, composer mbid, lyricist name, lyricist mbid)
  • makam, form, usul
  • era of recording (early 20th century, mid 20th century, 90s till now)
  • tonic frequency

We have to manually confirm the metadata to some extend. We will also gather the statistics from these metadata.

Buggy Tonic Error Histograms in Evaluator

Tonic histogram of Evaluator.py is buggy. When we are reporting the tonic errors, we calculate the difference between the annotated tonic and the estimated one and report this as the tonic error for each recording. Then, we decided to create a histogram of these errors as a histogram (of either [-600,600) or [0,1200) cents I can't remember) and it's faulty. There was a problem with the number of expected bins and returned values of numpy.histogram() and so on.

SenkurtEstimation

We need to implement our estimation method, that is baically chunked ChordiaEstimation and KNN. Check older versions of ChordiaEstimation for reference

Things to be Reported

  • Distribution of tonic estimation errors; the mean and standard deviation. This will be done for each distance method along with makam estimation precisions, see (Gedik & Bozkurt). These results are for comparing the distance methods.
  • Makam confusion matrix for the chosen optimal parameters.
  • Comparison of the system's performance for individual estimations vs. joint estimation.
  • Comparison of the performances for different segments of the input for both Chordia and Bozkurt. (in Bozkurt this means training and testing from some time-interval, e.g. 0-120 seconds, in the recording)

TODO

  • A sample toy dataset will be prepared from Dunya API
  • kNN implementation for parametrized k
  • Run of a single trial
  • The smoothing parameter in KDE will be studied
  • The problem with pitch track extraction will be examined
  • Scripts to be written: test, training, 10-fold, evaluation

Allow resampling in Chordia estimation

When training data for a mode is much bigger than others this mode might dominate others. Add a parameter in chordia estimation to allow random resampling such that all the classes are represented by n number of samples, where n is the number of samples in the least represented mode.

Separate pitch distribution and pitch class distribution

At some point in the future, we should create an abstract parent class for pitch distribution and pitch class distribution, and carry the bulk of the definitions/methods there. PD and PCD should call the parent class and only implement specific methods, e.g. shifting.

This is part of a more extensive task to catalog, describe and implement the makam features. However, the distribution implementations should stay in morty (as the feature is not makam specific).

Rewrite the description in the README

the description part is obscure as it doesn't properly explain the task, the mehods and the purpose. We have to rewrite it. Best is to write the SMC paper and reuse the material from there.

TODO

  • We need to prove the fact that tonics always appear on peaks with the parameters:
    smooth_factor = 15
    step_size = 15
  • While reporting the mbid of the decisive nearest neighbors, we should include the timing information of the corresponding chunk. Currently, there is only mbid.
  • There is a problem with the tonic histograms of Bozkurt tests.
  • ? We might include computation time in Bozkurt tests, as well?

Add configuration files for different cultures

Sometime in the future, we should create a file called morty/config/default_params.cfg and have the default parameters for the different cultures read from there.

  • Makam
  • Hindustani
  • Carnatic

Note that cfg file has a special format, which configparser module take care of the parsing. We should also find an easy way to call these default parameters.

Refactor ChordiaEstimation

  • Refactor I/O to make it simpler (exmpale BozkurtEstimation)
  • Rewrite the so called "spagetti" parts in the estimation function

limit PeakDetection

Right now, the Essentia module PeakDetection returns any tiny peak it encounters. We should limit this as it makes the runtime significantly slower and possibly degrades the performance, especially in PDs.

The module offers us two parameters:

  • maxPeaks (integer ∈ [1, ∞), default = 100) : the maximum number of returned peaks
  • threshold (real ∈ (-∞, ∞), default = -1e+06) : peaks below this given threshold are not output

Assigning maxPeaks to < 10, 20 would be plausible. Also it might be good to copy the values and max normalize the before applying peak detection so that we can fix the "threshold" as a ratio to the max value (provided that PeakDetection does not to any re-normalization).

Create installer

Once the refactor is done, create an installer so that people can install the package.

ImportError: No module named fileoperations.fileoperations

Hi,

I'm trying to test the package. After the install, I get this error :

>>> import morty.extras.foldgenerator
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "morty/extras/foldgenerator.py", line 3, in <module>
    from fileoperations.fileoperations import get_filenames_in_dir
ImportError: No module named fileoperations.fileoperations
>>>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.