Git Product home page Git Product logo

E. Parada-Cabaleiro et al. (2017), The SEILS Dataset: Symbolically encoded scores in modern-early notation for computational Musicology, in Proc. of ISMIR, Suzhou, P.R. China, pp. 575-581.

E. Parada-Cabaleiro et al. (2018), Musical-Linguistic annotation of Il Lauro Secco, in Proc. of ISMIR, Paris, France, pp. 461-467.

E. Parada-Cabaleiro et al. (2019), A diplomatic edition of Il Lauro Secco: Ground truth for OMR of white mensural notation, in Proc. of ISMIR, Delft, The Netherlands, pp. 557-564.

E. Parada-Cabaleiro et al. (2021), Automatic recognition of texture in Renaissance music, in Proc. of ISMIR, Online event, pp. 509-516.

SEILS dataset

Symbolically Encoded Il Lauro Secco dataset

The SEILS dataset is a corpus of scores in lilypond, music XML, MIDI, Finale, **kern, **mens, MEI, agnostic, semantic, and pdf formats, in white mensural and modern notation. The transcribed scores have been taken from the 16th century anthology of Italian madrigals Il Lauro Secco, published for the first time in 1582 by Vittorio Baldini in Ferrara (Italy). The presented corpus contains scores of 30 different madrigals for five unaccompanied voices composed by a variety of composers.

The SEILS dataset is presented and organised by having two different purposes in mind: Analysis (see SEILSdataset folder) and OMR applications (see SEILS_diplomatic_OMRgroundTruth folder).

SEILSdataset (encompasses 450 files considering both early and modern notation): 270 are symbolic files digitally encoded in different formats; whereas the remaining 180 are in pdf. From the 270 symbolic files: 60 are encoded in Lilypond (.ly), 30 for each considered notation (modern and ancient); 30 are encoded in Music XML (.xml); 30 in MIDI (.mid); 30 in Finale (.musx); 60 in **kern (.krn), 30 annotated and 30 without annotations; and 60 in MEI (.mei), 30 annotated and 30 without annotations. From the 180 pdf files: 30 are the modern notated transcriptions of the Finale encoded madrigals; and the other 150 are scanned copies of the original source, published in 1582 (5 pdfs for each madrigal, one for each voice).

SEILS_diplomatic_OMRgroundTruth (encompasses 960 files considering only the original notation, i.e. white mensural notation): 660 are symbolic files digitally encoded in different formats; whereas the remaining 300 are in pdf. From the 660 symbolic files: 60 are choral scores (containing the 5 voices), 30 encoded in MEI (.mei) and 30 encoded in **mens (.mns); 300 are particellas (containing one voice) 150 encoded in MEI (.mei), 150 encoded in **mens (.mns), 150 encoded in agnostic (.agnostic), and 150 encoded in semantic (.semantic). From the 300 pdf files: 150 are the images engraved from the diplomatic transcription in MEI of each particella (5 pdf’s for each madrigal, one for each voice); and the other 150 are scanned copies of the original source, published in 1582 (5 pdf’s for each madrigal, one for each voice).



DATA STRUCTURE AND NAMING CONVENTION IN SEILSdataset

The 450 files are stored in folders according to the composers family name. Within each folder, there are 15 files:

    5 scans of the original paper print in early notation, for each voice; canto, alto, quinto, tenor, and basso (pdf)
    1 printable version of the Finale transcription in modern notation (.pdf)
    2 symbolically encoded; early white mensural and modern notation (.ly)
    1 MIDI transcription (.mid)
    1 MusicXML (.xml)
    1 Finale Project (.musx)
    2 symbolically encoded in modern notation; **kern and MEI (.krn and .mei)
    2 symbolically encoded in modern notation with annotations; **kern and MEI (.krn and .mei)

The name of each individual file begins with the family name of the composer, followed by the first two or three words of the madrigal title, as well as by the type of notation (an – ancient notation, and mn – modern notation).
    e.g. composer_first3words_notationtype.format
    giovannelli_nelfoco_an.ly

Additionally for the pdf files, between title and type of notation, further information is given. For the scanned copies in early notation, the voice is also given (canto, alto, quinto, tenore, and basso); whereas for the printable version of the files encoded in Finale, the word ‘finale’ has been appended.

    e.g. composer_first3words_voice_notationtype.format
    giovannelli_nelfoco_canto_an.pdf

    e.g. composer_first3words_finale_notationtype.format
    giovannelli_nelfoco_finale_mn.pdf



DATA STRUCTURE AND NAMING CONVENTION IN SEILS_diplomatic_OMRgroundTruth

The 960 files are stored in folders according to their type:

    choral: 60 choral transcriptions in MEI and **mens (.mei and .mns)
    converters: two Python converters to automatically convert from **mens format to agnostic and semantic formats (mens2agnostic.py and mens2semantic.py). The vocabularies and a readme file are also included (note that the converters are designed to deal with particellas without text, i.e. **mens files of only one spine)
    particellas_codified: 300 symbolically encoded particellas (white mensural notation) in MEI and **mens (.mei and .mns)
    particellas_engravedImages: 150 prints engraved from the 150 particellas encoded in MEI (.pdf)
    particellas_OMRgroundTruth: 300 symbolically encoded particellas in agnostic and semantic (.agnostic and .semantic)
    particellas_original: 150 scans of the original paper print in early notation, for each voice; canto, alto, quinto, tenor, and basso (pdf)

The name of each individual file begins with the family name of the composer, followed by the first two or three words of the madrigal title, as well as by the type of score (choral – scores with five voices toghether; A, B, C, Q, or T – for Alto, Basso, Canto, Quinto, or Tenor in the particellas).
    e.g. composer_first3words_scoretype.format
    giovannelli_nelfoco_choral.mei



COMPOSERS

Innocentio Alberti Ruggiero Giovannelli Nicolò Perué
Giovanni Bardi Marc’Antonio Ingegneri Francesco Pigna
Girolamo Belli Paolo Isnardi Costanzo Porta
Lelio Bertani Luzzasco Luzzaschi Bartolomeo Spontone
Claudio da Correggio Jean de Macque Annibale Stabile
Alberto da l’Occa Francesco Manara Alessandro Strigio
Giulio Eremita Luca Marenzio Horatio Vecchi
Hippolito Fiorino Tiburrio Massino Paolo Virchi
Vincenzo Fronti Alessandro Mileville Giaches de Wert
Andrea Gabrielli Giovanni Battista Mosto Annibale Zoilo

SEILSdataset's Projects

omr-datasets icon omr-datasets

Collection of datasets used for Optical Music Recognition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.