Git Product home page Git Product logo

georgid / alignmentduration Goto Github PK

View Code? Open in Web Editor NEW
55.0 5.0 6.0 350.02 MB

Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.

Home Page: http://mtg.upf.edu/node/3751

License: GNU Affero General Public License v3.0

Python 95.29% Shell 1.04% MATLAB 0.50% C 1.89% TeX 1.28%
python htk lyrics duration decoding deep-learning hidden-markov-model alignment synchronization mfcc

alignmentduration's Issues

when WITH_SHORT_PAUSES = 1

we got error: last state for word SAZ is not sp. Sorry - not implemented.

The problem is it is that I removed sp from SAZ so that it is not sil sp but sil.

Doesn't install if cython not available

If cython is not available, this package fails setup.py install with this error:

    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/mnt/compmusic/itri/jenkins/jobs/Dunya/workspace/env/src/alignment-duration/setup.py", line 62, in <module>
        cmdclass = {'build_ext': build_ext},
    NameError: name 'build_ext' is not defined

This is preventing dunya from building and running tests, and so we have currently removed it as a dependency

reduce code of constructTransMAtrix

in function constructTransMAtrix from Decoder : # SPECIAL CASE: two last states
the else is not needed? because defineForwardTransProbs() covers it.

Related: THink how to reduce code do that # MAIN CASE and SPECIAL CASE are merged and defineForwardTransProbs() is called once only

make consistent. phoneme2states

for getRefDurations : expandlyrics2WordList
for decoded done with other code.
unify these two.

solve together with HARD CODED bug

means, covars, weights

means, covars, weights, should not be in the constructor of _ContinuousHMM, because they are observation probabilities.

Merge the wo Phonetizer() - delte the one from AlignmentStep

in line 117 of AlignmentStep.Aligner() there is different num arguments. problem triggered by outputHTKPhoneAlignedURI = Aligner.alignOnechunk(MODEL_URI, URIrecordingWav, lyrics, URIrecordingAnno, '/tmp/', withSynthesis)
in AlignmentDuration.alignOneChunk()

hmm.Path.Path.

hmm.Path.Path.init can haas input only hmm and read psi and phi from it

reduce dependency on htk and scikit learn

make sure extracting MFCC with essentia same as damp model:

  • add preempahsis (or recreate model without preemphasis )
  • add cepstral mean normalization

dont use scikit learn at all, keep LyricsWIthModelsGMM class for chinese.

  • test on Jingju

adapt the viterbi search to lyricsWithModels

instead of adapting lyricsWithModels class to guyzs implementation,
adapt the viterbi search to lyricsWithModels

MEthods to change: parse lyricsWithModels and construct params for guyZ
=Decoder.Decoder._constructHMMNetworkParameters

=Decoder.Decoder.path2ResultWordList - uses stateNetwork indices

refine and commit LyricAligner

-LyricsAligner. withAnnotations and withLinks have two similar loops, think how to put it in one loop.

  • test and update pycompmusic.lyricsAlign

with_section_annotations = 0, sectionLInks do not work

sectionLInk object has no section object assigneed.
See in method
makam.MakamRecording.MakamRecording._loadsectionTimeStampsLinks
have a look for an example at:
makam.MakamRecording.MakamRecording._loadsectionTimeStampsAnno()

This fails in align.LyricsAligner.LyricsAligner.alignRecording:
if not hasattr(currSectionLink, 'section') or currSectionLink.section == None:

make sure finalTs of referenceScore duration < actual duration of recording

EXAMPLE : /Users/joro/Documents/Phd/UPF//ISTANBUL//barbaros/02_Gel_9_nakarat2.scoreDeviation

this is a problem for evaluation Accuracy::
For now WORKAROUD in AccuracyEvaluator.calcCorrect
currEndDetected = finallTsAnno
logging.warn("currEndDetected > finallTsAnno")

REASON: Munir nurretim makes shorter some notes? ??

for MTG/HMM

Check if Path makes sence. put just backtracking logic in Path

Reduce code by refining LyricsWithModels

The LyricsWithModels is not needed for NeuralNEtwork, so Baseclass is used for DNN, add padded silicce method. As result:

  • LyricsWithModelsBase is used for DNN. LyricsWithModelsCNN._linkTomodels and
  • LyricsWithModelsBase._linkTomodels do not make sense.

maybe remove lyrics With models in general.

then reduce if statement in SecionLink.loadSmallAudioFragment()

reduce dependecy on htkmfc

make sure extracting MFCC with essentia same as damp model:

add preempahsis (or recreate model without preemphasis )
add cepstral mean normalization

dont trim file,

so eliminate RecordingSegmenter step,
do alignment for each section in a loop

  • recomupte timestamps fron whole recording mfc to given sectuin( input: section.json)
  • Align
  • recomupte tinmestamps for whole recordings

concatenate textGrid data.

Concatenate TextGrid annotation files for the segmented files
into one-per-recording TextGrid annotation automatically:

  1. install TextGridTools version 1.4.1 (either through GitHub or pip install --upgrade tgt). If you use pip, please make sure that it is really version 1.4.1 that is installed โ€” if you get an older version, try again.
  2. QUESTION:
    Because each big audio file to which the concatenated text grid corresponds starts with n seconds of silence, then I need to insert in the beginning n seconds of silence and then concatenate TextGrids where the first starts at timestamp = n.

To do this I tried this code:

shiftTime = 51.354230

tiers_ = []
os.chdir(pathInput)
tgtURI = '/Users/joro/Documents/Phd/UPF/ISTANBUL/goekhan/02_Kimseye_2_zemin.TextGrid'

from tgt.util import shift_boundaries
tg = tgt.read_textgrid(tgtURI)

tier = tg.get_tier_by_name('words')
tierShifted = shift_boundaries(tier, shiftTime,0)

tg.add_tiers(tierShifted)

tgOutURI = pathOut + 'Kimseye.TextGrig'
tgt.write_to_file(tg, tgOutURI)

However I get this error:

in ()
21
22 tgOutURI = pathOut + 'Kimseye.TextGrig'
---> 23 tgt.write_to_file(tg, tgOutURI)

/usr/local/lib/python2.7/site-packages/tgt/io.pyc in write_to_file(textgrid, filename, format, encoding, **kwargs)
390 with codecs.open(filename, 'w', encoding) as f:
391 if format in _EXPORT_FORMATS:
--> 392 f.write(_EXPORT_FORMATS[format](textgrid, **kwargs))
393 else:
394 raise Exception('Unknown output format: {0}'.format(format))

/usr/local/lib/python2.7/site-packages/tgt/io.pyc in export_to_short_textgrid(textgrid)
241 textgrid_corrected = correct_start_end_times_and_fill_gaps(textgrid)
242 for tier in textgrid_corrected:
--> 243
result += ['"' + tier.tier_type() + '"',

244                    '"' + escape_text(tier.name) + '"',
245
                tier.start_time, tier.end_time, len(tier)]

AttributeError: 'Interval' object has no attribute 'tier_type'

RESPONSE:

the problem you encounter is solved easily. To add the shifted tier, you did the following:

tg.add_tiers(tierShifted)

This method, however, expects a list of tiers, not a single tier. You have to do the following instead

tg.add_tier(tierShifted)

or

tg.add_tiers([tierShifted])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.