gabolsgabs / dali Goto Github PK
View Code? Open in Web Editor NEWDALI: a large Dataset of synchronised Audio, LyrIcs and vocal notes.
License: Other
DALI: a large Dataset of synchronised Audio, LyrIcs and vocal notes.
License: Other
Hi,
Thank you for the wonderful dataset. I was wondering if there's a way to get the actual musical notes ('A', 'A#' etc.) from the current information. Right now, what I see in the notes
section is a list of words from the lyrics. Maybe I am missing something, but it would be great if you could help me with this.
Thank you in advance.
-- Gaurav.
Hi!
Thank you for this comprehensive dataset! This might be a really stupid question but I have trouble getting the audio file. I use google colab. I followed the tutorial and loaded the first 2 .gz file using the following code as a demo:
path = '/ME/My Drive/Colab Notebooks/DALI-master/'
dali_data_path = path + 'annot_tismir/'
dali_data = dali_code.get_the_DALI_dataset(dali_data_path, skip=[], keep=["0a0a723686924d228daef2a2f692d437","0a1a15671536498f8a856da781c017d7"])
dali_info = dali_code.get_info(dali_data_path + '0a1a15671536498f8a856da781c017d7.gz')
dali_info
(Got output <DALI.Annotations.Annotations at 0x7f9925c4c160>)
path_audio = '/ME/My Drive/Colab Notebooks/DALI-master/audio'
errors = dali_code.get_audio(dali_info, path_audio, skip=[], keep=[])
Here I got error message "TypeError: 'Annotations' object is not subscriptable".
Plus if I try to print(dali_info[0]), the same error message pops up.
Could you please tell me if there's anything wrong with my data loading?
Besides, I noticed a lot of youtube link is shows "working: False"...I'm not sure if this would affect data loading. Shall I submit request for an updated version of data?
entry = dali_data['0a1a15671536498f8a856da781c017d7']
entry.info
the output looks like
{'artist': 'Janis Ian',
'audio': {'path': 'DALI_v2.0/audio/0a1a15671536498f8a856da781c017d7.mp3',
'url': 'iepedfdjA80',
'working': False},...
Thank you in advance for your help.
dali_data = dali_code.get_the_DALI_dataset(dali_data_path, skip=[], keep=[])
Hi, I downloaded DALI dataset (version 2) from zenodo.org, check the code in this repo, and found that there is no DALI_DATA_INFO.gz file in anywhere.
There are only a bunch of .gz files with random characters in "annot_tismir" directory.
I think I cannot download audio files without DALI_DATA_INFO.gz file.
Where can I find this file?
Hi @gabolsgabs,
I'm interested in the dataset. But I meet a question when I plan to get audio.
When I run
errors = dali_code.get_audio(dali_info, path_audio, skip=[], keep=[])
I get the error:
ERROR: ffmpeg exited with code 1
ERROR: ffmpeg exited with code 1
[tcp @ 000001a58d41b640] Connection to tcp://r2---sn-gxuo03g-3c2e.googlevideo.com:443 failed: Error number -138 occurred
https://r2---sn-gxuo03g-3c2e.googlevideo.com/videoplayback?expire=1628760487&ei=R5UUYaOuGLmOsfIPtdWXiA8&ip=72.13.86.178&id=o-ACsbiYNB_jkjhjaAhmR3Inb0Gt9_V6ZaKJ1kpdAZdrQ3&itag=251&source=youtube&requiressl=yes&mh=j6&mm=31%2C29&mn=sn-gxuo03g-3c2e%2Csn-n4v7sn7l&ms=au%2Crdu&mv=m&mvi=2&pl=24&initcwndbps=2487500&vprv=1&mime=audio%2Fwebm&ns=dGRvWmr4u0kSGji0arhIxuwG&gir=yes&clen=5002463&dur=282.861&lmt=1573450506527303&mt=1628738444&fvip=2&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=1301222&n=O9L5WR7rYM1WvfV5aP5&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIgF5z4QMeCv8k9PtHFjGv4I27RLLG7caPJ5ybflZ0CAz0CIQDzYqp8QXh1ynjeWQSPxm-ZyZljZaNsQRBOaarv_6YPvg%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRAIgLgzLM0HccljQ96vQk-zmtgTyjUcPvNNpF8ZGM7CwE0QCIFz42uRHnpZDpFtwCanHwheTaBzWJ7uDsKxOqbd-KtzS: Unknown error
I can open the link above by my browser, so I try to close the firewall but the error is still there.
I'd appreciate it if you could help me.
Thank you,
Haoran
Hi @gabolsgabs,
Are you gonna be releasing the code you used to create the dataset anytime soon? It would really help if you do.
Thanks,
Gaurav.
Hi,
Thank you for the wonderful dataset.
I found that the annotatioms of 'notes' missing through entry.annotations['annot']['notes']. Is there any update in ground truth file.
Thanks
Lin
Hi!
I am interested in checking out the dataset, but the download link seems to be broken...
Maybe I should wait a few more days, but just checking if it is supposed to be working now.
Thanks for the dataset :)
There are sometimes mismatches between audio and lyrics.
The entry 018f045bb1784602976a289705832d60 has the audio of Brentalfloss - Shovel night, the entry.info['title']
and entry.info['artist']
are coherent with that.
But the lyrics correspond to:
Brentalfloss - Tetris
Did I generate that issue on my side or does somebody else sees it?
Do we know how frequent these errors are ? (I checked manually less than 20 songs and found this one)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.