Git Product home page Git Product logo

clotho-dataset's People

Contributors

dr-costas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

clotho-dataset's Issues

Why did you keep audio files in wav format?

I am curious if there was any reason or you did not care about audio files format.

I am considering creating a large scale audio dataset by training a small model that later I can use to filter millions of samples and file format seems sensible to be as compressed as possible to save storage space.

Overlapping items in development / evaluation / validation sets

Hi, thanks for the great work!
I'm new to the dataset; and I found out there are overlapping entries between split sets as below.

Between valid and eval sets:

  • sound_id 66304. In valid set, the segment is [1607168, 2436248]. In eval, it's [1597440, 2865471].
    • file name: 01 A pug struggles to breathe 1_14_2008.wav
  • sound_id 86161. [263168, 1253213] vs [173056, 1249146].
    • file name: Bus(Drive_Reverse)_1-2.wav

Between valid and dev sets:

  • sound_id 86163. Its whole segment was used in both.
    • file name: City Ambience w_ Car Passing_1-2.wav
  • sound_id 130603. [0, 822465] vs [11264, 768067].
    • Their file names vary. (Greek Chat2 - (Apollonia__39_s sPA) 18_44 05.10.wav // Greek Chat2 - (Apollonia's sPA) 18_44 05.10.wav)

Between eval and dev sets:

  • soind_id 137692. [635904, 1430834] vs [141824, 1370009].
    • FREEZER_DOOR_OPEN_CLOSE.wav

I'm now wondering if this is a known issue to the (DCASE?) community / researchers. If so, do you know what is the recommended way to handle this? Otherwise, perhaps it's something that can be updated in Clotho 2.2 or something.

Not all sample in development split

Hi! According to the description of the dataset there are 4981 audio samples in development split, but actually there are only 2893. Why?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.