Git Product home page Git Product logo

acoustic-representation-toolbox's People

Contributors

aza avatar pcbermant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

acoustic-representation-toolbox's Issues

Add overview of the Acoustic Representation Toolbox to the README

I'd like to be able to link people to the https://github.com/earthspecies/representation-toolbox repo and have the ReadMe walk visitors through what the Acoustic Representation Toolbox is and why it matters, maybe with some examples, and also a description and exemplar for each of the types of representations. Audience is broader than just machine learners, can assume smart with some technical grasp. The overview should be broader and the examples of each type can be a little more technical.

https://github.com/earthspecies/representation-toolbox

Testing the performance of the models on standard datasets like MusDB18 and/or WSJ0-2mix

Given the performance of our model on the macaque task, I’m very curious to see how it might do in baseline tests using datasets commonly used in the literature. This might be relevant since the lightweight nature of the UNet model might make it an attractive solution (compared to existing models) for animal recordings sampled at high sampling rates. With this in mind, we already have MusDB18 at our disposal, so this could be a good project as we decide which animal set to focus on.

Improving the representation inverter by exploring more advanced models and/or reaching out to some of the authors

It seems very strange to me that the methods work super well for conventional STFT spectrograms but not so well for other reps. One reason why representations might be important for bioacoustic applications is the high sampling rates required. For instance, many dolphin recordings are samples at 96kHz. While it might be possible to downsample a bit and still avoid aliasing, this would lead to an audio input with ~300k elements. However we could also try slicing audio inputs into more reasonable frame lengths and then concatenating results during inference. This I suppose is related to the “Variable Time Scales I’m Vocal Behavior” problem to some degree

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.