Git Product home page Git Product logo

Comments (11)

bmcfee avatar bmcfee commented on August 30, 2024

But then when I look at the (vector) annotations, I have only one frame which matches the first annotation in my file, and that's it.

That's the intended behavior, sorry if it's not documented well enough. (This is still very much a WIP!)

The intended use case here is for things like parametric embedding approximation, where you map an entire recording down to a fixed-dimensional vector. This comes up in things like latent factor approximation in collaborative filtering.

If you want to broadcast the vector out over time, you'd have to know the target extent (number of frames), which isn't generally known with independent transformers. Eg your vector transformer would have to know about the CQT transformer inside the pump, and they don't currently support that kind of behavior. Probably your best bet is to reshape it at run-time (eg sampling during training), or design your model to be time-invariant to cut down on redundant memory consumption.

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

Let me explain more generally my problem. Maybe you will be able to help me ?
I want to annotate audio with (contemporary) instrument playing techniques (PT). I have started with the cello. Possible cello PT would be for instance : pizzicato, near the bridge ; or artificial harmonics with the bow, tremolo ; and so forth. Cello PT in my model are discrete and 4-dimensional.
I have added 2 dimensions for less important context data. Hence the 6-dim vector.
In the end the goal is to teach a deep CNN with a multi-task approach.

I want my work to be easy to benchmark, I naively tried to fit my annotations in the JAMS standard but I am having a hard time ; the basic chord was doing the trick in the original specification ; but as I understand the changes, I cannot use it anymore ; because chords now follow a precise syntax .. stuff like A7, G9... Now I thought the vector would work but it looks like I indeed misunderstood what it stood for. Any suggestions ?

Regarding your last paragraph ("if you want..") I don't see the difference between the broadcasting of chords and what I am asking for (except the chord syntax issue...). Surely your ChordTransformer synchronises with your CQT transformer, right? Is there a way I could use that one?
Looking forward to reading you
Thx for your help !

from pumpp.

bmcfee avatar bmcfee commented on August 30, 2024

Ah, I see. That's an interesting setup, and not one that I've thought too carefully about, but yeah, it ought to be possible.

Now I thought the vector would work but it looks like I indeed misunderstood what it stood for. Any suggestions ?

One option might be to model them as tags, rather than dense vector data. If you have some scalar value associated with them (eg the amount of vibrato, or something like that), you could pack that into the confidence field. Then you could use DynamicTaskTransformer directly.

Surely your ChordTransformer synchronises with your CQT transformer, right? Is there a way I could use that one?

It doesn't actually synchronize to the feature transformer. Rather, it samples the annotations at a specified rate (given clumsily in terms of sampling rate and hop length, to make it easier to parametrize in terms of audio). The reason for this decision is that the typical use-case for pumpp has features going through a model, and then being compared to the task outputs. Models often have some change of resolution associated with them (eg pooling in time or downsampling), so this lets us generate output frames to match whatever the rate of the model is, rather than being tied to the rate given by the input features.

As I said above, the vector transformer wasn't really designed for this kind of use case because I hadn't considered time-varying vector data. We definitely could add a DynamicVectorTransformer class that does frame interpolation / broadcasting replication of static vector data (like how DynamicTagTransformer samples labeled tag intervals), but that's not currently implemented.

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

I can use tag_open with a string which is made of my 6 integers separated with say, spaces. I don't even need the confidence field. And then down the line process the tensors to separate the 6 dimensions.
Far from ideal, but for the time being, yes it could work. Thx !

from pumpp.

bmcfee avatar bmcfee commented on August 30, 2024

Oh I was just thinking of each of your six integers getting their own tag. (pizzicato => yes/no, etc). Maybe that doesn't make sense for your data or model.

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

in contemporary music, it's a bit more complex than that :

first integer is what kind of exciter/vibrator couple you use : for instance, pizzicato means you're not using the bow but the finger to excite the string but you may also use the wood of the bow (con legno); you may even hit the body of the instrument with your hand or fingers, ...
second integer is what you do with the left hand (the hand producing the pitch) : vibrating the note or not, glissando, trill,...
third integer is about the amplitude envelope of the sound you're producing : playing tremolo, or staccato, marcato, spiccato...
last integer is the position of the interaction : near the fingerboard, ordinary, near the bridge...
And believe it or not, this is actually a simplification!

This model could be used for all the strings but we would need others for wind instruments, brasses and percussions.

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

Sorry to bother you again. You said : "Then you could use DynamicTaskTransformer directly."
I can only find BasicTaskTransformer or DynamicLabelTransformer. Guess you meant the latter ?

from pumpp.

bmcfee avatar bmcfee commented on August 30, 2024

Sorry, yes. I meant https://pumpp.readthedocs.io/en/latest/generated/pumpp.task.DynamicLabelTransformer.html

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

Hi
I tweaked my model a little to use tag_open in JAMS (collapsed my 4 dimensions into just one) and the pumpp.task.DynamicLabelTransformer
It is working just fine - so I ought to thank you... except / it seems that in the resulting tensor the classes are sorted alphabetically.
Is there a way to avoid that ? It obviously comes from the behavior of the sklearn.preprocessing.MultiLabelBinarizer class you're using.
I need my classes to be exactly in the order I gave to the pumpp.task.DynamicLabelTransformer object.

More importantly :
I also added "onset" and "pitch contour" information in the JAMS format.
I would like to sample that information at the usual (sampling rate/hop length) frequency
Any suggestion using your TaskTransformers ?
That is contextual information which I am going to need in order to analyse the behavior of my neural net.

Once this is done I have over 18 hours of cello to process...

from pumpp.

bmcfee avatar bmcfee commented on August 30, 2024

first integer is what kind of exciter/vibrator couple you use : for instance, pizzicato means you're not using the bow but the finger to excite the string but you may also use the wood of the bow (con legno); you may even hit the body of the instrument with your hand or fingers, ...
second integer is what you do with the left hand (the hand producing the pitch) : vibrating the note or not, glissando, trill,...
third integer is about the amplitude envelope of the sound you're producing : playing tremolo, or staccato, marcato, spiccato...
last integer is the position of the interaction : near the fingerboard, ordinary, near the bridge...
And believe it or not, this is actually a simplification!

Following up on this: why use integers instead of independent tags for each of the values?

from pumpp.

jeffd75 avatar jeffd75 commented on August 30, 2024

Sorry being originally a composer I am a bit new to all this.
you can have tags but some of them are mutually exclusive and some aren't.
it is extremely important for me to feed that info into the model
ex. "pizzicato" cannot be together with "con legno" (both on first axis)
but it can be "glissando" for instance (first and second axis)
and it can be "near the bridge" (first and last axis)
In terms of machine learning, the four axes can be seen as 4 different tasks ; except there will be a single NN for all 4 tasks.

from pumpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.