wwerkk / mc-fp Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 274.73 MB

Python 0.42% Max 7.06% Jupyter Notebook 92.52%

mc-fp's People

Contributors

Stargazers

Watchers

mc-fp's Issues

generated token sequence - alternating buffer

on a M1 MacBook Pro CPU it does not seem to be possible to generate sequences longer than 32 tokens fast enough to keep up with real-time, at least with the trebles model.
alternating token buffers need to be implemented so that while one token sequence is being played through, another can be generated at the same time

online.py - generation offset

The drumloop2 model generates rotations of the sequence which tend to move by token or two with each generation, even with temperature <=1.

Offline generation works properly, as it results in ABCB pattern which corresponds exactly to the drum pattern in the data.

Might be worth to test offline generation with longer sequences to determine if the pattern shifts in that case as well.

refined clustering

feature extraction and clustering workflow could be improved by considering the following implementation:
https://github.com/TylerMclaughlin/wav_clustering_workflow

labelled_frames dict refactor

labelled_frames dict object saves frames sample by sample which is redundant
it should be implemented to save sample index values for frame start and end only

generate function parameters control

generate function parameters in generate.py could be easily passed via OSC as a part of the message triggering generation

Byte-Pair Encoding

It could help a lot to detect commonly occuring sub-sequences and replace them with a single token.

first detected beat backtracking

maybe beat tracking generally should be handed over to streamer.py instead of being included in the training script?

Max always picks the first frame in array

Max is now responsible for looking up the frames according to the frames.json dictionary
at the moment it always picks up the first frame in the value array of a given key

this should be picked at random instead
ie. by selecting random odd value within the range of length of array-1 as it is done in Streamer.get_frame

frame picking

could do with a few modes which would change behaviour of picking the frame of a given class from the dictionary when a grain is being triggered:

locked n-th frame only
random (as it is now)
sequential (from first to last frame of given class, could us a table object to save state per-token)
sequential (from last to first)
mixture between the above, ie. using a markov chain or other means of controlled randomization

poor benchmark output

sweep, drumloop models trained on very simple data generate output that makes no sense in comparison to the input audio

validation

presumably there is no validation method viable for any sort of audio data and the training metrics are very difficult to trust in cases of audio input with high variance.
it might be a good idea to implement k-fold validation, ie. as described here

Use librosa.stream for loading and processing of input

File loading currently uses audio2numpy to load audio cut into fixed lengths during preprocessing.
This is cognitively suboptimal and leads to common errors in code.
Implementing the use of librosa.stream for loading and analysis also seems more straightforward than using tf.keras.utils.audio_dataset_from_directory as using tf.data.dataset objects still often evades my intuition.

tempo/beat detection

would be super nice for slicing

using previous generated sequence as generation prompt

at the moment it is easy to see in line 79 that generate.py doesn't actually do anything with the provided prompt
what is missing, is a proper encoding of the token sequence passed as parameter, ie. as it happens later on in line 100

generation function refactor

necessary, as it generates sequences starting with 128 zeros when maxlen is set to 256

timestretching

would be nice, but most likely would require refactoring synthesis to use mc.groove

polyphony

Max resynthesis should implement multiple voices, likely using mc

OSC queue crashes

with higher density, lots of generate are sent and clog up udp

Fixed-size segmentation results in multiple errors when analysing data

Segmentation is done using librosa.util.frame which works, but following analysis code still needs fixing to be compatible.

Stand-alone generation script

Generation could be implemented as a stand alone Python script, allowing for further extensions, ie. real-time sequence generation or communication with other environments to be used for resynthesis

OSC communication

A basic script could be implemented using python-osc to communicate with external applications, ie. Max MSP

trigger model (re)load via OSC

perhaps a function mapped to /m messages could deal with that?
should not be too difficult to implement, since the model name only is sufficient enough to localize the necessary directory/files

features

current extract_features function in train.py computes the following:

zero crossings
energy
spectral centroid
spectral bandwidth
spectral flatness
spectral rolloff
MFCCs 1-13

it is worth testing various combinations of the above as some might turn out to be redundant or worsen the performance

General refactor

Data processing, model training and sequence generation could be implemented as separate classes, as the Jupyter notebook is meant to be the end user interface for model training.

model trained on more data throws error during prediction

A model trained on dataset segmented with a small hop_length parameter (resulting in a much higher number of frames) does not work for prediction:

Prompt:  []
Generating sequence...
Temperature:
 0.0
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 62437)
Traceback (most recent call last):
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/osc_server.py", line 33, in handle
    server.dispatcher.call_handlers_for_packet(self.request[0], self.client_address)
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/dispatcher.py", line 193, in call_handlers_for_packet
    handler.invoke(client_address, timed_msg.message)
  File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/dispatcher.py", line 54, in invoke
    self.callback(message.address, self.args, *message)
  File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 126, in handle_g
    seq = generate(sequence_length=sequence_length, temperature=temperature, prompt=prompt)
  File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 97, in generate
    p_label = sample(preds[0], temperature)
  File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 71, in sample
    probas = np.random.multinomial(1, preds, 1)
  File "mtrand.pyx", line 4272, in numpy.random.mtrand.RandomState.multinomial
  File "_common.pyx", line 391, in numpy.random._common.check_array_constraint
  File "_common.pyx", line 377, in numpy.random._common._check_array_cons_bounded_0_1
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
----------------------------------------```

Path cleanup

Pathlib could be used all the way through the code to make the paths more comprehensive.
Brief user tests proved an interface as simple as possible seems like the way to go for setting file directories etc.

autoencoding?

an lstm autoencoder could be used for predicting the next step in the sequence.
this would render clustering redundant, as given predicted features, a single closest matching frame could be found using a k-d tree, similarly to audio mosaicking.
https://machinelearningmastery.com/lstm-autoencoders/

would it solve the general messiness of the model output experienced currently?
worth trying

wwerkk / mc-fp Goto Github PK

mc-fp's People

Contributors

Stargazers

Watchers

mc-fp's Issues

Recommend Projects

Recommend Topics

Recommend Org