mc-fp's People
mc-fp's Issues
generated token sequence - alternating buffer
on a M1 MacBook Pro CPU it does not seem to be possible to generate sequences longer than 32 tokens fast enough to keep up with real-time, at least with the trebles
model.
alternating token buffers need to be implemented so that while one token sequence is being played through, another can be generated at the same time
online.py - generation offset
The drumloop2 model generates rotations of the sequence which tend to move by token or two with each generation, even with temperature <=1.
Offline generation works properly, as it results in ABCB
pattern which corresponds exactly to the drum pattern in the data.
Might be worth to test offline generation with longer sequences to determine if the pattern shifts in that case as well.
refined clustering
feature extraction and clustering workflow could be improved by considering the following implementation:
https://github.com/TylerMclaughlin/wav_clustering_workflow
labelled_frames dict refactor
labelled_frames
dict object saves frames sample by sample which is redundant
it should be implemented to save sample index values for frame start and end only
generate function parameters control
generate
function parameters in generate.py
could be easily passed via OSC as a part of the message triggering generation
Byte-Pair Encoding
It could help a lot to detect commonly occuring sub-sequences and replace them with a single token.
first detected beat backtracking
maybe beat tracking generally should be handed over to streamer.py
instead of being included in the training script?
Max always picks the first frame in array
Max is now responsible for looking up the frames according to the frames.json
dictionary
at the moment it always picks up the first frame in the value array of a given key
this should be picked at random instead
ie. by selecting random odd value within the range of length of array-1 as it is done in Streamer.get_frame
frame picking
could do with a few modes which would change behaviour of picking the frame of a given class from the dictionary when a grain is being triggered:
- locked n-th frame only
- random (as it is now)
- sequential (from first to last frame of given class, could us a
table
object to save state per-token) - sequential (from last to first)
- mixture between the above, ie. using a markov chain or other means of controlled randomization
poor benchmark output
sweep, drumloop models trained on very simple data generate output that makes no sense in comparison to the input audio
validation
presumably there is no validation method viable for any sort of audio data and the training metrics are very difficult to trust in cases of audio input with high variance.
it might be a good idea to implement k-fold validation, ie. as described here
Use librosa.stream for loading and processing of input
File loading currently uses audio2numpy to load audio cut into fixed lengths during preprocessing.
This is cognitively suboptimal and leads to common errors in code.
Implementing the use of librosa.stream
for loading and analysis also seems more straightforward than using tf.keras.utils.audio_dataset_from_directory
as using tf.data.dataset
objects still often evades my intuition.
tempo/beat detection
would be super nice for slicing
using previous generated sequence as generation prompt
generation function refactor
necessary, as it generates sequences starting with 128 zeros when maxlen is set to 256
timestretching
would be nice, but most likely would require refactoring synthesis to use mc.groove
polyphony
Max resynthesis should implement multiple voices, likely using mc
OSC queue crashes
with higher density, lots of generate
are sent and clog up udp
Fixed-size segmentation results in multiple errors when analysing data
Segmentation is done using librosa.util.frame
which works, but following analysis code still needs fixing to be compatible.
Stand-alone generation script
Generation could be implemented as a stand alone Python script, allowing for further extensions, ie. real-time sequence generation or communication with other environments to be used for resynthesis
OSC communication
A basic script could be implemented using python-osc to communicate with external applications, ie. Max MSP
trigger model (re)load via OSC
perhaps a function mapped to /m
messages could deal with that?
should not be too difficult to implement, since the model name only is sufficient enough to localize the necessary directory/files
features
current extract_features
function in train.py
computes the following:
- zero crossings
- energy
- spectral centroid
- spectral bandwidth
- spectral flatness
- spectral rolloff
- MFCCs 1-13
it is worth testing various combinations of the above as some might turn out to be redundant or worsen the performance
General refactor
Data processing, model training and sequence generation could be implemented as separate classes, as the Jupyter notebook is meant to be the end user interface for model training.
model trained on more data throws error during prediction
A model trained on dataset segmented with a small hop_length
parameter (resulting in a much higher number of frames) does not work for prediction:
Prompt: []
Generating sequence...
Temperature:
0.0
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 62437)
Traceback (most recent call last):
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 683, in process_request_thread
self.finish_request(request, client_address)
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/osc_server.py", line 33, in handle
server.dispatcher.call_handlers_for_packet(self.request[0], self.client_address)
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/dispatcher.py", line 193, in call_handlers_for_packet
handler.invoke(client_address, timed_msg.message)
File "/Users/wwerkowicz/miniconda/envs/cpu/lib/python3.10/site-packages/pythonosc/dispatcher.py", line 54, in invoke
self.callback(message.address, self.args, *message)
File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 126, in handle_g
seq = generate(sequence_length=sequence_length, temperature=temperature, prompt=prompt)
File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 97, in generate
p_label = sample(preds[0], temperature)
File "/Users/wwerkowicz/GS/MC/MC-FP/MC-FP-master/generate.py", line 71, in sample
probas = np.random.multinomial(1, preds, 1)
File "mtrand.pyx", line 4272, in numpy.random.mtrand.RandomState.multinomial
File "_common.pyx", line 391, in numpy.random._common.check_array_constraint
File "_common.pyx", line 377, in numpy.random._common._check_array_cons_bounded_0_1
ValueError: pvals < 0, pvals > 1 or pvals contains NaNs
----------------------------------------```
Path cleanup
Pathlib could be used all the way through the code to make the paths more comprehensive.
Brief user tests proved an interface as simple as possible seems like the way to go for setting file directories etc.
autoencoding?
an lstm autoencoder could be used for predicting the next step in the sequence.
this would render clustering redundant, as given predicted features, a single closest matching frame could be found using a k-d tree, similarly to audio mosaicking.
https://machinelearningmastery.com/lstm-autoencoders/
would it solve the general messiness of the model output experienced currently?
worth trying
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.