Git Product home page Git Product logo

everybody-compose's Introduction

Everybody Compose: Deep Beats To Music

Authors: Conghao (Tom) Shen, Violet Yao, Yixin Liu

Abstract

This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated music. This project allows anyone to compose their own music by tapping their keyboards or ``recoloring'' beat sequences from existing works.

Getting Started

To get started, clone this repository and install the required packages:

git clone https://github.com/tsunrise/everybody-compose.git
cd everybody-compose
pip install -r requirements.txt

You may encouter dependency issues during training on protobuf. If so, try reinstall tensorboard by running:

pip install --upgrade tensorboard

This issue is due to an conflicting requirements of note_seq and tensorboard.

We have also provided a Colab Notebook for your reference.

Training

The preprocessed dataset will automatically be downloaded before training. To train a model, run the train.py script with the -m or --model_name argument followed by a string specifying the name of the model to use. The available model names are:

  • lstm_attn: LSTM with Local Attention
  • vanilla_rnn: Decoder Only Vanilla RNN
  • attention_rnn: LSTM with Full Attention
  • transformer: Transformer RPR

You can also use the -nf or --n_files argument followed by an integer to specify the number of files to use for training (the default value of -1 means that all available files will be used).

To specify the number of epochs to train the model for, use the -n or --n_epochs argument followed by an integer. The default value is 100.

To specify the device to use for training, use the -d or --device argument followed by a string. The default value is cuda if a CUDA-enabled GPU is available, or cpu if not.

To specify the frequency at which to save snapshots of the trained model, use the -s or --snapshots_freq argument followed by an integer. This specifies the number of epochs between each saved snapshot. The default value is 200. The snapshots will be saved in the .project_data/snapshots directory. The default value is 200.

To specify a checkpoint to load the model from, use the -c or --checkpoint argument followed by a string specifying the path to the checkpoint file. The default value is None, which means that no checkpoint will be loaded.

Here are some examples of how to use these arguments:

# Train the LSTM with Local Attention model using all available files, for 100 epochs, on the default device, saving snapshots every 200 epochs, and not using a checkpoint
python train.py -m lstm_attn

# Train the LSTM with Local Attention model using 10 files, for 1000 epochs, on the CPU, saving snapshots every 100 epochs, and starting from the checkpoint
python train.py -m lstm_attn -nf 10 -n 1000 -d cpu -s 100 -c ./.project_data/snapshots/my_checkpoint.pth

# Train the Transformer RPR model using all available files, for 500 epochs, on the default device, saving snapshots every 50 epochs, and not using a checkpoint
python train.py -m transformer -n 500 -s 50

Generating Melodies from Beats

To generate a predicted notes sequence and save it as a MIDI file, run the predict_stream.py script with the -m or --model_name argument followed by a string specifying the name of the model to use. The available model names are:

  • lstm_attn: LSTM with Local Attention
  • vanilla_rnn: Decoder Only Vanilla RNN
  • attention_rnn: LSTM with Full Attention
  • transformer: Transformer RPR

Use the -c or --checkpoint_path argument followed by a string specifying the path to the checkpoint file to use for the model.

The generated MIDI file will be saved using the filename specified by the -o or --midi_filename argument (the default value is output.mid).

To specify the device to use for generating the predicted sequence, use the -d or --device argument followed by a string. The default value is cuda if a CUDA-enabled GPU is available, or cpu if not.

To specify the source of the input beats, use the -s or --source argument followed by a string. The default value is interactive, which means that the user will be prompted to input the beats using the keyboard. Other possible values are:

  • A file path, e.g. beat_sequence.npy, to load the recorded beats from a file. Recorded beats can be generated using the create_beats.py script.
  • dataset to use a random sample from the dataset as the beats.

To specify the profile to use for generating the predicted sequence, use the -t or --profile argument followed by a string. The available values are beta, which uses stochastic search, or beam, which uses hybrid beam search. The heuristic parameters for these profiles can be customized in the config.toml file by adjusting the corresponding sections in [sampling.beta] and [sampling.beam]. The default value is default, which uses the settings specified in the config.toml file.

Here are some examples of how to use these arguments:

# Generate a predicted sequence using the LSTM with Local Attention model, from beats by the user using the keyboard, using the checkpoint at ./.project_data/snapshots/my_checkpoint.pth, on the default device, and using the beta profile with default settings
python predict_stream.py -m lstm_attn -c ./.project_data/snapshots/my_checkpoint.pth -t beta

everybody-compose's People

Contributors

tsunrise avatar violetyao avatar yixinliu030 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

everybody-compose's Issues

Add train, dev split

Discuss potential need to measure bias and variance and implement train dev split

blocked by #8

Add encoder/decoder LSTM

The input to our current model is batch_size * seq_len * 2 (curr dur, dur from last note). If we want to do greedy inference, we need a decoder to take in the prediction of our model, which is in the space of all possible notes.

Add Beats Generator

Implement a program to allow users to input their own beats. Ideally this program should be hosted on the web, but for simplicity, a python script that takes I/O is also satisfactory.

AWS Deployment

  • Write an automated shell script for deployment
  • Deploy the training pipeline to AWS

Add speed normalization

Each song has different tempo, which causes large variation in X input. We may want to normalize this so that each time_from_prev will have zero mean and (optionally, 1 std. not sure if there is any bad effect on forcing 1 std though)
we can retain this tempo information by feeding the mean to the network as well (feed to initial state of LSTM)

Explore more losses

Explore more losse for non-GAN models:

  • Add smoothing term to reduce difference between neighboring notes

Possibly Incorrect return value of `generate_sequences`

https://github.com/tsunrise/cs230-proj/blob/7dd5fb490c172b69aca9ebb85b1f730e3ce3d806/preprocess/dataset.py#L68-L86

For this project, y_notes is not "the next expected note that the network should predict given the beat sequence". It's a sequence of notes of length is seq_length. Therefore, the shape of y_notes is not (num_examples, 1), it should be something similar to (num_examples, seq_length, num_notes) where num_notes is the number of all possible notes we can expect (for piano it's 52 88).

In other words, suppose on the j's note of i's example, the note is 8, then y_notes[i][j] = [0,0,0,0,0,0,0,1,0,0,0,0...,0].

@YixinLiu030

Add tensorboard logger

Add tensorboard logger for easier metrics tracking

Sample code:

writer = tensorboard.SummaryWriter(log_dir=log_dir)
writer.add_scalar(
                    'validation loss',
                    avg_val_loss,
                   epoch
                )

Explore better model output

Explore better model output rather than a single note (highest note in a chord).

one hot for number of notes in chord + multi-hot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.