jimmychen623 / genre_classification Goto Github PK
View Code? Open in Web Editor NEWORIE4741 Project : Classifying genre of songs
ORIE4741 Project : Classifying genre of songs
The team aims to train a classifier to predict the genre of a song given the musical data. The data they used include a 300GB unlabeled dataset as well as the labeled Allmusic Top Genre Dataset.
Three good things:
Three things to improve:
You really did a lot of work. Your interpretation and future steps are quite intuitive. Good job!
Project Summary
This study aims at developing a model to automatically classify songs into different genres.
Positive aspects of this project:
Concerns and suggestions on improvements:
The project seeks to classify songs into genres and subgenres based upon meta data about the songs.
The classifier will be trained and tested using data from the Million Song Dataset.
Strengths
1. The dataset is very expansive and has quite a lot of fields that can be studied about each song.
2. The proposed approach is explained in detail and broken down into understandable, chronoligical steps
3. The approach section contains information about how the project could be expanded upon (if need be).
Areas to Improve
1. The dataset contains the artist and tags associated with the artist on two different music sites (musicbrainz.org and playme.com). If these tags contained genres would it be possible for your model to "cheat" and use these genres directly
2. It seems somewhat unlikely that someone would have so much metadata about a song and not have genre information. Maybe the model should only use the metadata related to the sound file itself?
3. If the application of this is to music streaming sites as stated in the motives section, couldn't the artist who created or uploaded the music provide genre information when the music is uploaded?
This is an excellent project! In the report, you guys satisfied all the condition of the class name of 'big messy data'. The tools you use are very comprehensive including supervise learning and unsupervised learning. Although some of the methods turn out not to be very handy, you guys list them all with each performance.
No additional words. Very impressive!
Project Summary
The Genre Classification project is attempting to predict the genre of a song from its metadata. The dataset the team is using is a million song set from Columbia University. The motive of the group is to improve the current recommendation system for what song should play next on a streaming service.
Positive Attributes of the Project
Areas for improvement
The project is to predict the genre of a song from properties and metadata of the song. The dataset being used is a set of properties for 1 million songs combined with genre tags for many of them. The results of this project could be useful in different ways to electronic musicians and online music platforms. Overall, I like this project and I think some really interesting results are possible.
I'm glad you've gone through the trouble of quantitatively analyzing your features to find out which ones might be differentiating between genres. Nice job.
I have a few comments on feature engineering. I like how you broke the key into 11 binary features when you realized it was more nominal than ordinal. However, it seems like the primary model type you tested so far is decision trees and forests, but for those you can simply have an 11-way branch rather than splitting into tons of binary features. Splitting into binary features makes more sense to me if you are doing something like linear regression or neural networks that actually relies on the numbers. If you do end up going that route, I think the binary output for each genre is better than the single_genre
feature for the same reason that you split up the key into individual features rather than one single feature. And then your output can be interpreted as a one-hot encoding of the single best genre.
One comment about your future directions is your mention of RNNs. I see why you might want to use them, but I think I would use them as a last resort. They are quite computationally intensive to train, so you could make a much more complex decision tree or linear regression model for the same computational expense as training a simple RNN.
One other thing I would keep in mind going forward is who might want to use your final model and which features they have easy access too. I don't have too much music knowledge, and it seems like all of your features can be automatically extracted from songs given your dataset, but just a note that whoever might want to use your model will need to be able to extract all of your input feature from the song that they want to classify.
If you have any comments or questions about what I've said, feel free to comment on this Github issue.
The objective of the project is to train a classifier for predicting the genre of a song given musical data about the song. They also hope to understand the subtle differences that underlie different genres of music.
I like the fact that the team has used almost all of the classification techniques in class and wrote about the result. I also like that the team has a very detailed description of the data in the final report. Last but not least, the graphs in the final report looks great and they help readers understand the result.
I think the team has a good conclusion paragraph that shows even though the result is not optimal, there is value in the project's result. I think the project can be used as a basis for more complex models.
The overall goal is to train a classifier to predict the genre of a song based on the given metadata about the song. If done well, this can pick up on better music selection for people. I think its a great topic. It is very modern and something that companies such as Pandora and Spotify would be interested in. I liked that you provided lot of detail on how you cleaned/modified your dataset. I can really understand/get the big picture of this project. This group seems to also know where they want to go. There seems to be a lot of different models that you want to keep playing with which is great.
I think you can improve on visuals. It is not a good idea to cram all of your visuals to half a page in one side of the page. They are too small so really try to use half a page at-least for two graphs. You should also try to expand more on how some models will be better than others. I agree with the other peer reviews that you may want to expand on how you tested the models exactly. Especially because Udell emphasized this greatly in the class.
Overall Great work, excited to see the end results.
The objective of the project is to classify songs into genres based upon meta data about the songs. I think it is great as it could support streaming services company to provide private suggestion for each user.
I really like how you chose your features, there is insightful thought given to how you selected and transformed your variables, especially for the key feature that you handled just right. I also really appreciate your data visualizations; it gives a more concrete view of how your data are distributed and their specific characteristics. It is a good thing that you also implemented random forest for your model even if your results were not what we could expect, but it was a good idea to try it!
I have a few constructive feedbacks to make. First of all be careful with your plots, do not forget to include title and label so that it can clear it up for your readers. Then, even if I really like how you handled the feature transformation of the variable key, it was not that important as you did not perform any kind of linear regression or model other than tree-related models. Still it can be very useful if you decide to do so in the future. When you talk about accuracy you should provide more information about your train and test set that helped you find the numbers you put in the report.
It seems like you have many ideas for the future but be careful as you are planning on implementing methods that we have not seen during class, I hope you will not struggle with them. However I think it is a bold choice and I appreciate it.
Overall I think that it is a really interesting project, I can’t wait to see your final results!
The purpose of this project is to determine the genre of a specific song based on some features of the song’s information. And all these features are determined by the model as the most useful features for the song’s classification.
What I like about this project:
Improvements and suggestions:
This project trains a classifier to predict the genre of a song based on the information of a song. By using such a classifier, we can have better understanding of subtle differences existing among different types of music and streaming service can have better guide to automatically tag music uploaded by users.
This is a very interesting topic, especially to me, who is very interested in music and have a little background in this area. If this model is finally built up, the application will be very broad and online music platform is willing to use this. I like the progress you have made so far. Choosing and transforming features and then testing different models can let you discover which fits the training data the best. It also shows your understanding to the data set very well. You have brought up many new ideas in future approaches, which shows clearly you know how to proceed for the next step. I am looking forward to seeing your progress for the next step.
I also have some suggestions about the report. The data visualization can be improved. I can barely see the graphs because they are very small and some many genres are sticking together. In the final report, I hope they can be displayed well and everyone can understand directly from what they see. As we know the data set you use is from a huge pool of data, which is hard to represent everything. I am not sure whether the data set you use to train is biased or not. I am very curious about the idea of using NLP because it's something far beyond I can come up with. It's possible to use that but I am not sure under the current situation, you have time to do that. Considering the lyrics will make the whole thing much more complicated. It might even make the model less accurate. But it's still worth trying if you have additional time.
This report was focused on classifying song genres given musical data about the song. The project used a 300GB dataset from the Million Song Dataset. The project has clear benefits of improving music recommendation engines such as Spotify and Pandora. First, I liked that this group put a lot of effort in analyzing the data. The group found that there was a large class imbalance, which would have skewed the predictive power of their final classifier, and remedied this by taking 4000 songs from each genre. The report was also very clear with the feature descriptions, which helped me get a much better sense of the data they were working with. Second, the group also experimented with a large number of different models- both covered and not covered in class.
Some things were this group could improve it that while the group covered an impressive number of models, it would have been nice to see more tuning/exploration within each model. For example, in Random Forest, the report only mentioned varying the number of trees to find the optimal Random Forest model. However, they could have experimented with maximum depth in each tree (in relation to RF) and the number of features to consider when looking for the best split. Also their chose to use CNN was interesting as CNN architecture is optimized to classify image data. This may have been why the performance was not as high as other models tested. Overall, this was a well-written and interesting report with practical ways to apply the model in the future when the classification accuracy is increased.
This report delved into a very interesting and neat project that focused on developing a classifier that would be able to label a song into one of 13 different genres based on many different features including tempo, duration, loudness, etc. As the group mentioned, I believe that this project could be very useful to online music platforms such as Spotify, Pandora, and possibly Youtube, in helping recommend certain types of music to people who listen to a certain genre a lot. The data set that the group is very large and comes from the Million Data Set Challenge; the genre labels come from two other data sets that correspond to the MSD dataset. I believe that the group made a wise choice in utilizing a random and uniformly distributed smaller part of dataset rather than the entire dataset in order to make the project more scalable, but possible still be able to create a good predictive model.
There were many positives with regards to the process the group used in developing the project so far. One specific area that I really thought was well done was the feature engineering on the data set. I specifically thought the handling of the genre labels and overall, the categorical features, was very interesting and handled appropriately by considering two valid different approaches. This was a similar problem my group faced with regards to movies and I think we considered a similar approach. The only concern I have is that by selecting the first genre listed, that you are not only selecting the genre that comes alphabetically first, or based on some basic list order. Are the genres sorted in anyway and is there a way you could choose the genre based on information on how much a certain song is 'pop' or is 'rock' compared to the other genres listed? I am not sure if this will make a large difference in the final result, but it might be an interesting experiment. You may want to try cross validation to see which model fits better when a certain song is one genre compared to another genre. Another positive aspect of your project was the description of deleting the extraneous features and keeping the relevant features. I think it was very well explained and illustrated the reasoning behind your choices clearly. Finally, I think that exploring many different models was a significant advantage of the report as it illustrates how the data may be modeled and predicted based on different algorithms we have learned in class. The results help make the final algorithm decision for the final project much more easier and accurate.
One important aspect that could be improved would be a more detailed information on how the two datasets regarding the genre labels were used or combined. Specifically, you mentioned how the first genre dataset was utilized in the label generation, but I was not quire clear on how the second genre dataset was used. Were the features in each dataset similar, or were they different and you combined the datasets together? Additionally, though, testing on many different models is generally very good, I think that putting more detail on how you tested each model could be more helpful in understanding your results even more. Specifically, I think that describing how you split the data into train and testing data for each model, if possible., would illustrate the reasoning behind the error behind each model. As you mentioned, I think going into more detail about k-fold cross-validation could help as well. Finally, as others have mentioned, I believe that the plots you provided do illustrate trends, but could use some more aesthetic formatting that helps the reader understand what you are trying to illustrating.
Overall, I think you did a great job in exploring more models, and I think the results could be very intriguing!
The project focuses on classifying songs into a particular genre given details of the song. I think the project is very interesting and relevant as it can greatly improve on the recommendation of songs on music streaming services like Spotify, which I personally use every day.
Things I like about the project:
Things which can be improved:
Also what if you had tried a PCA model for prediction? Do you think it would have outperformed the other models?
Overall, great work! Was interesting and I had a good time going through it.
The goal of this project is to classify the genre of a song given musical data about the song. The team uses a very big, messy dataset consisting of song data and another dataset consisting of genre labels. The team uses many models and techniques from class to approach this problem.
Strengths:
Improvements:
This project is proposed based on the background that streaming services provide relevant music suggestions to users. The data set includes many features of a song such as tempo, key, artist and genre. The goal is to classify the genre of a song.
I think it is a very interesting proposal.
Besides, I have several concerns about this project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.