Music Genre Classification Model

AIM

To develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost Classifier algorithms for the Kaggle GTZAN Dataset Music Genre Classification.

DATASET LINK

https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/data

MY NOTEBOOK LINK

https://colab.research.google.com/drive/1j8RZccP2ee5XlWEFSkTyJ98lFyNrezHS?usp=sharing

LIBRARIES NEEDED

LIBRARIES USED

librosa
matplotlib
pandas
sklearn
seaborn
numpy
scipy
xgboost

DESCRIPTION

What is the requirement of the project?

The objective of this research is to develop a precise and effective music genre classification model using Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest and XGBoost algorithms for the Kaggle GTZAN Dataset Music Genre Classification.

Why is it necessary?

Music genre classification has several real-world applications, including music recommendation, content-based music retrieval, and personalized music services. However, the task of music genre classification is challenging due to the subjective nature of music and the complexity of audio signals.

How is it beneficial and used?

For User: Provides more personalised music
For Developers: A recommendation system for songs that are of interest to the user
For Business: Able to charge premium for the more personalised and recommendation services provided

How did you start approaching this project? (Initial thoughts and planning)

Initially how the different sounds are structured.
Learned how to represent sound signal in 2D format on graphs using the librosa library.
Came to know about the various features of sound like
- Mel-frequency cepstral coefficients (MFCC)
- Chromagram
- Spectral Centroid
- Zero-crossing rate
- BPM - Beats Per Minute

Mention any additional resources used (blogs, books, chapters, articles, research papers, etc.).

EXPLANATION

DETAILS OF THE DIFFERENT FEATURES

There are 3 different types of the datasets.

- genres_original
- images_original
- features_3_sec.csv
- feature_30_sec.csv

The features in genres_original

['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock'] Each and every genre has 100 WAV files
The features in genres_original

['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock'] Each and every genre has 100 PNG files
There are 60 features in features_3_sec.csv
There are 60 features in features_30_sec.csv

WHAT I HAVE DONE

Step 1Step 2Step 3Step 4Step 5Step 6Step 7

Created data visual reprsentation of the data to help understand the data

Found strong relationships between independent features and dependent feature using correlation.

Performed Exploratory Data Analysis on data.

Used different Classification techniques like SVM, Random Forest,

Compared various models and used best performance model to make predictions.

Used Mean Squared Error and R2 Score for evaluating model's performance.

Visualized best model's performance using matplotlib and seaborn library.

PROJECT TRADE-OFFS AND SOLUTIONS

Trade Off 1Trade Off 2Trade Off 3Trade Off 4

How do you visualize audio signal

Solution:
librosa: It is the mother of all audio file libraries
Plotting Graphs: As I have the necessary libraries to visualize the data. I started plotting the audio signals
Spectogram:A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. Here we convert the frequency axis to a logarithmic one.

Features that help classify the data

Solution:
Feature Engineering: What are the features present in audio signals
Spectral Centroid: Indicates where the ”centre of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound.
Mel-Frequency Cepstral Coefficients: The Mel frequency cepstral coefficients (MFCCs) of a signal are a small set of features (usually about 10–20) which concisely describe the overall shape of a spectral envelope. It models the characteristics of the human voice.
Chroma Frequencies: Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.

Performing EDA on the CSV files

Solution:
Tool Selection: Used the correlation matrix on the features_30_sec.csv dataset to extract most related datasets
Visualization Best Practices: Followed best practices such as using appropriate chart types (e.g., box plots for BPM data, PCA plots for correlations), adding labels and titles, and ensuring readability.
Iterative Refinement: Iteratively refined visualizations based on feedback and self-review to enhance clarity and informativeness.

Implementing Machine Learning Models

Solution:
Cross-validation: Used cross-validation techniques to ensure the reliability and accuracy of the analysis results.
Collaboration with Experts: Engaged with Music experts and enthusiasts to validate the findings and gain additional perspectives.
Contextual Understanding: Interpreted results within the context of the music, considering factors such as mood of the users, surrounding, and specific events to provide meaningful and actionable insights.

SCREENSHOTS

Project workflow

  graph LR
    A[Start] --> B{Error?};
    B -->|Yes| C[Hmm...];
    C --> D[Debug];
    D --> B;
    B ---->|No| E[Yay!];

Visualizations and EDA of different features

MODELS USED AND THEIR ACCURACIES

Model	Accuracy
KNN	0.80581
Random Forest	0.81415
Cross Gradient Booster	0.90123
SVM	0.75409

MODELS COMPARISON GRAPHS

Models Comparison Graphs

ACC Plot

accplot

CONCLUSION

We can see that Accuracy plots of the different models.
XGB Classifier can predict most accurate results for predicting the Genre of the music.

WHAT YOU HAVE LEARNED

Insights gained from the data

Discovered a new library that help visualize audio signal
Discovered new features related to audio like STFT, MFCC, Spectral Centroid, Spectral Rolloff
Gained a deeper understanding of the features of different genres of music

Improvements in understanding machine learning concepts

Enhanced knowledge of data cleaning and preprocessing techniques to handle real-world datasets.
Improved skills in exploratory data analysis (EDA) to extract meaningful insights from raw data.
Learned how to use visualization tools to effectively communicate data-driven findings.

USE CASES OF THIS MODEL

Application 1Application 2

User Personalisation

It can be used to provide more personalised music recommendation for users based on their taste in music or the various genres they listen to. This personalisation experience can be used to develop 'Premium' based business models.

Compatability Between Users

Based on the musical taste and the genres they listen we can identify the user behaviour and pattern come with similar users who can be friends with. This increases social interaction within the app.

FEATURES PLANNED BUT NOT IMPLEMENTED

Feature 1Feature 1

Real-time Compatability Tracking
Implementing a real-time tracking system to view compatability between users.

Predictive Analytics
Using advanced machine learning algorithms to predict the next song the users is likely to listen to.