Music Genre Classification

Srinithi Arunachalam

Srinithi Arunachalam

Coimbatore, Tamil Nadu

0 0
  • 0 Collaborators

This project uses MFCC features and the k-NN algorithm to classify music genres from audio samples. It preprocesses data, trains the model, and evaluates accuracy, producing a confusion matrix and ROC curves. The goal is accurate genre classification using audio characteristics. ...learn more

Project status: Published/In Market

oneAPI, Internet of Things, Artificial Intelligence

Intel Technologies
oneAPI

Overview / Usage

Project Overview:

This project focuses on automating the classification of music genres from audio samples. By employing Mel-Frequency Cepstral Coefficients (MFCC) and the k-Nearest Neighbors (k-NN) algorithm, the system identifies the genre of a given audio track. It tackles the challenge of genre classification, enabling efficient categorization of music based on its acoustic characteristics.

Problems Addressed:

The project addresses the challenge of genre classification in the realm of music analysis. Differentiating between music genres is often subjective and complex, particularly in large datasets. This project aims to automate this process using data-driven techniques, which can significantly enhance the accuracy and speed of genre categorization.

Utility and Application:

The work presented here finds utility in various domains, such as music streaming platforms, recommendation systems, and music analysis research. In production, this approach can be integrated into music platforms to enhance user experience by providing accurate genre tags, enabling personalized recommendations, and facilitating music organization. Researchers can leverage this project to understand and explore the effectiveness of MFCC and k-NN for genre classification, leading to further advancements in audio analysis.

By combining MFCC extraction with k-NN classification, the project contributes to solving a real-world problem in music technology, making genre classification more objective, efficient, and scalable.

Methodology / Approach

Methodology:

The methodology employed in this project involves using Mel-Frequency Cepstral Coefficients (MFCC) as audio features and the k-Nearest Neighbors (k-NN) algorithm for genre classification. Here's how the approach is structured to solve the problem of genre classification:

  1. Data Preprocessing:

    • Audio samples are organized in the `genres_original/` directory, categorized by genre.

    • Each audio sample is converted into a sequence of MFCC vectors using the `python_speech_features` library.

    • MFCC features capture the timbral characteristics of the audio, crucial for genre differentiation.

  1. Feature Representation:

    • Each MFCC sequence is summarized into a mean matrix and covariance matrix to represent the audio sample's characteristics.

    • These matrices effectively capture the essence of each audio sample's genre-related attributes.

  1. k-Nearest Neighbors (k-NN) Classification:

    • The k-NN algorithm measures the similarity between the features of the test audio and those of the training set.

    • A customized distance function, taking into account mean and covariance matrices, computes the similarity.

    • Neighbors with the closest features are identified for classification.

    • The genre with the highest count among the neighbors is chosen as the predicted genre.

  1. Evaluation and Visualization:

    • The accuracy of the classification is computed, along with optimal k value, through evaluation on test data.

    • Confusion matrix reveals the model's performance across different genres.

    • ROC curves and AUC scores depict the classification's sensitivity and specificity.

Technology and Techniques:

  • MFCC Extraction: Mel-Frequency Cepstral Coefficients (MFCCs) are extracted from audio using the `python_speech_features` library. These coefficients provide compact representations of audio features.

  • k-NN Algorithm:The k-Nearest Neighbors algorithm is used for genre classification, relying on similarity measures between features.

  • Customized Distance Function: A tailored distance function considering mean and covariance matrices is used to calculate similarity between audio samples.

  • Python Libraries: NumPy, SciPy, pandas, matplotlib, seaborn, and scikit-learn are utilized for data manipulation, mathematical computations, visualization, and machine learning.

  • Data Serialization: Serialized data in `my.dat` file is used to store preprocessed training and testing data.

  • Confusion Matrix: Visualization of confusion matrix using Seaborn provides insight into classification performance.

  • ROC Curves: Receiver Operating Characteristic (ROC) curves illustrate the trade-off between true positive and false positive rates for various thresholds.

This methodology leverages modern audio analysis techniques, machine learning, and data visualization to automate genre classification. By utilizing these frameworks and techniques, the project tackles the complex task of genre differentiation in a structured and efficient manner.

Technologies Used

Technologies, Libraries, Tools:

  • Python: The primary programming language used for development.

  • NumPy: For efficient numerical computations and array operations.

  • SciPy: For scientific and technical computing, including linear algebra and statistical operations.

  • pandas: For data manipulation and analysis, including handling datasets.

  • matplotlib: For data visualization and plotting.

  • seaborn: For creating informative and attractive statistical graphics.

  • scikit-learn: For machine learning algorithms, including k-Nearest Neighbors and preprocessing.

  • python_speech_features: For extracting Mel-Frequency Cepstral Coefficients (MFCCs) from audio.

  • pickle: For serializing and deserializing Python objects, used for data storage.

  • GitHub: For version control and collaborative development.

Comments (0)