WHEEZE AND CRACKLE DETECTION USING DEEP LEARNING

PRIYANGA S

PRIYANGA S

Coimbatore, Tamil Nadu

0 0
  • 0 Collaborators

Our project describes that CNN and Mel-spectrograms can accurately classify adventitious lung sounds. Improving noise filtering and exploring advanced feature , self-supervised learning and Transformers can enhance the system's performance and support automated diagnosis of respiratory disorders. ...learn more

Project status: Under Development

oneAPI, Artificial Intelligence

Intel Technologies
oneAPI

Code Samples [1]

Overview / Usage

Lung sound analysis is an important diagnostic tool for detecting respiratory disorders such as wheezing and crackling. In this analysis, various signal processing techniques such as higher-order spectra, bandpass filtering, and mel spectrogram are used to identify and classify lung sounds.

Wheeze is a continuous, high-pitched whistling sound produced during breathing. To detect wheeze, the lung sound signal is first preprocessed using a bandpass filter to remove noise and other frequency components. A higher-order spectrum analysis is then performed on the filtered signal to extract non-linear features that are indicative of wheeze. The higher-order spectra analysis captures the non- linear properties of the lung sound signal, which are not captured by traditional frequency analysis techniques such as Fourier analysis.

Crackle is a discontinuous, non-musical sound produced during breathing. To detect crackle, the lung sound signal is preprocessed using a bandpass filter to remove noise and other frequency components. A mel spectrogram analysis is then performed on the filtered signal to extract features that are indicative of crackle. The mel spectrogram analysis captures the spectral content of the lung sound signal, which is used to identify the presence of crackle.

Higher-order spectra analysis is a signal processing technique used to extract non- linear features from a signal. This technique is used to capture the higher-order statistics of the signal, which are not captured by traditional linear signal Processing techniques such as Fourier analysis. In lung sound analysis, higher- order spectra analysis is used to extract features that are indicative of wheeze.

One example of a signal that can be used for HOS analysis of lung sounds is the wheezing sound that occurs in patients with asthma or chronic obstructive pulmonary disease (COPD). Wheezing is a high-pitched, continuous sound that occurs during expiration and is caused by airway obstruction. It has a non-linear nature and can be analyzed using HOS techniques to extract information about the underlying mechanisms and potential indicators of disease severity. The HOS analysis of wheezing can provide information about the complexity, regularity, and variability of the signal, which can be used to assess the severity of the disease, monitor treatment progress, and predict exacerbations.

A bandpass filter is a signal processing technique used to remove unwanted frequency components from a signal. In lung sound analysis, a bandpass filter is used to remove noise and other frequency components from the lung sound signal, which can interfere with the detection of wheeze and crackle.

A mel spectrogram is a signal processing technique used to extract spectral content from a signal. In lung sound analysis, a mel spectrogram is used to extract features that are indicative of crackle. The mel spectrogram analysis converts the lung sound signal into a spectrogram, which is a 2D representation of the spectral content of the signal. The mel spectrogram is used to identify the presence of crackle by analyzing the spectral content of the lung sound signal.

In summary, lung sound analysis is an important diagnostic tool for detecting respiratory disorders such as wheezing and crackling. Signal processing techniques such as higher-order spectra, bandpass filtering, and mel spectrogram are used to preprocess and analyze the lung sound signal to identify and classify the presence of wheeze and crackle.

Methodology / Approach

The proposed methodology involves collecting and pre -processing lung sound recordings, segmenting them into respiratory cycles, and labeling them based on the presence of wheeze or crackle. Data augmentation through audio stretching is used to increase the number of training samples, and Mel-spectrograms and other features are extracted from each cycle. The extracted features are then used to train a CNN model, which is optimized to achieve the best results. The proposed system shows promising results in detecting wheeze and crackle in lung sounds and can aid in diagnosing respiratory diseases.

The methodology for our project involves following steps:

Ø Data Collection and Pre-processing

Ø Data Augmentation

Ø Feature Extraction

Ø CNN Training

Data Collection and Pre-processing: We collected a large dataset of lung sound recordings and pre-processed them by segmenting the audio files into respiratory cycles. We labeled each cycle as 'wheeze', 'crackle', 'none', or 'both', depending on the presence of these sounds.

**Data Augmentation: **To increase the number of training samples, we performed data augmentation through audio stretching, which involves altering the tempo of the recordings without changing the pitch.

Feature Extraction: We extracted Mel-spectrograms and other relevant features of each respiratory cycle and provided zero padding as necessary to ensure uniformity in the dataset.

**CNN Training: **Finally, we trained a CNN model to recognize the class labels and improve its performance during testing. We used cross-validation to evaluate the model's accuracy and optimized its hyper parameters to achieve the best results.

i) Model Training: Train a deep learning model, such as a Convolutional Neural Network (CNN) to classify lung sounds as normal or containing wheezing and/or crackles. This can be done using labeled data, with a portion of the data held out for validation and testing.

ii) **Model Evaluation: **Evaluate the performance of the trained model on a separate set of test data. Measure the accuracy, precision, recall, and F1 score of the model's predictions.

iii) **Model Optimization: **Optimize the model by adjusting hyper parameters, such as learning rate, batch size, and regularization parameters. This may involve using techniques such as cross-validation and grid search.

In conclusion, a deep learning model such as a CNN can be trained on pre- processed lung sound recordings to accurately classify them as containing wheezing, crackles, both, or none. Data augmentation techniques such as audio stretching can increase the size of the training set, and feature extraction techniques such as Mel-spectrograms can help capture relevant information from the respiratory cycles. The trained model can be evaluated on a separate set of test data to measure its accuracy, and hyper parameter optimization can improve its performance.

Technologies Used

The analysis of the results obtained from this project indicates that the detection of wheeze and crackle. The results of a wheeze and crackle sound classification based on the output would depend on the specific output format and the performance of the classification algorithm or model. Some possible results that could be reported based on the output of a wheeze and crackle sound classification algorithm or model include, the number of wheezes and crackles detected in the output. The duration of the wheezes and crackles detected in the output. The intensity or amplitude of the wheezes and crackles detected in the output. A spectrogram or waveform display of the output showing the wheezes and crackles detected. A classification of the output as either containing wheezes, crackles, or neither based on a threshold or criteria for detection.

PRECISION ANALYSIS:

Precision is a metric that measures the percentage of correct positive predictions out of all the positive predictions made by the model. A high precision value indicates that the model has a low rate of false positives, i.e., the model is correctly identifying the positive cases.

Looking at the table, we can see that the precision values for the "None" class are consistently higher than those for the other classes. This could be because the majority of the dataset belongs to the "None" class.

For the "Wheezing" class, the precision values increase as the training data percentage increases from 60% to 80%, but then decrease slightly when the training data percentage is further increased to 90%. This suggests that the model may be over fitting to the training data when the training data percentage is high.

For the "Crackle" class, the precision values are consistently higher than those for the "Wheezing" class, particularly for the 80-20 and 90-10 splits. This suggests that the model may be better at identifying crackles than wheezing.

For the "Both" class, the precision values are lower than those for the other classes, particularly for the 60-40 and 70-30 splits. This suggests that have difficulty in correctly identifying cases both wheezing and crackles are present.

Overall, the precision values provide insight into the performance of the model on this specific task, and can be used to guide further improvements in the model's performance.

Precision = TP / (TP + FP)

This table shows the precision analysis of a machine learning model trained on a dataset for classifying lung sounds as None, Wheezing, Crackle, or Both. The precision values are shown for different train-test data splits, represented by the percentages in the first column (60-40, 70-30, 80-20, and 90-10).

RECALL:

Recall is a metric that measures the percentage of true positive predictions out of all the actual positive cases in the dataset. A high recall value indicates that the model is correctly identifying a high proportion of positive cases.

Looking at the table, we can see that the recall values for the "None" class are consistently higher than those for the other classes. This suggests that the model is better at correctly identifying cases where there are no abnormal lung sounds.

For the "Wheezing" class, the recall values are generally lower than those for the "None" class, particularly for the 60-40 and 70-30 splits. However, the recall values for wheezing increase as the percentage of training data increases, suggesting that the model may benefit from additional training data.

For the "Crackle" class, the recall values are generally higher than those for wheezing, particularly for the 80-20 and 90-10 splits. This suggests that the model is better at correctly identifying cases where crackles are present.

For the "Both" class, the recall values are generally lower than those for the other classes, particularly for the 60-40 and 90-10 splits. This suggests that the model may have difficulty in correctly identifying cases where both wheezing and crackles are present.

Overall, the recall values provide insight into the performance of the model on this specific task, and can be used to guide further improvements in the model's

performance. A model with high precision and high recall values would be considered the best performer on this task.

F1-SCORE:

The F1-score is a metric that combines both precision and recall to provide an overall measure of a model's accuracy. It is the harmonic mean of precision and recall and ranges from 0 to 1, with higher values indicating better performance.

Looking at the table, we can see that the F1-score values for the "None" class are consistently higher than those for the other classes. This suggests that the model is better at correctly identifying cases where there are no abnormal lung sounds.

For the "Wheezing" class, the F1-score values are generally lower than those for the "None" class, but increase as the percentage of training data increases. This suggests that the model may benefit from additional training data to improve its performance on identifying wheezing.

For the "Crackle" class, the F1-score values are generally higher than those for wheezing, particularly for the 80-20 and 90-10 splits. This suggests that the model is better at correctly identifying cases where crackles are present.

For the "Both" class, the F1-score values are generally lower than those for the other classes, particularly for the 60-40 and 90-10 splits. This suggests that the model may have difficulty in correctly identifying cases where both wheezing and crackles are present.

Overall, the F1-score values provide insight into the model's overall performance on this specific task, and can be used to guide further improvements in the model's precision and recall. A model with high F1-score values would be considered the best performer on this task.

F1-score = 2 * (precision * recall) / (precision + recall)

This table shows the F1-score analysis of a machine learning model trained on a dataset for classifying lung sounds as None, Wheezing, Crackle, or Both. The F1- score values are shown for different train-test data splits, represented by the percentages in the first column (60-40, 70-30, 80-20, and 90-10).

CONFUSION MATRIX:

A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted labels to the true labels. The rows of the matrix represent the true labels, while the columns represent the predicted labels. Each cell in the matrix represents the number of instances that were classified as the predicted label for a particular true label.

Looking at the table, we can see that the model has correctly classified a large number of instances as "None", as evidenced by the high numbers along the diagonal in the first row and column. The model has also correctly identified instances of "Wheezing" and "Crackle" to some extent, as evidenced by the non-zero values in the second and third rows and columns. However, the model

appears to have more difficulty identifying instances of "Both", as evidenced by the lower numbers along the diagonal in the fourth row and column.

Overall, the confusion matrix provides a detailed breakdown of the model's performance on each class, and can be used to calculate various performance metrics such as precision, recall, and F1-score. The confusion matrix is a useful tool for evaluating the performance of a classification model and can help identify areas for improvement.

The output of a wheeze and crackle sound classification project would depend on the specific methods and techniques used in the project. In general, the output of a wheeze and crackle sound classification project might include, A list of wheezes and crackles detected in the audio recording, along with their start and end times and the severity or intensity of each sound. A classification of the audio recording as containing wheezes, crackles, or neither based on a predetermined criteria or threshold.

Repository

https://github.com/Priyanga-S/Wheeze-and-Crackle-Detection-Analysis

Comments (0)