Augmenting dysphonia voice using Fourier-Based synchrosqueezing transform (FSST) for a CNN classifier

Alice Rueda

Alice Rueda

Toronto, Ontario

5 0
  • 0 Collaborators

The challenge of dysphonia voice studies is always the small dataset. It is difficult to apply more sophisticated deep learning techniques without overfitting or underfitting. Convolutional neural network (CNN) is a powerful classifier that requires a large amount of training data. Data augmentation techniques for voice are limited. Fourier-based synchrosqueezing transform (FSST) can be used as a data aug- mentation technique to increase the data size. The results indicated that not only can FSST increase the data size, the CNN can also learn better with FSST than with Short-Time Fourier Transform (STFT) power spectrum. The loss function for FSST converges, but not for STFT. FSST is also more stable and provides more accurate results. ...learn more

Project status: Published/In Market

Artificial Intelligence

Intel Technologies
AI DevCloud / Xeon

Code Samples [1]

Overview / Usage

This is to show case that there are possible augmentation techniques to increase the dataset that is suitable to train a DNN.

Methodology / Approach

Instead of using the typical spectrogram to feed into CNN, a sharper and more sparse time-frequency representation was used. The results showed that for a limited pathological dataset, there is not enough samples for even a sample CNN to learn. However, the sharper and more sparse representation can train the CNN even with sample size as small as 100 per class.

Technologies Used

Tensorflow, pipelining data through TFRecords, DevCloud

Repository

https://github.com/alicerueda/ICASSP2019.git

Comments (0)