Voice Recognition CNN - Yet another Enchancement

Ifrim Ciprian
May 5, 2022
1 min read

Updated: May 12, 2022

An enchancement that I wanted to add to the CNN Model for a long time now, was the ability to change the volume.

So 2 new commands:

"Increase the volume by 5!"
"Reduce the volume by 10!"

I have created all of these samples, for the 2 new commands, and also decided to increase the amount of overall samples to 1250 samples per class, and 15 000 total samples.

This results in 5 hours 33 minutes and 9 seconds of audio data.

Here is an infographic on the audio dataset used for the CNN model:

And so I created 2 different versions:

1) Version 1.0.64

Coefficients:

Feature Space Plotted:

Confusion Matrix:

Validation Set Plotted:

I have then changed the number of coefficients from 10 to 9, and reverted from integer weights to floats, as the model has less loss, and there is enough flash storage for it.

2) Version 1.0.65 - Final Version

Model Architecture & Training Settings:

Coefficients Plotted:

Feature Space:

Confusion Matrix:

Validation Set Plotted:

Device Performance:

Device Flash/Ram usage (the Xiao Sense with the I2C and ML models loaded as well):

Arduino Test:

The time it takes for the code to create the cepstral coefficients from the recorded voice is 310ms, the time it takes to classify is 26ms. Total processing 1986ms (with the 1650ms for recording the voice).

Here is the updated table with all the versions:

I have also added all 45 versions present in the table, with individual infographics, to their specific GitHub repository.

Available here: https://github.com/CiprianFlorin-Ifrim/CNN_Voice_Inference

Voice Recognition CNN - Yet another Enchancement

Recent Posts

Comments