Voice Recognition - Final Model - Extra 2 Voice Lines Added

Ifrim Ciprian
May 8, 2022
2 min read

Updated: May 16, 2022

While testing the full system it seemed to me that certain voice outputs were quite long resulting in up to 50 seconds of audio. So I decided to add 2 extra voice lines and separate the physics of the environment from the regular conditions such as temperature, humidity and pressure.

Furthemore, I have removed the environment adverse effects on the wearer's health from the health report voice command to a separate one.

The total voice commands are the following:

1. How is it going to be today? = ml weather classification, ml rainfall regression;

2. What are the environment conditions? = Temp, temp in Fahrenheit, feels like temp, humidity, relative humidity, absolute humidity, barometric pressure, sea-level pressure;

3. Details about my location = elevation, compass heading, compass pitch, compass roll, gravity vector, geomagnetic rotation vector;

4. Count the number of steps done = total steps, since last command steps, calories burned (kcal, kj), distance travelled;

5. Tell me the current time and date! = time, date;

6. Present the health report! = Heart rate, skin temperature, heat index, discomfort index, pressure effect;

7. AURI describe yourself! = project description;

8. Do you know anything about the clouds? = clouds altitude, clouds temp;

9. Thanks for the info provided! = conversation;

10. Update me on the battery level = battery level, battery percentage, battery voltage, battery faults;

11. Reduce the volume by 10! = voice output volume change;

12. Increase the volume by 5! = voice output volume change;

13. Identify the atmosphere physics! = saturation vapor pressure, water vapor pressure, dry air pressure, air dew point temperature, visible light, infrared light, UV index;

14. How about the air quality? = air quality, volatile organic compounds, carbon dioxide;

And so I created new samples:

1) IAQ Samples

2) Physics Samples

With the following statistics for the dataset:

And the coefficients map:

And Scatter Plot:

Which resulted in 2 new CNN models:

1) The first model with 99.7% accuracy and 0.02 loss, however, it does not accept any silence at the beginning of the recording.

2) The second model results in 99.6% accuracy with 0.02 loss, however, it has better accuracy with background noise and allows for more silence at the beginning of the voice recording allowing for lower reaction times.

This last model represents version 1.0.67 and the last model. Here is the final table of all versions:

Voice Recognition - Final Model - Extra 2 Voice Lines Added

Recent Posts

Comentários