Step 1: determine up to 5 speech commands to be recognized. |
Step 2: add 20 1-second audio samples for each speech command, and label them. |
Step 3: train the model, and check training accuracy. |
Step 4: add audio samples, and check predictions. |
Step 5: re-train the model with more audio samples. |
[Optional] Step 6: download the model, and use it on ESP32 device. |
[Optional] Step 7: re-train the model with samples captured on device. |
Model training is based on TensorFlow.js in browser. You audio samples are not saved to any server.