In this blogpost I will be talking about the correct audio format of audio signals which will be used for machine learning/deep learning.
You might have come across the following scenario; when you work on a project, you might face a problem of having a very limited budget, in this case, you would want to utilise every resource you already have available to achieve the desired task. The advancement of technology enabled us to carry some of these tasks using a device in our pockets which is equipped with many sensors, cameras, microphones, etc. Yes, ladies and gentlemen, I talking about our smartphones! Our smartphones have been providing a great aid in audio, video and photo capturing. Having said that, if you are planning to record an audio for machine learning/deep learning purposes why not use your phone.
To do so, there are a couple of things which should be taken into consideration when using audio files recorded through your phone.
Firstly, your phone records audio in a certain format such as mp3 or m4a with stereo/mono channel, a sampling rate of 44.1KHz and a bitrate of 64Kbps. However, in order to use an audio file in machine learning/deep learning or to convert these audio files to spectrograms a conversion to the format of these files should be made in order to have the best results.
Lets explore the desired specs of the audio file
The sampling rate should be converted to 16KHz.
The audio channel should be in mono-channel format (a single channel).
A suitable bitrate for the audio file is 16bits/sample
There are many free websites that will be able to convert an audio file into the desired format, however, I don’t find them to handy when you have a very large amount of audio files to be converted, a typical scenario in machine learning/deep learning.
If you are using a Linux/Unix system, I would suggest using FFMPEG, its a free tool which can be used in the terminal directly or you can create a bash script to loop through all audio files in a folder for example. Here is the command which will convert any audio file to an audio file with the specs mentioned above:
ffmpeg -i INPUT_AUDIO_FILE_PATH -ar 16000 -b:a 16000 OUTPUT_AUDIO_FILE_PATH
There you go, a cheap and easy way of recording and formatting an audio files for machine learning/deep learning purposes. If you have any questions or suggestions feel free to leave them in the comment section below 🙂
Have a great day!