It’s been another two weeks of “coding for decoding” for me.
What did I manage to do since my last post?
Things look really good by now, I’m making constant progress. A very important thing I managed to implement is collecting adaptation data from a Result object. I also implemented the part that uses this data for creating the adaptation file.
I must say that after seeing in the first weeks that the algorithm which estimated the transform was working well, but reading counts from a file generated with sphinxtrain, this was the next big step, to collect counts from the result of the first decoding process. This task is the part that took me the most by now.
I also implemented a component that based on the adaptation data creates a new means file. This is equivalent to having a new adapted acoustic model that will decode better if used with audio files containing speech of the persons that adaptation was made for.
You can see my work at:
- https://github.com/bogdanpetcu/sphinx4/tree/master/sphinx4-core/src/main/java/edu/cmu/sphinx/decoder/adaptation - adaptation package that I implemented from scratch.
- https://github.com/bogdanpetcu/sphinx4/commits/master - here you can see all of my commits
Have a great week!
My name is Georgiana Chelu and this summer I am working at CMU Sphinx.
CMU Sphinx is a great open source toolkit that uses speech recognition. The idea of controlling a device with your own voice is pretty amazing! I was very excited to find out that I will work on this project.
The process behind voice recognition is quite complex and you need time to get familiar with lots of new concept. In the first two weeks we had to read the documentation and understand the code, little by little.
An important step before starting to code is to create a setup, where you can test all your file modifications. You will write code easier and prevent most of the bugs. I’ve created a setup that gets us accuracy numbers, the adaptation matrix and other important information. We work with a lot of data, especially sound records. So, I wrote it some bash scripts that make the setup easier to use.
Now, I am ready to move to the next step: writing the actual code of the new feature!
My name is Bogdan Petcu and I am working within this year’s RSoC program at CMU Sphinx.
CMU Sphinx is a toolkit used for building applications that use speech recognition. Our aim for this summer is to implement a module in Sphinx4 that adapts the data that is used for decoding so that recognition process has better results.
For decoding with Sphinx4 you could use a general acoustic model (e.g for English language), but if you want to decode audio files that contains speech of non-native speakers, or if the recording environment has background noise you would want to adapt this general acoustic model for those particular speakers so that the decoding process is more precise. Currently adapting an acoustic model requires using the sphinxtrain tool provided by CMU Sphinx. Using sphinxtrain requires building manually some files ( recordings, their transcription etc.).
Our aim is to make Sphinx4 adapt the acoustic model by itself using information from the first decoding and after that redecode with the adapted model improving the recognition process.
Until now I implemented a component that collects adaptation data and another component that based on the collected adaptation data builds a specific file containing the transformation that will be applied to the acoustic model in order to adapt it.
Github repository: https://github.com/bogdanpetcu/sphinx4