This week, I continued preparing the Boston Radio Corpus for TTS synthesis. The audio files and their transcriptions are split by paragraphs. We want them to be split by sentences, to work with shorter utterances. To do this, I wrote a script to process each file in a directory, check if it is a text file, and if it is, send each sentence in the
file to a file of its own. The sentences were placed in a new directory, and were identified by a speaker name, and sentence number. After this was complete, I checked the sentence files to see if they matched up correctly with the paragraph files. They did!
file to a file of its own. The sentences were placed in a new directory, and were identified by a speaker name, and sentence number. After this was complete, I checked the sentence files to see if they matched up correctly with the paragraph files. They did!