I learned a lot this week! The data produces from the synthesizer exceeded the memory limits on my machine, so we transferred the files to Erica's computer. That machine ran out of room too, so we then moved everything over to a newer and faster machine with a greater memory capacity. I ran the demo again on that computer, and it hit an error with an unknown label file. Erica is now debugging the problem. Meanwhile, I started a new project; scraping websites for low-resource language data. We are interested in collecting text and audio
information for Telugu and Lithuanian, to build a synthesizer for these languages. The first step towards this goal is finding useful websites. I found a few sites that looked interesting and user-friendly. Then, I learned how to scrape the web using Python's Beautiful Soup library. It is fairly straight-forward, and isn't as difficult as I expected it to be. With this tool, I was able to write a script to extract Lithuanian terms and definitions from a cooking website. I wrote this data to a file. I started working on the next site, which posts stories in Telugu, but I got an error when I ran the code to read the page and print its contents. I was denied access, and therefore couldn't scrape their website. I moved on to a different source - a blog with posts, comments, and links in Telugu. Now I am writing a script to pull out all the information from this site that will be helpful for our purposes.
information for Telugu and Lithuanian, to build a synthesizer for these languages. The first step towards this goal is finding useful websites. I found a few sites that looked interesting and user-friendly. Then, I learned how to scrape the web using Python's Beautiful Soup library. It is fairly straight-forward, and isn't as difficult as I expected it to be. With this tool, I was able to write a script to extract Lithuanian terms and definitions from a cooking website. I wrote this data to a file. I started working on the next site, which posts stories in Telugu, but I got an error when I ran the code to read the page and print its contents. I was denied access, and therefore couldn't scrape their website. I moved on to a different source - a blog with posts, comments, and links in Telugu. Now I am writing a script to pull out all the information from this site that will be helpful for our purposes.