Week 3: July 1 - 5
This week was a short one as we had most of the week off due to the July 4th vacation and my mentor being out of town for a conference. I made some progress on the research front, identifying some useful publications I could use for the final report. It is real annoying to encounter potentially useful articles behind paywalls, and DePaul doesn't have access to all of them! After some digging around I managed to locate several articles that was perfectly suited to my work, and here's a sample of some:
- Enhanced ELAN functionality for sign language corpora
- The application of annotation models for the construction of databases and tools
- New Multilayer Concordance Functions in ELAN and TROVA
- Unlocking Language Archives Using Search
My homework was to read those articles and see what other people have done to improve ELAN. I hope to discover useful nuggets of information in those articles that I could use to better understand ELAN's code. So far I have located a major flaw in ELAN's search system - it takes a long time to load the necessary EAF ( ELAN Annotation Format - XML based text files containing the annotations ) files and search through them. I'm pondering making adjustments so the search engine executes a multi-pass method to scan the files. That way, we reduce the load times and search through only the pertinent files instead of the entire corpora. Furthermore, the XML engine ELAN currently uses is very outdated and there exists newer XML parsers that provide an order of magnitude speed-up which could prove to be what we needed.