This week, we did more experiments. Not much to say about that that I haven't said before, and I'm not allowed to say much about unpublished methods, anyway. Lots of waiting for results was involved!
I got to debug code! I was given a parsing tool developed by someone else, data and documentation. It wasn't working, even though it was supposed to be, given the documentation. I got to run my own experiments, and do whatever I wanted with it, to try and find out what the probleem was. That was a vey educational experience, solving the practical problems that come with research. I did find out a few things about it, such as the fact that it crashed on blank lines in the input.
Then I worked around the limitations to try and make it do what we needed it to do, anyway. That took a lot of time, since I had to split up the large data file into small chunks, make sure those chunks had no special characters, run the parser on them individually, and then stitch them back together. THEN, of course, we discover that there's a tokenise option, that works on the whole dataset (and I have to spell it "tokenize" because that's the American way of spelling it. I've lost count of the times I've debugged only to find that my British spelling was the bug.)