Personal Notes from Week 3

Monday, July 7, 2003

While doing research on XML, trying to figure out how exactly to best parse the XML files that Yi has put together and also how best to reflect the structure of the objects in C++, I came across a color example in his notes. From review, I think the validating parser that I found a while ago should work, but I am going to email Yi to ask which protocol (schema seems to be the word used for XML) I should use. I don't *think* it will matter terribly which one--I just want to stay away from the "experimental" schemas.
We had a meeting today with the 2D group where we discussed the different approaches that Linda and Sal have outlined. We are going to take the approach related to Yi's work. I need to get the XML files parsed as soon as possible so that we can start running tests to determine how good the structure detector actually performs. There is more on that topic outlined in the notes from the meeting. To do:

Meet with Sal and dicuss the Xerces parser--maybe he has a better suggestion on what to use?

Get a parser installed on Sal's system and figure out how to use it.

Read more in the XML book and see if there is anything useful about parsers in there.

Tuesday, July 8, 2003

I got my webpage up and online today! Submitted it to DMP. Now I will be able to access the links that I have compiled more readily, which will hopefully prove useful in the future.
Talked to Sal today. He didn't know too much about Xerces in particular; said he used a Perl parser when he had last parsed XML. Suggested that I get something installed on bigger-sal as soon as possible so that we can get tests going as soon as possible. Also said that I should just have the sysadmin install the parser if I am having trouble doing so. I untarred the parser, but it seemed kind of like a pain to build, so I emailed the sysadmin. Still waiting to hear back from him. Also, I asked Yi if Xerces was good to use, as it is a validating parser and such, and he said it should be fine as long as I wasn't trying to use an extremly old version. Yi also gave me his home phone number if I need to talk to him. I just have to keep in mind the time difference, since he is in Boston, if I attempt to call.
I got the CVS working for the 2D recognition group. It is in /proj/2d_recognition/repository. While on Sal's machine, everyone will have to go to their ~/.cshrc file and add a line to the bottom (there should be a place that says something about adding your preferences here) and enter the lines "setenv CVSROOT /proj/2d_recognition/repository/". I think everyone should also specify which text editor that they want to use when prompted for comments when commiting a file. For example, if someone wanted to use regular emacs as their text editor, then the next line that they could add to the file would be "setenv CVSEDITOR emacsclient".
When committing new file(s), you can specify the log message on the command line by using the -m option, thus suppressing the editor invocation. Ctr-x, Ctr-q to exit the log message in emacs. I need to start using the parser on the XML files that can be found in the following directory on bigger-sal: /data/programs/image-retrieval/for_sal/ .
Finally, I set office hours today, for Mondays and Wednesdays from 3-5. I have requested that if anyone plans on coming to my office to see me during those times, that they email me in advance as I do have a propensity for wandering off;).

Wednesday, July 9, 2003

I have been looking more at the XML book and think I have a basic understand of the differences between DOM and SAX, but I have emailed Yi to get validation and hopefully further support on how to go about parsing this data. I think that the Xerces parser that I want to use basically shoves all of the information from the document into some kind of tree-like structure, using this "document object model" (DOM). I am not quite sure how to access the data from there, or whether my understanding is accurate, but I should be able to figure it out. The sysadmin at the HITLab (Konrad Schroder) completed the compilation of all the Xerces stuff, so I should be able to use it at this point.
I also updated the links on my website such that all the links that I have found useful or will need to refer to are all conveniently located.
For reference, the location of the directory for the summer project webpage is cse/www/research/imagedatabase/summperproject . The group we are all in is called summer.
To do:

Look at the tar file that Sanjiv pointed us to with Corel images and see if they are useful. Assign to Clifford and Jenny to organize if necessary.

Thursday, July 10, 2003

I put together some C++ code for the data structures. Not done yet, but most of the header files seem to be done at least. The implementation of most of the methods should be fairly straightforward too.
I tried to find easier parsers to use online, but everything was either not validating, was the Xerces parser again, or would just be another big pain to figure out. So, I am going to stick with the Xerces parser. Tried to get in touch with Yi again, to ask him whether I should be using SAX or DOM, hoping he could elucidate the differences between the two, but I couldn't get ahold of him.
I also copied an example of a parser from the Xerces site, but it didn't compile. Konrad sent me a recommendation as to how to fix that. I will try that out tomorrow. I think it will be easier for me to just put together a Makefile with everything he said in it.

Friday, July 11, 2003

The group didn't meet today, probably because Linda is out of town--she should be back on Monday. I got a temporary cardkey from Jon today, so I think I should have 24-7 access to Sieg now. I am planning on coming in tomorrow and finishing up some of the necessary coding. I have emailed Sal about meeting with him early on Monday to go over how to run the bulk of pictures that he wants to run to see whether they get attributed structure. At this point, I am not sure if the two groups of pictures have been compiled yet. Hopefully he will be able to get back to me on those two counts.
After talking with one of the guys in my office today, it seems that using the DOM parser will be the easiest for what I want to get from the XML files. I think I have now found the part of the API that is useful for what I need. He said something about DOM having 7 different kinds of nodes or something, so I will need to figure that out, but it seems like less of a big deal right now.
Also, I tried to get into contact with Yi again today, but that didn't work out so much. I got an access number from Jon, but I don't think that actually works, and Lorraine left early. I talked with another woman named Melody Kadenko, who is located in 317, and might be able to help at some point, but it was too late at that time to call today. I think I will try to call him on my cell phone over the weekend, when my minutes are free.