Week 5
Week Goals
- Get the test code in working condition so
that it can run on comments or on the kidgab database.
- comment it and make it pretty.
- Urban dictionary stuff.
- Getting data from the website (write script)
- Test to see if the sentiment analysis works on these definitions to guess the sentiment of slang.
- Run-length encoding program.
- Set up dictionary for this?
- Look into other public kids websites stuff.
- Update meeting notes on wiki for meeting with Nick.
- Debug the activity problem to finish the ALL section of the activity feed?
- Start STUDYING for the GRE!
- Update my resume.
- Work on NSF grant/TAMU application?
Friday, June 29
- Fixed the bugs with the word class in the UD code.
- Now I'm going into the specific word's sites and trying to get a
list of the synonyms and the first, highest rated definition.
- Done.
- Used a combo of HTMLparser and BeautifulSoup (?!) to get it done. It took a while, but I did it.
- Saved all of the information to a CSV file (also wrote a program to copy a dictionary of Words to a csv file, but I'm not sure how to close the file afterward from the shell...).
- I don't think looking through youtube comments of childish videos will be very helpful because it's basically impossible to determine how old the people are who are commenting on the videos, and most of the time (regardless if the video is for a child), the users are older.
- I still need to look at educational game websites and myspace.com
- Got a neopets account and looked through some of the message boards. I don't think that there will be anything of the cyberbullying sort because of the strict rules in the forums, but it could be good for general sentiment.
- On one of the forums, I found this lovely message:
NeoBoard Index » Newbies Current Topic: this girl i know is really dumb she just got a new dog and she hates it because it's "annoying it's a boxer and it really is the sweetest little thing and her parents and her brother all love it but she keeps pestering them to get rid of it because it ticks her off seeing it around the house i'm just like lady water u doin
Thursday, June 28
- Read relevant, important papers (about online language, semantic
orientation/polarity, etc.)
- Gotta update the bibiography if I'm going to use them at all. (?)
- One of the papers talks about urbandictionary and how they used Turney's work to get the semantic polarity of slang based on the slang thesaurus (Multimodal social intelligence in a real-time dashboard system)
- Wrote HTML scraper to get UD definitions. Working on it!
- Changed the Word class to include a URL and definition
- Modularizing the code so that word and dictionary are in separate
files. Should really clean it up even more too.
- Now I can import the word class into my urbandictionary test code, etc. Some bugs going on... fix it tomorrow :)
Wednesday, June 27
- Found some interesting things:
- Corpus: http://webscope.sandbox.yahoo.com/catalog.php?datatype=l
- Papers on urban dictionary and language stuff (still have to read them thoroughly)
- Wrote run-length encoding function
- Gotta work more this weekend to make up for today.
Tuesday, June 26
- GRE class woohoo
- Worked on testing SentiWordNet dictionary as sentiment analysis (right now it's a list, I've been trying to change it to a dictionary, but because it's only doing one at a time, I don't think it really matters... since I'm not changing the WordNet data structure)
- Went to brown bag lunch at Blocker. Learned about getting funding for grad school and ate free food mmm
- Met with Dr. Hammond (notes)
- Power went out =O
- Typed up the notes from the meeting with Nick (here)
- Testing profane words in SentiWordNet. It looks like some are not given negative scores... which is bad.
- Fixed the bullying state plugin so its just state and the enum is correct and updated the DB.
- Fixed the cyberbullying check plugin so it's just last_id_checked (too vague?) and doesn't include time
- Removed time from the Python program
- Updated the main program to use SentiWordNet
- Testing it and it is really bad... I have to read through the dissertation to see if there's a better way than just adding up the negative/positive scores. Because right now it's not very accurate. Also wishy-washy words like guess or maybe come up negative.
- Creating my own dictionary may be a better idea.
- Cleaned up the Python code a bit.
- Updated my avatar. It's so cute! ^-^ Check it out on the about page.
- Looked at urbandictionary.com HTML code to figure out how I'm going to scrape the site.
Monday, June 25
- Updated my pictures a bit, 4 new ones :)
- Fixed the problem with the plugin from Friday. For whatever reason,
the table name just needed to be wp_bp_activity and not have the prefix.
- The plugin also sets the default value to 'none' and sets all of the statuses that were already in there to 'none'.
- Right now bullying is an enum with the four statuses, but I need to talk to Stephanie about exactly how we want it to work in the future, or if there should be different statuses. Also, should I not be calling the variable 'bullying' since we are also detecting general negative sentiment as well?
- Updated my Python program so that if it finds bullying/negative sentiment, it updates the bullying status of the message to "needs_review."
- Created dictionary of the profane words from the website (wrote scripts to edit them correctly)
- Created hash table (2 deep) for the dictionary
- Implemented detection with SentiWordNet in another program.