About Project: PubMed

PubMed Article Classification

PubMed is a large database of the abstracts for medical articles primarily used by doctors and other medical officials. Due to the detailed nature of medical research, even the most relevant documents to a search may not apply to the particular case, which may not be apparent from the abstract alone.

  • Final goal: Create an automated snippet from the article body, allowing a searcher to decide the article's relevance to their needs.
  • Summer goal: Train a classifier to predict important sentences for the snippet using supervised machine learning.
  • Learning goals:
    • How does a classifier see a document?
    • How else might it be represented?
    • How do we handle large data sets?