I am currently working in a research group that focuses on scientific workflow management in distributed environments. The main project in this group is the Pegasus workflow management system (http://pegasus.isi.edu). Pegasus was used as part of several NSF Information Technology Research projects: Grid Physics Network (GriPhyN), National Virtual Observatory, the Southern California Earthquake Center (SCEC) Community Modeling Environment, and others. Using Pegasus, earthquake scientists are able to generate more accurate hazard maps that can be used by civil engineers to design new construction in earthquake-prone areas. Astronomers use Pegasus to generate large-scale (6 and 10 square degree), science-grade mosaics of the sky that allow them to see structures not observed before. Gravitational-wave physicists are using Pegasus to run sophisticated analysis in the hopes of finding gravitational waves. Neuroscientists analyze 3D images of the brain in the hopes of understanding complex brain functions.
I am also exploring the new “cloud” technologies and how they can be used as an execution environment for a workflow system. As part of this work I will develop an interface between the Pegasus workflow management system and the cloud resources and characterize the performance of the workflow execution in such an environment.
I will be learning about workflow technologies and the challenge of managing data and computations on distributed resources. I will also help define a workflow for a new application from the bioinformatics domain. This will include familiarizing myself with the Pegasus software and the input format it needs as well as the understanding of how the application is set up and how it needs to be represented as a workflow.
I have immersed myself in Grid technology and the Pegasus project by reading as much as I possibly could about them. Just when I thought I had read all I could find, my mentor sent me links to more papers. I will start work on a project next week dealing with sequencing genomes.
The genome project is being put on hold until we can find a time to meet with the biologists. In the meantime, I will be trying to run workflows on the Nimbus Cloud (http://workspace.globus.org/clouds/) for the Montage astronomy application which delivers science-grade mosaics of the sky. I have been doing a lot more reading, and I have downloaded and installed the workspace cloud client from Nimbus, as well as the Pegasus software.
I have fired up my first virtual machine, but I am having trouble getting workflows to run. This week I have installed GridFTP from the Globus toolkit, GRAM pre-WMS server, Condor, xinetd, and the Pegasus worker packages on a virtual machine (VM) created from the image Globus-002 (from the cloud client workspace). The virtual image Globus-001 did not have enough space, so after consulting with Tim Freeman of the University of Chicago, he created a new one that would suit more needs. Each VM runs for a finite amount of time, specified by the user at deployment. A copy of the VM image must be saved before termination if any changes have been made. A simple certificate authority was also created on the virtual machine this week.
I tried to run the Pegasus tutorial that uses a simple Montage workflow on one virtual machine, but ran into problems with the format of the site catalog, replica catalog, and transformation catalog that Pegasus uses to plan workflows, as well as with the setup of a cluster. I had the tutorial running at the end of the week on one, two, three, and four virtual machines, so I should be able to start running workflows for Montage next week.
I got the virtual cluster working and was able to do a lot more runs to get better numbers for the paper. Now that I am home, all I have left to do is editing before I wait to see if my paper gets accepted.
I finished the paper, and now I am waiting to hear whether it got accepted to the SWBES workshop. Here is a link to my draft of the paper.