|
|
This page contains a week-by-week description of the progress of my project. For my final report, see the Final Report page. Week OneMy first week this summer was spent reading Dr. Brewster's Dissertation on Evolve IV. I officially received Evolve IV on our second encounter and spoke with Dr. Brockmeyer and Dr. Brewster about the details project. Dr. Brewster discussed possible performance optimization strategies on Evolve IV. We discussed lex and yacc and Dr. Brockmeyer loaned me her O'Reilly lex & yacc book. I thought that my summer would be spent learning lex and yacc. I immediately unzipped the files to get my first glimpse at Evolve IV. I was given some initial advice about using RCS (Revision Control System) and the check in and out procedures. My initial response at looking at Evolve IV was "Oh, my gosh, I have never worked on a program this big. Fifty-six files in total." It felt pretty overwhelming. I had no idea where I was going to start. My initial idea was to start looking at all the header files to see what files were interconnected. I decided to print out all the header files and look at them. I then expected that I spend hours figuring out what Evolve IV was doing. How was I going to optimize Evolve IV without knowing all the Chemistry, and Biology that was behind this whole project? After several discussions about this with a computer "geek", I realized that I was not supposed to understand all of Evolve IV, but my goal was to try to find ways to optimize Evolve IV. My first experience with Evolve IV was when I unzipped it and decided to compile it. I was unable to get it to compile. I went directly to the Makefile to see what might have been wrong. I was unable to understand the Makefile at all. It was the most complicated Makefile that I had ever seen. At my weekly meeting with Dr. Brockmeyer, I explained that I was having difficulty getting Evolve IV to compile and that I did not understand the make file at all. Dr. Brockmeyer provided me with a copy of a book on make. It was helpful, and after more reading and mentoring I was able to understand where this was going. Still, it would not compile. I went to see Dr. Brewster, who forgot to give me one crucial file, configdefs.sh. I got the file and ran make, and voila, Evolve IV was running. Dr. Brewster suggested that I use gprof, a tool used to capture profile trace data from program execution and list total execution times and calls for each function sorted by decreasing time. I met with Dr. Brockmeyer's Research Group. The group is comprised of several students that I have had classes with that are now pursuing their Masters Degrees and several TA's that are either pursuing their Masters or PhD. Dr. Brockmeyer introduced all of the students and each one explained what their research entailed for the summer. She then gave us several handouts and discussed the topic of summarizing articles for future use for writing a thesis. Dr. Brockmeyer also stated that several students would present information on their research or on given topics during our informal meetings throughout the summer. She told us that furniture for the new lab had been ordered and our desks would be arriving shortly. Week TwoThis week I spent time reading about gprof, prof and profil, as well as RCS. I tried playing with RCS and found a handout on the web that gave step-by-step instructions on how to use RCS. I compiled and ran Evolve IV overnight and realized that Evolve would continue running forever if I did not kill it. I then ran my first gprof against it. I spent time varying execution times from 11 seconds to 30 minutes to determine if the data I was seeing through gprof would remain constant and predictable. I analyzed as much of the data from the different time intervals and concluded that the data pointed to vector.h. (The following information is taken from prof11sec on thalia). Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 35.91 3.71 3.71 76634493 0.05 0.08 Vector<float>::operator[](int) const 19.85 5.76 2.05 76634702 0.03 0.03 Vector<float>::rangecheck(int) const 6.87 6.47 0.71 390391 1.82 1.82 Vector<float>::Vector(Vector<float> const &) The information that I obtained by using gprof indicated that the Vector index operator was clearly the top function that was being called. After looking at the vector.h file I decided that I would attempt to implement the Standard Template Library vector to see what effect it would have on execution time. I also read the paper on Ocean Store that Chunbo Chu will present next week. http://www.cs.berkeley.edu/~ravenben/publications/pdf/ieeeic. The furniture that Dr. Brockmeyer had ordered arrived at the lab this week. I will spend some time next week setting my computer up at my new desk. I have been working from home and from the computer science lab telneting to thalia. Week ThreeThis week I spent working on the vector.h file to use Standard Template Library vectors, in the belief that STL containers would be optimized. I used my Standard Template Library book, STL Programming from the Ground Up, by Herbert Schildt, along with looking at the /usr/include/g++-2/stl_vector.h source file to determine how it expected vector.h to be called. I tried to call the Standard Template Library vector as a base class to the vector.h constructors. This caused me numerous problems when compiling Evolve IV. I struggled with this idea for the entire week. I met with the research group this week and Chubo Chu gave an informal discussion on Ocean Store. Week Fourvector.h still continued to be a source of difficulty. I continued my effort with the Standard Template Library vector. I had little success in working this out. At the end of week four, I backed out the changes that I had made on the constructors. I continued my efforts with the vector index operator. The STL header file stl_vector.h prompted me to change the index operator from T& operator[] (int i) { return data[i]; } to const T& operator[] (int i) const { return *(data + i); } because I thought the pointer arithmetic might be more efficient than the equivalent array index reference. This had a negligible effect on performance. At the weekly research group meeting, Anne-Marie Bosneag gave an informal presentation of her prospectus. Week FiveI ran Evolve for three and four days to verify that the data that I collected remained consistent. This following is taken from 4daybaseline on thalia: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 13.95 5851.29 5851.29 53985569 0.11 0.11 Vector<float>::Vector(Vector<float> const &) 12.78 11211.93 5360.64 3078445849 0.00 0.00 Vector<float>::operator[](int) const 12.13 16299.89 5087.96 1918455585 0.00 0.12 Material::operator-(Material const &) const 10.08 20529.33 4229.44 3946342941 0.00 0.00 Material::Material(float) 9.83 24655.62 4126.29 1386188969 0.00 0.00 Reaction::equilibriumP(Material &) 9.36 28580.99 3925.37 1906634069 0.00 0.12 Material::operator*(double) const 6.58 31343.95 2762.96 3078446058 0.00 0.00 Vector<float>::rangecheck(int) const After my changes in the vector index operator, I started looking at the next contenders. There did not seem to be any opportunity for additional improvement in the top few routines. After looking at Material operator- and Material::Material, I decided to look at Reaction::EquilibriumP. This routine, in material.h, contained the code fragment: if((*this)[m] > 0.0) fwd *= pow(conc(milieu[m]), ((*this)[m])); else if((*this)[m] < 0.0) rev *= pow(conc(milieu[m]), (-(*this)[m])); I wondered if the compiler would have any easier time doing logarithmic math instead of the power function. After some research in this area, and a conversation with Dr. Brewster, I decided not to pursue the option, as C++ handled the math efficiently. Dan Frank gave an informal presentation on NS2, the network simulator. Week SixAlthough last week's work on Reaction::EquilibriumP did not lead to any promising results, I looked at the conversions that were happening between the float and the double in this function. It appeared that there may have been some wasted time on the conversion from floats to doubles. I changed the typedef float Material_t; in defs.h to be a double instead of a float. At this point, I also noted that there was no optimization turned on at the compiler level! With help from man pages, I was able to test out several of the options and ran gprof again to find that the profile had changed completely. The following is from 1000loops on thalia: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 52.77 230.37 230.37 7721011 29.84 43.01 Reaction::react(double, Material &) 23.30 332.10 101.73 21149483 4.81 4.81 Reaction::equilibriumP(Material &) 21.23 424.78 92.68 368941 251.21 1154.43 Organism::live(void) 1.18 429.92 5.14 283037 18.16 18.16 Organism::calculateEfficiencies(void) 0.50 432.12 2.20 36676 59.98 59.98 Material::totalDecompose(void) At least I saw the fruits of my labor this week. I have some reading for the Research group. Dr. Brockmeyer will have an informal presentation of some papers on pervasive and ubiquitous (even disposable) computing. http://www.ubiq.com/hypertext/weiser/SciAmDraft3.html
Week SevenI spent the rest of this week working on the verification that I was getting accurate run data. I would need to make sure that I could run Evolve IV in a manner that would produce consistent results. I made an adjustment to random.C, more specifically: z1_Rnd::z1_Rnd( long s ){ seed = -20010; instead of initializing seed to the parameter s to ensure that program execution would no longer be random. I started running Evolve IV wrapped in the UNIX time command so that I could obtain overall execution statistics in the log: %Uuser %Ssystem %Eelapsed %PCPU (%Xtext+%Ddata %Mmax)k %Iinputs+%Ooutputs (%Fmajor+%Rminor)pagefaults %Wswaps This gave me information on overall program performance, in addition to the relative breakdown provided by gprof. I passed the year option 0.019, which was determined empirically, so that Evolve IV would loop approximately 1000 times in a reasonable amount of elapsed time. (time ./evolve -c L11 -t86400 -y0.019) I set up separate baseline and faster (my optimistic naming convention :-) ) directories. This is where I stored all run logs, so that I could compare them. I thought that this was good house keeping. 278.90user 0.82system 4:41.45elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (240major+1720minor)pagefaults 0swaps Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 44.66 90.04 90.04 7584610 0.01 0.02 Reaction::react(double, Material &) 29.35 149.20 59.16 361224 0.16 0.53 Organism::live(void) 21.16 191.85 42.65 20654354 0.00 0.00 Reaction::equilibriumP(Material &) 209.41user 1.00system 3:31.74elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (238major+2394minor)pagefaults 0swaps Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 36.26 47.77 47.77 7609126 6.28 10.79 Reaction::react(double, Material &) 31.10 88.75 40.98 362445 113.07 341.11 Organism::live(void) 26.08 123.11 34.36 20561649 1.67 1.67 Reaction::equilibriumP(Material &) I was very pleased with 26% run time reduction for my float to double change! The papers were very interesting. I enjoyed the discussion our group had on this topic. Week EightOrganism::live() is still biggest time per call, but this is a very large routine. Investigated gprof options to get CPU utilization breakdown by source line within the routine. No satisfying results. I met with Dr. Brewster to discuss this routine which did not give me any promising prospects. I tried running gprof with -l option to enable line-by-line profiling g++2.prof on buzz. Although the information was interesting it did not prove useful in my search for modifications. I tried several more combinations of options to see if I could obtain useful information, with no further results. I found that a new release of the GCC/G++ compiler had come out with notes documenting performance enhancements. I upgraded the compiler and found that library changes required a number of relatively minor editorial changes throughout the program so that it would compile successfully. However, gprof indicated the recompiled code was running slightly slower g++3.1.log and g++3.1.prof. I thought that this was strange. 221.68user 1.61system 3:49.68elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (299major+2403minor)pagefaults 0swaps Each sample counts as 0.01 seconds.% cumulative self self total time seconds seconds calls ms/call ms/call name 36.71 56.97 56.97 7609126 0.01 0.01 Reaction::react(double, Material&) 21.71 90.66 33.69 362445 0.09 0.40 Organism::live() 17.96 118.53 27.87 20561465 0.00 0.00 Reaction::equilibriumP(Material&) During my weekly meeting with Dr. Brockmeyer and Dr. Brewster I shared my findings. I did not have any other ideas of what to modify next. I was unable to find anything to fix. I contacted Dr. Brewster asking for any suggestions. In my weekly meeting with Dr. Brockmeyer I shared my frustration, a which point she made a suggestion on some reading I might do. I went to a bookstore with a friend that weekend with no real intention of looking at books on the subject. While waiting for my friend, I wandered over to the Computer section of the store to find a book called Efficient C++ Performance Programming Techniques by Dov Bulka and David Mayhew. I opened it and looked at the chapter on Single Threaded Memory Pooling. It seemed that this might help me get past my slump. Dr. Brockmeyer at the same time was out hunting for some reading for me as well. She obtained Effective C++: 85 Specific Ways to Improve Your Programs and Designs by Scott Meyers. I now had reading to do in with a new avenue of optimization to pursue. Our UPS units arrived for our computers this week at the computer lab. Week NineI spent much of week nine reading Efficient C++ as well as the Effective C++ book looking for more possibilities on where more optimization might be obtained. I spent the entire week on this. Week TenI performed some experiments on Evolve IV to see how much memory the vector constructors were allocating. I added cerr statements that traced which constructor was being called and the space being allocated (or released by the destructor). Although, I saw that several of the constructors were being called, v2.log showed the size remained the same, 336 bytes. I met with Dr. Brockmeyer and Dr. Brewster to discuss my findings. Dr. Brewster confirmed my results and stated that Evolve IV would always allocate that same amount no matter the duration of the run. This allowed me to focus on implementation of an optimized fixed-size memory pool for vector objects. The information that I obtained from the Efficient C++ book explained that the global new() and delete() functions are very expensive functions, the flexibility that they provide comes at the expense of speed. It recommended to maintain a static linked list, when needed, memory would be obtained from the free list. Once finished with it, it would be returned to the free list for future allocations. This week along with the above efforts, I started on the implementation of fixed-size memory pool. Week ElevenI continued my efforts with the memory allocation implementation. I was extremely pleased when I finally got my newly implemented memory pool allocation. It compiled successfully without error. I immediately looked at top to see if Evolve IV was running. It was not. I looked at the 1kalloc.log file to determine that I had encountered a problem with peptidestring.C. While working with a Linux compiler sometimes has left me wishing for the Windows versions (Code Warrior or Borland C++), I tried the ddd debugger. I worked on getting that installed on buzz and started learning to work with it. I really started to feel that I was able to get more meaningful error messages from a UNIX debugger!! What a thrill. UNIX compilers have never been as helpful as their Intel counterparts until now. p failed at: assert(q <= &(pstring[length])); where the segmentation fault occurred. I found that not all vectors are created equal. More work on my memory allocation was necessary. After examination, I determined that the peptide was derived from vector and chromosome and was too big to fit in the size vector I had allocated. I made new and delete virtual in hopes of providing peptide-specific versions, but that did not work; it was not allowed. Week TwelveI had the opportunity to run Evolve IV on wombat, a system that is used by a System Administrator at work for internal projects. A computer geek extraordinaire set me up an id to run my project on this system. I spent several hours, after work, making minor editorial changes need when attempting to compile on a new platform. I chose the alpha architecture because I had heard that the math capabilities have been said to be superior to the Intel chip. alpha1.prof and alpha2.log indicated that it actually ran slower than the Intel baseline execution time. alpha1.prof on wombat: granularity: each sample hit covers 8 byte(s) for 0.00% of 463.37 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 44.3 205.38 205.38 7872789 0.03 0.04 _ZN8Reaction5reactEdR8Material [4] 22.3 308.94 103.56 374347 0.28 1.21 _ZN8Organism4liveEv [3] 17.2 388.74 79.79 21511360 0.00 0.00 _ZN8Reaction12equilibriumPER8Material [5] 7.9 425.39 36.65 42781902 0.00 0.00 _ZN6VectorIfEC2Eif [6] real 581.5 user 578.9 sys 2.9 I rewrote my memory allocation routine to allocate blocks big enough for all derived vector classes. The new size was determined empirically by increasing the size of pool blocks until no allocation failures occurred. There was no detectable improvement in performance. However, I had not changed the process for allocating the data array created for each vector by its constructor. Week ThirteenI finished rewriting allocations on buzz. I hard coded the size to be a fixed size with the newdata and deletedata functions to take care of allocation of the data to which the vector points. This showed only slight improvement memimprov1.log and memimprov1.prof shows a 16% improvement in execution time. I ran this again to verify that my data was repeatable. memimprov2.log and memimprov2.prof showed that it was repeatable. 204.83user 3.16system 3:32.69elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (298major+2754minor)pagefaults 0swaps Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 41.20 56.24 56.24 7609126 0.01 0.01 Reaction::react(double, Material&) 20.48 84.20 27.96 20561465 0.00 0.00 Reaction::equilibriumP(Material&) 11.04 99.27 15.07 362445 0.04 0.35 Organism::live() 172.56user 1.55system 2:55.62elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (299major+2757minor)pagefaults 0swaps Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 41.23 55.20 55.20 7609126 0.01 0.01 Reaction::react(double, Material&) 19.55 81.37 26.17 20561465 0.00 0.00 Reaction::equilibriumP(Material&) 11.19 96.35 14.98 40775834 0.00 0.00 Vector<double>::Vector[not-in-charge](int, double) 10.62 110.57 14.22 362445 0.04 0.34 Organism::live() This shows a 16% reduction in execution time. Yeah! Week FourteenI learned why programmers hate writing documentation. Website construction has been a challenge. Gathering all data and formatting it has been a painstaking process.
|
|