Final Report

Evolve IV Accomplishments

	Instrumentation of the code for analysis so that there was identical random seed for each execution.
	Changed Material_t from float to double to avoid unnecessary conversions.
	Rewrote vector memory allocation to avoid memory thrashing.
	Overall impact is 38% reduction in execution time on buzz.

Lessons Learned

I have learned so many things this summer.

I should have kept a running journal of all my prof and log files. Although my naming conventions helped me find them all. I should have kept a written log of them as well. It would have been helpful during the creation of this website.

Program comments. I have been lectured about this by professors and friends. This is so true. It is impossible to know what the author was thinking unless there are ample comments. This is in no way a slight on the author of this program, I believe that his ten years spent with the creation and implementation of this project allowed for his familiarity with it.

Memory management is difficult. It is not something that I worked with up to this point. Without the aid of my books and mentors I had no idea where to start.

I should have spent more time working with RCS. I know that it may have made some things easier for me in the long run. I did not want to spend the effort on this when I felt that I had so much to accomplish with this project.

Research can be very rewarding; it can also be very frustrating at times. It has been a wonderful experience.

Mentors are all around us. I have said this previously but it was something that really came to light this summer.

Suggestions for future research

I would have liked to explore several suggestions that were discussed in the Efficient C++ book or raised by work performed this summer.

Redundant construction is cited as a simple but costly coding mistake in the implementation of constructors with double construction of contained objects. This could be easily explored. I spent all of my time working on the single threaded memory management that I did not go back and work on this.

Return Value Optimization (RVO) may be of some interest. Return-by-value implementation can eliminate the local return value objects, saving construction as well as destruction computation. I wanted to go back and determine if any functions were returning classes by value and consider implementation of computational constructors.

Check other classes for dynamic memory allocation and rewrite them like vector.h.

Investigate Organism::Live more carefully.

Running the same version of Evolve on different platforms does not yield identical results. Additionally, the optimized version does not produce output identical to the baseline version. While rounding differences probably account for these discrepancies, a more careful investigation to confirm this is recommended.

Architectural Differences

At the completion of my work, I re-ran the baseline and latest optimized versions of Evolve on as many different platforms as I could.

buzz baseline (baseline1kloop.log and baseline1kloop.prof)

278.90user 0.82system 4:41.45elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (240major+1720minor)pagefaults 0swaps

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 44.66     90.04    90.04  7584610     0.01     0.02  Reaction::react(double, Material &)
 29.35    149.20    59.16   361224     0.16     0.53  Organism::live(void)
 21.16    191.85    42.65 20654354     0.00     0.00  Reaction::equilibriumP(Material &)
  2.63    197.15     5.30   277506     0.02     0.02  Organism::calculateEfficiencies(void)
  0.56    198.27     1.12    36673     0.03     0.03  Material::totalDecompose(void)

buzz optimized (memimprov2b.log and memimprov2b.prof)

This executed in 38% less elapsed time than the unoptimized version.

172.29user 1.61system 2:54.47elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (299major+2757minor)pagefaults 0swaps

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 39.97     52.97    52.97  7609126     0.01     0.01  Reaction::react(double, Material&)
 20.25     79.80    26.83 20561465     0.00     0.00  Reaction::equilibriumP(Material&)
 11.63     95.21    15.41 40775834     0.00     0.00  Vector<double>::Vector[not-in-charge](int, double)
 10.59    109.25    14.04   362445     0.04     0.34  Organism::live()
  4.87    115.70     6.45 12865914     0.00     0.00  Vector<double>::Vector[not-in-charge](Vector<double> const&)

sparky baseline (sparc-base.log and sparc-base.prof)

Note the output of this case appears to be substantially different (longer) than the other runs.

real    16:19.5
user    15:42.1
sys        18.6

granularity: each sample hit covers 4 byte(s) for 0.00% of 930.26 seconds

   %  cumulative    self              self    total          
 time   seconds   seconds    calls  ms/call  ms/call name    
 43.9     407.92   407.92                            internal_mcount [3]
 16.6     562.03   154.11  7872789     0.02     0.04  _ZN8Reaction5reactEdR8Material [5]
  7.4     631.11    69.08   374347     0.18     1.08  _ZN8Organism4liveEv [4]
  6.9     695.63    64.52 21511360     0.00     0.00  _ZN8Reaction12equilibriumPER8Material [6]
  5.6     747.44    51.81 42783706     0.00     0.00  _ZN6VectorIfEC2Eif [8]

sparky optimized (sparc-fast.log and sparc-fast.prof)

Due to the differences in output, it does not appear reasonable to calculate a performance improvement for this platform.

real     2:07.9
user     1:26.6
sys        40.1

granularity: each sample hit covers 4 byte(s) for 0.01% of 126.04 seconds

   %  cumulative    self              self    total          
 time   seconds   seconds    calls  ms/call  ms/call name    
 31.6      39.87    39.87  1311804     0.03     0.06  _ZN8Reaction5reactEdR8Material [4]
 18.9      63.68    23.81                            internal_mcount [6]
 16.6      84.55    20.87  2733415     0.01     0.01  _ZN8Reaction12equilibriumPER8Material [5]
 10.0      97.11    12.56  9930066     0.00     0.00  __pow [7]
  5.8     104.46     7.35    50454     0.15     1.92  _ZN8Organism4liveEv [3]

wombat baseline (alpha-base.log and alpha-base.prof)

real   445.5
user   442.5
sys    3.1

granularity: each sample hit covers 8 byte(s) for 0.00% of 329.16 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 41.0     135.11   135.11  7872789     0.02     0.03  _ZN8Reaction5reactEdR8Material [4]
 20.4     202.19    67.08   374347     0.18     0.85  _ZN8Organism4liveEv [3]
 16.8     257.50    55.31 21511360     0.00     0.00  _ZN8Reaction12equilibriumPER8Material [5]
 11.2     294.27    36.78 42781902     0.00     0.00  _ZN6VectorIfEC2Eif [6]
  4.7     309.79    15.52 13300626     0.00     0.00  _ZN6VectorIfEC2ERKS0_ [7]

wombat optimized (alpha-fast256.log and alpha-fast256.prof)

Due to the increased size of pointers on this architecture, it was necessary to increase the size of the vector memory blocks from 128 to 256 bytes to avoid allocation failures. There was a 17% reduction in run time compared to the baseline run.

real   368.0
user   364.9
sys    3.2

granularity: each sample hit covers 8 byte(s) for 0.00% of 285.35 seconds

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 45.0     128.38   128.38  7765707     0.02     0.03  _ZN8Reaction5reactEdR8Material [4]
 17.7     178.93    50.55 21180824     0.00     0.00  _ZN8Reaction12equilibriumPER8Material [5]
 12.2     213.88    34.95 42131613     0.00     0.00  _ZN6VectorIdEC2Eid [6]
 10.8     244.83    30.95   370487     0.08     0.74  _ZN8Organism4liveEv [3]
  6.0     262.01    17.18 13142367     0.00     0.00  _ZN6VectorIdEC2ERKS0_ [7]

thalia baseline (intel-base.log and intel-base.prof)

330.41user 0.35system 5:30.71elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (218major+1685minor)pagefaults 0swaps

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 45.98    123.92   123.92  7584610    16.34    25.08  Reaction::react(double, Material &)
 25.71    193.22    69.30   361224   191.85   720.92  Organism::live(void)
 24.61    259.55    66.33 20654354     3.21     3.21  Reaction::equilibriumP(Material &)
  1.90    264.67     5.12   277506    18.45    18.45  Organism::calculateEfficiencies(void)
  0.53    266.09     1.42    36673    38.72    38.72  Material::totalDecompose(void)

thalia optimized (intel-faster.log and intel-faster.prof)

253.18user 0.40system 4:13.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (262major+2725minor)pagefaults 0swaps

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 51.45    126.05   126.05  7609126    16.57    25.43  Reaction::react(double, Material &)
 27.54    193.52    67.47 20561649     3.28     3.28  Reaction::equilibriumP(Material &)
 16.65    234.31    40.79   362445   112.54   649.17  Organism::live(void)
  2.29    239.91     5.60   278418    20.11    20.11  Organism::calculateEfficiencies(void)
  0.50    241.14     1.23    36670    33.54    33.54  Material::totalDecompose(void)

There is a 23% reduction in execution time on the Intel Pentium II platform.

Final Thoughts

This was an interesting process. I enjoyed Dr. Brockmeyer's research group and the insight they provided on their ongoing work. I have known almost all of these students as undergraduates and have enjoyed getting a different view of their research. I found them all very welcoming. Dr. Brockmeyer has been a wonderful mentor. She is the only full-time woman in the Wayne State University Computer Science Department that I have had as a professor. I have found her to be a very encouraging and caring professor. I met Dr. Brewster for the first time this summer; he took time to talk with me when I had pressing questions about Evolve IV. I enjoyed working with him. I found Dr. Brewster to be very humble about this large complex project. I truly admire the creativity that went into his project. As he wished in his dissertation, he has provided inspiration, encouragement and enthusiasm to me.

Thank you for a wonderful experience.