CRA-DMP Experience

Shoshana Neuburger
CRA DMP Experience
Summer 2003

Final Report
Progress Report

My Research Journal:

Week 1
To begin my research project, I did a lot of reading on random number generators. Once I understood the usefulness and functionality of random number gerators, I began reading about statistical testing for these utilities. I constructed the simple Equidistribution Test and used it on first the random number generator inherent to Java and then the shuffled nested weyl generator, the focus of my research project. I was introduced to gnuplot, which is a useful tool in plotting the results of any test.

Week 2
My work this week began a lot more smoothly since I already have accounts on the computer systems and I am accustomed to programming in the Unix environment, a new experience for me. This week I programmed more significant tests on the random number generator, the chi-square test and the KS test. The Kolmogorov-Smirnov test was based on the writings of D. E. Knuth. I attended my first weely meeting with other faculty members and students actively involved in computer science research this summer. It was great to see and hear what other students are doing. I hope to gain lots of useful knowledge from these meetings.

Week 3
This week's programming work began with attempting to perform the KS test on the results of several chi-square tests on different sets of numbers. This algorithm is quite complex and involves other functions, such as the gamma function. Instead of writing the gamma function on my own, and "reinventing the wheel", I downdoaded a copy of the gamma function written in Java from the Internet. I revised this function to suit my purposes. This test did not produce the expected results, so I ended up rewriting the gamma function in case it was the culprit. This effort was soon proven futile. Next, I programmed an alternate algorithm for the same test. Once these basic statistical tests worked properly, I was finally able to begin programming the tests on parallel number generation.

Week 4
This week began with frustration. Tests that were supposed to output data heading towards 0, went to infinity. Finally I realized that I had an integer overflow. After correcting this, I plotted my data and found that the graphs did not quite resemble the theoretical plot. I followed Dr. Whitlock's advice and plotted the average of data produced by several sequences of numbers. One exciting discovery: After trying these tests on java's generator, I realized that they produced different results each time, unlike the generator I normally test. Now that I'm getting used to researching on the Internet, I found documentation that explained that different seed values are used each time, based on the current time. No wonder the same test gives different results a few seconds later! Next, I began generalizing the parallel tests, comparing threee sequences of numbers drawn from the same number generator within a short span of time. Many points must be compared in each of these tests, so the processing time increased considerably.

Week 5
By the time this week began, I had already produced enough basic tests. Because of this progress, I was able to generalize these tests for the case of three simultaneous sequences of numbers drawing from the same generator. Of course, I also had to modify the code that produces the theoretical results, so I can compare the results in statistical tests. I realized that my code is a little messy , so I took some time out to neaten it up and document it so it would be quite simple for someone else to run these tests. Perhaps someday someone will run these tests on a different generator!

Week 6
This week I used the code I wrote last week to run various parallel tests on several variations of the pseudorandom number generator I am testing. Whenever I obtained results that varied from the expected range of values, I spent time analyzing the results of these tests were further data to be generated. I analyzed 3, 4, 5, 6, and 7 sequences of numbers, to find correlations among the distinct sequences. I compared both pairs and triplets of numbers, when the sequences are split before being distributed to processes and when the pairs are distributed to processes one after another.

Week 7
This week I coded another random number generator, the nested weyl generator. I ran the same parallel tests on this generator, and found many correlations between the sequences. This was ecpected, and I documented the results. I also began another two tests, to find the length of the period and the minimum distance between the points in many sequences. I read the online documentation on the java command and realized that I can increase the size of the memory allocation for any program. This is nice to know since I had encountered "out of memory" errors in some of my programs that required many arrays of large sizes to be stored in memory. I still have to test the shuffled nested weyl generator in its capabilities of generating uncorrelated 10 parallel sequences of numbers, since I hear that the generator is used in this fashion, perhaps inaccurately.

Week 8
This week I modified the tests for parallel generation so they work on the nested weyl generator I created. It seems like I actually coded the nested weyl generator properly since there is a significant deviation from the theoretically "good" results. We expected this because the nested weyl generator has been documented to fail in parallel computing environments. I'm wondering whether my test that finds whether a generator is periodic or aperiodic actually works, because it tells me that every varsion of the shuffled nested weyl, and the nested weyl generators that I tested are aperiodic.

Week 9
This week I began by coding generators that have obvious periods. I was happy to see that my program that finds the period of a generator works properly. I also carried my test for parallel random number generation further. I checked the number of numbers that land in bins, but the two-dimensional bins each contained a smaller range of numbers, since there were 20 or 50 of them in each dimension. I hope to finish creating, testing, and using my minimum-distance test next week.

Week 10
I used this week to wrap up this research project and tie up any loose ends. I reviewed my progress with Dr. Whitlock. We discussed the results of my programs and further tests and variations I would develop were I to carry this project further. When we noticed gaps in the data, I clarified and completed the result set.