This first week was spent installing virtual machines (Virtual Box is my favorite), uninstalling virtual machines because I didn't like how much room they had on my computer, finding out about the Windows Subsystem for Linux, installing that, and then installing Argos using the Linux subsystem.
This was my second week on my own as both of my contacts were still at the conference, so I began to learn C++.
Usually when I learn a new language, I first learn a basic "hello world" program, and then program a simple game of rock-paper-scissors with a random (or extremely bad non-random) AI, just to make sure I know the language. This time, I decided to try to create a tic-tac-toe game instead of rock-paper-scissors, because I had most recently created a tic-tac-toe bot in Python, and thought it would be cool to see if I could create something that ran faster in C++.
While I had learned a little C in the past year, the learning curve was high for C++. I switched from using Visual Studio back to using Atom because Visual Studio was too complicated to understand, but then eventually settled back on Visual Studio once running things in Visual Studio became easier, despite the cryptic error messages and a difficult-to-understand GUI.
My Google history from this week is filled with searches like "initial value of reference to non-const must be an lvalue" and "set array via constructor parameter C++" and visiting Stack Overflow a lot.
In the end, this site ended up being the most helpful because it had a full working minimal example of include guards and subclasses, and helped me figure out a lot about how to structure my code.
I eventually did get a working tic-tac-toe game, where you could make moves on the board and determine who won, but I never implemented anything better than a completely random AI, and even that was difficult.
I've found I'm very much used to Python taking care of lower-level implementation for me, and I'm not used to feeling like I'm making bad low-level decisions by implementing things in three loops rather than one, or by using a datatype twice as big as the one I need. I started focusing on the minutiae instead of the overall program, and it really bogged me down.
Trying to learn C++ really made me appreciate Python more.
This was my first week at the U of M, meeting with my mentor and the graduate student I planned to work for. I talked to John Harwell and Professor Gini quite a bit during the first few days of this week, learning more about Argos and determining different ideas for projects I could work on.
When I was talking to John about his project, Foraging Robots Use Dynamic Caches (FORDYCA), he mentioned that one thing that would be very helpful would be able to have a script that generated configuration files for the experiments he wanted to run and then ran all of the experiments on a supercomputer. Due to my difficulty learning C++, I thought this would be a great project for me because I could implement it in Python.
So on Wednesday (June 6th), I started working on the project. We called it Sierra, just because the name sounded nice. (Later, I would come to think of it as saving a "mountain" of work.)
During the course of the week, I developed a class that allows you to easily edit XML files (since that is the format of Argos's configuration files). At this point, the class needed to be able to do 4 things: Open XML files, change attributes within the XML file, remove elements from the XML file, and save the XML file.
So, I spent a lot of time reading the documentation for the ElementTree built-in Python package, which has a lot of nice and easy functions for editing XML files.
Originally, the class I built was very confusing for me to understand what called what and why, so I looked for something that would track which functions called which, and ran into this python program called pyan.
By the end of this week, I had a program that could do all four of the requested tasks, which was great.
This week was spent pretty much solely working on Sierra to finally get to running on MSI (the Minnesota Supercomputing Institute).
John recommended using GNUParallel for getting the simulations to run in parallel on MSI. So, I spent a few days looking at GNUParallel and figuring out how to get it to run in Python. The solution I came up with was to use the Python subprocess module in order to essentially run the GNUParallel commands as if I were typing them at the command line.
Then I learned how to write job files for MSI, called PBS scripts, eventually learning how to run things in parallel. I ran a few "hello world"-type scripts just to make sure that running things in parallel worked.
I finally was able to understand enough to implement a class which took in an Argos file and generated multiple copies of that file, replacing the random seed in each to ensure randomness. I could then run those Argos files in parallel on my personal computer (which didn't really do anything because my computer doesn't have much hardware that can run things in parallel, but hey, it ran), and I had a script set up that hypothetically could run the files on MSI as well.
During this week, I was also thinking about potential personal research projects. I had recently seen the World Models paper and website and was extremely excited to see a general algorithm for learning how to play games just from watching someone else play. Also during my past year at school I had come across capsule networks (tutorial) as a new form of neural networks, and was very interested in finding some way to use them. I had also thoroughly read through another paper on a computer learning to simulate Mario by trying to come up with the logical rules running the engine based on seeing video of the gameplay.
Initially World Models seemed like a strict upgrade from the Mario engine-learner because it could learn to play games without needing a pre-built sprite map to understand the game. However, once I started playing around more with World Models, I realized that when using the World Models algorithm, the computer doesn't understand the game even remotely close to how humans do. The vector that is the computer's representation of the game (labeled the "Z" vector on the website) is extremely difficult for humans to make any meaning out of, so the computer's decisions become more like a magical black box. However, in the case of the Mario engine-creator, while the exact understanding of why the algorithm decided on implementing the rules in a particular way is complicated because there are so many small decisions, the final result of the algorithm is a list of if-then rules that describe the engine. Thus, if the engine behaves incorrectly, you can explicitly see what rules caused the issue and change them by hand if need be. So, I realized I wanted something in between these two, but I wasn't quite sure yet what that was going to be.
On Monday, I met with my mentor to discuss potential next project ideas, since the Sierra project was wrapping up for me. I had thought a lot about it over the weekend, and I decided that there was a way I could do something both with capsule networks and the world models algorithm that would help deal with the issue of computers not understanding the game in the same way that humans do.
Since capsule networks seemed to be good at understanding images similar to humans, what if I used a capsule network as the auto-encoder inside the world models architecture? This would hopefully allow the rest of the world models architecture to work exactly the same, with a simple component replaced that would allow its understanding of the world to be closer to our human understanding of the images. After explaining my thought process to her and proposing the project, she agreed, so I began to do research on how I could get capsule networks as auto-encoders inside the world models algorithm.
Since the environments for the world models algorithm seemed fairly complex, and I was hoping to train at least the initial models on my personal computer to check to see if they looked right, I decided to create my own training environment. So, I put together a little ball-dodging game that uses extremely simple graphics but generates an image for each frame. A human or an AI can play, and it can record the screenplay and play it back as a GIF.
While I was working on getting the capsule network to be able to encode these images, I was constantly updating the game to make it easier to visualize and rearranging the code's architecture so it made more sense. Since I was only interested in encoding the images at this point, the game doesn't have any game logic yet. That is, if the square does get hit by a falling ball, nothing happens; the ball goes straight through the square. However, as long as during the game, the square never gets hit, the capsule network should still learn fairly well how to encode and display images like this. Having my own simple game works as a great benchmark to make sure the algorithm is working.
With regards to Sierra, last week I had gotten something that worked on my personal computer, so the last thing to do was to make sure it could actually run on the supercomputer.
This was my first time working on a supercomputer, or any computer that wasn't mine, so when installing things didn't work, I then tried installing things with `sudo`, which also didn't work. This resulted in me getting a security email from MSI saying "you invoked the sudo command multiple times. you don't have administrative privileges in our systems. you need to desist or we'll take appropriate action". And that was how I learned that while I was using SSH to log into my own account on MSI, I didn't own the computer, and so trying `sudo` wouldn't work.
Eventually I learned that instead of installing particular files, I could import things from within my PBS script by using the `module load module_name` command. This seemed to almost get things up and running, but the simulation part still crashed with an error about Argos not working correctly.
It took some work, but I discovered that the environment with all the stuff I installed didn't transfer to the parallel part of the systems unless I included a specific line in the PBS script that did it. Once I included that line, it worked! We could start running actual simulations on MSI. (The page with the special line is here.)
At this point, I uploaded what I had, and passed it on to John, since he would be the one creating and running the experiments as part of FORDYCA. He was very excited, and worked to get some extra functionality in so that he could quickly and easily design and run experiments.
This is the first week where I took down notes of what I was doing pretty much every day, realizing that it would make it easier to write this journal in comparison to past weeks where I had to determine what I did from memory and looking at my commit history, file editing history, and emails.
I started off the week by downloading the code for the world models algorithm. I had originally tried this code from Medium, but realized since I was doing research it would be best to use the original code from the author. A lot of time was spent just getting the code up and running. I mostly tested to make sure I could play the racing game myself and see the trained model play both Doom and the car racing game.
During this time, I also discovered what seems to be the site of the author of World Models, called otoro. On their site they have a blog of different projects, and it was really cool to walk back through the blog and see how they went from doing animation with simple creatures to simple neural networks with slime soccer to more complex neural networks like Mixture Density Networks with a program that completes your drawing for you to creating an AI that achieved state-of-the-art performance on a dataset with the car racing game. (World Models is the first known program to have actually solved the racing game.)
After finally getting the World Models code up and running, I starting looking for implementations of Capsule Networks. An implementation in TensorFlow would be nice, but ideally I wanted something using Keras because it has a higher-level API that's much easier to use.
Despite there being a lot of different implementations, most of them were built specifically for the MNIST dataset. However, after asking around online quite a bit, Quora helped me to find an implementation by XifengGuo that was both general and was written using Keras.
So the rest of the week was spent getting that implementation of capsule networks up and running on my computer.
On Monday of this week, I learned how to use TensorBoard (the visualization system for TensorFlow) with Keras. Along the way I picked up a lot of information on how both TensorFlow and TensorBoard work.
After seeing that I could get a capsule up and running on MNIST, I decided to try it on my own my own dataset from the ball-dodging game.
To ensure that any issues were in the capsule network and encoding aspect of things, I first tested a basic feedforward network. Since I wrote my own videogame, I could collect whatever data I wanted, so I quickly put together a dataset where each image corresponded to the x-coordinate of the square on the screen. I then put together a feedforward network to try to predict the position of the square. It didn't work very well at first, and took a few iterations and questions and extremely nice people on Stack Overflow in order to finally get something that was working. While I have done some small machine learning projects in the past, I had forgotten how important normalizing the data and setting a good learning rate were. I'm really glad I tested out the feedforward network first, because trying to debug the capsule network without learning the importance of those two processes would have been very difficult.
The next goal was to run a capsule network auto-encoder on a pure image dataset, giving the capsule network an image and training it to give back an output that represented the image so that a feedforward network called a "decoder" could turn the representation back into an image.
There were quite a few bugs to fix and things to learn in order to get the architecture up and running, but I did it. I finally had a test that was working without crashing. After training for about 15 minutes, the error seemed to stop going down, and I was presented with what seemed to be a final result: a completely white image. No matter what image was put into the network, the response was a white image (with occasional random very-light-gray dots if you looked really closely using this tool).
And that was the result of this week for my personal project. Capsule network up and running, but the only thing I was getting out of it was completely white images.
This week was also the week I started the main portions of this website. I discovered here that there's no native way to import HTML files inside of other HTML files. That is, if I have a header that I want to be on every single page, I have to copy/paste that header onto every single page. And then, if I want to change the header, I have to edit the header on every single page. It looked like there were ways to do imports with JQuery or PHP, but I didn't want to add in lots of extra data to my site, and the libraries might have unintended consequences that I might not predict because I don't know them well enough. For this website, I wanted to have a more complete undestanding of everything that was happening without bringing in any external libraries that I didn't know well enough.
I found a single line of javascript that looked like it did what I wanted. I could type <div data-include="/path/to/include.html"></div>
and have it essentially run the HTML code specified by the data-include
path. However, I couldn't have multiple layers of this. That is, if I did a data-include on a top level page, and the second-level page that included also had a data-include, that second data-include would not work. However, I wanted to have a header file that was on each of my pages, where each of those header files pulled in HTML from other sources, so I needed to be able to have multiple layers of imports.
So, I created a Python script to do it for me. Taking inspiration from the the javascript code, the script looked for files ending in .htmlt.html
, and turned those files into normal HTML files by replacing the special div
tags inside them with the HTML file at the path specified by the data-include
tag, doing so recursively to add in all of the layers of imports. (The extension .htmlt
stands for HTML Template, and the following .html
makes it so that the file is still recognized by the system as a normal HTML file.)
Then, since I had the script, I could create HTML templates, such as the header I wanted to be on each page, and simply import them from other HTML pages. Since I am so used to object-oriented programming, not having something like this would have been extremely difficult for me. If I have a header that I want to be the same on each file, I don't want to have to copy/paste the header on each file; that's bad programming practice. If something changes, I have to go back and update everything, and there's a chance I might make a mistake while updating one of the pages. So, the Python script allows me to use good programming practices by letting me import HTML files, which is extremely helpful.
This week I got a strong hint that what I was trying to do with capsule networks might be more difficult than I originally thought.
I started by scaling my dataset down to three images to see if the capsule network could represent those accurately. When I did that, I found that the output was always a blend of all of the input images. So while it learned to encode something close to the dataset, it didn't learn any of the differences between the images.
After talking to John about it, he gave me the classic advice of trying to reduce it to more simple cases until I got it working and then build back up from there to figure out where things went wrong. So, instead of using a capsule network, I used a normal feedforward network, the kind of network that's usually used for auto-encoders. I had the exact same issue, where it would still always output a blend of all of the images in its training set. So, as per usual, I created a Stack Overflow post about the issue, hoping that someone would be able to shed some insight on the issue. Alas, some helpful users tried to aid me, but no one was able to solve the problem.
Continuing with John's advice, the next thing I tried was building a single-layer feedforward network without any encoding. That is, the network should train to be a network that literally outputs exactly what it was given, with no bottleneck forcing it to encode. At first, this network seemed to show the exact same issues. But after training for 1400 epochs (training on all 3 images 1400 times), it finally started to show some meaningful differences between the images. I finally had a working auto-encoder.
It took lots of experimenting and testing out different architectures, but eventually I found the issue with the larger network. It turns out I was trying to encode the data into too few dimensions. If you think of the encoder as a file "zipper", my zipped filesize was so small that it couldn't figure out how to unzip the file and get back all of the original information. As soon as I updated the feedforward encoder to use a bigger encoding size, it started to learn the differences between images, which was my first sign of hope for the capsule network working. Maybe I just needed to raise the encoding size for the capsule network, and it would begin to learn.
However, even with the updated encoding size, the capsule network didn't seem to work either. So, near the end of this week, I started trying to use convolutional neural networks as encoders, and saw that they worked just fine, so the next thing was to see if there was something in the architecture of the capsule network that caused it not to work.
With regards to the Sierra project this week, I added some more functionality to the XMLHelper
class, enabling it to change the name of tags, and also to search through elements using their id
property instead of their tag. This allowed John to dynamically change the names of tags and, if there were duplicate tags, identify them by their id
attribute. (For example, each of the walls is given the "box" tag, but the id
attribute of each of those tags is either "wall_north", "wall_east", "wall_south", or "wall_west", so they can each be identified and modified as desired.)
I also presented to the group of people working on FORDYCA about how neural networks work and my research project this week. We have a weekly meeting that I attend, and I chose to present this week. In my presentation I went over the basics of feedforward neural networks and the backpropagation algorithm and then went on to dicuss mixture density networks, recurrent neural networks (including LSTMs), capsule networks, the world models algorithm, and my project to help make computers' understanding of world closer to our human understanding. As part of my presentation, I created a quick paper on the math behind the backpropagation algorithm in the form that I would have wanted if I were just getting into machine learning.
At this point, I had started to be unsure if I could get the capsule networks up and running quick enough to be able to complete the project within these 10 weeks. It didn't seem likely, so, I started thinking about other ways I could encode the image into a vector so that the world models algorithm could still be used. I did come up with one method that I thought could be viable, but it ended up seeming computationally intractable and was based on logic, not machine learning.
So, I kept on trying to think of what I could be doing wrong or why the capsule network was not training as an auto-encoder. I tried building many different combinations of convolutional neural networks and feedforward neural networks and capsule networks, testing to see if any of the ones with a capsule layer in them would work. (None of them seemed to.)
Then, on Wednesday, I realized that I had been training the capsule network to generate images in a different way than the original paper. In the paper, they give the capsule network an image. The output of the capsule network is 10 vectors, each one representing a number. The length of each vector corresponds to its confidence in the image being that number; the longer the vector, the greater the confidence. However, when reconstructing the image, they take that single vector and use it to recreate the entire image. (The vector has 16 dimensions, which is enough to generally encode the images. To decode the vector into an image, they pass it to a feedforward neural network, called the "decoder" which then trains on the correct output image.) During my training, I was using all of the capsule network's outputs to try to generate the image, when I should just be using one, like in the paper.
The paper has another addition as well, something called "masking". To illustrate, let's say they give the capsule network an image of the number 5, and the capsule network predicts that the image is a 3. (That is, the longest output vector from the network is the one that corresponds to the number 3.) Since the real answer is 5, the authors of the paper use the output capsule corresponding to the number 5, and send that to the decoder to reconstruct the image and train. This way, the decoder always trains from the correct output capsule.
However, my data isn't labeled, and I don't want it to be. I want the capsule network to be able to encode the image regardless of what it is, the same way that a single capsule outputs an encoding that is used to reconstruct all of the "3"s, I want the network to be able to have a single capsule that outputs an ecoding that can be used to reconstruct all of the images in general.
So, I updated the network to only have one output capsule. Unfortunately, this didn't seem to work either; it still blended all the images together. This result was better than the white screen I had before, but that may only be because I changed some of the normalizing and reconstruction and activation functions so that the outputs of the network, when turned back into an image, were forced into the the [0, 255]
range.
However, I had seen this blending issue before with the feedforward neural networks. The solution was to raise the encoding size and train for longer. Yet, I had some doubts that this would work, given that in the original paper, 16 dimensions seemed to be just fine to encode the images, and I had chosen the dimensions of my images (24x32) so that there were slightly less pixels than the MNIST images (28x28) used in the paper. And, sadly, my doubts were correct. Even after letting it train for multiple hundreds of epochs, the error seemed to plateau at a level that still kept the network averaging all the images together.
So, the next thing to try was to test running the capsule network on the supercomputer to see if training it for longer helped any more. For the rest of the week I worked on getting the files uploaded on onto the supercomputer and trying to figure out how to run them.
In addition, during this week, I updated Sierra again, mainly ironing out bugs found by John when running his more complex tasks on files with configurations I hadn't tested on previously. I also updated the error message system so that the end user API gave simple and understandable error messages as to what generally went wrong, and then the messages became more verbose and specific as the user scrolled up through the entire traceback. I don't like it when error messages pop up that don't make sense, so I always try to have my error messages start off in a way that everyone can understand, and then I have them become more technically detailed for those who need to understand exactly why something did not work.
I met with Professor Gini at the start of the week to discuss my progress with Sierra and the capsule network training and figure out how I should approach this final week. She suggested that I work on finishing my documentation (i.e. this website and my final report) this week, even though I wasn't able to get any positive results with the capsule network auto-encoder.
So, during this week, I've been mostly focusing on this journal report and determining how to organize my paper report. MSI did recommend a particular solution for trying to run the capsule network on the supercomputer, but I have yet to determine how to modify their script to work for my project.
While writing this journal, I realized that it was becoming difficult to type links in HTML because anchor tags are a lot to type. I also discovered that none of my links were opening up in a new tab. So, when I went to go edit them all, I realized that once again it might be better if I followed good programming practice and made something that replaced all of the anchor tags correctly so that I could edit them all at once. Thus, I made a script so that if I use the Markup syntax of [words] (link)
without the space between the braces and parentheses, it will replace it with the appropriate anchor tag (that opens in a new tab) in the HTML.
As per usual, I also addded more updates to Sierra this week. This time, I added some fairly major modifications to the CSV class. The CSV class reads in .csv
files and creates objects that you can use to sum or average multiple .csv
files together and then save the sum or average to a new file. This week, I made it check that the CSV that was read in by the class was a perfect rectangle. (Every row of data had to have the same amount of columns.) This was due to some errors that John was finding with adding CSVs with incompatible dimensions together. By checking to make sure that the two CSVs were perfectly rectangular, I could guarantee that all the dimensions were compatible just by checking the height and width of the rectangular CSVs. I also updated the error messages to be more readable.