Over the weekend, we generated a few models for our dataset; however, these models did not perform as good as we were hoping. We were looking for models which had 0 training error, but the models that were generated had training errors that were all in the 90's. I could not do a lot on Monday concerning the project because we had a poster presentation for the REU on Wednesday and I spent all of Monday working on my poster. For the rest of the week, I was working on adding some constraints to our random model generator. When we were looking at the type of models we generated over the weekend, we noticed that because we were randomly making the model, that left a lot of room for error. We decided to add constraints to the learning rate and output layers to make the model that was generated more sensible. This week was my last week of the DREU program, and I had a lot of fun participating in this research. I want to thank the CRA organization and Dr. Koyejo for giving me this opportunity to do research.
This week the models we implemented from the cifar10 benchmark website finished training. The results, however, were not what we were expecting. We had extremely high train and test error, which means that we messed up somewhere in the implementation. We then decided that our goal was just to get models which have 0 training error, so we changed some things in one of the implementations we had and reran it. This resulted in a model with 0 training error, which is what we wanted. After this, we split the workload so we could make some progress on different things. I began working on generating our dataset of random models. At the end of the week, I had the code ready to generate the models and some graphs of the training and test error and training and test loss, so we could easily interpret the results. We decided again to ran the code over the weekend to minimize the time we were spending training during the week. By Monday, we should have a few models we can use to compare to the models we implemented that performed well.
This week, Brando and I tweaked the random model generator code so we could generate better models for the generalization prediction model. Brando also had an idea to tell if our generalization predictor was actually working correctly. This way was to use models that we already know are good for cifar10 in our prediction model. To get these models, we went to the cifar10 benchmarks website, which has papers of the best performing models for cifar10 and implemented some models. We also got access to GPU's this week, so we can train our models for more epochs to get to the best possible performance for the model. Since we were going to train the models on the full cifar10 dataset anywhere from 200-500 epochs, we decided to train them over the weekend to minimize the time we spend waiting for the model to get trained. By Monday, we should have our models trained, and then we could start generating our dataset of random models to compare it with.
This week I generated a dataset of 10 points so I could train the model on more data points. Once I had the model trained, I made a function to evaluate the model. This function basically just generated test error predictions based on the trained model and the inputs. Brando and I decided to track MSE epsilon loss and the regular MSELoss as our metric for evaluation. We decided to use a threshold of 0.01 for our epsilon loss, and this caused us to get an epsilon loss of 0 most of the time when we evaluated the model. This is a good thing because this means that our model to predict generalization is somewhat working. However, I noticed that we were training with data that had test errors of mostly 1, so this may be causing the model to overfit because when I evaluated the model, I was getting a prediction of .999 when the target was .925. Next week I will try to generate and train on better quality data to see if this fixes the problem.
This week, I was able to fix the dimentionality errors I was getting last week with the help of Chase, one of the Ph.D. students in Dr. Koyejo's research group. After I got this error fixed, and changing the loss function to MSE Loss instead of cross entropy loss, I was able to finish debugging the neural network. We decided to train the network using one data point to see if it overfit to the data, and it did we were able to predict the test error for the model accurately and thus predicting generalization. This is a good thing because it tells us that the network we built was working. Next, we need to train it using more data to see what happens. This week, Brando and I started doing unit tests for the data processor code I wrote to make sure that our neural network is taking in the correct data. Next week we will most likely finish up the unit tests and add some more data to the neural network.
This week, I was able to one-hot vectorize the model architectures and optimizer information, so we can have a valid input to our generalization prediction model. I also started building our generalization prediction model. The structure of this model would consist of 4 GRU recurrent neural networks that are connected to each other by their hidden outputs. As I have never built a single RNN using pytorch, trying to build 4 connected ones was challenging. With the help of the official pytorch documentation, I was able to get the outline of the model done. However, when I attempted to test the model, it kept getting dimensionality errors in the hidden layers of the GRU's. On Friday, I began debugging the model to see where the problem was coming from, but I was not able to fix the model. Next week I will hopefully be more successful in debugging the model.
In the earlier weeks, I had been building models for the cifar10 dataset by manually, and to get a big enough dataset for our generalization prediction model, this proved to be inefficient. So, Brando (the Ph.D. student I am working with) and I decided to write some code to randomly sample models from a pool of activations, layers, loss functions, and optimizers. This task was actually pretty difficult for me to do because of the nature of random and the constraints that each layer had. After several days of tweaking the code to adjust for the constraints, I was able to get it to work to produce as many models and consequently, as many data points as we wanted. I was also able to fix the problem from last week when we weren't getting all the information we wanted from each model. We then planned out how we were going to take the model information and pass it through the generalization model. As we discussed, we need to one-hot vectorize the model architecture and optimizer information that we saved, and I will be doing this next week.
I started my week by fitting several different models to the cifar10 dataset, so we could get some data points for the model we were going to build to predict generalization. I was able to make 6 models, but to get the model information in a format we could use to make a dataset, we had to save the data in specific files. We initially thought that keeping the structure of the model in a toml file and the initial and final weights in an npy file would be good. However, after several attempts to save the model information, I ended up using a yaml file to save the model structure and an npz file to save the weights because it was much easier to use these files and they gave us the same results. I also started making a parser that would go through the yaml and npz files to make the dataset, but then I realized that some information about the model was not included in the yaml file, so next week I have to figure out a new way of saving the model structure that would give us everything we want.
On Monday, I was continuing my research into CCA and how it could be useful for the neuroimaging project I was working on. Later that day, my supervisor told me about another project that was going on and asked if I would like to work on that project instead. This new project was about, and it seemed like a project that would be more appropriate for me to tackle, given that the neuroimaging project required some statistics skills that I didn't have. After reading about Meta-Learning and talking to the Ph.D. student leading the project, I had a good understanding of the new project I was going to be working on. We were essentially going to try to predict the generalization of a deep learning model with only the information about the model, like the architecture, initial weights, etc. To do this, we needed a dataset of model information and what their generalization error was, which we were going to get by running several models on the CIFAR 10 dataset and saving their information and generalization errors.
This week my professor was traveling, so I was not able to meet him in person. I was however put into contact with one of his PhD students and he was able to get me set up to begin the project. In the middle of the week I was able to meet my professor via discord and he was able to explain the project I will be working on to me. We will be researching machine learning techniques for neuroimaging data, particularly fMRI and EEG. For the rest of the week I started doing some research and learning about Canonical Correlation Analysis, a machine learning technique used to analyze two sets of multidimensional data. We will be applying this technique to our fMRI and EEG datasets to see if we can get some interesting results.