My summer project involves using unsupervised learning to train a neural network. Specifically, I used a technique involving jigsaw puzzles, where an image is cut into pieces and randomly shuffled, and the computer has to guess what the original image looked like.
This week was dedicated to learning about neural networks and ironing out an idea for my project. I didn't know anything about machine learning, so I spent the week reading papers until I got a vague idea of what I wanted to work on.
I started constructing my neural network based on this paper and this one. The parameters are more or less the same, except I adjusted them slightly to use CIFAR10's 3x32x32 images instead of a 3x225x225 pixel chunk of ImageNet data. Its purpose was to solve a jigsaw puzzle, but instead of the 3x3 puzzle in Noroozi and Favaro's paper I made a 2x2 puzzle due to the size of the image. This also required some adjustments on the network.
100% debugging. Every bug fixed caused three more to pop up. It was incredibly frustrating how the functions and attributes in Pytorch's documentation didn't work as it said they would. Turns out I was looking at the wrong version all along. When I finally got it working all test sets got 100% accuracy. Seems suspicious...
Fixed the bug giving me false positives. I now get more realistic accuracy rates ranging from 50% to 70%. I also put in 10 of my own test images to see how the computer reshuffles them.
Tweaked my testing code to show the images before shuffling, after shuffling, and after the computer reconstructed it. Also had it display a graph with the average loss in each epoch.
Started working with a 3x3 jigsaw puzzle using larger images from ImageNet. Lots of bug fixing and adjusting the code to work on a newer version of Pytorch.
Found out that my cropping function wasn't working, so I went back to the 2x2 puzzle to fix it The code was feeding whole 16x16 tiles into the machine instead of a smaller 12x12. This allowed the machine to cheat in a sense and use the borders of each tile to solve the jigsaw puzzle. I fixed that, dropping the accuracy of my tests to 25-40%.
While trying to display the cropped pieces I found that the cropping function still wasn't working, as it was returning a colorful mess of pixels instead of a piece of the 16x16 tile. I fixed the bug and now the accuracy ranges from 45-60%. Then I made a tweaked version that works with the larger images in ImageNet.
Ran the network with ImageNet, then ran it with ImageNet images resized to be 32x32 in order to compare the results and determine the effects of higher resolution. Also tracked the accuracy of the different classes in the CIFAR10 dataset. It seems that the machine is most successful reconstructing trucks, while planes give it the most trouble.
Ran the 3x3 Jigsaw test on ImageNet. Worked on the final report.