Hello Deep Learning: Dropout, data augmentation, weight decay and quantisation

from blog Bert Hubert's writings, | ↗ original
This page is part of the Hello Deep Learning series of blog posts. You are very welcome to improve this page via GitHub! In the previous chapter we found ways to speed up our character recognition learning by a factor of 20 by using a better optimizer, and a further factor of four by cleverly using threads using a ‘shared nothing architecture’....