Captcha breaking of simplecaptcha with 96% accuracy using RNNs.
Convolutional neural networks make assumptions regarding the size of input. But Recurrent Neural Networks (RNN) can work on variable sized input or output. Here we look how to use RNN in Torch to decode captchas. This will help in getting dependency between (adjacent) characters and can also work on variable length captchas (which we will see in a later post).
For a good overview of RNNs read this blog post by Karpathy. Please read that post before proceeding.
As we saw in an earlier post, using CNN we were able to acheive 92% accuracy. Here we build a RNN on top of the CNN which will lead to better accuracy as we will be able to model dependency of adjacent characters in the image.
Suppose we define the CNN net which take an image and gives as output 5 predictions of 36 classes each which is a 5×36 dim output. Then we split the output of the CNN to 5 parts such that each one goes to the RNN module, as follows.
Our RNN has 5 steps as the captcha size is fice letters. At each step our CNN net propogates a different part of its output (of 5 parts). Our RNN works on the hidden state and CNN output to output the final predictions. This helps in taking into consideration the earlier context to predict the current character more accurately. We will be using the Sequencer class from rnn package to achieve this (as follows)
local mlp = nn.Sequential() :add(nn.Recurrent( hiddenSize, nn.Identity(), nn:Sequential() :add(nn.Linear(hiddenSize, hiddenSize)) :add(nn.ReLU()), nn.ReLU(), rho )) :add(nn.Linear(hiddenSize, classes)) :add(nn.LogSoftMax()) local rnn = nn.Sequential() :add(net) :add(nn.Sequencer(mlp))
Here we see how to use SequenceCriterion to calculate the loss and backpropogate the gradients. This criterion takes a table of outputs and table of targets of size of steps of RNN and calculates the loss. We use tnet below to split our training Y to feed to criterion.
local ct = nn.SequencerCriterion( nn.ClassNLLCriterion()) local tnet = nn.SplitTable(2,2) local targets = tnet:forward(Yb) local outputs = rnn:forward(inputs) loss = loss + ct:forward(outputs, targets)
The code for the RNN captcha is here on github. Run rnnMain.lua to acheive accuracy of 96% on the captchas. We will look in later post how to decode captchas which have variable length using RNNs (sequence to sequence learning)