Sequence to sequence learning to decode variable length captchas

 

In earlier posts we saw how to use CNN and RNN neural networks to decode captchas. Here we will look how to use sequence to sequence learning to transcribe images to captcha sequence. The image is not a sequence per se, but this example helps in demonstrate how to use seq to seq learning. Sequence to sequence learning has been used in machine translation, question answering and chat bots.

The sequence to sequence learning is based on using RNN (encoder) to encode a sequence of variable length to fixed length vector. Then we use another RNN (decoder) to decode the fixed length vector to the target sequence. The vector should be large enough to capture the various sequences possible. This solves the problem of having different length inputs and outputs.

First lets see the encoder. This is an LSTM which takes a sequence of inputs from net. Then breaks in into tables to feed to the LSTM sequencer. The output of the last LSTM is taken to feed into the decoder.


local enc = nn.Sequential()
 enc:add(net)
 enc:add(nn.SplitTable(2,3))
 enc:add(nn.Sequencer(nn.LSTM(hsize, hsize)))
 enc:add(nn.SelectTable(-1))

The decoder takes the output of the encoder and feeds it into another LSTM seqeuencer. Depending on the length of the output.


 local dec = nn.Sequential()
                 :add(nn.LSTM(hsize, hsize))
                 :add(nn.Linear(hsize, csize))
                 :add(nn.LogSoftMax())

Then we use LogSoftMax with CLassNLLCriterion to calculate loss and backprop.


 local enc = seq.enc(net,hsize)
 local dec = seq.dec(hsize,csize)
 local encdec = nn.Sequential()
                      :add(enc)
                      :add(repl(osize))
                      :add(dec)
 local criterion = nn.SequencerCriterion(
nn.ClassNLLCriterion())

Now we see how to use this network to decode variable length captchas. If the length of the captcha is not known before hand, we have to use RNN and plain CNN may not work. As CNN only works on fixed length outputs. For example the captchas below have variable length with spaces between them.

Untitled

We use CNN with encoder to encode the captchas to a fixed length vector. Then we use decoder to decode the captchas. With this we achieve 50% accuracy on variable length captchas. The whole code can be checked on github. We will see how to use attention mechanism to increase the accuracy in later post.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s