Computer Graduation Project 12306 captcha recognition based on machine vision

Mondo Education Updated on 2024-03-07

Today, the senior would like to introduce a deep learning project to you.

12306 captcha recognition based on machine vision

Project Sharing & Guidance:

The captcha of 12306 is to find the requested object from 8 ** as shown in the image.

The senior counted 1000 samples and found that the number of categories of 12306 was actually only 80 categories, and their categories and corresponding statistics were as follows.

From the above statistics, we can see that the cracking of the 12306 captcha can be converted into a classification problem of 80 categories.

Dataset preview

The classification of objects can be simply divided into three parts:

network construction; data reading;

Model training. However, there are some hyperparameters in each of the above three steps, and how to set these hyperparameters is a skill that an experienced algorithm engineer must master. We will go through the details of each step in the following sections, and give my own experience and optimization strategy.

When building a classification network, you can use the classic network structure described in the previous articles, or you can build your own. When you build your own classification network, you can use the following steps:

For the stacking convolution operation (CONV2D) and the maximum pooling operation (Maxpooling2D), the size and number of channels of the input image need to be specified for the first layer;

flatten() is used to expand the feature map into a feature vector;

This is followed by the fully connected layer and the activation layer, noting that the softmax activation function should be used for multiple classifications.

When building a network on their own, seniors have several experiences:

1. The number of channels is 2 n;

2 times 2 times the number of channels after each maxpooling;

3. The size of the last layer of feature map should not be too large or too small (7-20 is a good choice);

4. The output layer and the flatten() layer often need to add at least one hidden layer for the transition feature;

5. Design the number of hidden layer nodes according to the number of nodes in the flatten() layer.

The following ** is a classification network built by seniors.

model_**= models.sequential()

model_**add(layers.conv2d(32, (3,3), padding='same', activation='relu', input_shape = (66,66,3)))

model_**add(layers.maxpooling2d((2,2)))

model_**add(layers.conv2d(64, (3,3), padding='same', activation='relu'))

model_**add(layers.maxpooling2d((2,2)))

model_**add(layers.conv2d(128, (3,3), padding='same', activation='relu'))

model_**add(layers.maxpooling2d((2,2)))

model_**add(layers.flatten())

model_**add(layers.dense(1024, activation='relu'))

model_**add(layers.dense(80, activation='softmax'))

In the above **, the VGG16() function is used to call the VGG-16 network that comes with KERAS, and the weights parameter specifies whether the network uses the transfer learning model, and the value of none indicates random initialization, and when the value is ImageNet, it represents the model trained using the ImageNet dataset.

The include top parameter indicates whether to use the following output layer, and we have determined that only the presentation layer is used, so the value is false. The input shape represents the size of the input **, and since VGG-16 will downsample 5 times, we use its default input size 2242243, so the input ** will be enlarged before input.

Keras provides a variety of ways to read data, and we recommend using generators. In the generator, Keras pre-reads the next batch of data to be trained into memory while training the model, which saves memory and facilitates the training of large-scale data. The initialization of keras' generator is the imagedatagenerator class, which has some built-in methods for augmenting data.

In this project, the principal puts different categories under different directories, so the flow from directory() function is used when reading the data, and the training data is read as follows (the validation and testing are the same):

train_data_gen = imagedatagenerator(rescale=1./255)

train_generator = train_data_gen.flow_from_directory(train_folder,

target_size=(66, 66),

batch_size=128,

class_mode='categorical')

We've almost determined that it's a categorical task, so the value of class mode is categorical.

When we train the model, we first need to determine the optimization strategy and loss function, here we choose adagrad as the optimization strategy, and the loss function chooses multi-classification cross-entropy categorical crossentropy. Since we are using a generator to read the data, we will use the fit generator to feed the data to the model, as follows.

model_**compile(loss='categorical_crossentropy', optimizer=optimizers.adagrad(lr=0.01), metrics=['acc'])

history_**= model_**fit_generator(train_generator,

steps_per_epoch=128,

epochs=20,

validation_data=val_generator)

After 20 epochs, the model tends to converge, and the loss value curve and accuracy curve are shown in the figure, and the accuracy of the test set is 08275。From the convergence situation, we can analyze that the model has been fitted at this point, and some strategies are needed to solve this problem.

Dropout has always been a very effective strategy to solve overfitting. When using dropout, the setting of the loss rate is a technical activity, if the loss rate is too small, the dropout cannot play its role, and if the loss rate is too large, the model will not be easy to converge, or even will always be **. Here, I'm adding a loss rate of 025 dropouts. The convergence curve and accuracy curve are shown in the figure below, and it can be seen that the overfitting problem still exists, but it is slightly reduced, and the accuracy of the test set obtained at this time is 083375

Keras provides a data augmentation strategy based on its parameters when calling the ImageDataGenerator class, and the seniors have a few suggestions for data enrichment:

1. The enrichment strategy should be based on sufficient observation and understanding of the dataset;

2. The correct enrichment strategy can increase the sample size and greatly reduce the problem of overfitting;

3. The wrong enrichment strategy is likely to lead to poor model convergence, and the more serious problem is that the distribution of the training set and the test set is more inconsistent, which exacerbates the problem of overfitting.

4 Developers often need to implement their own scaling strategies according to their business scenarios.

Here are a few of the data augmentation strategies I use.

train_data_gen_aug = imagedatagenerator(rescale=1./255,horizontal_flip = true, 

zoom_range = 0.1,width_shift_range= 0.1,height_shift_range=0.1,shear_range=0.1,rotation_range=5)

train_generator_aug = train_data_gen_aug.flow_from_directory(train_folder,

target_size=(66, 66),

batch_size=128,

class_mode='categorical')

where rescale=1The role of the 255 parameter is to normalize the image, which is a useful strategy for almost all image problems; horizontal flip = true, added horizontal flip, this is applicable to the current data set, but horizontal flip in OCR and other directions is not available; Others, including scaling, panning, and rotating, are common data augmentation strategies that will not be repeated here.

Combined with dropout, data augmentation can further alleviate the problem of overfitting, and its convergence curve and accuracy curve are shown in Figure 4, and the accuracy of the test set is 084875。

In addition to building our own networks, we can also use ready-made network pre-trained models for transfer learning, and the network structures that can be used are:

xception

vgg16vgg19

resnet50

inceptionv3

inceptionresnetv2

mobilenet

densenet

nasnet

The classic model is often used in conjunction with transfer learning, which is to use the trained model of task A (most commonly imagenet) for the initialization of the network of the current task, and then fine-tune it on its own data. This method tends to work well on tasks with relatively small datasets. Keras provides users to customize which layers can be fine-tuned and which layers do not need to be fine-tuned during transfer learningtrainable. The model provided by KERAS using transfer learning is often deep, which is prone to gradient vanishing or gradient **, so it is recommended to add a bn layer. The best strategy is to select a network suitable for your task and use the ImageNet dataset for training.

Taking VGG-16 as an example, the ** of its use of transfer learning is as follows. The first time you run this section, you'll need a model for transfer learning, so it's going to be slow, so be patient.

model_trans_vgg16 = models.sequential()

trans_vgg16 = vgg16(weights='imagenet', include_top=false, input_shape=(224,224,3))

model_trans_vgg16.add(trans_vgg16)

model_trans_vgg16.add(layers.flatten())

model_trans_vgg16.add(layers.dense(1024, activation='relu'))

model_trans_vgg16.add(layers.batchnormalization())

model_trans_vgg16.add(layers.dropout(0.25))

model_trans_vgg16.add(layers.dense(80, activation='softmax'))

model_trans_vgg16.summary()

Its convergence curve and accuracy curve are shown in Figure 5, and the accuracy of the test set is 0774375, the effect of transfer learning at this time is not as good as the network we built casually before. There are two reasons why transfer learning models perform poorly on this issue:

The network of VGG-16 is too deep, and it is easy to overfit on a simple verification code such as 12306 captcha;

Since the value of include top is false, the fully connected layer of the network is initialized randomly, resulting in a large loss value at the beginning of training, and biasing the trained presentation layer.

To prevent the presentation layer from being biased, we can do this by setting the trainable value of the layer in keras to false. Combined with the data augmentation and dropout described earlier, the convergence curve and accuracy curve we get are shown in Figure 6, and the accuracy of the test set is 091625。

I turned the 12306** captcha cracking into a classic multi-classification problem, and improved the recognition rate to 91 through deep learning and some tricks625%。

Training test results:

Project Sharing & Guidance:

Related Pages