深度学习实战篇之 ( 十二) -- TensorFlow之LetNet-5

用户5410712

发布于 2022-06-01 20:13:10

2500

发布于 2022-06-01 20:13:10

文章被收录于专栏：居士说AI居士说AI

科普知识

在神经网络中，当前面隐藏层的学习速率低于后面隐藏层的学习速率，即随着隐藏层数目的增加，分类准确率反而下降了。这种现象叫做消失的梯度问题。

解决方案：预训练加微调，梯度剪切、正则， relu、leakrelu、elu等激活函数。

回顾

本期文章紧跟理论篇的深度学习理论篇之 ( 十三) -- LetNet-5之风起云涌文章，该文章中，我们分析了其网络构造和具体的维度信息，今天小编就带着大家趁热打铁进行实战操作，用Tensoflow来编写LetNet-5网络结构，并用其做一个分类网络，一起来看看吧。

1. 数据准备

本次项目数据集我们还是采用之前的猫狗二分类数据集，分为训练集和测试集（分别对应一个文件夹），每个集里面分别有猫狗类别（文件夹）

在图像数据传入神经网络之前，需要将图像设置为统一的shape,上一篇文章中，我们的输入是32x32,但是对于实际我们的猫狗数据集来说，这样的长宽会在一定程度上损失图像的细节部分，因此我们采用的统一尺寸是：150*150.

2. 网络结构

# LetNet-5 网络结构

# 输入：3*150*150 代表三通道的彩色图像输入，图像大小为150*150

# 卷积层1：卷积核大小为5*5 卷积核个数：6步长：1

# 下采样层1：采样区域：2*2 步长：默认为1

# 卷积层2：积核大小为5*5 卷积核个数：16 步长：1

# 下采样层1：采样区域：2*2 步长：默认为1

# 卷积层2：积核大小为5*5 卷积核个数：120 步长：1

# 全连接层1：输出84

# 全连接层2：输出2

# 定义平均池化层
def Avg_pool_lrn(names, input, ksize, is_lrn):
    with tf.variable_scope(names) as scope:
        # 最大池化操作
        Avg_pool_out = tf.nn.avg_pool(input, ksize = ksize, strides = [1, 2, 2, 1], padding = 'SAME', name = 'max_pool_{}'.format(names))
        if is_lrn:
            # 是否增加一个lrn操作，一般来说用增加非线性表达和抑制过拟合
            Avg_pool_out = tf.nn.lrn(Avg_pool_out, depth_radius=4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name = 'lrn_{}'.format(names))
            print("use lrn operation")
    return Avg_pool_out
 
 #网络结构代码
def inference(images, batch_size, n_classes,drop_rate):
    conv1 = Conv_layer(names = 'conv1_scope', input = images , w_shape = [5, 5, 3, 6], b_shape = [6], strid = [1, 1])
    print("---------conv1:{}".format(conv1))
    down_sample1 = Avg_pool_lrn(names = 'avg_pooling1', input = conv1 , ksize = [1, 2, 2, 1], is_lrn = False)
    print("---------down_sample1:{}".format(down_sample1))
    conv2 = Conv_layer(names = 'conv2_scope', input = down_sample1 , w_shape = [5, 5, 6, 16], b_shape = [16], strid = [1, 1])
    down_sample2 = Avg_pool_lrn(names = 'avg_pooling2', input = conv2 , ksize = [1, 2, 2, 1], is_lrn = False)
    conv3 = Conv_layer(names = 'conv3_scope', input = down_sample2 , w_shape = [5, 5, 16, 120], b_shape = [120], strid = [1, 1])
    
    # conv-->local dimension change
    reshape = tf.reshape(conv3, shape=[batch_size, -1])
    dim = reshape.get_shape()[1].value
    local_1 = local_layer(names = 'local1_scope', input = reshape , w_shape = [dim, 84], b_shape = [84])

    # 
    # 将前面的FC层输出，再次做一个FC
    with tf.variable_scope('softmax_linear') as scope:
        weights = tf.Variable(tf.truncated_normal(shape=[84, n_classes], stddev=0.005, dtype=tf.float32),
                              name='softmax_linear', dtype=tf.float32)

        biases = tf.Variable(tf.constant(value=0.1, dtype=tf.float32, shape=[n_classes]),
                             name='biases', dtype=tf.float32)

        softmax_linear = tf.add(tf.matmul(local_1, weights), biases, name='softmax_linear')
        # print("---------softmax_linear:{}".format(softmax_linear))

    return softmax_linear

训练过程

对于训练过程的代码，我们还是采用之前的第一个分享的图像分类的代码，不做改变，改变的仅仅是网络结构，因此，所以说之前的那份代码是一份通用的图像分类代码。