Post

Image Classification

Image Classification

Objective

图像分类

  1. 用卷积神经网络解决图像分类问题。
  2. 通过数据增强来提高性能。
  3. 了解流行的图像模型技术,如残差。

Task Introduction

食物分类

  • 图像是从food-11数据集中收集的,分为11类。
  • Training set: 9866 labeled images
  • Validation set: 3430 labeled images
  • Testing set: 3347 images

思路

Sample Baseline

Score: 0.63047 Private score: 0.61416 (n_epochs = 10)

直接跑一边Sample Code提交结果,如果没到Sample Baseline,就多train几个epoch。从训练数据来看,大约训练5个epoch就可以到Sample Baseline。

Medium Baseline

Score: 0.77788 Private score: 0.76056

根据助教提示,进行Data Augmentation,然后训练更多epoch。实作中从易到难尝试了三种Data Augmentation。

由于torchvision.transforms.v2torchvision.transforms有更丰富的功能且速度更快,因此使用torchvision.transforms.v2实现Data Augmentation。

ref: https://pytorch.org/vision/0.21/transforms.html

1
import torchvision.transforms.v2 as transforms
  1. 使用基础API
1
2
3
4
5
6
7
8
9
10
11
12
13
train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(128),    # 随机裁剪 & 缩放到 224x224
    transforms.RandomHorizontalFlip(p=0.5),  # 50% 概率水平翻转
    transforms.RandomRotation(degrees=15),  # 旋转 ±15°
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1),  # 颜色抖动
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),  # 10% 随机平移
    transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5),  # 随机锐化
    transforms.RandomPosterize(bits=4, p=0.5),  # 颜色减少
    transforms.RandomPerspective(distortion_scale=0.2, p=0.5), # 随机透视变换
    transforms.ToImage(),  # 转换为 Tensor
    transforms.ToDtype(torch.float32, scale=True),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # 归一化
])
  1. Auto-Augmentation

我们并不知道当前数据适合使用哪些Data Augmentation方法,因此借助Auto-Augmentation为数据集自动搜索合适的Augmentation方法。实作中使用TrivialAugmentWide(),其他方法也可以选用。

ref: https://pytorch.org/vision/0.21/transforms.html#auto-augmentation

1
2
3
4
5
6
7
8
train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(128, antialias=True),
    transforms.RandomHorizontalFlip(0.5),
    transforms.TrivialAugmentWide(),
    transforms.PILToTensor(),
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ])
  1. Auto-Augmentation + CutMix & MixUp Transform

TrivialAugmentWide()基础上增加CutMix & MixUp Transform方法。 ref:https://pytorch.org/vision/main/auto_examples/transforms/plot_cutmix_mixup.html#sphx-glr-auto-examples-transforms-plot-cutmix-mixup-py

  • 定义def collate_fn(batch)函数;
1
2
3
4
5
6
7
8
9
from torch.utils.data import default_collate

NUM_CLASSES = 11
cutmix = transforms.CutMix(num_classes= NUM_CLASSES)
mixup = transforms.MixUp(num_classes= NUM_CLASSES)
cutmix_or_mixup = transforms.RandomChoice([cutmix, mixup])

def collate_fn(batch):
    return cutmix_or_mixup(*default_collate(batch))
  • train_loader新增collate_fn=collate_fn参数;
1
2
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True, drop_last=True, collate_fn=collate_fn) # CutMix and MixUp Transform

总结

  • Auto-Augmentation + CutMix & MixUp Transform速度最快,效果最佳。如👇图所示,Private Score和Public Score较接近,模型没有发生过拟合,泛化性好。总共训练370个Epoch即可超过Medium Baseline(Private Score : 0.71361, Public Score : 0.73207)

    image.png

  • 基础API和Auto-Augmentation方法需要训练约1000个Epoch才能超过Medium Bseline,且模型发生过拟合,Score也较低。 image.png

Strong Baseline

Score: 0.87948 Private score: 0.87323

加载Medium Baseline中Auto-Augmentation + CutMix & MixUp Transform方法预训练的模型,然后在‘Testing and generate prediction CSV’阶段使用Test Time Augmentation(TTA)方法的version_2版本生成预测结果,就超过了Strong Baseline,并且很接近Boss Baseline,令人十分意外❗

模型预测阶段使用好的预测方法也可以极大的提高测试集精度,不一定要训练更强更复杂的模型。

image.png

Boss Baseline

Score: 0.88745 Private score: 0.87878

  • 在Medium基础上使用ResNet50模型+dropout+batchnorm训练约500个epoch。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class ResNet50(nn.Module):
    def __init__(self, num_classes=11):
        super(ResNet50, self).__init__()
        
        # 加载 ResNet50 模型(去掉最后的 fc 层)
        resnet = resnet50(weights=None)  # 使用 weights=ResNet50_Weights.IMAGENET1K_V1 加载预训练权重
        self.cnn = nn.Sequential(*list(resnet.children())[:-1])  # 去掉 fc 层
        
        # 获取原始 fc 层的输入特征数
        num_features = resnet.fc.in_features
        
        # 定义新的 fc 层
        self.fc = nn.Sequential(
            nn.Linear(num_features, 1024),
            nn.BatchNorm1d(1024),  # Batch Normalization
            nn.ReLU(),
            nn.Dropout(0.5),  # Dropout

            nn.Linear(1024, 512),
            nn.BatchNorm1d(512),  # Batch Normalization
            nn.ReLU(),
            nn.Dropout(0.5),  # Dropout

            nn.Linear(512, num_classes)  # 输出层
        )

    def forward(self, x):
        # 提取特征
        out = self.cnn(x)
        
        # 展平特征
        out = torch.flatten(out, 1)  # 使用 torch.flatten 代替 view
        
        # 分类
        out = self.fc(out)
        return out
  • 挑选三份最好的结果作ensemble,实作中选择投票法。

Code

双过Boss Baseline

Report Questions

Q1. Augmentation Implementation (2%)

1
2
3
4
5
6
7
8
train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(224, antialias=True), # 128
    transforms.RandomHorizontalFlip(0.5),
    transforms.TrivialAugmentWide(),
    transforms.PILToTensor(),
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ])

Q2. Residual Connection Implementation (2%)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class Residual_Network(nn.Module):
    def __init__(self):
        super(Residual_Network, self).__init__()

        self.cnn_layer1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),
            nn.BatchNorm2d(64),
        )

        self.cnn_layer2 = nn.Sequential(
            nn.Conv2d(64, 64, 3, 1, 1),
            nn.BatchNorm2d(64),
        )

        self.cnn_layer3 = nn.Sequential(
            nn.Conv2d(64, 128, 3, 2, 1),
            nn.BatchNorm2d(128),
        )

        self.cnn_layer4 = nn.Sequential(
            nn.Conv2d(128, 128, 3, 1, 1),
            nn.BatchNorm2d(128),
        )
        self.cnn_layer5 = nn.Sequential(
            nn.Conv2d(128, 256, 3, 2, 1),
            nn.BatchNorm2d(256),
            )
        self.cnn_layer6 = nn.Sequential(
            nn.Conv2d(256, 256, 3, 1, 1),
            nn.BatchNorm2d(256),
        )
        self.fc_layer = nn.Sequential(
            nn.Linear(256* 32* 32, 256),
            nn.ReLU(),
            nn.Linear(256, 11)
        )
        self.relu = nn.ReLU()

def forward(self, x):
    # input (x): [batch_size, 3, 128, 128]
    # output: [batch_size, 11]

    # Extract features by convolutional layers.
    x1 = self.cnn_layer1(x)
    x1 = self.relu(x1)

    x2 = self.cnn_layer2(x1)
    x2 = self.relu(x2)
    
    # Residual connection: x2 + x1
    x2 = x2 + x1  

    x3 = self.cnn_layer3(x2)
    x3 = self.relu(x3)
    
    # Residual connection: x3 + x2
    x3 = x3 + x2  

    x4 = self.cnn_layer4(x3)
    x4 = self.relu(x4)
    
    # Residual connection: x4 + x3
    x4 = x4 + x3  

    x5 = self.cnn_layer5(x4)
    x5 = self.relu(x5)
    
    # Residual connection: x5 + x4
    x5 = x5 + x4  

    x6 = self.cnn_layer6(x5)
    x6 = self.relu(x6)
    
    # Residual connection: x6 + x5
    x6 = x6 + x5  

    # The extracted feature map must be flatten before going to fully-connected layers.
    xout = x6.flatten(1)  

    # The features are transformed by fully-connected layers to obtain the final logits.
    xout = self.fc_layer(xout)
    return xout

Reference

自动数据增强方法(附代码)_数据增强代码-CSDN博客

李宏毅2023机器学习作业HW03解析和代码分享

torchvision.transforms 常用方法解析(含图例代码以及参数解释)-CSDN博客

Test Time Augmentation

This post is licensed under CC BY 4.0 by the author.