Image Classification
Objective
图像分类
- 用卷积神经网络解决图像分类问题。
- 通过数据增强来提高性能。
- 了解流行的图像模型技术,如残差。
Task Introduction
食物分类
- 图像是从food-11数据集中收集的,分为11类。
- Training set: 9866 labeled images
- Validation set: 3430 labeled images
- Testing set: 3347 images
思路
Sample Baseline
Score: 0.63047 Private score: 0.61416 (n_epochs = 10)
直接跑一边Sample Code提交结果,如果没到Sample Baseline,就多train几个epoch。从训练数据来看,大约训练5个epoch就可以到Sample Baseline。
Medium Baseline
Score: 0.77788 Private score: 0.76056
根据助教提示,进行Data Augmentation,然后训练更多epoch。实作中从易到难尝试了三种Data Augmentation。
由于torchvision.transforms.v2
比torchvision.transforms
有更丰富的功能且速度更快,因此使用torchvision.transforms.v2
实现Data Augmentation。
ref: https://pytorch.org/vision/0.21/transforms.html
1
import torchvision.transforms.v2 as transforms
- 使用基础API
1
2
3
4
5
6
7
8
9
10
11
12
13
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(128), # 随机裁剪 & 缩放到 224x224
transforms.RandomHorizontalFlip(p=0.5), # 50% 概率水平翻转
transforms.RandomRotation(degrees=15), # 旋转 ±15°
transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1), # 颜色抖动
transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)), # 10% 随机平移
transforms.RandomAdjustSharpness(sharpness_factor=2, p=0.5), # 随机锐化
transforms.RandomPosterize(bits=4, p=0.5), # 颜色减少
transforms.RandomPerspective(distortion_scale=0.2, p=0.5), # 随机透视变换
transforms.ToImage(), # 转换为 Tensor
transforms.ToDtype(torch.float32, scale=True),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # 归一化
])
- Auto-Augmentation
我们并不知道当前数据适合使用哪些Data Augmentation方法,因此借助Auto-Augmentation为数据集自动搜索合适的Augmentation方法。实作中使用TrivialAugmentWide()
,其他方法也可以选用。
ref: https://pytorch.org/vision/0.21/transforms.html#auto-augmentation
1
2
3
4
5
6
7
8
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(128, antialias=True),
transforms.RandomHorizontalFlip(0.5),
transforms.TrivialAugmentWide(),
transforms.PILToTensor(),
transforms.ConvertImageDtype(torch.float),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
- Auto-Augmentation + CutMix & MixUp Transform
在TrivialAugmentWide()
基础上增加CutMix & MixUp Transform方法。 ref:https://pytorch.org/vision/main/auto_examples/transforms/plot_cutmix_mixup.html#sphx-glr-auto-examples-transforms-plot-cutmix-mixup-py
- 定义
def collate_fn(batch)
函数;
1
2
3
4
5
6
7
8
9
from torch.utils.data import default_collate
NUM_CLASSES = 11
cutmix = transforms.CutMix(num_classes= NUM_CLASSES)
mixup = transforms.MixUp(num_classes= NUM_CLASSES)
cutmix_or_mixup = transforms.RandomChoice([cutmix, mixup])
def collate_fn(batch):
return cutmix_or_mixup(*default_collate(batch))
train_loader
新增collate_fn=collate_fn
参数;
1
2
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True, drop_last=True, collate_fn=collate_fn) # CutMix and MixUp Transform
总结
Auto-Augmentation + CutMix & MixUp Transform速度最快,效果最佳。如👇图所示,Private Score和Public Score较接近,模型没有发生过拟合,泛化性好。总共训练370个Epoch即可超过Medium Baseline(Private Score : 0.71361, Public Score : 0.73207)
基础API和Auto-Augmentation方法需要训练约1000个Epoch才能超过Medium Bseline,且模型发生过拟合,Score也较低。
Strong Baseline
Score: 0.87948 Private score: 0.87323
加载Medium Baseline中Auto-Augmentation + CutMix & MixUp Transform方法预训练的模型,然后在‘Testing and generate prediction CSV’阶段使用Test Time Augmentation(TTA)方法的version_2版本生成预测结果,就超过了Strong Baseline,并且很接近Boss Baseline,令人十分意外❗
模型预测阶段使用好的预测方法也可以极大的提高测试集精度,不一定要训练更强更复杂的模型。
Boss Baseline
Score: 0.88745 Private score: 0.87878
- 在Medium基础上使用ResNet50模型+dropout+batchnorm训练约500个epoch。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class ResNet50(nn.Module):
def __init__(self, num_classes=11):
super(ResNet50, self).__init__()
# 加载 ResNet50 模型(去掉最后的 fc 层)
resnet = resnet50(weights=None) # 使用 weights=ResNet50_Weights.IMAGENET1K_V1 加载预训练权重
self.cnn = nn.Sequential(*list(resnet.children())[:-1]) # 去掉 fc 层
# 获取原始 fc 层的输入特征数
num_features = resnet.fc.in_features
# 定义新的 fc 层
self.fc = nn.Sequential(
nn.Linear(num_features, 1024),
nn.BatchNorm1d(1024), # Batch Normalization
nn.ReLU(),
nn.Dropout(0.5), # Dropout
nn.Linear(1024, 512),
nn.BatchNorm1d(512), # Batch Normalization
nn.ReLU(),
nn.Dropout(0.5), # Dropout
nn.Linear(512, num_classes) # 输出层
)
def forward(self, x):
# 提取特征
out = self.cnn(x)
# 展平特征
out = torch.flatten(out, 1) # 使用 torch.flatten 代替 view
# 分类
out = self.fc(out)
return out
- 挑选三份最好的结果作ensemble,实作中选择投票法。
Code
Report Questions
Q1. Augmentation Implementation (2%)
1
2
3
4
5
6
7
8
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(224, antialias=True), # 128
transforms.RandomHorizontalFlip(0.5),
transforms.TrivialAugmentWide(),
transforms.PILToTensor(),
transforms.ConvertImageDtype(torch.float),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
Q2. Residual Connection Implementation (2%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
class Residual_Network(nn.Module):
def __init__(self):
super(Residual_Network, self).__init__()
self.cnn_layer1 = nn.Sequential(
nn.Conv2d(3, 64, 3, 1, 1),
nn.BatchNorm2d(64),
)
self.cnn_layer2 = nn.Sequential(
nn.Conv2d(64, 64, 3, 1, 1),
nn.BatchNorm2d(64),
)
self.cnn_layer3 = nn.Sequential(
nn.Conv2d(64, 128, 3, 2, 1),
nn.BatchNorm2d(128),
)
self.cnn_layer4 = nn.Sequential(
nn.Conv2d(128, 128, 3, 1, 1),
nn.BatchNorm2d(128),
)
self.cnn_layer5 = nn.Sequential(
nn.Conv2d(128, 256, 3, 2, 1),
nn.BatchNorm2d(256),
)
self.cnn_layer6 = nn.Sequential(
nn.Conv2d(256, 256, 3, 1, 1),
nn.BatchNorm2d(256),
)
self.fc_layer = nn.Sequential(
nn.Linear(256* 32* 32, 256),
nn.ReLU(),
nn.Linear(256, 11)
)
self.relu = nn.ReLU()
def forward(self, x):
# input (x): [batch_size, 3, 128, 128]
# output: [batch_size, 11]
# Extract features by convolutional layers.
x1 = self.cnn_layer1(x)
x1 = self.relu(x1)
x2 = self.cnn_layer2(x1)
x2 = self.relu(x2)
# Residual connection: x2 + x1
x2 = x2 + x1
x3 = self.cnn_layer3(x2)
x3 = self.relu(x3)
# Residual connection: x3 + x2
x3 = x3 + x2
x4 = self.cnn_layer4(x3)
x4 = self.relu(x4)
# Residual connection: x4 + x3
x4 = x4 + x3
x5 = self.cnn_layer5(x4)
x5 = self.relu(x5)
# Residual connection: x5 + x4
x5 = x5 + x4
x6 = self.cnn_layer6(x5)
x6 = self.relu(x6)
# Residual connection: x6 + x5
x6 = x6 + x5
# The extracted feature map must be flatten before going to fully-connected layers.
xout = x6.flatten(1)
# The features are transformed by fully-connected layers to obtain the final logits.
xout = self.fc_layer(xout)
return xout