李宏毅-ML2022-HW10-Attack
Task Description
本次作业的主题是“Adversarial Attack”,即对抗攻击,是一种通过对输入数据添加精心设计的微小扰动,使机器学习模型产生错误预测的技术。这些扰动对人类几乎不可察觉,但能显著改变模型输出。
Prerequisite
对抗攻击按照攻击目标可分为Targeted attack(有目标攻击)和Non-targeted attack(无目标攻击),作业实作Non-targeted attack。
- Targeted attack:误导模型输出特定错误类别;
- Non-targeted attack:仅需使模型输出错误;
扰动必须限制在人类不可感知的范围内,即原图像$x^0$与扰动后图像$x$之间的距离$d(x^0, x) \le \epsilon$,使用L-infinity计算$d(x^0, x)$,即: \(d(x^0, x) = \|\Delta x\|_\infty = \text max \{ |\Delta x_1|, |\Delta x_2|, |\Delta x_3|,...\}\) 其中:$x - x^0 = \Delta x$。
作业使用的攻击算法为Fast Gradient Sign Method (FGSM)或Iterative FGSM(I-FGSM ),详情参见上课录影。
根据攻击模式可分为White Box Attack(白盒攻击)和Black Box Attack(黑盒攻击),作业使用黑盒攻击。
- 白盒攻击:攻击者知道目标模型参数,直接求梯度生成attacked objects;
- 黑盒攻击:攻击者对模型没有了解,训练一个与目标模型相似的代理模型,成功攻击代理模型的attacked objects也许在目标模型上也可以成功。
Data Format
Images:
- CIFAR-10 images
- (32 * 32 RGB images) * 200
- airplane/airplane1.png, …, airplane/airplane20.png
- …
- truck/truck1.png, …, truck/truck20.png
- 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)
- 20 images for each class
Methodology
选择任意代理模型攻击JudgeBoi上的黑盒模型;
实现non-targeted adversarial attack method:
a. FGSM
b. I-FGSM
c. MI-FGSM
通过多样化输入(Diverse input, DIM)增加攻击的transferability;
Ensemble attack:攻击多个代理模型。
思路
Simple baseline (acc <= 0.70)
运行Sample Code。
fgsm_acc = 0.59000, fgsm_loss = 2.49187
Medium baseline (acc <= 0.50)
根据助教提示,使用Ensemble Attack方法,攻击算法为IFGSM。
精度明显降低:
ifgsm_ensemble_acc = 0.00000, ifgsm_ensemble_loss = 2.45724
- 随机选择几个预训练模型:
1
2
3
4
5
model_names = [
'nin_cifar10',
'resnet20_cifar10',
'preresnet20_cifar10'
]
- 补全
class ensembleNet
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from pytorchcv.model_provider import get_model as ptcv_get_model
class ensembleNet(nn.Module):
def __init__(self, model_names):
super().__init__()
self.models = nn.ModuleList([ptcv_get_model(name, pretrained=True) for name in model_names])
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
ensemble_logits = None
for i, m in enumerate(self.models):
# TODO: sum up logits from multiple models
# return ensemble_logits
logits = m(x)
if ensemble_logits is None:
ensemble_logits = logits
else:
ensemble_logits += logits
ensemble_logits /= len(self.models)
return self.softmax(ensemble_logits)
- 实现’Ensemble attack with IFGSM’;
1
2
3
4
adv_examples, ifgsm_acc, ifgsm_loss = gen_adv_examples(ensemble_model, adv_loader, ifgsm, loss_fn)
print(f'ifgsm_ensemble_acc = {ifgsm_acc:.5f}, ifgsm_ensemble_loss = {ifgsm_loss:.5f}')
create_dir(root, 'ifgsm_ensemble', adv_examples, adv_names)
Strong baseline (acc <= 0.30)
助教提供了两个思路:
- Ensemble Attack + paper B (pick right models) + IFGSM
- Ensemble Attack + many models + MIFGSM
Ensemble Attack与Medium Baseline相同。
思路1’paper B (pick right models)’使用了论文 Query-Free Adversarial Transfer via Undertrained Surrogates的思想,旨在挑选更effective的单个代理模型,而不是随机的一组代理模型。论文定义了Undertrained Models, 它包含两个condition:
- 具有更高验证集损失;
- 训练的step或epoch更少;
可产生更强的可转移的对抗性攻击。
实作中采用了思路2的方法。
mifgsm_acc = 0.00000, mifgsm_loss = 2.36389
- 增加更多代理模型;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
model_names = [
'nin_cifar10',
'resnet20_cifar10',
'resnet56_cifar10',
'preresnet20_cifar10',
'preresnet56_cifar10',
'seresnet20_cifar10',
'seresnet56_cifar10',
'sepreresnet20_cifar10',
'sepreresnet56_cifar10',
'wrn16_10_cifar10',
'wrn20_10_1bit_cifar10',
'rir_cifar10',
'diaresnet20_cifar10',
'diapreresnet20_cifar10',
'densenet40_k12_cifar10',
]
- 实现MIFGSM算法
MIFGSM(Momentum Iterative Fast Gradient Sign Method)对抗攻击算法是IFGSM引入动量的优化版本,使得攻击更稳定、迁移性更强。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, mu=1.0):
x_adv = x
# initialze momentum tensor
momentum = torch.zeros_like(x).detach().to(device)
# write a loop of num_iter to represent the iterative times
for i in range(num_iter):
x_adv = x_adv.detach().clone()
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
loss = loss_fn(model(x_adv), y) # calculate loss
loss.backward() # calculate gradient
# TODO: Momentum calculation
grad = x_adv.grad.detach()
# 梯度归一化(L1范数)
grad_norm = torch.mean(torch.abs(grad), dim=(1, 2, 3), keepdim=True)
normalized_grad = grad / (grad_norm + 1e-8) # 避免除零
grad = mu * momentum + normalized_grad
momentum = grad
x_adv = x_adv + alpha * grad.sign()
x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
return x_adv
Boss baseline (acc <= 0.15)
根据助教提示,实现DIM-MIFGSM算法。
dim_mifgsm_acc = 0.00000, dim_mifgsm_loss = 2.34545
DIM(Diverse Input Method)在攻击过程中对输入图像进行随机resize + padding的处理,提高攻击的迁移性。
我们实现DIM机制,并在Strong Baseline基础上修改MIFGSM算法。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# DIM + MI-FGSM
import random
import torchvision.transforms.functional as TF
def input_diversity(x, resize_rate=0.9, diversity_prob=0.5):
"""
对输入图像进行随机resize + padding(DIM核心步骤)
"""
if random.random() < diversity_prob:
img_size = x.shape[-1]
new_size = int(img_size * resize_rate)
rescaled = TF.resize(x, [new_size, new_size])
pad_top = random.randint(0, img_size - new_size)
pad_bottom = img_size - new_size - pad_top
pad_left = random.randint(0, img_size - new_size)
pad_right = img_size - new_size - pad_left
padded = TF.pad(rescaled,
[pad_left, pad_top, pad_right, pad_bottom],
fill=0)
return padded
else:
return x
def dim_mifgsm(model, x, y, loss_fn, epsilon=epsilon, alpha=alpha, num_iter=50, mu=1.0, diversity_prob=0.7):
x_adv = x
# initialze momentum tensor
momentum = torch.zeros_like(x).detach().to(device)
# write a loop of num_iter to represent the iterative times
for i in range(num_iter):
x_adv = x_adv.detach().clone()
x_adv.requires_grad = True # need to obtain gradient of x_adv, thus set required grad
diversified_x = input_diversity(x_adv, diversity_prob=diversity_prob)
loss = loss_fn(model(diversified_x), y) # calculate loss
loss.backward() # calculate gradient
# TODO: Momentum calculation
grad = x_adv.grad.detach()
# 梯度归一化(L1范数)
grad_norm = torch.mean(torch.abs(grad), dim=(1, 2, 3), keepdim=True)
normalized_grad = grad / (grad_norm + 1e-8) # 避免除零
grad = mu * momentum + normalized_grad
momentum = grad
x_adv = x_adv + alpha * grad.sign()
x_adv = torch.max(torch.min(x_adv, x+epsilon), x-epsilon) # clip new x_adv back to [x-epsilon, x+epsilon]
return x_adv
Code
Report
Part 1: Attack
根據你最好的實驗結果,簡述你是如何產生transferable noises, Judge Boi上Accuracy降到多少?
结果需提交到Judge Boi才能看到Accuracy,非台大的学生不能提交。
理论上讲,最好的实验结果应该是DIM-MIFGSM算法。代理模型选择借鉴了论文Query-Free Adversarial Transfer via Undertrained Surrogates的实验,主要采用ResNet,SENet,DenseNet类型的预训练模型,也包括了其他类型的模型,详情如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
model_names = [
'nin_cifar10',
'resnet20_cifar10',
'resnet56_cifar10',
'preresnet20_cifar10',
'preresnet56_cifar10',
'seresnet20_cifar10',
'seresnet56_cifar10',
'sepreresnet20_cifar10',
'sepreresnet56_cifar10',
'wrn16_10_cifar10',
'wrn20_10_1bit_cifar10',
'rir_cifar10',
'diaresnet20_cifar10',
'diapreresnet20_cifar10',
'densenet40_k12_cifar10',
]
Part 2: Defense
當source model為resnet110_cifar10(from Pytorchcv), 使用最原始的fgsm 攻擊在dog2.png的圖片。
請問被攻擊後的預測的class是錯誤的嗎?(1pt) 有的話:變成哪個class? 沒有的話:則不用作答
是错误的,变成cat。
實作jpeg compression (compression rate=70%) 前處理圖片, 請問 prediction class是錯誤的嗎?同第一題作答 (1pt)
prediction class是正确的,依然为dog。
Jpeg compression為什麼可以抵擋adversarial attack, 讓模型維持高正確率? (1pt)
- 圖片壓縮時讓色彩更鮮豔
- 圖片壓縮時把雜訊減少
- 圖片壓縮讓圖片品質下降
- 圖片壓縮時雜訊反而變大