李宏毅-ML2022-HW2-Phoneme Classification

Posted Feb 16, 2025 Updated Apr 4, 2025

By Kaige Zhang

2 min read

Objectives

Data Preprocessing：从waveform中抽取MFCC特征；
Classification：使用预提取的MFCC特征进行phoneme分类；

Task Introduction

Multiclass Classification：预测speech中每个phoneme所属的类别。

思路&Code

双过Boss Baseline

Report Questions

1. (2%) Implement 2 models with approximately the same number of parameters, (A) one narrower and deeper (e.g. hidden_layers=6, hidden_dim=1024) and (B) the other wider and shallower (e.g. hidden_layers=2, hidden_dim=1700). Report training/validation accuracies for both models.

实现两个参数量大致相同的模型，(A) 一个深窄的（例如，隐藏层数=6，隐藏维度=1024），(B) 一个浅宽的（例如，隐藏层数=2，隐藏维度=1750）。报告两个模型的训练/验证准确率。

计算神经网络的参数量：

以全连接层为例：

假设输入神经元数为M，输出神经元数为N，则

（1）bias为True时：

则参数数量为：M*N + N（bias的数量与输出神经元数的数量是一样的）

（2）bias为False时：

则参数数量为：M×N

使用Pytorch直接计算模型的参数量：

  
model = Classifier(input_dim=input_dim, hidden_layers=hidden_layers, hidden_dim=hidden_dim)
total_params = sum(param.numel() for param in model.parameters())
print(f'Total params: {total_params}')