torch.optim :
- torch.optim是实现各种优化算法的包
- 要使用torch.optim,必须构造一个optimizer对象。这个对象能保存当前的参数状态并且基于计算梯度更新参数
- 导入优化器+构造优化器对象:
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3)
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)
- 要构造一个Optimizer,必须给它一个包含参数(必须都是Variable对象)进行优化。然后,可以指定optimizer的参 数选项,比如学习率,权重衰减等。
- 一般训练中,可以指定哪些参数不参与优化:
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'gamma', 'beta']#将权重衰减应用于除了'偏移''gamma''beta'这些之外的所有参数
optimizer_grouped_parameters = [
{'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.01},
{'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
'weight_decay_rate': 0.0}]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
{'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
{'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
- 定义好优化器之后,训练过程中,梯度被计算之后,调用优化函数更新就可以更新网络的参数了:
for input, target in dataset:
optimizer.zero_grad()||梯度清零
output = model(input)
loss = loss_fn(output, target)||计算loss
loss.backward() ||计算梯度
optimizer.step()|| 参数更新
2. 保存模型:
torch.save(the_model.state_dict(), PATH)
output_dir = './models/'
output_model_file = os.path.join(output_dir, WEIGHTS_NAME)
output_config_file = os.path.join(output_dir, CONFIG_NAME)
torch.save(model.state_dict(), output_model_file)
model.config.to_json_file(output_config_file)
model =model_name.from_pretrained(output_dir).to(device)
torch.save(the_model, PATH)
the_model = torch.load(PATH)
参考:
https://ptorch.com/docs/1/optim#how-to-use-an-optimizer
网友评论