https://blog.csdn.net/kejizuiqianfang/article/details/102454278
https://blog.csdn.net/hu378910532/article/details/102860618
参照上面第一条链接:出现:
Single-Process Multi-GPU is not the recommended mode for DDP.
In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process.
The overhead of scatter/gather and GIL contention in every forward pass can slow down training.
Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES.
"Single-Process Multi-GPU is not the recommended mode for "
另外,还有就是batch_size, lr, 梯度累积的间隔数的问题,有待进一步确认
网友评论