GPU的使用

1. 数据迁移至GPU

image-20200817160011137

to函数:转换数据类型/设备

  1. tensor.to(*args, **kwargs)

  2. module.to(*args, **kwargs)

区别:张量不执行inplace,模型执行inplace

1
2
3
4
5
6
7
8
9
10
11
x = torch.ones((3, 3))
x = x.to(torch.float64) #转换类型;张量不执行inplace,需要用等号赋值

x = torch.ones((3, 3))
x = x.to("cuda") # 转换设备;张量不执行inplace,需要用等号赋值

linear = nn.Linear(2, 2)
linear.to(torch.double)

gpu1 = torch.device("cuda")
linear.to(gpu1) # 模型执行inplace

实验:

  1. tensor to cuda

    1
    2
    3
    4
    5
    6
    7
    8
    import torch
    import torch.nn as nn

    x_cpu = torch.ones((3, 3))
    print("x_cpu:\ndevice: {} is_cuda: {} id: {}".format(x_cpu.device, x_cpu.is_cuda, id(x_cpu)))

    x_gpu = x_cpu.to('cuda')
    print("x_gpu:\ndevice: {} is_cuda: {} id: {}".format(x_gpu.device, x_gpu.is_cuda, id(x_gpu)))

    输出

    1
    2
    3
    4
    x_cpu:
    device: cpu is_cuda: False id: 2141032222976
    x_gpu:
    device: cuda:0 is_cuda: True id: 2138929688384

    转换前后id不同,说明不是inplace操作,划分了新的内存地址

  2. module to cuda

    1
    2
    3
    4
    5
    6
    7
    8
    9
    import torch
    import torch.nn as nn

    net = nn.Sequential(nn.Linear(3, 3))

    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

    net.to('cuda')
    print("\nid:{} is_cuda: {}".format(id(net), next(net.parameters()).is_cuda))

    输出

    1
    2
    3
    id:2455688847568 is_cuda: False

    id:2455688847568 is_cuda: True

    模型转换是inplace操作

  3. forward in cuda

    1
    2
    3
    4
    5
    6
    7
    8
    9
    import torch
    import torch.nn as nn

    x_cpu = torch.ones((3, 3))
    x_gpu = x_cpu.to('cuda')
    net = nn.Sequential(nn.Linear(3, 3))

    output = net(x_gpu)
    print("output is_cuda: {}".format(output.is_cuda))

    输出

    1
    output is_cuda: True

2. torch.cuda常用方法

  1. torch.cuda.device_count():计算当前可见可用gpu数

  2. torch.cuda.get_device_name():获取gpu名称

  3. torch.cuda.is_available():返回True则说明cuda可用(安装的是cuda版的pytorch)

  4. torch.cuda.manual_seed():为当前gpu设置随机种子

  5. torch.cuda.manual_seed_all():为所有可见可用gpu设置随机种子

  6. torch.cuda.set_device():设置主gpu为哪一个物理gpu(不推荐)

推荐: os.environ.setdefault(“CUDA_VISIBLE_DEVICES”, “2, 3”)

image-20200817175036479

通过设置环境将逻辑gpu映射为物理gpu;os.environ.setdefault(“CUDA_VISIBLE_DEVICES”, “2, 3”)这句话的意思就是设置逻辑gpu0、逻辑gpu1为物理gpu2,物理gpu3。

3. 多GPU并行运算

1
2
3
4
torch.nn.DataParallel(module, 
device_ids=None,
output_device=None,
dim=0)

功能:包装模型,实现分发并行机制

主要参数:

  • module: 需要包装分发的模型
  • device_ids: 可分发的gpu,默认分发到所有可见可用gpu
  • output_device: 结果输出设备

实验:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import os
import numpy as np
import torch
import torch.nn as nn

# ============================ 手动选择gpu
gpu_list = [0]
gpu_list_str = ','.join(map(str, gpu_list))
os.environ.setdefault("CUDA_VISIBLE_DEVICES", gpu_list_str)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# ============================ 模型
class FooNet(nn.Module):
def __init__(self, neural_num, layers=3):
super(FooNet, self).__init__()
self.linears = nn.ModuleList([nn.Linear(neural_num, neural_num, bias=False) for i in range(layers)])

def forward(self, x):

print("\nbatch size in forward: {}".format(x.size()[0]))

for (i, linear) in enumerate(self.linears):
x = linear(x)
x = torch.relu(x)
return x

# ============================ 训练
batch_size = 16

# data
inputs = torch.randn(batch_size, 3)
labels = torch.randn(batch_size, 3)

inputs, labels = inputs.to(device), labels.to(device)

# model
net = FooNet(neural_num=3, layers=3)
net = nn.DataParallel(net)
net.to(device)

# training
for epoch in range(1):

outputs = net(inputs)

print("model outputs.size: {}".format(outputs.size()))

print("CUDA_VISIBLE_DEVICES :{}".format(os.environ["CUDA_VISIBLE_DEVICES"]))
print("device_count :{}".format(torch.cuda.device_count()))

输出:

1
2
3
4
batch size in forward: 16
model outputs.size: torch.Size([16, 3])
CUDA_VISIBLE_DEVICES :0
device_count :1

我的电脑只有一个GPU,因此batch size in forward = 16

如果我们在选择GPU时设置为gpu_list = [1],将逻辑GPU0映射为物理GPU1,那么运行代码后的结果如下所示:

1
2
3
4
batch size in forward: 16
model outputs.size: torch.Size([16, 3])
CUDA_VISIBLE_DEVICES :1
device_count :0

显然我们是没有物理GPU1的,因此device_count=0

下面是一台有四个GPU(选择了2、3号GPU)的电脑运行上面程序的结果:

1
2
3
4
5
batch size in forward: 8
batch size in forward: 8
model outputs.size: torch.Size([16, 3])
CUDA_VISIBLE_DEVICES :2,3
device_count :2

4. GPU加载常见报错

报错1:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU -only machine, please use torch.load with map_location=torch.device(‘cpu’) to map your storages to the CPU.

这个错误是因为保存模型是保存在了cuda上,而加载的电脑并不能使用cuda时,就会有这个错误。

解决:

1
torch.load(path_state_dict, map_location="cpu")

报错2:

RuntimeError: Error(s) in loading state_dict for FooNet:Missing key(s) in state_dict: “linears.0.weight”, “linears.1.weight”, “linears.2.weight”. Unexpected key(s) in state_dict: “module.linears.0.weight”, “module.linears.1.weight”, “module.linears.2.weight”.

这个错误是因为当使用了torch.nn.DataParallel进行多GPU运算时,这个函数会把state_dict的键的名称前面加上一个module.的前缀(共7个字符)

解决:

1
2
3
4
5
6
7
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict_load.items():
namekey = k[7:] if k.startswith('module.') else k
new_state_dict[namekey] = v

net.load_state_dict(new_state_dict)