原文: https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
作者: Sung Kim Jenny Kang
譯者: bat67
校驗(yàn)者: FontTian 片刻 yearing1017
在這個(gè)教程里,我們將學(xué)習(xí)如何使用數(shù)據(jù)并行(DataParallel
)來(lái)使用多GPU。
PyTorch非常容易的就可以使用GPU,可以用如下方式把一個(gè)模型放到GPU上:
device = torch.device("cuda: 0")
model.to(device)
然后可以復(fù)制所有的張量到GPU上:
mytensor = my_tensor.to(device)
請(qǐng)注意,調(diào)用my_tensor.to(device)
返回一個(gè)GPU上的my_tensor
副本,而不是重寫my_tensor
。你需要把它賦值給一個(gè)新的張量并在GPU上使用這個(gè)張量。
在多GPU上執(zhí)行正向和反向傳播是自然而然的事。然而,PyTorch 默認(rèn)將只是用一個(gè)GPU。你可以使用DataParallel
讓模型并行運(yùn)行來(lái)輕易的在多個(gè)GPU上運(yùn)行你的操作。
model = nn.DataParallel(model)
這是這篇教程背后的核心,我們接下來(lái)將更詳細(xì)的介紹它。
導(dǎo)入 PyTorch 模塊和定義參數(shù)。
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
## Parameters 和 DataLoaders
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
設(shè)備( Device ):
device = torch.device("cuda: 0" if torch.cuda.is_available() else "cpu")
要制作一個(gè)虛擬(隨機(jī))數(shù)據(jù)集,你只需實(shí)現(xiàn)__getitem__
。
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)
作為演示,我們的模型只接受一個(gè)輸入,執(zhí)行一個(gè)線性操作,然后得到結(jié)果。然而,你能在任何模型(CNN,RNN,Capsule Net等)上使用DataParallel
。
我們?cè)谀P蛢?nèi)部放置了一條打印語(yǔ)句來(lái)檢測(cè)輸入和輸出向量的大小。請(qǐng)注意批等級(jí)為0時(shí)打印的內(nèi)容。
class Model(nn.Module):
# Our model
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tIn Model: input size", input.size(),
"output size", output.size())
return output
這是本教程的核心部分。首先,我們需要?jiǎng)?chuàng)建一個(gè)模型實(shí)例和檢測(cè)我們是否有多個(gè)GPU。如果我們有多個(gè)GPU,我們使用nn.DataParallel
來(lái)包裝我們的模型。然后通過(guò)model.to(device)
把模型放到GPU上。
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = nn.DataParallel(model)
model.to(device)
輸出:
Let's use 2 GPUs!
現(xiàn)在我們可以看輸入和輸出張量的大小。
for data in rand_loader:
input = data.to(device)
output = model(input)
print("Outside: input size", input.size(),
"output_size", output.size())
輸出:
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
如果沒有GPU或只有1個(gè)GPU,當(dāng)我們對(duì)30個(gè)輸入和輸出進(jìn)行批處理時(shí),我們和期望的一樣得到30個(gè)輸入和30個(gè)輸出,但是若有多個(gè)GPU,會(huì)得到如下的結(jié)果。
若有2個(gè)GPU,將看到:
Let's use 2 GPUs!
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
若有3個(gè)GPU,將看到:
Let's use 3 GPUs!
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
若有8個(gè)GPU,將看到:
Let's use 8 GPUs!
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])
DataParallel
自動(dòng)的劃分?jǐn)?shù)據(jù),并將作業(yè)順序發(fā)送到多個(gè)GPU上的多個(gè)模型。DataParallel
會(huì)在每個(gè)模型完成作業(yè)后,收集與合并結(jié)果然后返回給你。
更多信息,請(qǐng)參考: https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html
更多建議: