一级特黄AA大片欧美视频,97久久久国产精品爽,我乳房发育正常吗如图

原文：PyTorth torch.nn

參數(shù)

class torch.nn.Parameter?

一種被視為模塊參數(shù)的 Tensor。

參數(shù)是 Tensor 子類，當(dāng)與 Module 一起使用時(shí)，具有非常特殊的屬性-將它們分配為模塊屬性時(shí)，它們會(huì)自動(dòng)添加到其列表中參數(shù)，并會(huì)出現(xiàn)，例如在 parameters() 迭代器中。分配張量不會(huì)產(chǎn)生這種效果。這是因?yàn)榭赡芤谀Ｐ椭芯彺嬉恍┡R時(shí)狀態(tài)，例如 RNN 的最后一個(gè)隱藏狀態(tài)。如果不存在 Parameter 這樣的類，這些臨時(shí)人員也將被注冊(cè)。

參數(shù)

數(shù)據(jù) (tensor)–參數(shù)張量。
require_grad (布爾 ， 可選）–如果參數(shù)需要漸變。有關(guān)更多詳細(xì)信息，請(qǐng)參見從后向中排除子圖。默認(rèn)值： <cite>True</cite>

貨柜

模組

class torch.nn.Module?

所有神經(jīng)網(wǎng)絡(luò)模塊的基類。

您的模型也應(yīng)該繼承此類。

模塊也可以包含其他模塊，從而可以將它們嵌套在樹形結(jié)構(gòu)中。您可以將子模塊分配為常規(guī)屬性：

import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

以這種方式分配的子模塊將被注冊(cè)，并且當(dāng)您調(diào)用 to() 等時(shí)，也會(huì)轉(zhuǎn)換其參數(shù)。

add_module(name, module)?

將子模塊添加到當(dāng)前模塊。

可以使用給定名稱將模塊作為屬性訪問。

Parameters

名稱(字符串）–子模塊的名稱。可以使用給定名稱從該模塊訪問子模塊
模塊 (模塊)–要添加到該模塊的子模塊。

apply(fn)?

將fn遞歸應(yīng)用于每個(gè)子模塊(由.children()返回）以及自身。典型的用法包括初始化模型的參數(shù)(另請(qǐng)參見 torch.nn.init)。

Parameters

fn (Module ->無）–應(yīng)用于每個(gè)子模塊的功能

退貨

自

返回類型

模塊

例：

>>> def init_weights(m):
>>>     print(m)
>>>     if type(m) == nn.Linear:
>>>         m.weight.data.fill_(1.0)
>>>         print(m.weight)
>>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
>>> net.apply(init_weights)
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Linear(in_features=2, out_features=2, bias=True)
Parameter containing:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)

buffers(recurse=True)?

返回模塊緩沖區(qū)上的迭代器。

Parameters

遞歸 (bool )–如果為 True，則產(chǎn)生此模塊和所有子模塊的緩沖區(qū)。否則，僅產(chǎn)生作為該模塊直接成員的緩沖區(qū)。

Yields

torch張緊器 –模塊緩沖區(qū)

Example:

>>> for buf in model.buffers():
>>>     print(type(buf.data), buf.size())
<class 'torch.FloatTensor'> (20L,)
<class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)

children()?

返回直接子代模塊上的迭代器。

Yields

模塊 –子模塊

cpu()?

將所有模型參數(shù)和緩沖區(qū)移至 CPU。

Returns

self

Return type

Module

cuda(device=None)?

將所有模型參數(shù)和緩沖區(qū)移至 GPU。

這也使相關(guān)的參數(shù)并緩沖不同的對(duì)象。因此，在構(gòu)建優(yōu)化程序之前，如果模塊在優(yōu)化過程中可以在 GPU 上運(yùn)行，則應(yīng)調(diào)用它。

Parameters

設(shè)備 (python：int ， 可選）–如果指定，則所有參數(shù)都將復(fù)制到該設(shè)備

Returns

self

Return type

Module

double()?

將所有浮點(diǎn)參數(shù)和緩沖區(qū)強(qiáng)制轉(zhuǎn)換為double數(shù)據(jù)類型。

Returns

self

Return type

Module

dump_patches = False?

這為 load_state_dict() 提供了更好的 BC 支持。在 state_dict() 中，版本號(hào)將保存為返回狀態(tài) dict 的屬性 <cite>_metadata</cite> 中，因此會(huì)被腌制。 <cite>_metadata</cite> 是字典，其鍵遵循狀態(tài) dict 的命名約定。有關(guān)如何在加載中使用此信息的信息，請(qǐng)參見_load_from_state_dict。

如果從模塊添加/刪除了新的參數(shù)/緩沖區(qū)，則該數(shù)字將增加，并且模塊的 <cite>_load_from_state_dict</cite> 方法可以比較版本號(hào)，并且如果狀態(tài) dict 來自更改之前，則可以進(jìn)行適當(dāng)?shù)母摹?/p>

eval()?

將模塊設(shè)置為評(píng)估模式。

這僅對(duì)某些模塊有影響。請(qǐng)參閱特定模塊的文檔，以了解其在訓(xùn)練/評(píng)估模式下的行為的詳細(xì)信息(如果受到影響），例如 Dropout ，BatchNorm等

這等效于 self.train(False) 。

Returns

self

Return type

Module

extra_repr()?

設(shè)置模塊的額外表示形式

要打印自定義的額外信息，您應(yīng)該在自己的模塊中重新實(shí)現(xiàn)此方法。單行和多行字符串都是可以接受的。

float()?

將所有浮點(diǎn)參數(shù)和緩沖區(qū)強(qiáng)制轉(zhuǎn)換為 float 數(shù)據(jù)類型。

Returns

self

Return type

Module

forward(*input)?

定義每次調(diào)用時(shí)執(zhí)行的計(jì)算。

應(yīng)該被所有子類覆蓋。

注意

盡管需要在此函數(shù)中定義向前傳遞的配方，但此后應(yīng)調(diào)用 Module 實(shí)例，而不是此實(shí)例，因?yàn)榍罢哓?fù)責(zé)運(yùn)行已注冊(cè)的鉤子，而后者則靜默地忽略它們。

half()?

將所有浮點(diǎn)參數(shù)和緩沖區(qū)強(qiáng)制轉(zhuǎn)換為half數(shù)據(jù)類型。

Returns

self

Return type

Module

load_state_dict(state_dict, strict=True)?

將參數(shù)和緩沖區(qū)從 state_dict 復(fù)制到此模塊及其子代中。如果strict為True，則 state_dict 的鍵必須與該模塊的 state_dict() 功能返回的鍵完全匹配。

Parameters

state_dict (dict )–包含參數(shù)和持久緩沖區(qū)的 dict。
嚴(yán)格 (bool ， 可選）–是否嚴(yán)格要求 state_dict 中的鍵與此模塊的 state_dict() 功能返回的鍵。默認(rèn)值：True

Returns

missing_keys 是包含缺失鍵的 str 列表
意外的密鑰是包含意外密鑰的 str 列表

Return type

具有missing_keys和unexpected_keys字段的NamedTuple

modules()?

返回網(wǎng)絡(luò)中所有模塊的迭代器。

Yields

模塊 –網(wǎng)絡(luò)中的模塊

Note

重復(fù)的模塊僅返回一次。在以下示例中，l將僅返回一次。

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.modules()):
        print(idx, '->', m)
0 -> Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
)
1 -> Linear(in_features=2, out_features=2, bias=True)

named_buffers(prefix='', recurse=True)?

返回模塊緩沖區(qū)上的迭代器，同時(shí)產(chǎn)生緩沖區(qū)的名稱和緩沖區(qū)本身。

Parameters

前綴 (str )–前綴為所有緩沖區(qū)名稱的前綴。
recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

(字符串，torch張量） –包含名稱和緩沖區(qū)的元組

Example:

>>> for name, buf in self.named_buffers():
>>>    if name in ['running_var']:
>>>        print(buf.size())

named_children()?

返回直接子模塊的迭代器，同時(shí)產(chǎn)生模塊名稱和模塊本身。

Yields

(字符串，模塊） –包含名稱和子模塊的元組

Example:

>>> for name, module in model.named_children():
>>>     if name in ['conv4', 'conv5']:
>>>         print(module)

named_modules(memo=None, prefix='')?

在網(wǎng)絡(luò)中的所有模塊上返回一個(gè)迭代器，同時(shí)產(chǎn)生模塊的名稱和模塊本身。

Yields

(字符串，模塊） –名稱和模塊的元組

Note

Duplicate modules are returned only once. In the following example, l will be returned only once.

Example:

>>> l = nn.Linear(2, 2)
>>> net = nn.Sequential(l, l)
>>> for idx, m in enumerate(net.named_modules()):
        print(idx, '->', m)
0 -> ('', Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Linear(in_features=2, out_features=2, bias=True)
))
1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

named_parameters(prefix='', recurse=True)?

返回模塊參數(shù)上的迭代器，同時(shí)產(chǎn)生參數(shù)名稱和參數(shù)本身。

Parameters

前綴 (str )–前綴所有參數(shù)名稱。
遞歸 (bool )–如果為 True，則產(chǎn)生該模塊和所有子模塊的參數(shù)。否則，僅產(chǎn)生作為該模塊直接成員的參數(shù)。

Yields

(字符串，參數(shù)） –包含名稱和參數(shù)的元組

Example:

>>> for name, param in self.named_parameters():
>>>    if name in ['bias']:
>>>        print(param.size())

parameters(recurse=True)?

返回模塊參數(shù)上的迭代器。

通常將其傳遞給優(yōu)化器。

Parameters

recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

參數(shù) –模塊參數(shù)

Example:

>>> for param in model.parameters():
>>>     print(type(param.data), param.size())
<class 'torch.FloatTensor'> (20L,)
<class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook)?

在模塊上注冊(cè)向后掛鉤。

每當(dāng)計(jì)算相對(duì)于模塊輸入的梯度時(shí)，都會(huì)調(diào)用該掛鉤。掛鉤應(yīng)具有以下簽名：

hook(module, grad_input, grad_output) -> Tensor or None

如果模塊具有多個(gè)輸入或輸出，則grad_input和grad_output可能是元組。掛鉤不應(yīng)該修改其參數(shù)，但可以選擇相對(duì)于輸入返回新的梯度，該梯度將在后續(xù)計(jì)算中代替grad_input使用。

Returns

可以通過調(diào)用handle.remove()來刪除添加的鉤子的句柄

Return type

torch.utils.hooks.RemovableHandle

警告

對(duì)于執(zhí)行許多操作的復(fù)雜 Module ，當(dāng)前實(shí)現(xiàn)不具有所呈現(xiàn)的行為。在某些故障情況下，grad_input和grad_output將僅包含輸入和輸出的子集的梯度。對(duì)于此類 Module ，應(yīng)在特定的輸入或輸出上直接使用 torch.Tensor.register_hook() 以獲取所需的梯度。

register_buffer(name, tensor)?

將持久性緩沖區(qū)添加到模塊。

這通常用于注冊(cè)不應(yīng)被視為模型參數(shù)的緩沖區(qū)。例如，BatchNorm 的running_mean不是參數(shù)，而是持久狀態(tài)的一部分。

可以使用給定名稱將緩沖區(qū)作為屬性進(jìn)行訪問。

Parameters

名稱(字符串）–緩沖區(qū)的名稱。可以使用給定名稱從此模塊訪問緩沖區(qū)
張量 (tensor)–要注冊(cè)的緩沖區(qū)。

Example:

>>> self.register_buffer('running_mean', torch.zeros(num_features))

register_forward_hook(hook)?

在模塊上注冊(cè)一個(gè)前向掛鉤。

每當(dāng) forward() 計(jì)算輸出后，該掛鉤都會(huì)被調(diào)用。它應(yīng)具有以下簽名：

hook(module, input, output) -> None or modified output

掛鉤可以修改輸出。它可以就地修改輸入，但是不會(huì)對(duì)正向產(chǎn)生影響，因?yàn)樵谡{(diào)用 forward() 之后會(huì)調(diào)用它。

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_forward_pre_hook(hook)?

在模塊上注冊(cè)前向預(yù)鉤。

每次調(diào)用 forward() 之前，都會(huì)調(diào)用該掛鉤。它應(yīng)具有以下簽名：

hook(module, input) -> None or modified input

掛鉤可以修改輸入。用戶可以在掛鉤中返回一個(gè)元組或一個(gè)修改后的值。如果返回單個(gè)值，則將值包裝到一個(gè)元組中(除非該值已經(jīng)是一個(gè)元組）。

Returns

a handle that can be used to remove the added hook by calling handle.remove()

Return type

torch.utils.hooks.RemovableHandle

register_parameter(name, param)?

向模塊添加參數(shù)。

可以使用給定名稱將參數(shù)作為屬性訪問。

Parameters

名稱(字符串）–參數(shù)的名稱。可以使用給定名稱從此模塊訪問參數(shù)
參數(shù) (參數(shù))–要添加到模塊的參數(shù)。

requires_grad_(requires_grad=True)?

更改 autograd 是否應(yīng)記錄此模塊中參數(shù)的操作。

此方法就地設(shè)置參數(shù)的requires_grad屬性。

此方法有助于凍結(jié)模塊的一部分以分別微調(diào)或訓(xùn)練模型的各個(gè)部分(例如 GAN 訓(xùn)練）。

Parameters

require_grad (bool )– autograd 是否應(yīng)記錄此模塊中參數(shù)的操作。默認(rèn)值：True。

Returns

self

Return type

Module

state_dict(destination=None, prefix='', keep_vars=False)?

返回包含模塊整個(gè)狀態(tài)的字典。

包括參數(shù)和持久緩沖區(qū)(例如運(yùn)行平均值）。鍵是相應(yīng)的參數(shù)和緩沖區(qū)名稱。

Returns

包含模塊整體狀態(tài)的字典

Return type

字典

Example:

>>> module.state_dict().keys()
['bias', 'weight']

to(*args, **kwargs)?

移動(dòng)和/或強(qiáng)制轉(zhuǎn)換參數(shù)和緩沖區(qū)。

這可以稱為

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

它的簽名類似于 torch.Tensor.to() ，但僅接受所需的浮點(diǎn)dtype。此外，此方法只會(huì)將浮點(diǎn)參數(shù)和緩沖區(qū)強(qiáng)制轉(zhuǎn)換為dtype(如果給定）。如果已給定，則積分參數(shù)和緩沖區(qū)將被移動(dòng)device，但 dtypes 不變。設(shè)置non_blocking時(shí)，如果可能，它將嘗試相對(duì)于主機(jī)進(jìn)行異步轉(zhuǎn)換/移動(dòng)，例如，將具有固定內(nèi)存的 CPU 張量移動(dòng)到 CUDA 設(shè)備。

請(qǐng)參見下面的示例。

Note

此方法就地修改模塊。

Parameters

設(shè)備(torch.device）–該模塊中參數(shù)和緩沖區(qū)的所需設(shè)備
dtype (torch.dtype）–此模塊中浮點(diǎn)參數(shù)和緩沖區(qū)的所需浮點(diǎn)類型
張量 (torch張量)–張量，其 dtype 和 device 是此模塊中所有參數(shù)和緩沖區(qū)的所需 dtype 和 device

Returns

self

Return type

Module

Example:

>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

train(mode=True)?

將模塊設(shè)置為訓(xùn)練模式。

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

模式 (bool )–是設(shè)置訓(xùn)練模式(True）還是評(píng)估模式(False）。默認(rèn)值：True。

Returns

self

Return type

Module

type(dst_type)?

將所有參數(shù)和緩沖區(qū)強(qiáng)制轉(zhuǎn)換為dst_type。

Parameters

dst_type (python：type 或 字符串）–所需類型

Returns

self

Return type

Module

zero_grad()?

將所有模型參數(shù)的梯度設(shè)置為零。

順序的

class torch.nn.Sequential(*args)?

順序容器。模塊將按照在構(gòu)造函數(shù)中傳遞的順序添加到模塊中。或者，也可以傳遞模塊的有序字典。

為了更容易理解，這是一個(gè)小示例：

## Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
## Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

模塊列表

class torch.nn.ModuleList(modules=None)?

將子模塊保存在列表中。

ModuleList 可以像常規(guī) Python 列表一樣被索引，但是其中包含的模塊已正確注冊(cè)，并且對(duì)所有 Module 方法都是可見的。

Parameters

模塊(可迭代 ，可選）–可迭代的模塊

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

append(module)?

將給定模塊附加到列表的末尾。

Parameters

模塊 (nn.Module)–要附加的模塊

extend(modules)?

將可迭代的 Python 模塊附加到列表的末尾。

Parameters

模塊(可迭代）–可迭代的模塊

insert(index, module)?

在列表中給定索引之前插入給定模塊。

Parameters

索引 (python：int )–要插入的索引。
模塊 (nn.Module)–要插入的模塊

ModuleDict

class torch.nn.ModuleDict(modules=None)?

將子模塊保存在字典中。

ModuleDict 可以像常規(guī)的 Python 字典一樣被索引，但是其中包含的模塊已正確注冊(cè)，并且對(duì)所有 Module 方法都是可見的。

ModuleDict 是有序字典，

插入順序，以及
在 update() 中，OrderedDict或另一個(gè) ModuleDict 的合并順序 (update() 的順序）。

請(qǐng)注意， update() 和其他無序映射類型(例如 Python 的普通dict）不會(huì)保留合并映射的順序。

Parameters

模塊(可迭代 ， 可選）–(字符串：模塊）的映射(字典）或鍵值對(duì)的可迭代類型(字符串，模塊）

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.choices = nn.ModuleDict({
                'conv': nn.Conv2d(10, 10, 3),
                'pool': nn.MaxPool2d(3)
        })
        self.activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['prelu', nn.PReLU()]
        ])
    def forward(self, x, choice, act):
        x = self.choices[choice](x)
        x = self.activations[act](x)
        return x

clear()?

從 ModuleDict 中刪除所有項(xiàng)目。

items()?

返回一個(gè)可迭代的 ModuleDict 鍵/值對(duì)。

keys()?

返回一個(gè)可迭代的 ModuleDict 鍵。

pop(key)?

從 ModuleDict 中刪除密鑰并返回其模塊。

Parameters

鍵(字符串）–從 ModuleDict 彈出的鍵

update(modules)?

使用來自映射或可迭代，覆蓋現(xiàn)有鍵的鍵值對(duì)更新 ModuleDict 。

Note

如果modules是OrderedDict， ModuleDict 或鍵值對(duì)的可迭代項(xiàng)，則將保留其中的新元素順序。

Parameters

模塊(可迭代）–從字符串到 Module 的映射(字典），或鍵值對(duì)類型的可迭代(字符串， [] Module])

values()?

返回一個(gè) ModuleDict 值的可迭代值。

參數(shù)表

class torch.nn.ParameterList(parameters=None)?

將參數(shù)保存在列表中。

ParameterList 可以像常規(guī) Python 列表一樣被索引，但是其中包含的參數(shù)已正確注冊(cè)，并且將由所有 Module 方法可見。

Parameters

參數(shù)(可迭代的 ， 可選）–可迭代的 Parameter

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])
    def forward(self, x):
        # ParameterList can act as an iterable, or be indexed using ints
        for i, p in enumerate(self.params):
            x = self.params[i // 2].mm(x) + p.mm(x)
        return x

append(parameter)?

在列表的末尾附加一個(gè)給定的參數(shù)。

Parameters

參數(shù) (nn.Parameter)–要附加的參數(shù)

extend(parameters)?

將可迭代的 Python 參數(shù)附加到列表的末尾。

Parameters

參數(shù)(可迭代）–可迭代的參數(shù)

ParameterDict

class torch.nn.ParameterDict(parameters=None)?

將參數(shù)保存在字典中。

可以像常規(guī) Python 詞典一樣對(duì) ParameterDict 進(jìn)行索引，但是它包含的參數(shù)已正確注冊(cè)，并且對(duì)所有 Module 方法都可見。

ParameterDict 是有序字典，

the order of insertion, and
在 update() 中，OrderedDict或另一個(gè) ParameterDict 的合并順序 (update() 的順序）。

請(qǐng)注意， update() 和其他無序映射類型(例如 Python 的普通dict）不會(huì)保留合并映射的順序。

Parameters

參數(shù)(可迭代的 ， 可選）–(字符串： Parameter)的映射(字典）或類型(字符串 Parameter)的鍵值對(duì)的可迭代

Example:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.params = nn.ParameterDict({
                'left': nn.Parameter(torch.randn(5, 10)),
                'right': nn.Parameter(torch.randn(5, 10))
        })
    def forward(self, x, choice):
        x = self.params[choice].mm(x)
        return x

clear()?

從 ParameterDict 中刪除所有項(xiàng)目。

items()?

返回一個(gè) ParameterDict 鍵/值對(duì)的可迭代對(duì)象。

keys()?

返回一個(gè)可迭代的 ParameterDict 鍵。

pop(key)?

從 ParameterDict 中刪除鍵并返回其參數(shù)。

Parameters

鍵(字符串）–從 ParameterDict 彈出的鍵

update(parameters)?

使用來自映射或可迭代，覆蓋現(xiàn)有鍵的鍵值對(duì)更新 ParameterDict 。

Note

如果parameters是OrderedDict， ParameterDict 或鍵值對(duì)的可迭代項(xiàng)，則將保留其中的新元素順序。

Parameters

參數(shù)(可迭代的）–從字符串到 Parameter 的映射(字典），或鍵值對(duì)類型的可迭代(字符串， [] Parameter])

values()?

返回 ParameterDict 值的可迭代值。

卷積層

轉(zhuǎn)換 1d

class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維卷積。

在最簡(jiǎn)單的情況下，具有輸入大小和輸出的圖層的輸出值可以精確地描述為：

其中[是有效的]()[互相關(guān)]運(yùn)算符，是批處理大小，表示通道數(shù)，![img])是信號(hào)序列的長(zhǎng)度。

stride控制互相關(guān)的步幅，單個(gè)數(shù)字或一個(gè)元素的元組。

對(duì)于padding點(diǎn)數(shù)，padding控制兩側(cè)的隱式零填充量。

dilation控制內(nèi)核點(diǎn)之間的間距；也稱為àtrous 算法。很難描述，但是此鏈接很好地展示了dilation的功能。

groups控制輸入和輸出之間的連接。 in_channels和out_channels必須都可以被groups整除。例如，

\> 在組= 1 時(shí)，所有輸入都卷積為所有輸出。 > \> \> 在 groups = 2 時(shí)，該操作等效于并排設(shè)置兩個(gè) conv 層，每個(gè) conv 層看到一半的輸入通道，并產(chǎn)生一半的輸出通道，并且隨后都將它們級(jí)聯(lián)。 > \> \> * 在 groups = in_channels時(shí)，每個(gè)輸入通道都與自己的大小為的一組濾波器卷積。

Note

根據(jù)內(nèi)核的大小，輸入的(最后）幾列可能會(huì)丟失，因?yàn)樗怯行У幕ハ嚓P(guān)，而不是完整的互相關(guān) 。由用戶決定是否添加適當(dāng)?shù)奶畛洹?/p>

Note

當(dāng)<cite>組== in_channels</cite> 和 <cite>out_channels == K * in_channels</cite> 時(shí)，其中 <cite>K</cite> 是一個(gè)正整數(shù)，此操作在文獻(xiàn)中也被稱為深度卷積。

換句話說，對(duì)于大小為的輸入，可以通過參數(shù)構(gòu)造具有深度乘數(shù) <cite>K</cite> 的深度卷積。

Note

在某些情況下，將 CUDA 后端與 CuDNN 一起使用時(shí)，該運(yùn)算符可能會(huì)選擇不確定的算法來提高性能。如果不希望這樣做，則可以通過設(shè)置torch.backends.cudnn.deterministic = True來使操作具有確定性(可能會(huì)降低性能）。請(qǐng)參閱關(guān)于可再現(xiàn)性的注意事項(xiàng)作為背景。

Parameters

in_channels (python：int )–輸入圖像中的通道數(shù)
out_channels (python：int )–卷積產(chǎn)生的通道數(shù)
kernel_size (python：int 或元組）–卷積內(nèi)核的大小
步幅 (python：int 或元組，可選）–步幅卷積。默認(rèn)值：1
填充 (python：int 或元組 ， 可選）–零填充添加到輸入的兩側(cè)。默認(rèn)值：0
padding_mode (字符串 ，可選）– <cite>零</cite>
擴(kuò)展 (python：int 或元組 ， 可選）–內(nèi)核之間的間隔元素。默認(rèn)值：1
組 (python：int ， 可選）–從輸入通道到輸出通道的阻塞連接數(shù)。默認(rèn)值：1
偏置 (bool ，可選）–如果True，則向輸出添加可學(xué)習(xí)的偏置。默認(rèn)值：True

Shape:

輸入：

輸出：其中

Variables

?Conv1d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值來自，其中
?Conv1d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）。如果bias為True，則這些權(quán)重的值將從采樣，其中

例子：

>>> m = nn.Conv1d(16, 33, 3, stride=2)
>>> input = torch.randn(20, 16, 50)
>>> output = m(input)

轉(zhuǎn)換 2d

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')?

對(duì)由多個(gè)輸入平面組成的輸入信號(hào)應(yīng)用 2D 卷積。

在最簡(jiǎn)單的情況下，具有輸入大小和輸出的圖層的輸出值可以精確地描述為：

其中是有效的 2D 互相關(guān)運(yùn)算符，是批處理大小，表示通道數(shù)，是輸入平面的高度(以像素為單位），并且[ 是以像素為單位的寬度。

stride控制互相關(guān)的步幅，單個(gè)數(shù)字或元組。

對(duì)于每個(gè)維度的padding點(diǎn)數(shù)，padding控制兩側(cè)的隱式零填充量。

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

\> At groups=1, all inputs are convolved to all outputs. > \> \> At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. > \> \> * 在 groups = in_channels時(shí)，每個(gè)輸入通道都與自己的一組過濾器卷積，大小為。

參數(shù)kernel_size，stride，padding和dilation可以是：

單個(gè)int –在這種情況下，高度和寬度尺寸將使用相同的值

- 兩個(gè)整數(shù)的tuple –在這種情況下，第一個(gè) <cite&int</cite& 用于高度尺寸，第二個(gè) <cite&int</cite& 用于寬度尺寸

Note

When <cite>groups == in_channels</cite> and <cite>out_channels == K * in_channels</cite>, where <cite>K</cite> is a positive integer, this operation is also termed in literature as depthwise convolution.

換句話說，對(duì)于大小為的輸入，可以通過參數(shù)構(gòu)造具有深度乘數(shù) <cite>K</cite> 的深度卷積。

Note

In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True. Please see the notes on Reproducibility for background.

Parameters

in_channels (python:int) – Number of channels in the input image
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple__, optional) – Stride of the convolution. Default: 1
填充 (python：int 或元組 ， 可選）–零填充添加到輸入的兩側(cè)。默認(rèn)值：0
padding_mode (string__, optional) – <cite>zeros</cite>
擴(kuò)展 (python：int 或元組 ， 可選）–內(nèi)核之間的間隔元素。默認(rèn)值：1
組 (python：int ，可選）–從輸入通道到輸出通道的阻塞連接數(shù)。默認(rèn)值：1
bias (bool__, optional) – If True, adds a learnable bias to the output. Default: True

Shape:

輸入：

輸出：其中

Variables

?Conv2d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值取自，其中
?Conv2d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）。如果bias為True，則這些權(quán)重的值將從采樣，其中

Examples:

>>> # With square kernels and equal stride
>>> m = nn.Conv2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> # non-square kernels and unequal stride and with padding and dilation
>>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)

轉(zhuǎn)換 3d

class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 3D 卷積。

在最簡(jiǎn)單的情況下，具有輸入大小和輸出的圖層的輸出值可以精確地描述為：

其中是有效的 3D 互相關(guān)運(yùn)算符

stride控制互相關(guān)的步幅。

padding controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension.

dilation控制內(nèi)核點(diǎn)之間的間距；也稱為àtrous 算法。很難描述，但是此鏈接很好地展示了dilation的功能。

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

The parameters kernel_size, stride, padding, dilation can either be:

單個(gè)int –在這種情況下，深度，高度和寬度尺寸使用相同的值

- 三個(gè)整數(shù)的tuple –在這種情況下，第一個(gè) <cite&int</cite& 用于深度尺寸，第二個(gè) <cite&int</cite& 用于高度尺寸，第三個(gè) <cite&int</cite& 為寬度尺寸

Note

Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.

Note

換句話說，對(duì)于大小為的輸入，可以通過參數(shù)構(gòu)造具有深度乘數(shù) <cite>K</cite> 的深度卷積。

Note

Parameters

in_channels (python:int) – Number of channels in the input image
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple__, optional) – Stride of the convolution. Default: 1
填充 (python：int 或元組 ， 可選）–零填充添加到輸入的所有三個(gè)方面。默認(rèn)值：0
padding_mode (string__, optional) – <cite>zeros</cite>
dilation (python:int or tuple__, optional) – Spacing between kernel elements. Default: 1
groups (python:int__, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool__, optional) – If True, adds a learnable bias to the output. Default: True

Shape:

輸入：

輸出：其中

Variables

?Conv3d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值取自，其中
?Conv3d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）。如果bias為True，則這些權(quán)重的值將從采樣，其中

Examples:

>>> # With square kernels and equal stride
>>> m = nn.Conv3d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
>>> input = torch.randn(20, 16, 10, 50, 100)
>>> output = m(input)

ConvTranspose1d

class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')?

在由多個(gè)輸入平面組成的輸入圖像上應(yīng)用一維轉(zhuǎn)置的卷積運(yùn)算符。

該模塊可以看作是 Conv1d 相對(duì)于其輸入的梯度。它也被稱為分?jǐn)?shù)步法卷積或反卷積(盡管它不是實(shí)際的反卷積運(yùn)算）。

stride controls the stride for the cross-correlation.

對(duì)于dilation * (kernel_size - 1) - padding點(diǎn)數(shù)，padding控制兩側(cè)的隱式零填充量。有關(guān)詳細(xì)信息，請(qǐng)參見下面的注釋。

output_padding控制添加到輸出形狀一側(cè)的附加尺寸。有關(guān)詳細(xì)信息，請(qǐng)參見下面的注釋。

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

Note

padding參數(shù)有效地將dilation * (kernel_size - 1) - padding的零填充量添加到兩種輸入大小。進(jìn)行設(shè)置時(shí)，以相同的參數(shù)初始化 Conv1d 和 ConvTranspose1d 時(shí)，它們?cè)谳斎牒洼敵鲂螤罘矫姹舜讼喾础?但是，當(dāng)stride > 1， Conv1d 將多個(gè)輸入形狀映射到相同的輸出形狀時(shí)。提供output_padding可通過有效地增加一側(cè)的計(jì)算輸出形狀來解決這種歧義。請(qǐng)注意，output_padding僅用于查找輸出形狀，而實(shí)際上并未向輸出添加零填充。

Note

Parameters

in_channels (python:int) – Number of channels in the input image
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple__, optional) – Stride of the convolution. Default: 1
填充 (python：int 或元組 ， 可選）– dilation * (kernel_size - 1) - padding 零填充將添加到輸入的兩側(cè)。默認(rèn)值：0
output_padding (python：int 或元組 ， 可選）–已添加其他大小到輸出形狀的一側(cè)。默認(rèn)值：0
groups (python:int__, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool__, optional) – If True, adds a learnable bias to the output. Default: True
dilation (python:int or tuple__, optional) – Spacing between kernel elements. Default: 1

Shape:

Input:

Output: where

Variables

?ConvTranspose1d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值取自，其中
?ConvTranspose1d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）。如果bias為True，則這些權(quán)重的值將從采樣，其中

ConvTranspose2d

class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')?

在由多個(gè)輸入平面組成的輸入圖像上應(yīng)用二維轉(zhuǎn)置卷積運(yùn)算符。

該模塊可以看作是 Conv2d 相對(duì)于其輸入的梯度。它也被稱為分?jǐn)?shù)步法卷積或反卷積(盡管它不是實(shí)際的反卷積運(yùn)算）。

stride controls the stride for the cross-correlation.

padding controls the amount of implicit zero-paddings on both sides for dilation * (kernel_size - 1) - padding number of points. See note below for details.

output_padding controls the additional size added to one side of the output shape. See note below for details.

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

\> At groups=1, all inputs are convolved to all outputs. > \> \> At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. > \> \> * At groups= in_channels, each input channel is convolved with its own set of filters (of size ).

參數(shù)kernel_size，stride，padding和output_padding可以是：

單個(gè)int –在這種情況下，高度和寬度尺寸將使用相同的值

- a tuple of two ints – in which case, the first <cite&int</cite& is used for the height dimension, and the second <cite&int</cite& for the width dimension

Note

padding參數(shù)有效地將dilation * (kernel_size - 1) - padding的零填充量添加到兩種輸入大小。進(jìn)行設(shè)置時(shí)，以相同的參數(shù)初始化 Conv2d 和 ConvTranspose2d 時(shí)，它們?cè)谳斎牒洼敵鲂螤罘矫姹舜讼喾础?但是，當(dāng)stride > 1， Conv2d 將多個(gè)輸入形狀映射到相同的輸出形狀時(shí)。提供output_padding可通過有效地增加一側(cè)的計(jì)算輸出形狀來解決這種歧義。請(qǐng)注意，output_padding僅用于查找輸出形狀，而實(shí)際上并未向輸出添加零填充。

Note

Parameters

in_channels (python:int) – Number of channels in the input image
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple__, optional) – Stride of the convolution. Default: 1
填充 (python：int 或元組，可選）– dilation * (kernel_size - 1) - padding 零填充將添加到輸入中每個(gè)維度的兩側(cè)。默認(rèn)值：0
output_padding (python：int 或元組，可選）–已添加其他大小輸出形狀中每個(gè)尺寸的一側(cè)。默認(rèn)值：0
groups (python:int__, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool__, optional) – If True, adds a learnable bias to the output. Default: True
dilation (python:int or tuple__, optional) – Spacing between kernel elements. Default: 1

Shape:

Input:
Output: where

Variables

?ConvTranspose2d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值取自，其中
?ConvTranspose2d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）如果bias為True，則值這些權(quán)重來自，其中

Examples:

>>> # With square kernels and equal stride
>>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = m(input)
>>> # exact output size can be also specified as an argument
>>> input = torch.randn(1, 16, 12, 12)
>>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
>>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
>>> h = downsample(input)
>>> h.size()
torch.Size([1, 16, 6, 6])
>>> output = upsample(h, output_size=input.size())
>>> output.size()
torch.Size([1, 16, 12, 12])

ConvTranspose3d

class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')?

在由多個(gè)輸入平面組成的輸入圖像上應(yīng)用 3D 轉(zhuǎn)置卷積運(yùn)算符。轉(zhuǎn)置的卷積運(yùn)算符將每個(gè)輸入值逐個(gè)元素地乘以一個(gè)可學(xué)習(xí)的內(nèi)核，并對(duì)所有輸入特征平面的輸出求和。

該模塊可以看作是 Conv3d 相對(duì)于其輸入的梯度。它也被稱為分?jǐn)?shù)步法卷積或反卷積(盡管它不是實(shí)際的反卷積運(yùn)算）。

stride controls the stride for the cross-correlation.

padding controls the amount of implicit zero-paddings on both sides for dilation * (kernel_size - 1) - padding number of points. See note below for details.

output_padding controls the additional size added to one side of the output shape. See note below for details.

dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

\> At groups=1, all inputs are convolved to all outputs. > \> \> At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. > \> \> * At groups= in_channels, each input channel is convolved with its own set of filters (of size ).

The parameters kernel_size, stride, padding, output_padding can either be:

單個(gè)int –在這種情況下，深度，高度和寬度尺寸使用相同的值

- a tuple of three ints – in which case, the first <cite&int</cite& is used for the depth dimension, the second <cite&int</cite& for the height dimension and the third <cite&int</cite& for the width dimension

Note

padding參數(shù)有效地將dilation * (kernel_size - 1) - padding的零填充量添加到兩種輸入大小。進(jìn)行設(shè)置時(shí)，以相同的參數(shù)初始化 Conv3d 和 ConvTranspose3d 時(shí)，它們?cè)谳斎牒洼敵鲂螤罘矫姹舜讼喾础?但是，當(dāng)stride > 1， Conv3d 將多個(gè)輸入形狀映射到相同的輸出形狀時(shí)。提供output_padding可通過有效地增加一側(cè)的計(jì)算輸出形狀來解決這種歧義。請(qǐng)注意，output_padding僅用于查找輸出形狀，而實(shí)際上并未向輸出添加零填充。

Note

Parameters

in_channels (python:int) – Number of channels in the input image
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple__, optional) – Stride of the convolution. Default: 1
padding (python:int or tuple__, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of each dimension in the input. Default: 0
output_padding (python:int or tuple__, optional) – Additional size added to one side of each dimension in the output shape. Default: 0
groups (python:int__, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool__, optional) – If True, adds a learnable bias to the output. Default: True
dilation (python:int or tuple__, optional) – Spacing between kernel elements. Default: 1

Shape:

Input:
Output: where

Variables

?ConvTranspose3d.weight (tensor)–形狀為的模塊的可學(xué)習(xí)重量。這些權(quán)重的值取自，其中
?ConvTranspose3d.bias (tensor)–形狀模塊的可學(xué)習(xí)偏差(out_channels）如果bias為True，則值這些權(quán)重來自，其中

Examples:

>>> # With square kernels and equal stride
>>> m = nn.ConvTranspose3d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2))
>>> input = torch.randn(20, 16, 10, 50, 100)
>>> output = m(input)

展開

class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)?

從批處理輸入張量中提取滑動(dòng)局部塊。

考慮形狀為的成批input張量，其中為批尺寸，為通道尺寸，代表任意空間尺寸。此操作將input的空間尺寸內(nèi)每個(gè)kernel_size大小的滑動(dòng)塊壓平為形狀為的 3-D output張量的列(即最后一個(gè)尺寸），其中為總數(shù) 每個(gè)塊內(nèi)的值數(shù)量(一個(gè)塊具有個(gè)空間位置，每個(gè)位置包含通道矢量），是此類塊的總數(shù)：

其中由input(以上）的空間尺寸形成，而在所有空間尺寸上。

因此，在最后一個(gè)維度(列維度）上索引output將給出特定塊內(nèi)的所有值。

padding，stride和dilation自變量指定如何檢索滑塊。

stride控制滑塊的步幅。
在重塑之前，padding控制每個(gè)維的padding個(gè)點(diǎn)的兩側(cè)的隱式零填充量。
dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

Parameters

kernel_size (python：int 或元組）–滑塊的大小
跨度 (python：int 或元組 ， 可選）–跨度輸入空間維度中的滑塊。默認(rèn)值：1
填充 (python：int 或元組 ， 可選）–隱式零填充將被添加到輸入的兩側(cè)。默認(rèn)值：0
擴(kuò)張 (python：int 或元組，可選）–一個(gè)參數(shù) 控制鄰域內(nèi)元素的步幅。默認(rèn)值：1
如果kernel_size，dilation，padding或stride是長(zhǎng)度為 1 的 int 或元組，則它們的值將在所有空間維度上復(fù)制。
對(duì)于兩個(gè)輸入空間維度，此操作有時(shí)稱為im2col。

Note

Fold 通過對(duì)來自所有包含塊的所有值求和來計(jì)算所得大張量中的每個(gè)組合值。 Unfold 通過復(fù)制大張量來提取局部塊中的值。因此，如果這些塊重疊，則它們不是彼此相反。

通常，折疊和展開操作如下相關(guān)。考慮使用相同參數(shù)創(chuàng)建的 Fold 和 Unfold 實(shí)例：

>>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...)
>>> fold = nn.Fold(output_size=..., **fold_params)
>>> unfold = nn.Unfold(**fold_params)

然后，對(duì)于任何(受支持的）input張量，以下等式成立：

fold(unfold(input)) == divisor * input

其中divisor是僅取決于input的形狀和 dtype 的張量：

>>> input_ones = torch.ones(input.shape, dtype=input.dtype)
>>> divisor = fold(unfold(input_ones))

當(dāng)divisor張量不包含零元素時(shí)，則fold和unfold運(yùn)算互為逆(最大除數(shù)）。

Warning

當(dāng)前，僅支持 4D 輸入張量(像圖像一樣的批狀張量）。

Shape:

輸入：
輸出：如上所述

Examples:

>>> unfold = nn.Unfold(kernel_size=(2, 3))
>>> input = torch.randn(2, 5, 3, 4)
>>> output = unfold(input)
>>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels)
>>> # 4 blocks (2x3 kernels) in total in the 3x4 input
>>> output.size()
torch.Size([2, 30, 4])
>>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
>>> inp = torch.randn(1, 3, 10, 12)
>>> w = torch.randn(2, 3, 4, 5)
>>> inp_unf = torch.nn.functional.unfold(inp, (4, 5))
>>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
>>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
>>> # or equivalently (and avoiding a copy),
>>> # out = out_unf.view(1, 2, 7, 8)
>>> (torch.nn.functional.conv2d(inp, w) - out).abs().max()
tensor(1.9073e-06)

折

class torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)?

將一系列滑動(dòng)局部塊組合成一個(gè)大型的張量。

考慮一個(gè)包含形狀的滑動(dòng)局部塊(例如圖像塊）的批處理input張量，其中是批處理尺寸，是一個(gè)塊內(nèi)的值數(shù)(一個(gè)塊具有每個(gè)包含通道向量的空間位置），是塊的總數(shù)。 (這與 Unfold 的輸出形狀完全相同。）此操作通過求和重疊值，將這些局部塊組合為形狀為的大output張量。與 []()Unfold 類似，參數(shù)必須滿足

其中覆蓋所有空間尺寸。

output_size描述了滑動(dòng)局部塊的大包含張量的空間形狀。當(dāng)多個(gè)輸入形狀例如使用stride > 0映射到相同數(shù)量的滑塊時(shí)，解決歧義很有用。

The padding, stride and dilation arguments specify how the sliding blocks are retrieved.

stride controls the stride for the sliding blocks.
padding controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension before reshaping.
dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

Parameters

output_size (python：int 或元組）–輸出的空間尺寸形狀(即output.sizes()[2:]）
kernel_size (python:int or tuple) – the size of the sliding blocks
跨度 (python：int 或元組）–滑動(dòng)塊在輸入空間維度上的跨度。默認(rèn)值：1
padding (python:int or tuple__, optional) – implicit zero padding to be added on both sides of input. Default: 0
dilation (python:int or tuple__, optional) – a parameter that controls the stride of elements within the neighborhood. Default: 1
如果output_size，kernel_size，dilation，padding或stride是長(zhǎng)度為 1 的整數(shù)或元組，則它們的值將在所有空間維度上復(fù)制。
對(duì)于兩個(gè)輸出空間維度，此操作有時(shí)稱為col2im。

Note

Fold calculates each combined value in the resulting large tensor by summing all values from all containing blocks. Unfold extracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other.

In general, folding and unfolding operations are related as follows. Consider Fold and Unfold instances created with the same parameters:

>>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...)
>>> fold = nn.Fold(output_size=..., **fold_params)
>>> unfold = nn.Unfold(**fold_params)

Then for any (supported) input tensor the following equality holds:

fold(unfold(input)) == divisor * input

where divisor is a tensor that depends only on the shape and dtype of the input:

>>> input_ones = torch.ones(input.shape, dtype=input.dtype)
>>> divisor = fold(unfold(input_ones))

When the divisor tensor contains no zero elements, then fold and unfold operations are inverses of each other (upto constant divisor).

Warning

當(dāng)前，僅支持 4D 輸出張量(像圖像一樣的批狀張量）。

Shape:

輸入：
輸出：如上所述

Examples:

>>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
>>> input = torch.randn(1, 3 * 2 * 2, 12)
>>> output = fold(input)
>>> output.size()
torch.Size([1, 3, 4, 5])

匯聚層

MaxPool1d

class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維最大池化。

在最簡(jiǎn)單的情況下，具有輸入大小和輸出的圖層的輸出值可以精確地描述為：

如果padding不為零，則對(duì)于padding點(diǎn)的數(shù)量，輸入將在兩側(cè)隱式填充零。 dilation控制內(nèi)核點(diǎn)之間的間距。很難描述，但是此鏈接很好地展示了dilation的功能。

Parameters

kernel_size –取最大值的窗口大小
步幅 –窗口的步幅。默認(rèn)值為kernel_size
填充 –在兩側(cè)都添加隱式零填充
膨脹 –控制窗口中元素步幅的參數(shù)
return_indices –如果True，將返回最大索引以及輸出。以后對(duì) torch.nn.MaxUnpool1d 有用
ceil_mode –為 True 時(shí)，將使用 <cite>ceil</cite> 而不是 <cite>floor</cite> 計(jì)算輸出形狀

Shape:

輸入：

輸出：，其中

Examples:

>>> # pool of size=3, stride=2
>>> m = nn.MaxPool1d(3, stride=2)
>>> input = torch.randn(20, 16, 50)
>>> output = m(input)

MaxPool2d

class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 最大合并。

在最簡(jiǎn)單的情況下，具有輸入大小，輸出和kernel_size 的圖層的輸出值可以精確地描述為：

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points. dilation controls the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of what dilation does.

The parameters kernel_size, stride, padding, dilation can either be:

a single int – in which case the same value is used for the height and width dimension

- a tuple of two ints – in which case, the first <cite&int</cite& is used for the height dimension, and the second <cite&int</cite& for the width dimension

Parameters

kernel_size – the size of the window to take a max over
stride – the stride of the window. Default value is kernel_size
padding – implicit zero padding to be added on both sides
dilation – a parameter that controls the stride of elements in the window
return_indices –如果True，將返回最大索引以及輸出。以后對(duì) torch.nn.MaxUnpool2d 有用
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape

Shape:

輸入：

輸出：，其中

Examples:

>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)

MaxPool3d

class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 3D 最大池化。

在最簡(jiǎn)單的情況下，具有輸入大小，輸出和kernel_size 的圖層的輸出值可以精確地描述為：

The parameters kernel_size, stride, padding, dilation can either be:

a single int – in which case the same value is used for the depth, height and width dimension

- a tuple of three ints – in which case, the first <cite&int</cite& is used for the depth dimension, the second <cite&int</cite& for the height dimension and the third <cite&int</cite& for the width dimension

Parameters

kernel_size – the size of the window to take a max over
stride – the stride of the window. Default value is kernel_size
填充 –在所有三個(gè)面上都添加隱式零填充
dilation – a parameter that controls the stride of elements in the window
return_indices –如果True，將返回最大索引以及輸出。以后對(duì) torch.nn.MaxUnpool3d 有用
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape

Shape:

輸入：

輸出：，其中

Examples:

>>> # pool of square window of size=3, stride=2
>>> m = nn.MaxPool3d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2))
>>> input = torch.randn(20, 16, 50,44, 31)
>>> output = m(input)

MaxUnpool1d

class torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)?

計(jì)算 MaxPool1d 的局部逆。

MaxPool1d 不能完全反轉(zhuǎn)，因?yàn)闀?huì)丟失非最大值。

MaxUnpool1d 接收包括最大值索引在內(nèi)的 MaxPool1d 的輸出作為輸入，并計(jì)算一個(gè)部分逆，其中所有非最大值都設(shè)置為零。

Note

MaxPool1d 可以將多個(gè)輸入大小映射到相同的輸出大小。因此，反轉(zhuǎn)過程可能會(huì)變得模棱兩可。為了解決這個(gè)問題，您可以在前進(jìn)調(diào)用中提供所需的輸出大小作為附加參數(shù)output_size。請(qǐng)參閱下面的輸入和示例。

Parameters

kernel_size (python：int 或元組）–最大池窗口的大小。
跨度 (python：int 或元組）–最大合并窗口的跨度。默認(rèn)設(shè)置為kernel_size。
填充 (python：int 或元組）–已添加到輸入中的填充

Inputs:

<cite>輸入</cite>：輸入張量反轉(zhuǎn)
<cite>指標(biāo)</cite>： MaxPool1d 給出的指標(biāo)
<cite>output_size</cite> (可選）：目標(biāo)輸出大小

Shape:

輸入：

輸出：，其中

或由呼叫運(yùn)營商中的output_size給定

Example:

>>> pool = nn.MaxPool1d(2, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool1d(2, stride=2)
>>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8]]])
>>> output, indices = pool(input)
>>> unpool(output, indices)
tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.]]])
>>> # Example showcasing the use of output_size
>>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8, 9]]])
>>> output, indices = pool(input)
>>> unpool(output, indices, output_size=input.size())
tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.,  0.]]])
>>> unpool(output, indices)
tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.]]])

MaxUnpool2d

class torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)?

計(jì)算 MaxPool2d 的局部逆。

MaxPool2d 不能完全反轉(zhuǎn)，因?yàn)闀?huì)丟失非最大值。

MaxUnpool2d 接收包括最大值索引在內(nèi)的 MaxPool2d 的輸出作為輸入，并計(jì)算一個(gè)部分逆，其中所有非最大值都設(shè)置為零。

Note

MaxPool2d 可以將多個(gè)輸入大小映射到相同的輸出大小。因此，反轉(zhuǎn)過程可能會(huì)變得模棱兩可。為了解決這個(gè)問題，您可以在前進(jìn)調(diào)用中提供所需的輸出大小作為附加參數(shù)output_size。請(qǐng)參閱下面的輸入和示例。

Parameters

kernel_size (python:int or tuple) – Size of the max pooling window.
stride (python:int or tuple) – Stride of the max pooling window. It is set to kernel_size by default.
padding (python:int or tuple) – Padding that was added to the input

Inputs:

<cite>input</cite>: the input Tensor to invert
<cite>指標(biāo)</cite>： MaxPool2d 給出的指標(biāo)
<cite>output_size</cite> (optional): the targeted output size

Shape:

Input:

Output: , where

or as given by output_size in the call operator

Example:

>>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool2d(2, stride=2)
>>> input = torch.tensor([[[[ 1.,  2,  3,  4],
                            [ 5,  6,  7,  8],
                            [ 9, 10, 11, 12],
                            [13, 14, 15, 16]]]])
>>> output, indices = pool(input)
>>> unpool(output, indices)
tensor([[[[  0.,   0.,   0.,   0.],
          [  0.,   6.,   0.,   8.],
          [  0.,   0.,   0.,   0.],
          [  0.,  14.,   0.,  16.]]]])
>>> # specify a different output size than input size
>>> unpool(output, indices, output_size=torch.Size([1, 1, 5, 5]))
tensor([[[[  0.,   0.,   0.,   0.,   0.],
          [  6.,   0.,   8.,   0.,   0.],
          [  0.,   0.,   0.,  14.,   0.],
          [ 16.,   0.,   0.,   0.,   0.],
          [  0.,   0.,   0.,   0.,   0.]]]])

MaxUnpool3d

class torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)?

計(jì)算 MaxPool3d 的局部逆。

MaxPool3d 不能完全反轉(zhuǎn)，因?yàn)闀?huì)丟失非最大值。 MaxUnpool3d 將包含最大值索引的 MaxPool3d 的輸出作為輸入，并計(jì)算將所有非最大值均設(shè)置為零的部分逆。

Note

MaxPool3d 可以將多個(gè)輸入大小映射到相同的輸出大小。因此，反轉(zhuǎn)過程可能會(huì)變得模棱兩可。為了解決這個(gè)問題，您可以在前進(jìn)調(diào)用中提供所需的輸出大小作為附加參數(shù)output_size。請(qǐng)參閱下面的輸入部分。

Parameters

kernel_size (python:int or tuple) – Size of the max pooling window.
stride (python:int or tuple) – Stride of the max pooling window. It is set to kernel_size by default.
padding (python:int or tuple) – Padding that was added to the input

Inputs:

<cite>input</cite>: the input Tensor to invert
<cite>指標(biāo)</cite>： MaxPool3d 給出的指標(biāo)
<cite>output_size</cite> (optional): the targeted output size

Shape:

Input:

Output: , where

or as given by output_size in the call operator

Example:

>>> # pool of square window of size=3, stride=2
>>> pool = nn.MaxPool3d(3, stride=2, return_indices=True)
>>> unpool = nn.MaxUnpool3d(3, stride=2)
>>> output, indices = pool(torch.randn(20, 16, 51, 33, 15))
>>> unpooled_output = unpool(output, indices)
>>> unpooled_output.size()
torch.Size([20, 16, 51, 33, 15])

平均池 1d

class torch.nn.AvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維平均池。

在最簡(jiǎn)單的情況下，具有輸入大小，輸出和kernel_size 的圖層的輸出值可以精確地描述為：

如果padding不為零，則對(duì)于padding點(diǎn)的數(shù)量，輸入將在兩側(cè)隱式填充零。

參數(shù)kernel_size，stride和padding可以分別是int或一個(gè)元素元組。

Parameters

kernel_size –窗口的大小
stride – the stride of the window. Default value is kernel_size
padding – implicit zero padding to be added on both sides
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape
count_include_pad –為 True 時(shí)，將在平均計(jì)算中包括零填充

Shape:

Input:

Output: , where

Examples:

>>> # pool with window of size=3, stride=2
>>> m = nn.AvgPool1d(3, stride=2)
>>> m(torch.tensor([[[1.,2,3,4,5,6,7]]]))
tensor([[[ 2.,  4.,  6.]]])

平均池 2d

class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 平均池。

In the simplest case, the output value of the layer with input size , output and kernel_size can be precisely described as:

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

參數(shù)kernel_size，stride和padding可以是：

a single int – in which case the same value is used for the height and width dimension

- a tuple of two ints – in which case, the first <cite&int</cite& is used for the height dimension, and the second <cite&int</cite& for the width dimension

Parameters

kernel_size – the size of the window
stride – the stride of the window. Default value is kernel_size
padding – implicit zero padding to be added on both sides
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape
count_include_pad – when True, will include the zero-padding in the averaging calculation
divisor_override -如果指定，它將用作除數(shù)，否則 attr： <cite>kernel_size</cite>

Shape:

Input:

Output: , where

Examples:

>>> # pool of square window of size=3, stride=2
>>> m = nn.AvgPool2d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.AvgPool2d((3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)

平均池 3d

class torch.nn.AvgPool3d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 3D 平均池。

In the simplest case, the output value of the layer with input size , output and kernel_size can be precisely described as:

如果padding不為零，則對(duì)于padding點(diǎn)的數(shù)量，輸入將在所有三個(gè)側(cè)面隱式填充零。

參數(shù)kernel_size和stride可以是：

a single int – in which case the same value is used for the depth, height and width dimension

- a tuple of three ints – in which case, the first <cite&int</cite& is used for the depth dimension, the second <cite&int</cite& for the height dimension and the third <cite&int</cite& for the width dimension

Parameters

kernel_size – the size of the window
stride – the stride of the window. Default value is kernel_size
padding – implicit zero padding to be added on all three sides
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape
count_include_pad – when True, will include the zero-padding in the averaging calculation
divisor_override – if specified, it will be used as divisor, otherwise attr:<cite>kernel_size</cite> will be used

Shape:

Input:

Output: , where

Examples:

>>> # pool of square window of size=3, stride=2
>>> m = nn.AvgPool3d(3, stride=2)
>>> # pool of non-square window
>>> m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2))
>>> input = torch.randn(20, 16, 50,44, 31)
>>> output = m(input)

分?jǐn)?shù)最大池 2d

class torch.nn.FractionalMaxPool2d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 分?jǐn)?shù)最大池化。

Ben Graham 的論文 Fractional MaxPooling 中詳細(xì)描述了分?jǐn)?shù)最大池化

在區(qū)域中通過由目標(biāo)輸出大小確定的隨機(jī)步長(zhǎng)應(yīng)用最大合并操作。輸出要素的數(shù)量等于輸入平面的數(shù)量。

Parameters

kernel_size –接管最大值的窗口大小。可以是單個(gè)數(shù)字 k(對(duì)于 k x k 的平方核）或元組<cite>(kh，kw）</cite>
output_size – <cite>oH x oW</cite> 形式的圖像的目標(biāo)輸出尺寸。可以是一個(gè)元組<cite>(oH，oW）</cite>，也可以是一個(gè)正方形圖像 <cite>oH x oH</cite> 的一個(gè)數(shù)字 oH
output_ratio –如果希望輸出大小與輸入大小的比率，可以指定此選項(xiàng)。這必須是范圍為(0，1）的數(shù)字或元組
return_indices -如果True，則將返回索引以及輸出。有助于傳遞給nn.MaxUnpool2d()。默認(rèn)值：False

例子

>>> # pool of square window of size=3, and target output size 13x12
>>> m = nn.FractionalMaxPool2d(3, output_size=(13, 12))
>>> # pool of square window and target output size being half of input image size
>>> m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)

LPPool1d

class torch.nn.LPPool1d(norm_type, kernel_size, stride=None, ceil_mode=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維功率平均池。

在每個(gè)窗口上，計(jì)算的函數(shù)為：

在 p = 時(shí)，獲得最大池化
在 p = 1 時(shí)，總和池(與平均池成正比）

Note

如果 <cite>p</cite> 的冪的和為零，則此函數(shù)的梯度不確定。在這種情況下，此實(shí)現(xiàn)會(huì)將梯度設(shè)置為零。

Parameters

kernel_size –單個(gè)整數(shù)，窗口的大小
跨度 –一個(gè) int，即窗口的跨度。默認(rèn)值為kernel_size
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape

Shape:

Input:

Output: , where

Examples::
>>> # power-2 pool of window of length 3, with stride 2.
>>> m = nn.LPPool1d(2, 3, stride=2)
>>> input = torch.randn(20, 16, 50)
>>> output = m(input)

LPPool2d

class torch.nn.LPPool2d(norm_type, kernel_size, stride=None, ceil_mode=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 功率平均池。

On each window, the function computed is:

At p = , one gets Max Pooling
在 p = 1 時(shí)，將獲得“匯總池”(與平均池成比例）

The parameters kernel_size, stride can either be:

a single int – in which case the same value is used for the height and width dimension

- a tuple of two ints – in which case, the first <cite&int</cite& is used for the height dimension, and the second <cite&int</cite& for the width dimension

Note

If the sum to the power of <cite>p</cite> is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case.

Parameters

kernel_size – the size of the window
stride – the stride of the window. Default value is kernel_size
ceil_mode – when True, will use <cite>ceil</cite> instead of <cite>floor</cite> to compute the output shape

Shape:

Input:

Output: , where

Examples:

>>> # power-2 pool of square window of size=3, stride=2
>>> m = nn.LPPool2d(2, 3, stride=2)
>>> # pool of non-square window of power 1.2
>>> m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1))
>>> input = torch.randn(20, 16, 50, 32)
>>> output = m(input)

AdaptiveMaxPool1d

class torch.nn.AdaptiveMaxPool1d(output_size, return_indices=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維自適應(yīng)最大池化。

對(duì)于任何輸入大小，輸出大小均為 H。輸出要素的數(shù)量等于輸入平面的數(shù)量。

Parameters

output_size –目標(biāo)輸出大小 H
return_indices -如果True，則將返回索引以及輸出。傳遞給 nn.MaxUnpool1d 很有用。默認(rèn)值：False

Examples

>>> # target output size of 5
>>> m = nn.AdaptiveMaxPool1d(5)
>>> input = torch.randn(1, 64, 8)
>>> output = m(input)

AdaptiveMaxPool2d

class torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 自適應(yīng)最大池化。

對(duì)于任何輸入大小，輸出大小均為 H xW。輸出要素的數(shù)量等于輸入平面的數(shù)量。

Parameters

output_size – H x W 形式的圖像的目標(biāo)輸出大小?？梢允窃M(H，W），也可以是正方形圖像 H x H 的單個(gè) H。H 和 W 可以是 int或None表示大小與輸入的大小相同。
return_indices -如果True，則將返回索引以及輸出。傳遞給 nn.MaxUnpool2d 很有用。默認(rèn)值：False

Examples

>>> # target output size of 5x7
>>> m = nn.AdaptiveMaxPool2d((5,7))
>>> input = torch.randn(1, 64, 8, 9)
>>> output = m(input)
>>> # target output size of 7x7 (square)
>>> m = nn.AdaptiveMaxPool2d(7)
>>> input = torch.randn(1, 64, 10, 9)
>>> output = m(input)
>>> # target output size of 10x7
>>> m = nn.AdaptiveMaxPool2d((None, 7))
>>> input = torch.randn(1, 64, 10, 9)
>>> output = m(input)

AdaptiveMaxPool3d

class torch.nn.AdaptiveMaxPool3d(output_size, return_indices=False)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 3D 自適應(yīng)最大池化。

對(duì)于任何輸入大小，輸出大小均為 D xH xW。輸出要素的數(shù)量等于輸入平面的數(shù)量。

Parameters

output_size – D x H x W 形式的圖像的目標(biāo)輸出尺寸?？梢允且粋€(gè)元組(D，H，W），也可以是一個(gè)多維數(shù)據(jù)集 D x D x D 的單個(gè) D。D， H 和 W 可以是int或None，這意味著大小將與輸入的大小相同。
return_indices -如果True，則將返回索引以及輸出。傳遞給 nn.MaxUnpool3d 很有用。默認(rèn)值：False

Examples

>>> # target output size of 5x7x9
>>> m = nn.AdaptiveMaxPool3d((5,7,9))
>>> input = torch.randn(1, 64, 8, 9, 10)
>>> output = m(input)
>>> # target output size of 7x7x7 (cube)
>>> m = nn.AdaptiveMaxPool3d(7)
>>> input = torch.randn(1, 64, 10, 9, 8)
>>> output = m(input)
>>> # target output size of 7x9x8
>>> m = nn.AdaptiveMaxPool3d((7, None, None))
>>> input = torch.randn(1, 64, 10, 9, 8)
>>> output = m(input)

AdaptiveAvgPool1d

class torch.nn.AdaptiveAvgPool1d(output_size)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用一維自適應(yīng)平均池。

The output size is H, for any input size. The number of output features is equal to the number of input planes.

Parameters

output_size – the target output size H

Examples

>>> # target output size of 5
>>> m = nn.AdaptiveAvgPool1d(5)
>>> input = torch.randn(1, 64, 8)
>>> output = m(input)

AdaptiveAvgPool2d

class torch.nn.AdaptiveAvgPool2d(output_size)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 2D 自適應(yīng)平均池。

The output is of size H x W, for any input size. The number of output features is equal to the number of input planes.

Parameters

output_size – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a int, or None which means the size will be the same as that of the input.

Examples

>>> # target output size of 5x7
>>> m = nn.AdaptiveAvgPool2d((5,7))
>>> input = torch.randn(1, 64, 8, 9)
>>> output = m(input)
>>> # target output size of 7x7 (square)
>>> m = nn.AdaptiveAvgPool2d(7)
>>> input = torch.randn(1, 64, 10, 9)
>>> output = m(input)
>>> # target output size of 10x7
>>> m = nn.AdaptiveMaxPool2d((None, 7))
>>> input = torch.randn(1, 64, 10, 9)
>>> output = m(input)

AdaptiveAvgPool3d

class torch.nn.AdaptiveAvgPool3d(output_size)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用 3D 自適應(yīng)平均池。

The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes.

Parameters

output_size – D x H x W 形式的目標(biāo)輸出大小。可以是元組(D，H，W），也可以是多維數(shù)據(jù)集 D xD x D 的單個(gè)數(shù)字 D。D，H 和 W 可以是int或None，這意味著大小將與輸入的大小相同。

Examples

>>> # target output size of 5x7x9
>>> m = nn.AdaptiveAvgPool3d((5,7,9))
>>> input = torch.randn(1, 64, 8, 9, 10)
>>> output = m(input)
>>> # target output size of 7x7x7 (cube)
>>> m = nn.AdaptiveAvgPool3d(7)
>>> input = torch.randn(1, 64, 10, 9, 8)
>>> output = m(input)
>>> # target output size of 7x9x8
>>> m = nn.AdaptiveMaxPool3d((7, None, None))
>>> input = torch.randn(1, 64, 10, 9, 8)
>>> output = m(input)

填充層

ReflectionPad1d

class torch.nn.ReflectionPad1d(padding)?

使用輸入邊界的反射來填充輸入張量。

對(duì)于 <cite>N</cite> 維填充，請(qǐng)使用 torch.nn.functional.pad() 。

Parameters

填充 (python：int ，元組）–填充的大小。如果為 <cite>int</cite> ，則在所有邊界中使用相同的填充。如果 2- <cite>元組</cite>，則使用(，）

Shape:

輸入：

輸出：其中

Examples:

>>> m = nn.ReflectionPad1d(2)
>>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4)
>>> input
tensor([[[0., 1., 2., 3.],
         [4., 5., 6., 7.]]])
>>> m(input)
tensor([[[2., 1., 0., 1., 2., 3., 2., 1.],
         [6., 5., 4., 5., 6., 7., 6., 5.]]])
>>> # using different paddings for different sides
>>> m = nn.ReflectionPad1d((3, 1))
>>> m(input)
tensor([[[3., 2., 1., 0., 1., 2., 3., 2.],
         [7., 6., 5., 4., 5., 6., 7., 6.]]])

ReflectionPad2d

class torch.nn.ReflectionPad2d(padding)?

Pads the input tensor using the reflection of the input boundary.

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

填充 (python：int ， 元組）–填充的大小。如果為 <cite>int</cite> ，則在所有邊界中使用相同的填充。如果是 4- <cite>元組</cite>，則使用(，，和）

Shape:

Input:

輸出：其中

Examples:

>>> m = nn.ReflectionPad2d(2)
>>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3)
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> m(input)
tensor([[[[8., 7., 6., 7., 8., 7., 6.],
          [5., 4., 3., 4., 5., 4., 3.],
          [2., 1., 0., 1., 2., 1., 0.],
          [5., 4., 3., 4., 5., 4., 3.],
          [8., 7., 6., 7., 8., 7., 6.],
          [5., 4., 3., 4., 5., 4., 3.],
          [2., 1., 0., 1., 2., 1., 0.]]]])
>>> # using different paddings for different sides
>>> m = nn.ReflectionPad2d((1, 1, 2, 0))
>>> m(input)
tensor([[[[7., 6., 7., 8., 7.],
          [4., 3., 4., 5., 4.],
          [1., 0., 1., 2., 1.],
          [4., 3., 4., 5., 4.],
          [7., 6., 7., 8., 7.]]]])

復(fù)制板 1d

class torch.nn.ReplicationPad1d(padding)?

使用輸入邊界的復(fù)制來填充輸入張量。

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

padding (python:int__, tuple) – the size of the padding. If is <cite>int</cite>, uses the same padding in all boundaries. If a 2-<cite>tuple</cite>, uses (, )

Shape:

Input:

Output: where

Examples:

>>> m = nn.ReplicationPad1d(2)
>>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4)
>>> input
tensor([[[0., 1., 2., 3.],
         [4., 5., 6., 7.]]])
>>> m(input)
tensor([[[0., 0., 0., 1., 2., 3., 3., 3.],
         [4., 4., 4., 5., 6., 7., 7., 7.]]])
>>> # using different paddings for different sides
>>> m = nn.ReplicationPad1d((3, 1))
>>> m(input)
tensor([[[0., 0., 0., 0., 1., 2., 3., 3.],
         [4., 4., 4., 4., 5., 6., 7., 7.]]])

復(fù)制板 2d

class torch.nn.ReplicationPad2d(padding)?

Pads the input tensor using replication of the input boundary.

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

padding (python:int__, tuple) – the size of the padding. If is <cite>int</cite>, uses the same padding in all boundaries. If a 4-<cite>tuple</cite>, uses (, , , )

Shape:

Input:

Output: where

Examples:

>>> m = nn.ReplicationPad2d(2)
>>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3)
>>> input
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
>>> m(input)
tensor([[[[0., 0., 0., 1., 2., 2., 2.],
          [0., 0., 0., 1., 2., 2., 2.],
          [0., 0., 0., 1., 2., 2., 2.],
          [3., 3., 3., 4., 5., 5., 5.],
          [6., 6., 6., 7., 8., 8., 8.],
          [6., 6., 6., 7., 8., 8., 8.],
          [6., 6., 6., 7., 8., 8., 8.]]]])
>>> # using different paddings for different sides
>>> m = nn.ReplicationPad2d((1, 1, 2, 0))
>>> m(input)
tensor([[[[0., 0., 1., 2., 2.],
          [0., 0., 1., 2., 2.],
          [0., 0., 1., 2., 2.],
          [3., 3., 4., 5., 5.],
          [6., 6., 7., 8., 8.]]]])

復(fù)制板 3d

class torch.nn.ReplicationPad3d(padding)?

Pads the input tensor using replication of the input boundary.

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

填充 (python：int ， 元組）–填充的大小。如果為 <cite>int</cite> ，則在所有邊界中使用相同的填充。如果是 6-<cite>元組</cite>，則使用(，，，，，）

Shape:

Input:

輸出：其中

Examples:

>>> m = nn.ReplicationPad3d(3)
>>> input = torch.randn(16, 3, 8, 320, 480)
>>> output = m(input)
>>> # using different paddings for different sides
>>> m = nn.ReplicationPad3d((3, 3, 6, 6, 1, 1))
>>> output = m(input)

ZeroPad2d

class torch.nn.ZeroPad2d(padding)?

用零填充輸入張量邊界。

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

padding (python:int__, tuple) – the size of the padding. If is <cite>int</cite>, uses the same padding in all boundaries. If a 4-<cite>tuple</cite>, uses (, , , )

Shape:

Input:

Output: where

Examples:

>>> m = nn.ZeroPad2d(2)
>>> input = torch.randn(1, 1, 3, 3)
>>> input
tensor([[[[-0.1678, -0.4418,  1.9466],
          [ 0.9604, -0.4219, -0.5241],
          [-0.9162, -0.5436, -0.6446]]]])
>>> m(input)
tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000, -0.1678, -0.4418,  1.9466,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.9604, -0.4219, -0.5241,  0.0000,  0.0000],
          [ 0.0000,  0.0000, -0.9162, -0.5436, -0.6446,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
>>> # using different paddings for different sides
>>> m = nn.ZeroPad2d((1, 1, 2, 0))
>>> m(input)
tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
          [ 0.0000, -0.1678, -0.4418,  1.9466,  0.0000],
          [ 0.0000,  0.9604, -0.4219, -0.5241,  0.0000],
          [ 0.0000, -0.9162, -0.5436, -0.6446,  0.0000]]]])

ConstantPad1d

class torch.nn.ConstantPad1d(padding, value)?

用恒定值填充輸入張量邊界。

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

填充 (python：int ，元組）–填充的大小。如果為 <cite>int</cite> ，則在兩個(gè)邊界中使用相同的填充。如果 2- <cite>元組</cite>，則使用(，）

Shape:

Input:

Output: where

Examples:

>>> m = nn.ConstantPad1d(2, 3.5)
>>> input = torch.randn(1, 2, 4)
>>> input
tensor([[[-1.0491, -0.7152, -0.0749,  0.8530],
         [-1.3287,  1.8966,  0.1466, -0.2771]]])
>>> m(input)
tensor([[[ 3.5000,  3.5000, -1.0491, -0.7152, -0.0749,  0.8530,  3.5000,
           3.5000],
         [ 3.5000,  3.5000, -1.3287,  1.8966,  0.1466, -0.2771,  3.5000,
           3.5000]]])
>>> m = nn.ConstantPad1d(2, 3.5)
>>> input = torch.randn(1, 2, 3)
>>> input
tensor([[[ 1.6616,  1.4523, -1.1255],
         [-3.6372,  0.1182, -1.8652]]])
>>> m(input)
tensor([[[ 3.5000,  3.5000,  1.6616,  1.4523, -1.1255,  3.5000,  3.5000],
         [ 3.5000,  3.5000, -3.6372,  0.1182, -1.8652,  3.5000,  3.5000]]])
>>> # using different paddings for different sides
>>> m = nn.ConstantPad1d((3, 1), 3.5)
>>> m(input)
tensor([[[ 3.5000,  3.5000,  3.5000,  1.6616,  1.4523, -1.1255,  3.5000],
         [ 3.5000,  3.5000,  3.5000, -3.6372,  0.1182, -1.8652,  3.5000]]])

ConstantPad2d

class torch.nn.ConstantPad2d(padding, value)?

Pads the input tensor boundaries with a constant value.

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

padding (python:int__, tuple) – the size of the padding. If is <cite>int</cite>, uses the same padding in all boundaries. If a 4-<cite>tuple</cite>, uses (, , , )

Shape:

Input:

Output: where

Examples:

>>> m = nn.ConstantPad2d(2, 3.5)
>>> input = torch.randn(1, 2, 2)
>>> input
tensor([[[ 1.6585,  0.4320],
         [-0.8701, -0.4649]]])
>>> m(input)
tensor([[[ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  1.6585,  0.4320,  3.5000,  3.5000],
         [ 3.5000,  3.5000, -0.8701, -0.4649,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000]]])
>>> # using different paddings for different sides
>>> m = nn.ConstantPad2d((3, 0, 2, 1), 3.5)
>>> m(input)
tensor([[[ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
         [ 3.5000,  3.5000,  3.5000,  1.6585,  0.4320],
         [ 3.5000,  3.5000,  3.5000, -0.8701, -0.4649],
         [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000]]])

ConstantPad3d

class torch.nn.ConstantPad3d(padding, value)?

Pads the input tensor boundaries with a constant value.

For <cite>N</cite>-dimensional padding, use torch.nn.functional.pad().

Parameters

padding (python:int__, tuple) – the size of the padding. If is <cite>int</cite>, uses the same padding in all boundaries. If a 6-<cite>tuple</cite>, uses (, , , , , )

Shape:

Input:

Output: where

Examples:

>>> m = nn.ConstantPad3d(3, 3.5)
>>> input = torch.randn(16, 3, 10, 20, 30)
>>> output = m(input)
>>> # using different paddings for different sides
>>> m = nn.ConstantPad3d((3, 3, 6, 6, 0, 1), 3.5)
>>> output = m(input)

非線性激活(加權(quán)和，非線性）

ELU

class torch.nn.ELU(alpha=1.0, inplace=False)?

應(yīng)用逐元素函數(shù)：

Parameters

alpha – ELU 公式的值。默認(rèn)值：1.0
就地 –可以選擇就地進(jìn)行操作。默認(rèn)值：False

Shape:

輸入：其中 <cite>*</cite> 表示任意數(shù)量的附加尺寸
輸出：，形狀與輸入相同

_images/ELU.png

Examples:

>>> m = nn.ELU()
>>> input = torch.randn(2)
>>> output = m(input)

硬收縮

class torch.nn.Hardshrink(lambd=0.5)?

逐個(gè)應(yīng)用硬收縮功能：

Parameters

lambd – Hardshrink 配方的值。默認(rèn)值：0.5

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Hardshrink.png

Examples:

>>> m = nn.Hardshrink()
>>> input = torch.randn(2)
>>> output = m(input)

哈丹

class torch.nn.Hardtanh(min_val=-1.0, max_val=1.0, inplace=False, min_value=None, max_value=None)?

逐個(gè)應(yīng)用 HardTanh 函數(shù)

HardTanh 定義為：

線性區(qū)域的范圍可以使用min_val和max_val進(jìn)行調(diào)整。

Parameters

min_val –線性區(qū)域范圍的最小值。默認(rèn)值：-1
max_val –線性區(qū)域范圍的最大值。默認(rèn)值：1
inplace – can optionally do the operation in-place. Default: False

不推薦使用關(guān)鍵字參數(shù)min_value和max_value，而推薦使用min_val和max_val。

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Hardtanh.png

Examples:

>>> m = nn.Hardtanh(-2, 2)
>>> input = torch.randn(2)
>>> output = m(input)

漏尿

class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)?

Applies the element-wise function:

要么

Parameters

negative_slope –控制負(fù)斜率的角度。默認(rèn)值：1e-2
inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/LeakyReLU.png

Examples:

>>> m = nn.LeakyReLU(0.1)
>>> input = torch.randn(2)
>>> output = m(input)

LogSigmoid

class torch.nn.LogSigmoid?

Applies the element-wise function:

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/LogSigmoid.png

Examples:

>>> m = nn.LogSigmoid()
>>> input = torch.randn(2)
>>> output = m(input)

多頭注意力

class torch.nn.MultiheadAttention(embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None)?

允許模型共同關(guān)注來自不同表示子空間的信息。請(qǐng)參閱參考：注意就是您所需要的

Parameters

embed_dim -模型的總尺寸。
num_heads –平行注意頭。
dropout – attn_output_weights 上的 Dropout 層。默認(rèn)值：0.0
偏置 –將偏置添加為模塊參數(shù)。默認(rèn)值：True。
add_bias_kv –將偏差添加到鍵和值序列的 dim = 0。
add_zero_attn –將新一批零添加到調(diào)暗值為 1 的鍵和值序列。
kdim -密鑰中的功能總數(shù)。默認(rèn)值：無。
vdim -密鑰中的功能總數(shù)。默認(rèn)值：無。
注意 –如果 kdim 和 vdim 為 None，則將它們?cè)O(shè)置為 embed_dim，以便
鍵和值具有相同數(shù)量的功能。 (查詢，）–

Examples:

>>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
>>> attn_output, attn_output_weights = multihead_attn(query, key, value)

forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None)?

Parameters

鍵，值(查詢，）–將查詢和一組鍵值對(duì)映射到輸出。有關(guān)更多詳細(xì)信息，請(qǐng)參見“注意就是全部”。
key_padding_mask –如果提供，則將忽略按鍵中指定的填充元素。這是一個(gè)二進(jìn)制掩碼。當(dāng)值為 True 時(shí)，注意層上的相應(yīng)值將用-inf 填充。
need_weights -輸出 attn_output_weights。
attn_mask –防止注意某些位置的遮罩。這是一個(gè)附加蒙版(即這些值將添加到關(guān)注層）。

Shape:

輸入：
查詢：其中 L 是目標(biāo)序列長(zhǎng)度，N 是批處理大小，E 是嵌入維數(shù)。
密鑰：，其中 S 是源序列長(zhǎng)度，N 是批處理大小，E 是嵌入維數(shù)。
值：其中 S 是源序列長(zhǎng)度，N 是批處理大小，E 是嵌入維數(shù)。
key_padding_mask：，ByteTensor，其中 N 是批處理大小，S 是源序列長(zhǎng)度。
attn_mask：其中 L 是目標(biāo)序列長(zhǎng)度，S 是源序列長(zhǎng)度。
輸出：
attn_output：其中 L 是目標(biāo)序列長(zhǎng)度，N 是批處理大小，E 是嵌入維數(shù)。
attn_output_weights：其中 N 是批處理大小，L 是目標(biāo)序列長(zhǎng)度，S 是源序列長(zhǎng)度。

預(yù)備

class torch.nn.PReLU(num_parameters=1, init=0.25)?

Applies the element-wise function:

此處是可學(xué)習(xí)的參數(shù)。當(dāng)不帶參數(shù)調(diào)用時(shí)， <cite>nn.PReLU(）</cite>在所有輸入通道上使用單個(gè)參數(shù)。如果使用 <cite>nn.PReLU(nChannels）</cite>進(jìn)行調(diào)用，則每個(gè)輸入通道將使用單獨(dú)的。

Note

學(xué)習(xí)以獲得良好性能時(shí)，不應(yīng)使用重量衰減。

Note

通道暗淡是輸入的第二暗淡。當(dāng)輸入的亮度為< 2 時(shí)，則不存在通道的亮度，并且通道數(shù)= 1。

Parameters

num_parameters (python：int )–要學(xué)習(xí)的的數(shù)量。盡管將 int 作為輸入，但是只有兩個(gè)值是合法的：1，即輸入的通道數(shù)。默認(rèn)值：1
初始 (python：float )– 的初始值。默認(rèn)值：0.25

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

Variables

?PReLU.weight (tensor)–可學(xué)習(xí)的形狀權(quán)重(num_parameters）。

_images/PReLU.png

Examples:

>>> m = nn.PReLU()
>>> input = torch.randn(2)
>>> output = m(input)

ReLU

class torch.nn.ReLU(inplace=False)?

將整流的線性單位函數(shù)按元素應(yīng)用：

Parameters

inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/ReLU.png

Examples:

  >>> m = nn.ReLU()
  >>> input = torch.randn(2)
  >>> output = m(input)
An implementation of CReLU - https://arxiv.org/abs/1603.05201
  >>> m = nn.ReLU()
  >>> input = torch.randn(2).unsqueeze(0)
  >>> output = torch.cat((m(input),m(-input)))

ReLU6

class torch.nn.ReLU6(inplace=False)?

Applies the element-wise function:

Parameters

inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/ReLU6.png

Examples:

>>> m = nn.ReLU6()
>>> input = torch.randn(2)
>>> output = m(input)

RReLU

class torch.nn.RReLU(lower=0.125, upper=0.3333333333333333, inplace=False)?

如本文所述，按元素應(yīng)用隨機(jī)泄漏的整流襯套單元功能：

卷積網(wǎng)絡(luò)中修正激活的經(jīng)驗(yàn)評(píng)估。

該函數(shù)定義為：

其中是從均勻分布中隨機(jī)抽樣的。

參見： https://arxiv.org/pdf/1505.00853.pdf

Parameters

下-均勻分布的下限。默認(rèn)值：
上限 –均勻分布的上限。默認(rèn)值：
inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

Examples:

>>> m = nn.RReLU(0.1, 0.3)
>>> input = torch.randn(2)
>>> output = m(input)

SELU

class torch.nn.SELU(inplace=False)?

按元素應(yīng)用，例如：

和。

更多細(xì)節(jié)可以在論文自歸一化神經(jīng)網(wǎng)絡(luò)中找到。

Parameters

原位 (bool ，可選）–可以選擇就地進(jìn)行操作。默認(rèn)值：False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/SELU.png

Examples:

>>> m = nn.SELU()
>>> input = torch.randn(2)
>>> output = m(input)

中央圖書館

class torch.nn.CELU(alpha=1.0, inplace=False)?

Applies the element-wise function:

可以在論文連續(xù)微分指數(shù)線性單位中找到更多詳細(xì)信息。

Parameters

alpha – CELU 配方的值。默認(rèn)值：1.0
inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/CELU.png

Examples:

>>> m = nn.CELU()
>>> input = torch.randn(2)
>>> output = m(input)

格魯

class torch.nn.GELU?

應(yīng)用高斯誤差線性單位函數(shù)：

其中是高斯分布的累積分布函數(shù)。

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/GELU.png

Examples:

>>> m = nn.GELU()
>>> input = torch.randn(2)
>>> output = m(input)

乙狀結(jié)腸

class torch.nn.Sigmoid?

Applies the element-wise function:

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Sigmoid.png

Examples:

>>> m = nn.Sigmoid()
>>> input = torch.randn(2)
>>> output = m(input)

軟加

class torch.nn.Softplus(beta=1, threshold=20)?

Applies the element-wise function:

SoftPlus 是 ReLU 函數(shù)的平滑近似，可用于將機(jī)器的輸出約束為始終為正。

為了獲得數(shù)值穩(wěn)定性，對(duì)于超過一定值的輸入，實(shí)現(xiàn)將恢復(fù)為線性函數(shù)。

Parameters

beta – Softplus 制劑的值。默認(rèn)值：1
閾值 –高于此閾值的值恢復(fù)為線性函數(shù)。默認(rèn)值：20

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Softplus.png

Examples:

>>> m = nn.Softplus()
>>> input = torch.randn(2)
>>> output = m(input)

軟縮

class torch.nn.Softshrink(lambd=0.5)?

逐個(gè)應(yīng)用軟收縮功能：

Parameters

lambd –軟收縮配方的值。默認(rèn)值：0.5

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Softshrink.png

Examples:

>>> m = nn.Softshrink()
>>> input = torch.randn(2)
>>> output = m(input)

軟簽

class torch.nn.Softsign?

Applies the element-wise function:

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Softsign.png

Examples:

>>> m = nn.Softsign()
>>> input = torch.randn(2)
>>> output = m(input)

h

class torch.nn.Tanh?

Applies the element-wise function:

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Tanh.png

Examples:

>>> m = nn.Tanh()
>>> input = torch.randn(2)
>>> output = m(input)

Tanhshrink

class torch.nn.Tanhshrink?

Applies the element-wise function:

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

_images/Tanhshrink.png

Examples:

>>> m = nn.Tanhshrink()
>>> input = torch.randn(2)
>>> output = m(input)

閾

class torch.nn.Threshold(threshold, value, inplace=False)?

設(shè)置輸入張量的每個(gè)元素的閾值。

閾值定義為：

Parameters

閾值 –達(dá)到閾值的值
值 –要替換為的值
inplace – can optionally do the operation in-place. Default: False

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

Examples:

>>> m = nn.Threshold(0.1, 20)
>>> input = torch.randn(2)
>>> output = m(input)

非線性激活(其他）

軟敏

class torch.nn.Softmin(dim=None)?

將 Softmin 函數(shù)應(yīng)用于縮放后的 n 維輸入張量，以便 n 維輸出張量的元素在 <cite>[0，1]</cite> 范圍內(nèi)，總和為 1。

Softmin 定義為：

Shape:

輸入：其中 <cite>*</cite> 表示任意數(shù)量的附加尺寸
輸出：，形狀與輸入相同

Parameters

dim (python：int )–將計(jì)算 Softmin 的維度(因此，沿著 dim 的每個(gè)切片的總和為 1）。

Returns

與輸入具有相同尺寸和形狀的張量，其值在[0，1]范圍內(nèi)

Examples:

>>> m = nn.Softmin()
>>> input = torch.randn(2, 3)
>>> output = m(input)

軟最大

class torch.nn.Softmax(dim=None)?

將 Softmax 函數(shù)應(yīng)用于縮放后的 n 維輸入 Tensor，以使 n 維輸出 Tensor 的元素在[0,1]范圍內(nèi)，總和為 1。

Softmax 定義為：

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

Returns

與輸入具有相同尺寸和形狀的張量，其值在[0，1]范圍內(nèi)

Parameters

dim (python：int )–將計(jì)算 Softmax 的維度(因此，沿著 dim 的每個(gè)切片的總和為 1）。

Note

該模塊無法直接與 NLLLoss 配合使用，后者希望 Log 是在 Softmax 及其自身之間進(jìn)行計(jì)算的。請(qǐng)改用 <cite>LogSoftmax</cite> (速度更快，并且具有更好的數(shù)值屬性）。

Examples:

>>> m = nn.Softmax(dim=1)
>>> input = torch.randn(2, 3)
>>> output = m(input)

Softmax2d

class torch.nn.Softmax2d?

將 SoftMax 應(yīng)用于要素上的每個(gè)空間位置。

當(dāng)給定Channels x Height x Width的圖像時(shí)，它將 <cite>Softmax</cite> 應(yīng)用于每個(gè)位置

Shape:

輸入：
輸出：(形狀與輸入相同）

Returns

a Tensor of the same dimension and shape as the input with values in the range [0, 1]

Examples:

>>> m = nn.Softmax2d()
>>> # you softmax over the 2nd dimension
>>> input = torch.randn(2, 3, 12, 13)
>>> output = m(input)

LogSoftmax

class torch.nn.LogSoftmax(dim=None)?

將功能應(yīng)用于 n 維輸入張量。 LogSoftmax 公式可以簡(jiǎn)化為：

Shape:

Input: where <cite>*</cite> means, any number of additional dimensions
Output: , same shape as the input

Parameters

暗淡的 (python：int )–用來計(jì)算 LogSoftmax 的尺寸。

Returns

與輸入具有相同尺寸和形狀的張量，其值在[-inf，0）范圍內(nèi)

Examples:

>>> m = nn.LogSoftmax()
>>> input = torch.randn(2, 3)
>>> output = m(input)

AdaptiveLogSoftmaxWithLoss

class torch.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs, div_value=4.0, head_bias=False)?

如 Edouard Grave，Armand Joulin，MoustaphaCissé，David Grangier 和 HervéJégou 在中針對(duì) GPU 所述的高效 softmax 逼近。

自適應(yīng) softmax 是用于訓(xùn)練具有大輸出空間的模型的近似策略。當(dāng)標(biāo)簽分布高度不平衡時(shí)，例如在自然語言建模中，單詞頻率分布大致遵循 Zipf 定律時(shí)，此方法最為有效。

自適應(yīng) softmax 根據(jù)標(biāo)簽的頻率將標(biāo)簽劃分為幾個(gè)簇。這些集群每個(gè)可能包含不同數(shù)量的目標(biāo)。此外，包含較少標(biāo)簽的群集將較低維的嵌入分配給這些標(biāo)簽，從而加快了計(jì)算速度。對(duì)于每個(gè)小批量，僅評(píng)估至少存在一個(gè)目標(biāo)的集群。

這個(gè)想法是，頻繁訪問的集群(如第一個(gè)集群，包含最頻繁的標(biāo)簽），也應(yīng)該便宜計(jì)算-也就是說，包含少量分配的標(biāo)簽。

我們強(qiáng)烈建議您查看原始文件以了解更多詳細(xì)信息。

cutoffs應(yīng)該是按升序排序的有序整數(shù)序列。它控制集群的數(shù)量以及將目標(biāo)劃分為集群。例如，設(shè)置cutoffs = [10, 100, 1000]意味著第一個(gè) <cite>10 個(gè)</cite>目標(biāo)將分配給自適應(yīng) softmax 的“頭部”，目標(biāo) <cite>11、12，…，100 個(gè)</cite>將分配給第一個(gè)目標(biāo) 集群，目標(biāo) <cite>101、102，…，1000</cite> 將分配給第二個(gè)集群，而目標(biāo) <cite>1001、1002，…，n_classes-1</cite> 將分配給最后一個(gè)，第三個(gè) 簇。
div_value用于計(jì)算每個(gè)附加聚類的大小，以的形式給出，其中是聚類索引(具有較少索引的聚類具有較大索引，而聚類從開始）。
head_bias如果設(shè)置為 True，則會(huì)向自適應(yīng) softmax 的“頭部”添加一個(gè)偏差項(xiàng)。有關(guān)詳細(xì)信息，請(qǐng)參見紙張。在官方實(shí)現(xiàn)中設(shè)置為 False。

Warning

傳遞給該模塊的標(biāo)簽應(yīng)根據(jù)其頻率進(jìn)行分類。這意味著最頻繁的標(biāo)簽應(yīng)由索引 <cite>0</cite> 表示，最不頻繁的標(biāo)簽應(yīng)由索引 <cite>n_classes-1</cite> 表示。

Note

該模塊返回帶有output和loss字段的NamedTuple。有關(guān)詳細(xì)信息，請(qǐng)參見其他文檔。

Note

要計(jì)算所有類別的對(duì)數(shù)概率，可以使用log_prob方法。

Parameters

in_features (python：int )–輸入張量中的特征數(shù)
n_classes (python：int )–數(shù)據(jù)集中的類數(shù)
臨界值(序列）–用于將目標(biāo)分配給其存儲(chǔ)桶的臨界值
div_value (python：float ，可選）–用作計(jì)算集群大小的指數(shù)的值。默認(rèn)值：4.0
head_bias (bool ， 可選）–如果True，則向自適應(yīng) softmax 的“ head”添加一個(gè)偏差項(xiàng)。默認(rèn)值：False

Returns

輸出是大小為N的張量，其中包含每個(gè)示例的計(jì)算目標(biāo)對(duì)數(shù)概率
損失是表示計(jì)算出的負(fù)對(duì)數(shù)似然損失的標(biāo)量

Return type

具有output和loss字段的NamedTuple

Shape:

輸入：
目標(biāo)：其中每個(gè)值都滿足
輸出 1：
輸出 2：Scalar

log_prob(input)?

計(jì)算所有的日志概率

Parameters

輸入 (tensor)–小批量示例

Returns

范圍內(nèi)每個(gè)類別的對(duì)數(shù)概率，其中是傳遞給AdaptiveLogSoftmaxWithLoss構(gòu)造函數(shù)的參數(shù)。

Shape:

輸入：
輸出：

predict(input)?

這等效于 <cite>self.log_pob(input）.argmax(dim = 1）</cite>，但在某些情況下效率更高。

Parameters

input (Tensor) – a minibatch of examples

Returns

每個(gè)示例中概率最高的類別

Return type

輸出(張量）

Shape:

Input:
輸出：

歸一化層

BatchNorm1d

class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)?

如論文中所述，對(duì) 2D 或 3D 輸入(具有可選附加通道尺寸的 1D 輸入的微型批處理）應(yīng)用批歸一化：通過減少內(nèi)部協(xié)變量偏移加速深度網(wǎng)絡(luò)訓(xùn)練。

均值和標(biāo)準(zhǔn)偏差是在微型批次上按維度計(jì)算的，并且和是大小為 <cite>C</cite> 的可學(xué)習(xí)參數(shù)矢量(其中 <cite>C</cite> 是輸入大小 )。默認(rèn)情況下，的元素設(shè)置為 1，的元素設(shè)置為 0。

同樣默認(rèn)情況下，在訓(xùn)練過程中，該層會(huì)繼續(xù)對(duì)其計(jì)算的均值和方差進(jìn)行估算，然后將其用于評(píng)估期間的標(biāo)準(zhǔn)化。運(yùn)行估計(jì)保持默認(rèn)值momentum 0.1。

如果track_running_stats設(shè)置為False，則此層將不保持運(yùn)行估計(jì)，而是在評(píng)估期間也使用批處理統(tǒng)計(jì)信息。

Note

momentum參數(shù)不同于優(yōu)化程序類中使用的參數(shù)以及傳統(tǒng)的動(dòng)量概念。在數(shù)學(xué)上，此處用于運(yùn)行統(tǒng)計(jì)信息的更新規(guī)則為，其中是估計(jì)的統(tǒng)計(jì)信息，是新的觀測(cè)值。

由于批量歸一化是在 <cite>C</cite> 維度上完成的，因此要計(jì)算<cite>(N，L）</cite>切片的統(tǒng)計(jì)信息，因此通常將其稱為“時(shí)間批量歸一化”。

Parameters

num_features – 來自大小為的預(yù)期輸入，或來自大小為的輸入
eps –分母增加的值，以保證數(shù)值穩(wěn)定性。默認(rèn)值：1e-5
動(dòng)量 –用于 running_mean 和 running_var 計(jì)算的值。可以設(shè)置為None以獲得累積移動(dòng)平均線(即簡(jiǎn)單平均線）。默認(rèn)值：0.1
仿射 –一個(gè)布爾值，當(dāng)設(shè)置為True時(shí)，此模塊具有可學(xué)習(xí)的仿射參數(shù)。默認(rèn)值：True
track_running_stats –一個(gè)布爾值，設(shè)置為True時(shí)，此模塊跟蹤運(yùn)行平均值和方差；設(shè)置為False時(shí)，此模塊不跟蹤此類統(tǒng)計(jì)信息，并且始終使用批處理統(tǒng)計(jì)信息訓(xùn)練和評(píng)估模式。默認(rèn)值：True

Shape:

輸入：或
輸出：或(形狀與輸入相同）

Examples:

>>> # With Learnable Parameters
>>> m = nn.BatchNorm1d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm1d(100, affine=False)
>>> input = torch.randn(20, 100)
>>> output = m(input)

BatchNorm2d

class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)?

如論文中所述，對(duì) 4D 輸入(具有附加通道尺寸的 2D 輸入的微型批處理）應(yīng)用批歸一化：通過減少內(nèi)部協(xié)變量偏移來加速深度網(wǎng)絡(luò)訓(xùn)練。

The mean and standard-deviation are calculated per-dimension over the mini-batches and and are learnable parameter vectors of size <cite>C</cite> (where <cite>C</cite> is the input size). By default, the elements of are set to 1 and the elements of are set to 0.

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.

If track_running_stats is set to False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

Note

This momentum argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is , where is the estimated statistic and is the new observed value.

由于批量歸一化是在 <cite>C</cite> 維度上完成的，因此要計(jì)算<cite>(N，H，W）</cite>切片的統(tǒng)計(jì)信息，因此通常將其稱為“空間批量歸一化”。

Parameters

num_features – 來自大小為的預(yù)期輸入
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

Shape:

Input:
Output: (same shape as input)

Examples:

>>> # With Learnable Parameters
>>> m = nn.BatchNorm2d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm2d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)

BatchNorm3d

class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)?

如論文中所述，對(duì) 5D 輸入(具有附加通道尺寸的 3D 輸入的微型批處理）應(yīng)用批歸一化：通過減少內(nèi)部協(xié)變量偏移加速深度網(wǎng)絡(luò)訓(xùn)練。

If track_running_stats is set to False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

Note

由于批量歸一化是在 <cite>C</cite> 維度上完成的，因此要計(jì)算<cite>(N，D，H，W）</cite>切片的統(tǒng)計(jì)信息，因此通常將這種體積批量歸一化或時(shí)空稱為術(shù)語批處理規(guī)范化。

Parameters

num_features – 來自大小為的預(yù)期輸入
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

Shape:

輸入：
輸出：(形狀與輸入相同）

Examples:

>>> # With Learnable Parameters
>>> m = nn.BatchNorm3d(100)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm3d(100, affine=False)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)

集團(tuán)規(guī)范

class torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)?

如論文組歸一化中所述，將組歸一化應(yīng)用于微型輸入。

輸入通道分為num_groups組，每個(gè)組包含num_channels / num_groups通道。均值和標(biāo)準(zhǔn)差在每個(gè)組中分別計(jì)算。如果affine為True，則和是大小為num_channels的可學(xué)習(xí)的每通道仿射變換參數(shù)矢量。

該層使用在訓(xùn)練和評(píng)估模式下從輸入數(shù)據(jù)中計(jì)算出的統(tǒng)計(jì)信息。

Parameters

num_groups (python：int )–將通道分隔為的組數(shù)
num_channels (python：int )–輸入中預(yù)期的通道數(shù)
eps –分母增加的值，以保證數(shù)值穩(wěn)定性。默認(rèn)值：1e-5
仿射 –一個(gè)布爾值，當(dāng)設(shè)置為True時(shí)，此模塊具有可學(xué)習(xí)的每通道仿射參數(shù)，分別初始化為 1(用于權(quán)重）和零(用于偏差）。默認(rèn)值：True。

Shape:

輸入：，其中
輸出：(形狀與輸入相同）

Examples:

>>> input = torch.randn(20, 6, 10, 10)
>>> # Separate 6 channels into 3 groups
>>> m = nn.GroupNorm(3, 6)
>>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
>>> m = nn.GroupNorm(6, 6)
>>> # Put all 6 channels into a single group (equivalent with LayerNorm)
>>> m = nn.GroupNorm(1, 6)
>>> # Activating the module
>>> output = m(input)

SyncBatchNorm

class torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, process_group=None)?

如論文[中所述，對(duì) N 維輸入(具有附加通道維的N-2] D 輸入的小批量）進(jìn)行批量歸一化批量歸一化：通過減少內(nèi)部協(xié)變量偏移加速深度網(wǎng)絡(luò)訓(xùn)練。

均值和標(biāo)準(zhǔn)偏差是在同一過程組的所有微型批次中按維度計(jì)算的。和是大小為 <cite>C</cite> (其中 <cite>C</cite> 是輸入大小）的可學(xué)習(xí)參數(shù)向量。默認(rèn)情況下，從采樣的元素，并將的元素設(shè)置為 0。

If track_running_stats is set to False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

Note

由于批量歸一化是在 <cite>C</cite> 維度上完成的，因此要計(jì)算<cite>(N，+）</cite>切片的統(tǒng)計(jì)信息，因此通常將其稱為體積批量歸一化或時(shí)空批量歸一化。

當(dāng)前，SyncBatchNorm 僅支持每個(gè)進(jìn)程具有單個(gè) GPU 的 DistributedDataParallel。在使用 DDP 包裝網(wǎng)絡(luò)之前，使用 torch.nn.SyncBatchNorm.convert_sync_batchnorm(）將 BatchNorm 層轉(zhuǎn)換為 SyncBatchNorm。

Parameters

num_features – 來自大小為的預(yù)期輸入
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True
process_group –統(tǒng)計(jì)信息的同步分別在每個(gè)進(jìn)程組內(nèi)發(fā)生。默認(rèn)行為是在整個(gè)世界范圍內(nèi)同步

Shape:

輸入：
輸出：(形狀與輸入相同）

Examples:

>>> # With Learnable Parameters
>>> m = nn.SyncBatchNorm(100)
>>> # creating process group (optional)
>>> # process_ids is a list of int identifying rank ids.
>>> process_group = torch.distributed.new_group(process_ids)
>>> # Without Learnable Parameters
>>> m = nn.BatchNorm3d(100, affine=False, process_group=process_group)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)
>>> # network is nn.BatchNorm layer
>>> sync_bn_network = nn.SyncBatchNorm.convert_sync_batchnorm(network, process_group)
>>> # only single gpu per process is currently supported
>>> ddp_sync_bn_network = torch.nn.parallel.DistributedDataParallel(
>>>                         sync_bn_network,
>>>                         device_ids=[args.local_rank],
>>>                         output_device=args.local_rank)

classmethod convert_sync_batchnorm(module, process_group=None)?

輔助函數(shù)將模型中的 <cite>torch.nn.BatchNormND</cite> 層轉(zhuǎn)換為 <cite>torch.nn.SyncBatchNorm</cite> 層。

Parameters

模塊 (nn.Module)–包含模塊
process_group (可選）–進(jìn)程組到范圍的同步，

默認(rèn)是整個(gè)世界

Returns

具有轉(zhuǎn)換后的 <cite>torch.nn.SyncBatchNorm</cite> 層的原始模塊

Example:

>>> # Network with nn.BatchNorm layer
>>> module = torch.nn.Sequential(
>>>            torch.nn.Linear(20, 100),
>>>            torch.nn.BatchNorm1d(100)
>>>          ).cuda()
>>> # creating process group (optional)
>>> # process_ids is a list of int identifying rank ids.
>>> process_group = torch.distributed.new_group(process_ids)
>>> sync_bn_module = convert_sync_batchnorm(module, process_group)

InstanceNorm1d

class torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)?

如論文中所述，將實(shí)例標(biāo)準(zhǔn)化應(yīng)用于 3D 輸入(具有可選附加通道尺寸的 1D 輸入的微型批處理）實(shí)例標(biāo)準(zhǔn)化：快速樣式化的缺失成分。

微型批處理中每個(gè)對(duì)象的維數(shù)均值和標(biāo)準(zhǔn)差分別計(jì)算。如果affine為True，則和是大小為 <cite>C</cite> (其中 <cite>C</cite> 為輸入大小）的可學(xué)習(xí)參數(shù)矢量。

默認(rèn)情況下，該層使用在訓(xùn)練和評(píng)估模式下從輸入數(shù)據(jù)計(jì)算出的實(shí)例統(tǒng)計(jì)信息。

如果track_running_stats設(shè)置為True，則在訓(xùn)練過程中，此層將繼續(xù)對(duì)其計(jì)算的均值和方差進(jìn)行估算，然后將其用于評(píng)估期間的標(biāo)準(zhǔn)化。運(yùn)行估計(jì)保持默認(rèn)值momentum 0.1。

Note

InstanceNorm1d 和 LayerNorm 非常相似，但有一些細(xì)微的差異。 InstanceNorm1d 應(yīng)用于多維數(shù)據(jù)序列之類的通道數(shù)據(jù)的每個(gè)通道，但是 LayerNorm 通常應(yīng)用于整個(gè)樣本，并且通常用于 NLP 任務(wù)。另外， LayerNorm 應(yīng)用逐元素仿射變換，而 InstanceNorm1d 通常不應(yīng)用仿射變換。

Parameters

num_features – from an expected input of size or from input of size
eps – a value added to the denominator for numerical stability. Default: 1e-5
動(dòng)量 –用于 running_mean 和 running_var 計(jì)算的值。默認(rèn)值：0.1
仿射 –一個(gè)布爾值，當(dāng)設(shè)置為True時(shí)，此模塊具有可學(xué)習(xí)的仿射參數(shù)，其初始化方式與批量標(biāo)準(zhǔn)化相同。默認(rèn)值：False。
track_running_stats –一個(gè)布爾值，設(shè)置為True時(shí)，此模塊跟蹤運(yùn)行平均值和方差；設(shè)置為False時(shí)，此模塊不跟蹤此類統(tǒng)計(jì)信息，并且始終使用批處理統(tǒng)計(jì)信息訓(xùn)練和評(píng)估模式。默認(rèn)值：False

Shape:

輸入：
輸出：(形狀與輸入相同）

Examples:

>>> # Without Learnable Parameters
>>> m = nn.InstanceNorm1d(100)
>>> # With Learnable Parameters
>>> m = nn.InstanceNorm1d(100, affine=True)
>>> input = torch.randn(20, 100, 40)
>>> output = m(input)

InstanceNorm2d

class torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)?

如論文中所述，將實(shí)例標(biāo)準(zhǔn)化應(yīng)用于 4D 輸入(具有附加通道尺寸的 2D 輸入的小批量）實(shí)例標(biāo)準(zhǔn)化：快速樣式化的缺失成分。

The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. and are learnable parameter vectors of size <cite>C</cite> (where <cite>C</cite> is the input size) if affine is True.

By default, this layer uses instance statistics computed from input data in both training and evaluation modes.

If track_running_stats is set to True, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.

Note

InstanceNorm2d 和 LayerNorm 非常相似，但有一些細(xì)微的差異。 InstanceNorm2d 適用于 RGB 圖像之類的通道數(shù)據(jù)的每個(gè)通道，但是 LayerNorm 通常適用于整個(gè)樣本，并且通常用于 NLP 任務(wù)。另外， LayerNorm 應(yīng)用逐元素仿射變換，而 InstanceNorm2d 通常不應(yīng)用仿射變換。

Parameters

num_features – from an expected input of size
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: False

Shape:

Input:
Output: (same shape as input)

Examples:

>>> # Without Learnable Parameters
>>> m = nn.InstanceNorm2d(100)
>>> # With Learnable Parameters
>>> m = nn.InstanceNorm2d(100, affine=True)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)

InstanceNorm3d

class torch.nn.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)?

如論文中所述，將實(shí)例標(biāo)準(zhǔn)化應(yīng)用于 5D 輸入(具有附加通道尺寸的 3D 輸入的微型批處理）實(shí)例標(biāo)準(zhǔn)化：快速樣式化缺少的成分。

微型批處理中每個(gè)對(duì)象的維數(shù)均值和標(biāo)準(zhǔn)差分別計(jì)算。如果affine為True，則和是大小為 C 的可學(xué)習(xí)參數(shù)矢量(其中 C 為輸入大?。?/p>

By default, this layer uses instance statistics computed from input data in both training and evaluation modes.

Note

InstanceNorm3d 和 LayerNorm 非常相似，但有一些細(xì)微的差異。 InstanceNorm3d 適用于通道數(shù)據(jù)的每個(gè)通道，例如具有 RGB 顏色的 3D 模型，但 LayerNorm 通常適用于整個(gè)樣本，并且經(jīng)常用于 NLP 任務(wù)。另外， LayerNorm 應(yīng)用逐元素仿射變換，而 InstanceNorm3d 通常不應(yīng)用仿射變換。

Parameters

num_features – from an expected input of size
eps – a value added to the denominator for numerical stability. Default: 1e-5
momentum – the value used for the running_mean and running_var computation. Default: 0.1
affine – a boolean value that when set to True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: False.
track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: False

Shape:

Input:
Output: (same shape as input)

Examples:

>>> # Without Learnable Parameters
>>> m = nn.InstanceNorm3d(100)
>>> # With Learnable Parameters
>>> m = nn.InstanceNorm3d(100, affine=True)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)

層范數(shù)

class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)?

如論文層規(guī)范化中所述，在小批量輸入上應(yīng)用層規(guī)范化。

平均值和標(biāo)準(zhǔn)偏差是在最后一定數(shù)量的尺寸上分別計(jì)算的，這些尺寸必須具有normalized_shape指定的形狀。如果elementwise_affine為True，則和是normalized_shape的可學(xué)習(xí)仿射變換參數(shù)。

Note

與批量歸一化和實(shí)例歸一化不同，后者使用affine選項(xiàng)對(duì)每個(gè)通道/平面應(yīng)用標(biāo)量縮放和偏置，而層歸一化則使用elementwise_affine對(duì)每個(gè)元素縮放和偏置。

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters

normalized_shape (python：int 或列表或 torch尺寸）–

輸入尺寸的預(yù)期輸入

如果使用單個(gè)整數(shù)，則將其視為一個(gè)單例列表，并且此模塊將在最后一個(gè)預(yù)期為該特定大小的維度上進(jìn)行規(guī)范化。

eps – a value added to the denominator for numerical stability. Default: 1e-5

elementwise_affine –一個(gè)布爾值，當(dāng)設(shè)置為True時(shí)，此模塊具有可學(xué)習(xí)的按元素仿射參數(shù)，分別初始化為 1(用于權(quán)重）和零(用于偏差）。默認(rèn)值：True。

Shape:

輸入：
輸出：(形狀與輸入相同）

Examples:

>>> input = torch.randn(20, 5, 10, 10)
>>> # With Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:])
>>> # Without Learnable Parameters
>>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False)
>>> # Normalize over last two dimensions
>>> m = nn.LayerNorm([10, 10])
>>> # Normalize over last dimension of size 10
>>> m = nn.LayerNorm(10)
>>> # Activating the module
>>> output = m(input)

LocalResponseNorm

class torch.nn.LocalResponseNorm(size, alpha=0.0001, beta=0.75, k=1.0)?

在由多個(gè)輸入平面組成的輸入信號(hào)上應(yīng)用本地響應(yīng)歸一化，其中通道占據(jù)第二維。跨通道應(yīng)用標(biāo)準(zhǔn)化。

Parameters

大小 –用于標(biāo)準(zhǔn)化的相鄰信道的數(shù)量
alpha –乘法因子。默認(rèn)值：0.0001
beta -指數(shù)。默認(rèn)值：0.75
k –加法因子。默認(rèn)值：1

Shape:

Input:
Output: (same shape as input)

Examples:

>>> lrn = nn.LocalResponseNorm(2)
>>> signal_2d = torch.randn(32, 5, 24, 24)
>>> signal_4d = torch.randn(16, 5, 7, 7, 7, 7)
>>> output_2d = lrn(signal_2d)
>>> output_4d = lrn(signal_4d)

循環(huán)層

RNN 庫

class torch.nn.RNNBase(mode, input_size, hidden_size, num_layers=1, bias=True, batch_first=False, dropout=0.0, bidirectional=False)?

flatten_parameters()?

重置參數(shù)數(shù)據(jù)指針，以便它們可以使用更快的代碼路徑。

現(xiàn)在，僅當(dāng)模塊在 GPU 上并且啟用 cuDNN 時(shí)，此方法才有效。否則，這是無人操作。

RNN

class torch.nn.RNN(*args, **kwargs)?

將具有或非線性的多層 Elman RNN 應(yīng)用于輸入序列。

對(duì)于輸入序列中的每個(gè)元素，每一層都會(huì)計(jì)算以下功能：

其中是時(shí)間 <cite>t</cite> 的隱藏狀態(tài)，是時(shí)間 <cite>t</cite> 的輸入，而是時(shí)間<cite>的上一層的隱藏狀態(tài) ] t-1</cite> 或時(shí)間 <cite>0</cite> 時(shí)的初始隱藏狀態(tài)。如果nonlinearity為'relu'，則使用 <cite>ReLU</cite> 代替 <cite>tanh</cite> 。

Parameters

input_size -輸入 <cite>x</cite> 中預(yù)期功能的數(shù)量
hidden_size –處于隱藏狀態(tài)的特征數(shù) <cite>h</cite>
num_layers –循環(huán)層數(shù)。例如，設(shè)置num_layers=2意味著將兩個(gè) RNN 堆疊在一起以形成<cite>堆疊的 RNN</cite> ，而第二個(gè) RNN 則接收第一個(gè) RNN 的輸出并計(jì)算最終結(jié)果。默認(rèn)值：1
非線性 –使用的非線性。可以是'tanh'或'relu'。默認(rèn)值：'tanh'
偏置-如果False，則該層不使用偏置權(quán)重 <cite>b_ih</cite> 和 <cite>b_hh</cite> 。默認(rèn)值：True
batch_first –如果為True，則輸入和輸出張量以<cite>(批，序列，特征）</cite>的形式提供。默認(rèn)值：False
dropout –如果不為零，則在除最后一層之外的每個(gè) RNN 層的輸出上引入 <cite>Dropout</cite> 層，其丟棄概率等于dropout。默認(rèn)值：0
雙向 –如果True成為雙向 RNN。默認(rèn)值：False

Inputs: input, h_0

形狀為<cite>(seq_len，批處理，input_size）的輸入：</cite>：包含輸入序列特征的張量。輸入也可以是打包的可變長(zhǎng)度序列。有關(guān)詳細(xì)信息，請(qǐng)參見 torch.nn.utils.rnn.pack_padded_sequence() 或 torch.nn.utils.rnn.pack_sequence() 。
h*0 <cite>的形狀為<cite>(num_layers * num_directions，批處理，hidden*size）</cite>：張量，包含批處理中每個(gè)元素的初始隱藏狀態(tài)。如果未提供，則默認(rèn)為零。如果 RNN 是雙向的，則 num_directions 應(yīng)該為 2，否則應(yīng)為 1。</cite>

Outputs: output, h_n

輸出形狀為<cite>(seq*len，批處理，num_directions * hidden*size）的</cite> ：張量包含來自 RNN 的最后一層的輸出特征 (<cite>h_t</cite> )，對(duì)于每個(gè)[ <cite>t</cite> 。如果已將 torch.nn.utils.rnn.PackedSequence 作為輸入，則輸出也將是打包序列。

對(duì)于未包裝的情況，可以使用output.view(seq_len, batch, num_directions, hidden_size)分隔方向，向前和向后分別是方向 <cite>0</cite> 和 <cite>1</cite> 。同樣，在包裝好的情況下，方向也可以分開。

h*n <cite>的形狀為<cite>(num_layers * num_directions，批處理，hidden*size）</cite>：包含 <cite>t = seq_len</cite> 的隱藏狀態(tài)的張量。</cite>

像輸出一樣，可以使用h_n.view(num_layers, num_directions, batch, hidden_size)分離各層。

Shape:

Input1：包含輸入特征的張量，其中和 <cite>L</cite> 表示序列長(zhǎng)度。
Input2：張量，包含批次中每個(gè)元素的初始隱藏狀態(tài)。如果未提供，則默認(rèn)為零。其中如果 RNN 是雙向的，則 num_directions 應(yīng)該為 2，否則應(yīng)為 1。
輸出 1：，其中
輸出 2：張量，包含批次中每個(gè)元素的下一個(gè)隱藏狀態(tài)

Variables

?RNN.weight_ih_l [k] –第 k 層的可學(xué)習(xí)的輸入隱藏權(quán)重，形狀為<cite>(hiddensize，input_size）</cite>，其中 <cite>k = 0</cite> 。否則，形狀為<cite>(hiddensize，numdirections * hiddensize）</cite>
?RNN.weight_hh_l [k] –第 k 層可學(xué)習(xí)的隱藏權(quán)重，形狀為<cite>(hiddensize，hiddensize）</cite>
?RNN.bias_ih_l [k] –第 k 層的可學(xué)習(xí)的輸入隱藏偏差，形狀為<cite>(hidden_size）</cite>
?RNN.bias_hh_l [k] –第 k 層的可學(xué)習(xí)的隱藏偏差，形狀為<cite>(hidden_size）</cite>

Note

所有權(quán)重和偏差均從初始化，其中

Note

如果滿足以下條件：1）啟用 cudnn，2）輸入數(shù)據(jù)在 GPU 上 3）輸入數(shù)據(jù)具有 dtype torch.float16 4）使用 V100 GPU，5）輸入數(shù)據(jù)不是PackedSequence格式的持久算法可以選擇以提高性能。

Examples:

>>> rnn = nn.RNN(10, 20, 2)
>>> input = torch.randn(5, 3, 10)
>>> h0 = torch.randn(2, 3, 20)
>>> output, hn = rnn(input, h0)

LSTM

class torch.nn.LSTM(*args, **kwargs)?

將多層長(zhǎng)短期記憶(LSTM）RNN 應(yīng)用于輸入序列。

For each element in the input sequence, each layer computes the following function:

其中是時(shí)間 <cite>t</cite> 的隱藏狀態(tài)，是時(shí)間 <cite>t</cite> 的單元狀態(tài)，是時(shí)間 <cite>t</cite> 的輸入，是時(shí)間 <cite>t-1</cite> 時(shí)層的隱藏狀態(tài)，或者是時(shí)間 <cite>0</cite> 時(shí)的初始隱藏狀態(tài)，以及，，，分別是輸入，忘記，單元和輸出門。是 S 型函數(shù)，是 Hadamard 乘積。

在多層 LSTM 中，第層(）的輸入是前一層的隱藏狀態(tài)乘以壓降，其中每個(gè)是伯努利隨機(jī)變量概率為dropout的是。

Parameters

input_size – The number of expected features in the input <cite>x</cite>
hidden_size – The number of features in the hidden state <cite>h</cite>
num_layers –循環(huán)層數(shù)。例如，設(shè)置num_layers=2意味著將兩個(gè) LSTM 堆疊在一起以形成<cite>堆疊的 LSTM</cite> ，而第二個(gè) LSTM 則接收第一個(gè) LSTM 的輸出并計(jì)算最終結(jié)果。默認(rèn)值：1
bias – If False, then the layer does not use bias weights <cite>b_ih</cite> and <cite>b_hh</cite>. Default: True
batch_first –如果為True，則輸入和輸出張量按(batch，seq，feature）提供。默認(rèn)值：False
dropout –如果不為零，則在除最后一層以外的每個(gè) LSTM 層的輸出上引入 <cite>Dropout</cite> 層，其丟棄概率等于dropout。默認(rèn)值：0
雙向 –如果True變?yōu)殡p向 LSTM。默認(rèn)值：False

Inputs: input, (h_0, c_0)

形狀為<cite>(seq_len，批處理，input_size）的輸入：</cite>：包含輸入序列特征的張量。輸入也可以是打包的可變長(zhǎng)度序列。有關(guān)詳細(xì)信息，請(qǐng)參見 torch.nn.utils.rnn.pack_padded_sequence() 或 torch.nn.utils.rnn.pack_sequence() 。

h*0 <cite>的形狀為<cite>(num_layers * num_directions，批處理，hidden*size）</cite>：張量，包含批處理中每個(gè)元素的初始隱藏狀態(tài)。如果 LSTM 是雙向的，則 num_directions 應(yīng)該為 2，否則應(yīng)為 1。</cite>

c_0 的形狀為<cite>(numlayers * num_directions，批處理，hiddensize）</cite>：張量，包含批處理中每個(gè)元素的初始單元狀態(tài)。

如果未提供<cite>(h_0，c_0）</cite>，則 h_0 和 c_0 均默認(rèn)為零。

Outputs: output, (h_n, c_n)

每個(gè)[[G] <cite>t</cite> 。如果已將 torch.nn.utils.rnn.PackedSequence 作為輸入，則輸出也將是打包序列。

For the unpacked case, the directions can be separated using output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction <cite>0</cite> and <cite>1</cite> respectively. Similarly, the directions can be separated in the packed case.

h_n of shape <cite>(num_layers * num_directions, batch, hidden_size)</cite>: tensor containing the hidden state for <cite>t = seq_len</cite>.

像輸出一樣，可以使用h_n.view(num_layers, num_directions, batch, hidden_size)分隔各層，并且對(duì)于 c_n 同樣。

c*n <cite>的形狀為<cite>(num_layers * num_directions，batch，hidden*size）</cite>：張量，其中包含 <cite>t = seq_len</cite> 的像元狀態(tài)。</cite>

Variables

?LSTM.weight_ih_l [k] – <cite>形狀為<cite>(4 <cite>k = 0</cite> 的[hidden_size，input_size）</cite>。否則，形狀為<cite>(4 hiddensize，num_directions * hiddensize）</cite></cite>
?LSTM.weight_hh_l [k] – 層<cite>(Whi | W_hf | W_hg | W_ho）</cite>的可學(xué)習(xí)的隱藏權(quán)重，形狀為<cite>(4 * hiddensize，hidden_size）</cite>
?LSTM.bias_ih_l [k] – 層<cite>(bii | b_if | b_ig | b_io）</cite>的可學(xué)習(xí)輸入隱藏偏置，形狀為<cite>(4 * hiddensize）</cite>
?LSTM.bias_hh_l [k] – 層<cite>(bhi | b_hf | b_hg | b_ho）</cite>的可學(xué)習(xí)的隱藏偏置，形狀為<cite>(4 * hiddensize）</cite>

Note

All the weights and biases are initialized from where

Note

If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype torch.float16 4) V100 GPU is used, 5) input data is not in PackedSequence format persistent algorithm can be selected to improve performance.

Examples:

>>> rnn = nn.LSTM(10, 20, 2)
>>> input = torch.randn(5, 3, 10)
>>> h0 = torch.randn(2, 3, 20)
>>> c0 = torch.randn(2, 3, 20)
>>> output, (hn, cn) = rnn(input, (h0, c0))

格魯

class torch.nn.GRU(*args, **kwargs)?

將多層門控循環(huán)單元(GRU）RNN 應(yīng)用于輸入序列。

For each element in the input sequence, each layer computes the following function:

其中是時(shí)間 <cite>t</cite> 的隱藏狀態(tài)，是時(shí)間 <cite>t</cite> 的輸入，是時(shí)間 <cite>t 時(shí)層的隱藏狀態(tài) -1</cite> 或時(shí)間 <cite>0</cite> 時(shí)的初始隱藏狀態(tài)，以及，和分別是復(fù)位門，更新門和新門。是 S 型函數(shù)，是 Hadamard 乘積。

在多層 GRU 中，第層(）的輸入是前一層的隱藏狀態(tài)乘以壓降，其中每個(gè)是伯努利隨機(jī)變量概率為dropout的是。

Parameters

input_size – The number of expected features in the input <cite>x</cite>
hidden_size – The number of features in the hidden state <cite>h</cite>
num_layers –循環(huán)層數(shù)。例如，設(shè)置num_layers=2意味著將兩個(gè) GRU 堆疊在一起以形成<cite>堆疊的 GRU</cite> ，而第二個(gè) GRU 則接收第一個(gè) GRU 的輸出并計(jì)算最終結(jié)果。默認(rèn)值：1
bias – If False, then the layer does not use bias weights <cite>b_ih</cite> and <cite>b_hh</cite>. Default: True
batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
丟失 –如果不為零，則在除最后一層之外的每個(gè) GRU 層的輸出上引入<cite>丟失</cite>層，丟失概率等于dropout。默認(rèn)值：0
雙向 –如果True成為雙向 GRU。默認(rèn)值：False

Inputs: input, h_0

形狀為<cite>(seq_len，批處理，input_size）的輸入：</cite>：包含輸入序列特征的張量。輸入也可以是打包的可變長(zhǎng)度序列。有關(guān)詳細(xì)信息，請(qǐng)參見 torch.nn.utils.rnn.pack_padded_sequence() 。
h_0 of shape <cite>(num_layers * num_directions, batch, hidden_size)</cite>: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs: output, h_n

輸出形狀為<cite>(seq*len，batch，num_directions * hidden*size）的</cite>：：對(duì)于每個(gè) <cite>t</cite> ，張量包含來自 GRU 最后一層的輸出特征 h_t。如果已給定 torch.nn.utils.rnn.PackedSequence ，則輸出也將是打包序列。對(duì)于未包裝的情況，可以使用output.view(seq_len, batch, num_directions, hidden_size)分離方向，向前和向后分別是方向 <cite>0</cite> 和 <cite>1</cite> 。

同樣，在包裝好的情況下，方向也可以分開。

h_n 的形狀為<cite>(numlayers * num_directions，批處理，hiddensize）</cite>：張量，其中包含<cite>的隱藏狀態(tài) t = seq_len</cite>

Like output, the layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size).

Shape:

Input1: tensor containing input features where and <cite>L</cite> represents a sequence length.
Input2: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. where If the RNN is bidirectional, num_directions should be 2, else it should be 1.
Output1: where
Output2: tensor containing the next hidden state for each element in the batch

Variables

?GRU.weight_ih_l [k] – 層(Wir | W_iz | W_in）的可學(xué)習(xí)的輸入隱藏權(quán)重，形狀為<cite>(3 * hiddensize，inputsize）</cite> <cite>k = 0</cite> 。否則，形狀為<cite>(3 * hiddensize，numdirections * hiddensize）</cite>
?GRU.weight_hh_l [k] – 層(Whr | W_hz | W_hn）的可學(xué)習(xí)隱藏權(quán)重，形狀為<cite>(3 * hiddensize，hidden_size）</cite>
?GRU.bias_ih_l [k] – 層(bir | b_iz | b_in）的可學(xué)習(xí)輸入隱藏偏置，形狀為<cite>(3 * hiddensize）</cite>
?GRU.bias_hh_l [k] – 層(bhr | b_hz | b_hn）的可學(xué)習(xí)的隱藏偏置，形狀為<cite>(3 * hiddensize）</cite>

Note

All the weights and biases are initialized from where

Note

Examples:

>>> rnn = nn.GRU(10, 20, 2)
>>> input = torch.randn(5, 3, 10)
>>> h0 = torch.randn(2, 3, 20)
>>> output, hn = rnn(input, h0)

核細(xì)胞

class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')?

具有 tanh 或 ReLU 非線性的 Elman RNN 單元。

如果nonlinearity為<cite>'relu'</cite>，則使用 ReLU 代替 tanh。

Parameters

input_size – The number of expected features in the input <cite>x</cite>
hidden_size – The number of features in the hidden state <cite>h</cite>
bias – If False, then the layer does not use bias weights <cite>b_ih</cite> and <cite>b_hh</cite>. Default: True
nonlinearity – The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

Inputs: input, hidden

形狀為<cite>的輸入</cite>(批處理，input_size）：包含輸入特征的張量
形狀為<cite>(批次，hidden_size）</cite>的隱藏：張量，其中包含批次中每個(gè)元素的初始隱藏狀態(tài)。如果未提供，則默認(rèn)為零。

Outputs: h'

h'的形狀為<cite>(批處理，hidden_size）</cite>：張量，其中包含批次中每個(gè)元素的下一個(gè)隱藏狀態(tài)

Shape:

Input1：包含輸入特征的張量，其中 = <cite> input_size</cite>
Input2：張量，包含批處理中每個(gè)元素的初始隱藏狀態(tài)，其中 = <cite>hidden_size</cite> 如果未提供，則默認(rèn)為零。
輸出：張量包含批處理中每個(gè)元素的下一個(gè)隱藏狀態(tài)

Variables

?RNNCell.weight_ih –可學(xué)習(xí)的輸入隱藏權(quán)重，形狀為<cite>(hidden_size，input_size）</cite>
?RNNCell.weight_hh –可學(xué)習(xí)的隱藏權(quán)重，形狀為<cite>(hiddensize，hiddensize）</cite>
?RNNCell.bias_ih –形狀為<cite>(hidden_size）</cite>的可學(xué)習(xí)的隱藏輸入偏差
?RNNCell.bias_hh –形狀為<cite>(hidden_size）</cite>的可學(xué)習(xí)的隱藏偏差。

Note

All the weights and biases are initialized from where

Examples:

>>> rnn = nn.RNNCell(10, 20)
>>> input = torch.randn(6, 3, 10)
>>> hx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
        hx = rnn(input[i], hx)
        output.append(hx)

LSTM 單元

class torch.nn.LSTMCell(input_size, hidden_size, bias=True)?

長(zhǎng)短期記憶(LSTM）單元。

其中是 S 型函數(shù)，是 Hadamard 乘積。

Parameters

input_size – The number of expected features in the input <cite>x</cite>
hidden_size – The number of features in the hidden state <cite>h</cite>
偏置-如果False，則該層不使用偏置權(quán)重 <cite>b_ih</cite> 和 <cite>b_hh</cite> 。默認(rèn)值：True

Inputs: input, (h_0, c_0)

input of shape <cite>(batch, input_size)</cite>: tensor containing input features

h_0 <cite>的形狀為</cite>(批處理，hidden_size）：張量，其中包含批次中每個(gè)元素的初始隱藏狀態(tài)。

c_0 的形狀為<cite>(批處理，hidden_size）</cite>：張量，其中包含批次中每個(gè)元素的初始單元狀態(tài)。

If <cite>(h_0, c_0)</cite> is not provided, both h_0 and c_0 default to zero.

Outputs: (h_1, c_1)

h*1 <cite>的形狀為<cite>(批處理，hidden*size）</cite>：張量，其中包含批次中每個(gè)元素的下一個(gè)隱藏狀態(tài)</cite>
c*1 <cite>的形狀為<cite>(批處理，hidden*size）</cite>：張量，其中包含批次中每個(gè)元素的下一個(gè)單元格狀態(tài)</cite>

Variables

?LSTMCell.weight_ih –可學(xué)習(xí)的輸入隱藏權(quán)重，形狀為<cite>(4 * hidden_size，input_size）</cite>
?LSTMCell.weight_hh –可學(xué)習(xí)的隱藏權(quán)重，形狀為<cite>(4 hiddensize，hidden*size）</cite>
?LSTMCell.bias_ih –形狀為<cite>(4 * hidden_size）</cite>的可學(xué)習(xí)的隱藏輸入偏差
?LSTMCell.bias_hh –形狀為<cite>(4 * hidden_size）</cite>的可學(xué)習(xí)的隱藏偏差。

Note

All the weights and biases are initialized from where

Examples:

>>> rnn = nn.LSTMCell(10, 20)
>>> input = torch.randn(6, 3, 10)
>>> hx = torch.randn(3, 20)
>>> cx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
        hx, cx = rnn(input[i], (hx, cx))
        output.append(hx)

格魯塞爾

class torch.nn.GRUCell(input_size, hidden_size, bias=True)?

門控循環(huán)單元(GRU）單元

where is the sigmoid function, and is the Hadamard product.

Parameters

input_size – The number of expected features in the input <cite>x</cite>
hidden_size – The number of features in the hidden state <cite>h</cite>
bias – If False, then the layer does not use bias weights <cite>b_ih</cite> and <cite>b_hh</cite>. Default: True

Inputs: input, hidden

input of shape <cite>(batch, input_size)</cite>: tensor containing input features
hidden of shape <cite>(batch, hidden_size)</cite>: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.

Outputs: h'

h' of shape <cite>(batch, hidden_size)</cite>: tensor containing the next hidden state for each element in the batch

Shape:

Input1: tensor containing input features where = <cite>input_size</cite>
Input2: tensor containing the initial hidden state for each element in the batch where = <cite>hidden_size</cite> Defaults to zero if not provided.
Output: tensor containing the next hidden state for each element in the batch

Variables

?GRUCell.weight_ih –可學(xué)習(xí)的輸入隱藏權(quán)重，形狀為<cite>(3 * hidden_size，input_size）</cite>
?GRUCell.weight_hh –可學(xué)習(xí)的隱藏權(quán)重，形狀為<cite>(3 hiddensize，hidden*size）</cite>
?GRUCell.bias_ih –可學(xué)習(xí)的輸入隱藏偏差，形狀為<cite>(3 * hidden_size）</cite>
?GRUCell.bias_hh –可學(xué)習(xí)的隱藏偏差，形狀為<cite>(3 * hidden_size）</cite>

Note

All the weights and biases are initialized from where

Examples:

>>> rnn = nn.GRUCell(10, 20)
>>> input = torch.randn(6, 3, 10)
>>> hx = torch.randn(3, 20)
>>> output = []
>>> for i in range(6):
        hx = rnn(input[i], hx)
        output.append(hx)

變壓器層

變壓器

class torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='relu', custom_encoder=None, custom_decoder=None)?

變壓器模型。用戶可以根據(jù)需要修改屬性。該體系結(jié)構(gòu)基于論文“注意就是您所需要的”。 Ashish Vaswani，Noam Shazeer，Niki Parmar，Jakob Uszkoreit，Llion Jones，Aidan N Gomez，Lukasz Kaiser 和 Illia Polosukhin。 2017 年。您只需要關(guān)注即可。《神經(jīng)信息處理系統(tǒng)的發(fā)展》，第 6000-6010 頁。用戶可以使用相應(yīng)的參數(shù)構(gòu)建 BERT (https://arxiv.org/abs/1810.04805)模型。

Parameters

d_model –編碼器/解碼器輸入中的預(yù)期功能數(shù)量(默認(rèn)= 512）。
nhead –多頭注意力模型中的頭數(shù)(默認(rèn)為 8）。
num_encoder_layers –編碼器中的子編碼器層數(shù)(默認(rèn)為 6）。
num_decoder_layers –解碼器中子解碼器層的數(shù)量(默認(rèn)為 6）。
dim_feedforward -前饋網(wǎng)絡(luò)模型的尺寸(默認(rèn)= 2048）。
dropout -退出值(默認(rèn)值= 0.1）。
激活 –編碼器/解碼器中間層 relu 或 gelu 的激活功能(默認(rèn)= relu）。
custom_encoder –自定義編碼器(默認(rèn)=無）。
custom_decoder -自定義解碼器(默認(rèn)=無）。

Examples::
>>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
>>> src = torch.rand((10, 32, 512))
>>> tgt = torch.rand((20, 32, 512))
>>> out = transformer_model(src, tgt)

注意：中提供了將 nn.Transformer 模塊應(yīng)用于單詞語言模型的完整示例，網(wǎng)址為 https://github.com/pytorch/examples/tree/master/word_language_model

forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)?

接收并處理屏蔽的源/目標(biāo)序列。

Parameters

src –編碼器的序列(必需）。
tgt –解碼器的序列(必需）。
src_mask – src 序列的附加掩碼(可選）。
tgt_mask – tgt 序列的附加掩碼(可選）。
memory_mask –編碼器輸出的附加掩碼(可選）。
src_key_padding_mask –每批 src 密鑰的 ByteTensor 掩碼(可選）。
tgt_key_padding_mask –每批 tgt 密鑰的 ByteTensor 掩碼(可選）。
memory_key_padding_mask –每批存儲(chǔ)密鑰的 ByteTensor 掩碼(可選）。

Shape:

src：。
tgt：。
src_mask：。
tgt_mask：。
memory_mask：。
src_key_padding_mask：。
tgt_key_padding_mask：。
memory_key_padding_mask：。

注意：[src / tgt / memory] _mask 應(yīng)該用 float('-inf'）表示被遮蓋的位置，而 float(0.0）表示其他。這些掩碼可確保對(duì)位置 i 的預(yù)測(cè)僅取決于未掩碼的位置 j，并且對(duì)批次中的每個(gè)序列均應(yīng)用相同的預(yù)測(cè)。 [src / tgt / memory] _key_padding_mask 應(yīng)該是 ByteTensor，其中 True 值是應(yīng)該用 float('-inf'）掩蓋的位置，而 False 值將保持不變。此掩碼可確保在屏蔽后不會(huì)從位置 i 獲取任何信息，并且對(duì)于批次中的每個(gè)序列都有單獨(dú)的掩碼。

輸出：。

注意：由于轉(zhuǎn)換器模型中的多頭注意架構(gòu)，轉(zhuǎn)換器的輸出序列長(zhǎng)度與解碼的輸入序列(即目標(biāo)）長(zhǎng)度相同。

其中 S 是源序列長(zhǎng)度，T 是目標(biāo)序列長(zhǎng)度，N 是批處理大小，E 是特征編號(hào)

Examples

>>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)

generate_square_subsequent_mask(sz)?

為該序列生成一個(gè)正方形蒙版。屏蔽的位置填充有 float('-inf'）。未屏蔽的位置填充有 float(0.0）。

變壓器編碼器

class torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)?

TransformerEncoder 是 N 個(gè)編碼器層的堆棧

Parameters

coder_layer – TransformerEncoderLayer(）類的實(shí)例(必需）。
num_layers –編碼器中子編碼器層的數(shù)量(必填）。
規(guī)范 –圖層歸一化組件(可選）。

Examples::
>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
>>> src = torch.rand(10, 32, 512)
>>> out = transformer_encoder(src)

forward(src, mask=None, src_key_padding_mask=None)?

將輸入依次通過編碼器層。

Parameters

src –編碼器的序列(必需）。
掩碼 – src 序列的掩碼(可選）。
src_key_padding_mask –每批 src 密鑰的掩碼(可選）。

Shape:

請(qǐng)參閱 Transformer 類中的文檔。

變壓器解碼器

class torch.nn.TransformerDecoder(decoder_layer, num_layers, norm=None)?

TransformerDecoder 是 N 個(gè)解碼器層的堆棧

Parameters

coder_layer – TransformerDecoderLayer(）類的實(shí)例(必需）。
num_layers –解碼器中子解碼器層的數(shù)量(必填）。
norm – the layer normalization component (optional).

Examples::
>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
>>> transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
>>> memory = torch.rand(10, 32, 512)
>>> tgt = torch.rand(20, 32, 512)
>>> out = transformer_decoder(tgt, memory)

forward(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)?

輸入(和掩碼）依次通過解碼器層。

Parameters

tgt – the sequence to the decoder (required).
存儲(chǔ)器 –從編碼器最后一層開始的順序(必需）。
tgt_mask – tgt 序列的掩碼(可選）。
memory_mask –內(nèi)存序列的掩碼(可選）。
tgt_key_padding_mask –每批 tgt 密鑰的掩碼(可選）。
memory_key_padding_mask –每批存儲(chǔ)密鑰的掩碼(可選）。

Shape:

see the docs in Transformer class.

TransformerEncoderLayer

class torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='relu')?

TransformerEncoderLayer 由自檢和前饋網(wǎng)絡(luò)組成。此標(biāo)準(zhǔn)編碼器層基于“注意就是您所需要的”一文。 Ashish Vaswani，Noam Shazeer，Niki Parmar，Jakob Uszkoreit，Llion Jones，Aidan N Gomez，Lukasz Kaiser 和 Illia Polosukhin。 2017 年。您只需要關(guān)注即可。《神經(jīng)信息處理系統(tǒng)的發(fā)展》，第 6000-6010 頁。用戶可以在應(yīng)用過程中以不同的方式修改或?qū)崿F(xiàn)。

Parameters

d_model –輸入中預(yù)期特征的數(shù)量(必填）。
nhead –多頭注意力模型中的頭數(shù)(必填）。
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
激活 –中間層，relu 或 gelu(默認(rèn)值= relu）的激活功能。

Examples::
>>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
>>> src = torch.rand(10, 32, 512)
>>> out = encoder_layer(src)

forward(src, src_mask=None, src_key_padding_mask=None)?

使輸入通過編碼器層。

Parameters

src –編碼器層的序列(必需）。
src_mask – src 序列的掩碼(可選）。
src_key_padding_mask – the mask for the src keys per batch (optional).

Shape:

see the docs in Transformer class.

變壓器解碼器層

class torch.nn.TransformerDecoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='relu')?

TransformerDecoderLayer 由自組織，多頭組織和前饋網(wǎng)絡(luò)組成。這個(gè)標(biāo)準(zhǔn)的解碼器層基于論文“注意就是全部”。 Ashish Vaswani，Noam Shazeer，Niki Parmar，Jakob Uszkoreit，Llion Jones，Aidan N Gomez，Lukasz Kaiser 和 Illia Polosukhin。 2017 年。您只需要關(guān)注即可。《神經(jīng)信息處理系統(tǒng)的發(fā)展》，第 6000-6010 頁。用戶可以在應(yīng)用過程中以不同的方式修改或?qū)崿F(xiàn)。

Parameters

d_model – the number of expected features in the input (required).
nhead – the number of heads in the multiheadattention models (required).
dim_feedforward – the dimension of the feedforward network model (default=2048).
dropout – the dropout value (default=0.1).
activation – the activation function of intermediate layer, relu or gelu (default=relu).

Examples::
>>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
>>> memory = torch.rand(10, 32, 512)
>>> tgt = torch.rand(20, 32, 512)
>>> out = decoder_layer(tgt, memory)

forward(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)?

將輸入(和掩碼）通過解碼器層。

Parameters

tgt –解碼器層的序列(必需）。
memory – the sequnce from the last layer of the encoder (required).
tgt_mask – the mask for the tgt sequence (optional).
memory_mask – the mask for the memory sequence (optional).
tgt_key_padding_mask – the mask for the tgt keys per batch (optional).
memory_key_padding_mask – the mask for the memory keys per batch (optional).

Shape:

see the docs in Transformer class.

線性層

身分識(shí)別

class torch.nn.Identity(*args, **kwargs)?

對(duì)參數(shù)不敏感的占位符身份運(yùn)算符。

Parameters

args –任何參數(shù)(未使用）
kwargs –任何關(guān)鍵字參數(shù)(未使用）

Examples:

>>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 20])

線性的

class torch.nn.Linear(in_features, out_features, bias=True)?

對(duì)輸入數(shù)據(jù)應(yīng)用線性變換：

Parameters

in_features –每個(gè)輸入樣本的大小
out_features –每個(gè)輸出樣本的大小
偏差-如果設(shè)置為False，則該圖層將不會(huì)學(xué)習(xí)加法偏差。默認(rèn)值：True

Shape:

輸入：，其中表示任意數(shù)量的附加尺寸，
輸出：，除最后一個(gè)尺寸外，所有尺寸均與輸入和相同。

Variables

?Linear.weight -形狀為的模塊的可學(xué)習(xí)重量。值從初始化，其中
?Linear.bias -形狀的模塊的可學(xué)習(xí)偏差。如果bias為True，則從初始化值，其中

Examples:

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])

雙線性

class torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True)?

對(duì)輸入數(shù)據(jù)應(yīng)用雙線性變換：

Parameters

in1_features –每個(gè)第一個(gè)輸入樣本的大小
in2_features –每秒鐘輸入樣本的大小
out_features – size of each output sample
偏差-如果設(shè)置為 False，則該圖層將不會(huì)學(xué)習(xí)加法偏差。默認(rèn)值：True

Shape:

輸入 1：，其中和表示任意數(shù)量的附加尺寸。除了最后輸入的維度外，其他所有維度均應(yīng)相同。
輸入 2：其中。
輸出：，其中和除最后一個(gè)尺寸外的所有尺寸均與輸入相同。

Variables

?Bilinear.weight -形狀為的模塊的可學(xué)習(xí)權(quán)重。值從初始化，其中
?Bilinear.bias -形狀的模塊的可學(xué)習(xí)偏差。如果bias為True，則從初始化值，其中

Examples:

>>> m = nn.Bilinear(20, 30, 40)
>>> input1 = torch.randn(128, 20)
>>> input2 = torch.randn(128, 30)
>>> output = m(input1, input2)
>>> print(output.size())
torch.Size([128, 40])

輟學(xué)層

退出

class torch.nn.Dropout(p=0.5, inplace=False)?

在訓(xùn)練期間，使用伯努利分布的樣本以概率p將輸入張量的某些元素隨機(jī)歸零。在每個(gè)前向呼叫中，每個(gè)通道將獨(dú)立清零。

如論文中所述，通過防止特征檢測(cè)器的共同適應(yīng)來改善神經(jīng)網(wǎng)絡(luò)，這已被證明是一種有效的技術(shù)，可用于規(guī)范化和防止神經(jīng)元的共同適應(yīng)。

此外，在訓(xùn)練期間，將輸出縮放為。這意味著在評(píng)估期間，模塊僅計(jì)算身份函數(shù)。

Parameters

p –元素歸零的概率。默認(rèn)值：0.5
就地 –如果設(shè)置為True，將就地執(zhí)行此操作。默認(rèn)值：False

Shape:

輸入：。輸入可以是任何形狀
輸出：。輸出與輸入的形狀相同

Examples:

>>> m = nn.Dropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)

Dropout2d

class torch.nn.Dropout2d(p=0.5, inplace=False)?

隨機(jī)將整個(gè)通道調(diào)零(通道是 2D 特征圖，例如，批輸入中第個(gè)樣本的第個(gè)通道是 2D 張量）。使用伯努利分布中的樣本，每個(gè)信道將在每次前向呼叫中以概率p獨(dú)立清零。

通常，輸入來自nn.Conv2d模塊。

如論文中所述，使用卷積網(wǎng)絡(luò)進(jìn)行有效的對(duì)象定位，如果特征圖中的相鄰像素高度相關(guān)(通常是早期卷積層的情況），則 i.i.d。輟學(xué)不會(huì)使激活規(guī)律化，否則只會(huì)導(dǎo)致有效學(xué)習(xí)率下降。

在這種情況下，nn.Dropout2d()將有助于促進(jìn)要素地圖之間的獨(dú)立性，應(yīng)改用nn.Dropout2d()。

Parameters

p (python：float ， 可選）–元素歸零的概率。
原位 (bool ，可選）–如果設(shè)置為True，則將原位執(zhí)行此操作

Shape:

Input:
Output: (same shape as input)

Examples:

>>> m = nn.Dropout2d(p=0.2)
>>> input = torch.randn(20, 16, 32, 32)
>>> output = m(input)

輟學(xué) 3d

class torch.nn.Dropout3d(p=0.5, inplace=False)?

隨機(jī)將整個(gè)通道調(diào)零(通道是 3D 特征圖，例如，批輸入中第個(gè)樣本的第個(gè)通道是 3D 張量）。使用伯努利分布中的樣本，每個(gè)信道將在每次前向呼叫中以概率p獨(dú)立清零。

通常，輸入來自nn.Conv3d模塊。

As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease.

在這種情況下，nn.Dropout3d()將有助于促進(jìn)要素地圖之間的獨(dú)立性，應(yīng)改用nn.Dropout3d()。

Parameters

p (python：float ， 可選）–元素歸零的概率。
inplace (bool__, optional) – If set to True, will do this operation in-place

Shape:

Input:
Output: (same shape as input)

Examples:

>>> m = nn.Dropout3d(p=0.2)
>>> input = torch.randn(20, 16, 4, 32, 32)
>>> output = m(input)

AlphaDropout

class torch.nn.AlphaDropout(p=0.5, inplace=False)?

將 Alpha Dropout 應(yīng)用于輸入。

Alpha Dropout 是一種 Dropout，可以維持自我規(guī)范化屬性。對(duì)于均值為零且單位標(biāo)準(zhǔn)差為零的輸入，Alpha Dropout 的輸出將保持輸入的原始均值和標(biāo)準(zhǔn)差。 Alpha Dropout 與 SELU 激活功能緊密結(jié)合，可確保輸出具有零均值和單位標(biāo)準(zhǔn)偏差。

在訓(xùn)練期間，它使用來自伯努利分布的樣本以概率 p 隨機(jī)掩蓋輸入張量的某些元素。在每個(gè)前向調(diào)用中，要屏蔽的元素都會(huì)隨機(jī)化，并進(jìn)行縮放和移位以保持零均值和單位標(biāo)準(zhǔn)差。

在評(píng)估過程中，模塊僅計(jì)算身份函數(shù)。

More details can be found in the paper Self-Normalizing Neural Networks .

Parameters

p (python：float )–元素被刪除的概率。默認(rèn)值：0.5
inplace (bool__, optional) – If set to True, will do this operation in-place

Shape:

Input: . Input can be of any shape
Output: . Output is of the same shape as input

Examples:

>>> m = nn.AlphaDropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)

稀疏層

嵌入

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)?

一個(gè)簡(jiǎn)單的查找表，用于存儲(chǔ)固定字典和大小的嵌入。

該模塊通常用于存儲(chǔ)單詞嵌入并使用索引檢索它們。模塊的輸入是索引列表，輸出是相應(yīng)的詞嵌入。

Parameters

num_embeddings (python：int )–嵌入字典的大小
embedding_dim (python：int )–每個(gè)嵌入向量的大小
padding_idx (python：int ， 可選）–如果給定，則在padding_idx處嵌入輸出以填充輸出(初始化為零）。
max_norm (python：float ， 可選）–如果給定，則范數(shù)大于max_norm的每個(gè)嵌入矢量將重新規(guī)范化為具有規(guī)范max_norm。
norm_type (python：float ， 可選）–為max_norm選項(xiàng)計(jì)算的 p 范數(shù)的 p。默認(rèn)值2。
scale_grad_by_freq (布爾值 ，可選））–如果給定，則將按照最小批量。默認(rèn)值False。
稀疏 (bool ，可選）–如果True，則梯度 w.r.t. weight矩陣將是稀疏張量。有關(guān)稀疏漸變的更多詳細(xì)信息，請(qǐng)參見注釋。

Variables

?Embedding.weight (tensor)–形狀模塊的可學(xué)習(xí)權(quán)重(num_embeddings，embedding_dim）從初始化

Shape:

輸入：，任意形狀的 LongTensor，包含要提取的索引
輸出：，其中 <cite>*</cite> 是輸入形狀，

Note

請(qǐng)記住，只有有限數(shù)量的優(yōu)化程序支持稀疏漸變：當(dāng)前為optim.SGD (<cite>CUDA</cite> 和 <cite>CPU</cite> )，optim.SparseAdam (<cite>CUDA</cite> 和[ <cite>CPU</cite> )和optim.Adagrad (<cite>CPU</cite> )

Note

設(shè)置padding_idx時(shí)，padding_idx的嵌入矢量初始化為全零。但是，請(qǐng)注意，此向量可以在以后進(jìn)行修改，例如，使用定制的初始化方法，從而更改用于填充輸出的向量。來自 Embedding 的矢量的梯度始終為零。

Examples:

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],
        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])
>>> # example with padding_idx
>>> embedding = nn.Embedding(10, 3, padding_idx=0)
>>> input = torch.LongTensor([[0,2,0,5]])
>>> embedding(input)
tensor([[[ 0.0000,  0.0000,  0.0000],
         [ 0.1535, -2.0309,  0.9315],
         [ 0.0000,  0.0000,  0.0000],
         [-0.1655,  0.9897,  0.0635]]])

classmethod from_pretrained(embeddings, freeze=True, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)?

從給定的二維 FloatTensor 創(chuàng)建嵌入實(shí)例。

Parameters

嵌入 (tensor)–包含嵌入權(quán)重的 FloatTensor。第一維作為num_embeddings傳遞給嵌入，第二維作為embedding_dim傳遞給 Embedding。
凍結(jié)(布爾 ， 可選）–如果True，則張量不會(huì)在學(xué)習(xí)過程中更新。等效于embedding.weight.requires_grad = False。默認(rèn)值：True
padding_idx (python：int ， 可選）–請(qǐng)參閱模塊初始化文檔。
max_norm (python：float ， 可選）–請(qǐng)參閱模塊初始化文檔。
norm_type (python：float ， 可選）–請(qǐng)參閱模塊初始化文檔。默認(rèn)值2。
scale_grad_by_freq (布爾值 ， 可選）–請(qǐng)參見模塊初始化文檔。默認(rèn)值False。
稀疏 (bool ，可選）–請(qǐng)參閱模塊初始化文檔。

Examples:

>>> # FloatTensor containing pretrained weights
>>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
>>> embedding = nn.Embedding.from_pretrained(weight)
>>> # Get embeddings for index 1
>>> input = torch.LongTensor([1])
>>> embedding(input)
tensor([[ 4.0000,  5.1000,  6.3000]])

嵌入袋

class torch.nn.EmbeddingBag(num_embeddings, embedding_dim, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False, _weight=None)?

計(jì)算嵌入“袋”的總和或方法，而無需實(shí)例化中間嵌入。

對(duì)于恒定長(zhǎng)度且沒有per_sample_weights的袋子，此類

mode="sum"等于 Embedding ，然后是torch.sum(dim=0)，

- mode="mean"等于 Embedding ，然后是torch.mean(dim=0)，

- mode="max"的等效于 Embedding ，然后是torch.max(dim=0)。

但是， EmbeddingBag 比使用這些操作鏈要花費(fèi)更多的時(shí)間和內(nèi)存。

EmbeddingBag 還支持按樣本權(quán)重作為正向傳遞的參數(shù)。在執(zhí)行mode指定的加權(quán)縮減之前，這會(huì)縮放嵌入的輸出。如果通過per_sample_weights``，則唯一支持的mode是"sum"，它根據(jù)per_sample_weights`計(jì)算加權(quán)和。

Parameters

num_embeddings (python:int) – size of the dictionary of embeddings
embedding_dim (python:int) – the size of each embedding vector
max_norm (python:float__, optional) – If given, each embedding vector with norm larger than max_norm is renormalized to have norm max_norm.
norm_type (python:float__, optional) – The p of the p-norm to compute for the max_norm option. Default 2.
scale_grad_by_freq (布爾 ， 可選）–如果指定，則將按比例縮小坡度中單詞的頻率批量。默認(rèn)值False。注意：mode="max"時(shí)不支持此選項(xiàng)。
模式(字符串 ， 可選）– "sum"，"mean"或"max"。指定減少袋子的方式。 "sum"會(huì)考慮per_sample_weights來計(jì)算加權(quán)總和。 "mean"計(jì)算袋子中值的平均值，"max"計(jì)算每個(gè)袋子中的最大值。默認(rèn)值："mean"
稀疏 (bool ， 可選）–如果True，則梯度 w.r.t. weight矩陣將是稀疏張量。有關(guān)稀疏漸變的更多詳細(xì)信息，請(qǐng)參見注釋。注意：mode="max"時(shí)不支持此選項(xiàng)。

Variables

?EmbeddingBag.weight (tensor)–形狀為<cite>的模塊的可學(xué)習(xí)權(quán)重(從初始化的 num_embeddings，embedding_dim）</cite> 。

Inputs: input (LongTensor), offsets (LongTensor, optional), and

per_index_weights(張量，可選）

如果input是形狀為<cite>(B，N）</cite>的二維，

它將被視為B袋(序列），每個(gè)袋子的長(zhǎng)度都是固定長(zhǎng)度N，這將返回B值的匯總值取決于mode。在這種情況下，offsets被忽略，必須為None。

如果input是形狀為<cite>(N）</cite>的 1D，

它將被視為多個(gè)包(序列）的串聯(lián)。 offsets必須是一維張量，其中包含input中每個(gè)包的起始索引位置。因此，對(duì)于形狀為<cite>(B）</cite>的offsets，input將被視為具有B袋。空袋子(即長(zhǎng)度為 0 的袋子）將返回由零填充的向量。

per_sample_weights (Tensor, optional): a tensor of float / double weights, or None

表示所有權(quán)重應(yīng)為1。如果指定，per_sample_weights必須具有與輸入完全相同的形狀，并且如果不是None，則將其視為具有相同的offsets。僅支持mode='sum'。

輸出形狀：<cite>(B，embedding_dim）</cite>

Examples:

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum')
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([1,2,4,5,4,3,2,9])
>>> offsets = torch.LongTensor([0,4])
>>> embedding_sum(input, offsets)
tensor([[-0.8861, -5.4350, -0.0523],
        [ 1.1306, -2.5798, -1.0044]])

classmethod from_pretrained(embeddings, freeze=True, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False)?

從給定的二維 FloatTensor 創(chuàng)建 EmbeddingBag 實(shí)例。

Parameters

嵌入 (tensor)–包含 EmbeddingBag 權(quán)重的 FloatTensor。第一維正以“ num_embeddings”傳遞給 EmbeddingBag，第二維以“ embedding_dim”傳遞。
凍結(jié)(布爾 ， 可選）–如果True，則張量不會(huì)在學(xué)習(xí)過程中更新。等效于embeddingbag.weight.requires_grad = False。默認(rèn)值：True
max_norm (python：float ， 可選）–請(qǐng)參閱模塊初始化文檔。默認(rèn)值：None
norm_type (python:float__, optional) – See module initialization documentation. Default 2.
scale_grad_by_freq (boolean__, optional) – See module initialization documentation. Default False.
模式(字符串 ， 可選）–請(qǐng)參見模塊初始化文檔。默認(rèn)值："mean"
稀疏 (bool ， 可選）–請(qǐng)參閱模塊初始化文檔。默認(rèn)值：False。

Examples:

>>> # FloatTensor containing pretrained weights
>>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
>>> embeddingbag = nn.EmbeddingBag.from_pretrained(weight)
>>> # Get embeddings for index 1
>>> input = torch.LongTensor([[1, 0]])
>>> embeddingbag(input)
tensor([[ 2.5000,  3.7000,  4.6500]])

距離功能

余弦相似度

class torch.nn.CosineSimilarity(dim=1, eps=1e-08)?

傳回與之間的余弦相似度(沿 dim 計(jì)算）。

Parameters

昏暗的 (python：int ， 可選）–計(jì)算余弦相似度的維度。默認(rèn)值：1
eps (python：float ，可選）–避免被零除的小值。默認(rèn)值：1e-8

Shape:

輸入 1：，其中 D 在位置<cite>變暗</cite>
Input2：，形狀與 Input1 相同
輸出：

Examples::
>>> input1 = torch.randn(100, 128)
>>> input2 = torch.randn(100, 128)
>>> cos = nn.CosineSimilarity(dim=1, eps=1e-6)
>>> output = cos(input1, input2)

成對(duì)距離

class torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)?

使用 p 范數(shù)計(jì)算向量和之間的成對(duì)成對(duì)距離：

Parameters

p (實(shí)數(shù)）–規(guī)范度。默認(rèn)值：2
eps (python：float ， 可選）–避免被零除的小值。默認(rèn)值：1e-6
keepdim (bool ，可選）–確定是否保留矢量尺寸。默認(rèn)值：False

Shape:

輸入 1：，其中 <cite>D =矢量尺寸</cite>
Input2：，形狀與 Input1 相同
輸出：。如果keepdim為True，則。

Examples::
>>> pdist = nn.PairwiseDistance(p=2)
>>> input1 = torch.randn(100, 128)
>>> input2 = torch.randn(100, 128)
>>> output = pdist(input1, input2)

損失函數(shù)

L1 損失

class torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，該標(biāo)準(zhǔn)測(cè)量輸入中的每個(gè)元素與目標(biāo)中的平均絕對(duì)誤差(MAE）。

未減少的損失(即reduction設(shè)置為'none'）的損失可描述為：

其中是批次大小。如果reduction不是'none'(默認(rèn)為'mean'），則：

和是任意形狀的張量，每個(gè)張量共有個(gè)元素。

求和運(yùn)算仍對(duì)所有元素進(jìn)行運(yùn)算，并除以。

如果一組reduction = 'sum'可以避免被劃分。

Parameters

size_average (布爾 ， 可選）–已棄用(請(qǐng)參見reduction）。默認(rèn)情況下，損失是批次中每個(gè)損失元素的平均數(shù)。請(qǐng)注意，對(duì)于某些損失，每個(gè)樣本有多個(gè)元素。如果將字段size_average設(shè)置為False，則每個(gè)小批量的損失總和。當(dāng) reduce 為False時(shí)將被忽略。默認(rèn)值：True
還原(布爾 ， 可選）–已棄用(請(qǐng)參閱reduction）。默認(rèn)情況下，取決于size_average，對(duì)每個(gè)小批量的觀測(cè)值求平均或求和。當(dāng)reduce為False時(shí)，返回每批元素?fù)p失，并忽略size_average。默認(rèn)值：True
縮減(字符串 ，可選）–指定要應(yīng)用于輸出的縮減：'none' | 'mean' | 'sum'。 'none'：不應(yīng)用任何減少量； 'mean'：輸出的總和除以輸出中元素的數(shù)量； 'sum'：將對(duì)輸出求和。注意：size_average和reduce正在淘汰中，與此同時(shí)，指定這兩個(gè) args 中的任何一個(gè)將覆蓋reduction。默認(rèn)值：'mean'

Shape:

輸入：其中表示任意數(shù)量的附加尺寸
目標(biāo)：，形狀與輸入相同
輸出：標(biāo)量。如果reduction為'none'，則的形狀與輸入相同

Examples:

>>> loss = nn.L1Loss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

選配

class torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，該標(biāo)準(zhǔn)測(cè)量輸入和目標(biāo)中每個(gè)元素之間的均方誤差(L2 平方平方）。

The unreduced (i.e. with reduction set to 'none') loss can be described as:

where is the batch size. If reduction is not 'none' (default 'mean'), then:

and are tensors of arbitrary shapes with a total of elements each.

The sum operation still operates over all the elements, and divides by .

The division by can be avoided if one sets reduction = 'sum'.

Parameters

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: where means, any number of additional dimensions
Target: , same shape as the input

Examples:

>>> loss = nn.MSELoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5)
>>> output = loss(input, target)
>>> output.backward()

交叉熵?fù)p失

class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')?

此標(biāo)準(zhǔn)將nn.LogSoftmax()和nn.NLLLoss()合并為一個(gè)類別。

在訓(xùn)練帶有 <cite>C</cite> 類的分類問題時(shí)很有用。如果提供的話，可選參數(shù)weight應(yīng)該是一維<cite>張量</cite>，為每個(gè)類分配權(quán)重。當(dāng)您的訓(xùn)練集不平衡時(shí)，此功能特別有用。

<cite>輸入</cite>預(yù)計(jì)將包含每個(gè)類別的原始，未標(biāo)準(zhǔn)化的分?jǐn)?shù)。

對(duì)于 <cite>K</cite> -維情況(稍后描述），<cite>輸入</cite>必須為或和大小的張量。

對(duì)于一個(gè)大小為 <cite>minibatch</cite> 的 1D 張量張量的每個(gè)值，此標(biāo)準(zhǔn)都希望在范圍內(nèi)的類別索引作為<cite>目標(biāo)</cite>；如果指定了 <cite>ignore_index</cite> ，則此條件也接受該類索引(該索引可能不一定在類范圍內(nèi)）。

損失可描述為：

或在指定weight參數(shù)的情況下：

對(duì)于每個(gè)小批量，損失是通過觀察得出的平均值。

通過提供大小為的輸入(其中是尺寸的數(shù)量）和適當(dāng)形狀的目標(biāo)，也可以用于更高尺寸的輸入(例如 2D 圖像）(請(qǐng)參見下文）。

Parameters

重量 (tensor ， 可選）–為每個(gè)類別提供手動(dòng)縮放比例的重量。如果給定，則其張量必須為 <cite>C</cite>
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
ignore_index (python：int ， 可選）–指定目標(biāo)值，該目標(biāo)值將被忽略并且不會(huì)對(duì)輸入梯度產(chǎn)生影響。當(dāng)size_average為True時(shí)，損耗是在不可忽略的目標(biāo)上平均的。
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：，其中 <cite>C =類</cite>的數(shù)量，或在 <cite>K</cite> -尺寸損失的情況下，和一起使用。
目標(biāo)：，其中每個(gè)值為，或者在 K 維丟失的情況下為和。
輸出：標(biāo)量。如果reduction為'none'，則與 K 尺寸相同：或，帶有的對(duì)象在 K 維丟失的情況下。

Examples:

>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

CTCLoss

class torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)?

連接主義者的時(shí)間分類損失。

計(jì)算連續(xù)(非分段）時(shí)間序列與目標(biāo)序列之間的損失。 CTCLoss 將輸入與目標(biāo)進(jìn)行可能的對(duì)齊之和加起來，從而產(chǎn)生相對(duì)于每個(gè)輸入節(jié)點(diǎn)可微分的損耗值。假設(shè)輸入與目標(biāo)的比對(duì)是“多對(duì)一”，這限制了目標(biāo)序列的長(zhǎng)度，因此它必須是輸入長(zhǎng)度的。

Parameters

空白 (python：int ， 可選）–空白標(biāo)簽。默認(rèn)值。
縮減(字符串 ，可選）–指定要應(yīng)用于輸出的縮減：'none' | 'mean' | 'sum'。 'none'：不應(yīng)用減少量，'mean'：將輸出損失除以目標(biāo)長(zhǎng)度，然后取批次的平均值。默認(rèn)值：'mean'
zero_infinity (bool ， 可選）–是否將無限大損失和相關(guān)的梯度歸零。默認(rèn)值：False無限損失主要發(fā)生在輸入太短而無法與目標(biāo)對(duì)齊時(shí)。

Shape:

Log_probs：大小為的張量，其中，和。輸出的對(duì)數(shù)概率(例如，使用 torch.nn.functional.log_softmax() 獲得的概率）。
目標(biāo)：大小為或的張量，其中和。它代表靶序列。目標(biāo)序列中的每個(gè)元素都是一個(gè)類索引。并且目標(biāo)索引不能為空(默認(rèn)= 0）。在形式中，將目標(biāo)填充到最長(zhǎng)序列的長(zhǎng)度，然后堆疊。在格式中，假定目標(biāo)未填充且在 1 維內(nèi)連接。
Input_lengths：大小為的元組或張量，其中。它代表輸入的長(zhǎng)度(每個(gè)必須為）。并且在序列被填充為相等長(zhǎng)度的假設(shè)下，為每個(gè)序列指定長(zhǎng)度以實(shí)現(xiàn)屏蔽。
Target_lengths：大小為的元組或張量，其中。它代表目標(biāo)的長(zhǎng)度。在將序列填充為相等長(zhǎng)度的假設(shè)下，為每個(gè)序列指定長(zhǎng)度以實(shí)現(xiàn)屏蔽。如果目標(biāo)形狀為，則 target_lengths 實(shí)際上是每個(gè)目標(biāo)序列的終止索引，從而使批次中每個(gè)目標(biāo)的target_n = targets[n,0:s_n]。每個(gè)長(zhǎng)度都必須為如果目標(biāo)是作為單個(gè)目標(biāo)的并置的 1d 張量給出的，則 target_lengths 必須加起來為張量的總長(zhǎng)度。
輸出：標(biāo)量。如果reduction為'none'，則為，其中。

Example:

>>> T = 50      # Input sequence length
>>> C = 20      # Number of classes (including blank)
>>> N = 16      # Batch size
>>> S = 30      # Target sequence length of longest target in batch
>>> S_min = 10  # Minimum target length, for demonstration purposes
>>>
>>> # Initialize random batch of input vectors, for *size = (T,N,C)
>>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
>>>
>>> # Initialize random batch of targets (0 = blank, 1:C = classes)
>>> target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
>>>
>>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
>>> target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
>>> ctc_loss = nn.CTCLoss()
>>> loss = ctc_loss(input, target, input_lengths, target_lengths)
>>> loss.backward()
Reference:

A. Graves 等人：連接主義者的時(shí)間分類：使用循環(huán)神經(jīng)網(wǎng)絡(luò)標(biāo)記未分段的序列數(shù)據(jù)： https://www.cs.toronto.edu/~graves/icml_2006.pdf

Note

為了使用 CuDNN，必須滿足以下條件：targets必須為級(jí)聯(lián)格式，所有input_lengths都必須為 <cite>T</cite> 。，target_lengths ，整數(shù)參數(shù)必須為 dtype torch.int32。

常規(guī)實(shí)現(xiàn)使用(在 PyTorch 中更常見） <cite>torch.long</cite> dtype。

Note

虧損

class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')?

負(fù)對(duì)數(shù)似然損失。訓(xùn)練帶有 <cite>C</cite> 類的分類問題很有用。

如果提供，則可選參數(shù)weight應(yīng)該是一維 Tensor，為每個(gè)類分配權(quán)重。當(dāng)您的訓(xùn)練集不平衡時(shí)，此功能特別有用。

通過前向調(diào)用給出的<cite>輸入</cite>預(yù)計(jì)將包含每個(gè)類的對(duì)數(shù)概率。對(duì)于 <cite>K</cite> -維情況(稍后描述），<cite>輸入</cite>的張量必須為或和的大小。

通過在網(wǎng)絡(luò)的最后一層添加 <cite>LogSoftmax</cite> 層，可以輕松獲得神經(jīng)網(wǎng)絡(luò)中的對(duì)數(shù)概率。如果您不想添加額外的圖層，則可以改用 <cite>CrossEntropyLoss</cite> 。

該損失預(yù)期的<cite>目標(biāo)</cite>應(yīng)該是范圍內(nèi)的類別索引，其中 <cite>C =類別數(shù)</cite>；如果指定了 <cite>ignore_index</cite> ，則此丟失也將接受該類索引(該索引可能不一定在類范圍內(nèi)）。

The unreduced (i.e. with reduction set to 'none') loss can be described as:

其中是批次大小。如果reduction不是'none'(默認(rèn)為'mean'），則

通過提供大小為的輸入(其中是尺寸的數(shù)量）和適當(dāng)形狀的目標(biāo)，也可以用于更高尺寸的輸入(例如 2D 圖像）(請(qǐng)參見下文）。對(duì)于圖像，它計(jì)算每像素 NLL 損耗。

Parameters

重量 (tensor ，可選）–為每個(gè)類別提供手動(dòng)縮放比例的重量。如果給定，則它必須是 <cite>C</cite> 大小的張量。否則，將其視為擁有全部。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
ignore_index (python：int ， 可選）–指定目標(biāo)值，該目標(biāo)值將被忽略并且不會(huì)對(duì)輸入梯度產(chǎn)生影響。當(dāng)size_average為True時(shí)，損耗是在不可忽略的目標(biāo)上平均的。
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: where <cite>C = number of classes</cite>, or with in the case of <cite>K</cite>-dimensional loss.
Target: where each value is , or with in the case of K-dimensional loss.
輸出：標(biāo)量。如果reduction為'none'，則與 K 尺寸相同：或，帶有的對(duì)象在 K 維丟失的情況下。

Examples:

>>> m = nn.LogSoftmax(dim=1)
>>> loss = nn.NLLLoss()
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = loss(m(input), target)
>>> output.backward()
>>>
>>>
>>> # 2D loss example (used, for example, with image inputs)
>>> N, C = 5, 4
>>> loss = nn.NLLLoss()
>>> # input is of size N x C x height x width
>>> data = torch.randn(N, 16, 10, 10)
>>> conv = nn.Conv2d(16, C, (3, 3))
>>> m = nn.LogSoftmax(dim=1)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
>>> output = loss(m(conv(data)), target)
>>> output.backward()

泊松

class torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')?

帶有目標(biāo)泊松分布的負(fù)對(duì)數(shù)似然損失。

The loss can be described as:

最后一項(xiàng)可以省略，也可以用斯特林公式近似。逼近值用于大于 1 的目標(biāo)值。對(duì)于小于或等于 1 的目標(biāo)值，零添加到損耗中。

Parameters

log_input (bool ， 可選）–如果True的損失計(jì)算為，如果False的損失計(jì)算是。

完整的 (bool ，可選）–

是否計(jì)算全部損失； i。 e。添加斯特林近似項(xiàng)

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True

eps (python：float ，可選）–較小的值，以避免在log_input = False時(shí)評(píng)估。默認(rèn)值：1e-8

reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True

reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Examples:

>>> loss = nn.PoissonNLLLoss()
>>> log_input = torch.randn(5, 2, requires_grad=True)
>>> target = torch.randn(5, 2)
>>> output = loss(log_input, target)
>>> output.backward()
Shape:

Input: where means, any number of additional dimensions
Target: , same shape as the input
輸出：默認(rèn)情況下為標(biāo)量。如果reduction為'none'，則的形狀與輸入相同

損失

class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')?

Kullback-Leibler 散度損失

KL 散度是用于連續(xù)分布的有用距離度量，并且在對(duì)(離散采樣）連續(xù)輸出分布的空間進(jìn)行直接回歸時(shí)通常很有用。

與 NLLLoss 一樣，給定的<cite>輸入</cite>預(yù)期包含對(duì)數(shù)概率，并且不限于 2D 張量。目標(biāo)以概率給出(即，不取對(duì)數(shù)）。

該標(biāo)準(zhǔn)要求<cite>目標(biāo)</cite> <cite>張量</cite>與<cite>輸入</cite> <cite>張量</cite>大小相同。

The unreduced (i.e. with reduction set to 'none') loss can be described as:

其中索引跨越input的所有維度，并且具有與input相同的形狀。如果reduction不是'none'(默認(rèn)為'mean'），則：

在默認(rèn)的reduction模式'mean'中，損耗是對(duì)觀察值和尺寸上的的每個(gè)小批量進(jìn)行平均的。 'batchmean'模式給出正確的 KL 散度，其中損失僅在批次范圍內(nèi)平均。在下一個(gè)主要版本中，'mean'模式的行為將更改為與'batchmean'相同。

Parameters

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
縮減(字符串 ， 可選）–指定要應(yīng)用于輸出的縮減：'none' | 'batchmean' | 'sum' | 'mean'。 'none'：不應(yīng)用減少。 'batchmean'：輸出的總和將除以 batchsize。 'sum'：將對(duì)輸出求和。 'mean'：輸出將除以輸出中的元素?cái)?shù)。默認(rèn)值：'mean'

Note

size_average和reduce正在棄用的過程中，與此同時(shí)，指定這兩個(gè) args 中的任何一個(gè)將覆蓋reduction。

Note

reduction = 'mean'未返回真實(shí)的 kl 散度值，請(qǐng)使用與 KL 數(shù)學(xué)定義一致的reduction = 'batchmean'。在下一個(gè)主要版本中，'mean'將更改為與'batchmean'相同。

Shape:

Input: where means, any number of additional dimensions
Target: , same shape as the input
輸出：默認(rèn)情況下為標(biāo)量。如果：attr：reduction為'none'，則的形狀與輸入的形狀相同

BCELoss

class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)衡量目標(biāo)和輸出之間的二進(jìn)制交叉熵的標(biāo)準(zhǔn)：

The unreduced (i.e. with reduction set to 'none') loss can be described as:

where is the batch size. If reduction is not 'none' (default 'mean'), then

這用于測(cè)量例如自動(dòng)編碼器中的重建誤差。請(qǐng)注意，目標(biāo)應(yīng)為 0 到 1 之間的數(shù)字。

Parameters

重量 (tensor ，可選）–手工重新定標(biāo)重量，以補(bǔ)償每個(gè)批次元素的損失。如果給定，則必須是大小為 <cite>nbatch</cite> 的張量。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: where means, any number of additional dimensions
Target: , same shape as the input
輸出：標(biāo)量。如果reduction為'none'，則與輸入形狀相同。

Examples:

>>> m = nn.Sigmoid()
>>> loss = nn.BCELoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(m(input), target)
>>> output.backward()

BCEWithLogitsLoss

class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)?

這種損失將<cite>乙狀結(jié)腸</cite>層和 <cite>BCELoss</cite> 合并為一個(gè)類別。該版本比使用普通的 <cite>Sigmoid</cite> 和隨后的 <cite>BCELoss</cite> 在數(shù)值上更穩(wěn)定，因?yàn)橥ㄟ^將操作合并為一層，我們利用了 log-sum-exp 技巧進(jìn)行數(shù)值計(jì)算穩(wěn)定性。

The unreduced (i.e. with reduction set to 'none') loss can be described as:

where is the batch size. If reduction is not 'none' (default 'mean'), then

這用于測(cè)量例如自動(dòng)編碼器中的重建誤差。請(qǐng)注意，目標(biāo) <cite>t [i]</cite> 應(yīng)為 0 到 1 之間的數(shù)字。

通過在積極的例子中增加權(quán)重，可以權(quán)衡召回率和準(zhǔn)確性。在多標(biāo)簽分類的情況下，損失可描述為：

其中，是類別編號(hào)(對(duì)于多標(biāo)簽二進(jìn)制分類，；對(duì)于單標(biāo)簽二進(jìn)制分類，），是批次中樣品的數(shù)量，是樣品的重量課程的肯定答案。

增加查全率，增加精度。

例如，如果數(shù)據(jù)集包含一個(gè)類的 100 個(gè)正例和 300 個(gè)負(fù)例，則該類的 <cite>pos_weight</cite> 應(yīng)等于。損失將好像數(shù)據(jù)集包含陽性示例。

Examples:

>>> target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
>>> output = torch.full([10, 64], 0.999)  # A prediction (logit)
>>> pos_weight = torch.ones([64])  # All weights are equal to 1
>>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
>>> criterion(output, target)  # -log(sigmoid(0.999))
tensor(0.3135)

Parameters

weight (Tensor, optional) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size <cite>nbatch</cite>.
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'
pos_weight (tensor ，可選）–正例的權(quán)重。必須是長(zhǎng)度等于類數(shù)的向量。

Shape:

輸入：其中表示任意數(shù)量的附加尺寸

- Target: , same shape as the input

- Output: scalar. If reduction is 'none', then , same shape as input.

Examples:

>>> loss = nn.BCEWithLogitsLoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(input, target)
>>> output.backward()

保證金排名損失

class torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)來測(cè)量給定輸入，，兩個(gè) 1D 迷你批量<cite>張量</cite>和標(biāo)簽 1D 迷你批量張量(包含 1 或-1）的損耗。

如果，則假定第一個(gè)輸入的排名應(yīng)高于第二個(gè)輸入(具有更大的值），反之亦然。

迷你批次中每個(gè)樣本的損失函數(shù)為：

Parameters

邊距 (python：float ， 可選）–默認(rèn)值為。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：，其中 <cite>N</cite> 是批次大小， <cite>D</cite> 是樣本大小。
目標(biāo)：
輸出：標(biāo)量。如果reduction是'none'，則。

嵌入損耗

class torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')?

在輸入張量和標(biāo)簽張量(包含 1 或-1）的情況下測(cè)量損耗。通常用于測(cè)量?jī)蓚€(gè)輸入是否相似或不相似，例如使用 L1 成對(duì)距離作為，通常用于學(xué)習(xí)非線性嵌入或半監(jiān)督學(xué)習(xí)。

微型批次中第個(gè)樣本的損失函數(shù)為

總損失函數(shù)為

其中。

Parameters

邊距 (python：float ， 可選）–默認(rèn)值為 <cite>1</cite> 。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：其中表示任意數(shù)量的尺寸。求和運(yùn)算對(duì)所有元素進(jìn)行運(yùn)算。
目標(biāo)：，形狀與輸入相同
輸出：標(biāo)量。如果reduction為'none'，則形狀與輸入相同

多標(biāo)簽保證金損失

class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，以優(yōu)化輸入(2D 微型批量<cite>張量</cite>）和輸出(2D <cite>目標(biāo)類別索引的張量</cite>。對(duì)于小批量中的每個(gè)樣品：

其中，，，和都適用于和。

和的大小必須相同。

該標(biāo)準(zhǔn)僅考慮從正面開始的非負(fù)目標(biāo)的連續(xù)塊。

這允許不同的樣本具有可變數(shù)量的目標(biāo)類別。

Parameters

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：或，其中 <cite>N</cite> 是批處理大小， <cite>C</cite> 是類別數(shù)。
目標(biāo)：或，標(biāo)有-1 的標(biāo)簽?zāi)繕?biāo)確保與輸入形狀相同。
Output: scalar. If reduction is 'none', then .

Examples:

>>> loss = nn.MultiLabelMarginLoss()
>>> x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])
>>> # for target y, only consider labels 3 and 0, not after label -1
>>> y = torch.LongTensor([[3, 0, -1, 1]])
>>> loss(x, y)
>>> # 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
tensor(0.8500)

平滑 L1 損失

class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')?

如果每個(gè)元素的絕對(duì)誤差小于 1，則創(chuàng)建一個(gè)使用平方項(xiàng)的條件，否則，則使用 L1 項(xiàng)。它對(duì)異常值的敏感性不及 <cite>MSELoss</cite> ，并且在某些情況下可以防止爆炸梯度(例如，參見 Ross Girshick 的 <cite>Fast R-CNN</cite> 論文）。也稱為胡貝爾損耗：

其中的計(jì)算公式為：

具有總共個(gè)元素的和任意形狀，求和運(yùn)算仍對(duì)所有元素進(jìn)行運(yùn)算，并除以。

如果設(shè)置reduction = 'sum'，則可以避免被劃分。

Parameters

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: where means, any number of additional dimensions
Target: , same shape as the input
Output: scalar. If reduction is 'none', then , same shape as the input

軟保證金損失

class torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，以優(yōu)化輸入張量與目標(biāo)張量(包含 1 或-1）之間的兩類分類邏輯損失。

Parameters

size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：其中表示任意數(shù)量的附加尺寸
Target: , same shape as the input
Output: scalar. If reduction is 'none', then same shape as the input

MultiLabelSoftMarginLoss

class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，該標(biāo)準(zhǔn)基于大小的輸入和目標(biāo)之間的最大熵，優(yōu)化多標(biāo)簽“一對(duì)全”損失。對(duì)于小批量中的每個(gè)樣品：

其中和。

Parameters

weight (*Tensor*, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size <cite>C</cite>. Otherwise, it is treated as if having all ones.
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：，其中 <cite>N</cite> 是批次大小， <cite>C</cite> 是類別數(shù)量。
目標(biāo)：，用-1 填充的標(biāo)簽?zāi)繕?biāo)確保與輸入形狀相同。
Output: scalar. If reduction is 'none', then .

余弦嵌入損失

class torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)來測(cè)量給定輸入張量，和<cite>張量</cite>標(biāo)簽的損耗，其值為 1 或-1。這用于使用余弦距離來測(cè)量?jī)蓚€(gè)輸入是否相似或不相似，并且通常用于學(xué)習(xí)非線性嵌入或半監(jiān)督學(xué)習(xí)。

每個(gè)樣本的損失函數(shù)為：

Parameters

保證金 (python：float ，可選）–應(yīng)該是從到，建議使用。如果缺少margin，則默認(rèn)值為。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

多保證金虧損

class torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，以優(yōu)化輸入(2D 微型批處理<cite>張量</cite>）和輸出(目標(biāo)的 1D 張量）之間的多類分類鉸鏈損耗(基于邊距的損耗）類別索引）：

對(duì)于每個(gè)小批量樣品，一維輸入和標(biāo)量輸出的損耗為：

其中和。

(可選）您可以通過將 1D weight張量傳遞到構(gòu)造函數(shù)中來對(duì)類進(jìn)行非相等加權(quán)。

損失函數(shù)將變?yōu)椋?/p>

Parameters

p (python：int ，可選）–默認(rèn)值為。僅支持和值。
邊距 (python：float ， 可選）–默認(rèn)值為。
weight (*Tensor*, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size <cite>C</cite>. Otherwise, it is treated as if having all ones.
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

三重保證金虧損

class torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')?

創(chuàng)建一個(gè)標(biāo)準(zhǔn)，該標(biāo)準(zhǔn)在輸入張量，，和值大于的邊距下測(cè)量三重態(tài)損失。這用于測(cè)量樣本之間的相對(duì)相似性。一個(gè)三元組由 <cite>a</cite> ， <cite>p</cite> 和 <cite>n</cite> 組成(即<cite>錨定</cite>，<cite>陽性示例</cite>和[HTG14 負(fù)面示例）。所有輸入張量的形狀應(yīng)為。

V. Balntas，E. Riba 等人的論文學(xué)習(xí)具有三重態(tài)損失的淺卷積特征描述符中詳細(xì)描述了距離交換。

The loss function for each sample in the mini-batch is:

哪里

Parameters

邊距 (python：float ，可選）–默認(rèn)值：。
p (python：int ，可選）–成對(duì)距離的標(biāo)準(zhǔn)度。默認(rèn)值：。
交換 (bool ，可選）–距離交換在論文中詳細(xì)描述。<cite>通過以下方法學(xué)習(xí)淺卷積特征描述符 V. Balntas，E。Riba 等人的三重?fù)p失</cite>。默認(rèn)值：False。
size_average (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True
reduce (bool__, optional) – Deprecated (see reduction). By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True
reduction (string__, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

輸入：其中是矢量尺寸。
Output: scalar. If reduction is 'none', then .

>>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
>>> anchor = torch.randn(100, 128, requires_grad=True)
>>> positive = torch.randn(100, 128, requires_grad=True)
>>> negative = torch.randn(100, 128, requires_grad=True)
>>> output = triplet_loss(anchor, positive, negative)
>>> output.backward()

視覺層

像素隨機(jī)播放

class torch.nn.PixelShuffle(upscale_factor)?

將形狀為的張量中的元素重新排列為形狀為的張量中的元素。

這對(duì)于實(shí)現(xiàn)跨度為的有效子像素卷積很有用。

看論文：Shi 等人的使用高效的亞像素卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行實(shí)時(shí)單圖像和視頻超分辨率。 al(2016）了解更多詳情。

Parameters

upscale_factor (python：int )–通過提高空間分辨率的因子

Shape:

輸入：，其中
輸出：，其中和

Examples:

>>> pixel_shuffle = nn.PixelShuffle(3)
>>> input = torch.randn(1, 9, 4, 4)
>>> output = pixel_shuffle(input)
>>> print(output.size())
torch.Size([1, 1, 12, 12])

上采樣

class torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)?

上采樣給定的多通道 1D(時(shí)間），2D(空間）或 3D(體積）數(shù)據(jù)。

假定輸入數(shù)據(jù)的形式為<cite>微型批處理 x 通道 x [可選深度] x [可選高度] x 寬度</cite>。因此，對(duì)于空間輸入，我們期望使用 4D 張量；對(duì)于體積輸入，我們期望使用 5D 張量。

可用于上采樣的算法分別是 3D，4D 和 5D 輸入張量的最近鄰和線性，雙線性，雙三次和三線性。

可以給出scale_factor或目標(biāo)輸出size來計(jì)算輸出大小。 (因?yàn)槟＠鈨煽桑荒芡瑫r(shí)給出兩者）

Parameters

大小 (python：int 或元組 [ python：int ]或元組 [ python：int ， python：int ]或元組 [ python：int ， python：int ， python：int ] ， 可選）–輸出空間大小
scale_factor (python：float 或元組 [ python：float ]或元組 [ python：float ， python：float ]或元組 [ python：float ， python：float ， python：float ] ， 可選）–空間大小的乘數(shù)。如果是元組，則必須匹配輸入大小。
模式 (str ，可選）–上采樣算法：'nearest'，'linear'，'bilinear'，[ 'bicubic'和'trilinear'。默認(rèn)值：'nearest'
align_corners (bool ， 可選）–如果True，則輸入和輸出張量的角像素對(duì)齊，因此保留這些像素的值。僅在mode為'linear'，'bilinear'或'trilinear'時(shí)才有效。默認(rèn)值：False

Shape:

輸入：，或
輸出：，或，其中

Warning

使用align_corners = True時(shí)，線性插值模式(<cite>線性</cite>，<cite>雙線性</cite>，<cite>雙三次</cite>和<cite>三線性</cite>）不會(huì)按比例對(duì)齊輸出和輸入像素，因此輸出值可能取決于輸入大小。這是這些模式(0.3.1 版之前）的默認(rèn)行為。從那時(shí)起，默認(rèn)行為是align_corners = False。有關(guān)如何影響輸出的具體示例，請(qǐng)參見下文。

Note

如果要縮減采樣/調(diào)整大小，應(yīng)使用interpolate()。

Examples:

>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
>>> input
tensor([[[[ 1.,  2.],
          [ 3.,  4.]]]])
>>> m = nn.Upsample(scale_factor=2, mode='nearest')
>>> m(input)
tensor([[[[ 1.,  1.,  2.,  2.],
          [ 1.,  1.,  2.,  2.],
          [ 3.,  3.,  4.,  4.],
          [ 3.,  3.,  4.,  4.]]]])
>>> m = nn.Upsample(scale_factor=2, mode='bilinear')  # align_corners=False
>>> m(input)
tensor([[[[ 1.0000,  1.2500,  1.7500,  2.0000],
          [ 1.5000,  1.7500,  2.2500,  2.5000],
          [ 2.5000,  2.7500,  3.2500,  3.5000],
          [ 3.0000,  3.2500,  3.7500,  4.0000]]]])
>>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
>>> m(input)
tensor([[[[ 1.0000,  1.3333,  1.6667,  2.0000],
          [ 1.6667,  2.0000,  2.3333,  2.6667],
          [ 2.3333,  2.6667,  3.0000,  3.3333],
          [ 3.0000,  3.3333,  3.6667,  4.0000]]]])
>>> # Try scaling the same data in a larger tensor
>>>
>>> input_3x3 = torch.zeros(3, 3).view(1, 1, 3, 3)
>>> input_3x3[:, :, :2, :2].copy_(input)
tensor([[[[ 1.,  2.],
          [ 3.,  4.]]]])
>>> input_3x3
tensor([[[[ 1.,  2.,  0.],
          [ 3.,  4.,  0.],
          [ 0.,  0.,  0.]]]])
>>> m = nn.Upsample(scale_factor=2, mode='bilinear')  # align_corners=False
>>> # Notice that values in top left corner are the same with the small input (except at boundary)
>>> m(input_3x3)
tensor([[[[ 1.0000,  1.2500,  1.7500,  1.5000,  0.5000,  0.0000],
          [ 1.5000,  1.7500,  2.2500,  1.8750,  0.6250,  0.0000],
          [ 2.5000,  2.7500,  3.2500,  2.6250,  0.8750,  0.0000],
          [ 2.2500,  2.4375,  2.8125,  2.2500,  0.7500,  0.0000],
          [ 0.7500,  0.8125,  0.9375,  0.7500,  0.2500,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
>>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
>>> # Notice that values in top left corner are now changed
>>> m(input_3x3)
tensor([[[[ 1.0000,  1.4000,  1.8000,  1.6000,  0.8000,  0.0000],
          [ 1.8000,  2.2000,  2.6000,  2.2400,  1.1200,  0.0000],
          [ 2.6000,  3.0000,  3.4000,  2.8800,  1.4400,  0.0000],
          [ 2.4000,  2.7200,  3.0400,  2.5600,  1.2800,  0.0000],
          [ 1.2000,  1.3600,  1.5200,  1.2800,  0.6400,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])

UpsamplingNearest2d

class torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)?

將二維最近鄰居上采樣應(yīng)用于由多個(gè)輸入通道組成的輸入信號(hào)。

要指定比例，請(qǐng)使用size或scale_factor作為構(gòu)造函數(shù)參數(shù)。

給定size時(shí)，它是圖像<cite>(h，w）</cite>的輸出大小。

Parameters

大小 (python：int 或元組 [ python：int ， python：int ] ，可選）–輸出空間大小
scale_factor (python：float 或元組 [ python：float ， python：float ] ， 可選）–空間大小的乘數(shù)。

Warning

不推薦使用此類，而推薦使用interpolate()。

Shape:

Input:
Output: where

Examples:

>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
>>> input
tensor([[[[ 1.,  2.],
          [ 3.,  4.]]]])
>>> m = nn.UpsamplingNearest2d(scale_factor=2)
>>> m(input)
tensor([[[[ 1.,  1.,  2.,  2.],
          [ 1.,  1.,  2.,  2.],
          [ 3.,  3.,  4.,  4.],
          [ 3.,  3.,  4.,  4.]]]])

上采樣雙線性 2d

class torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)?

將二維雙線性上采樣應(yīng)用于由多個(gè)輸入通道組成的輸入信號(hào)。

To specify the scale, it takes either the size or the scale_factor as it's constructor argument.

When size is given, it is the output size of the image <cite>(h, w)</cite>.

Parameters

size (python:int or Tuple[python:int__, python:int], optional) – output spatial sizes
scale_factor (python:float or Tuple[python:float__, python:float], optional) – multiplier for spatial size.

Warning

不推薦使用此類，而推薦使用interpolate()。等效于nn.functional.interpolate(..., mode='bilinear', align_corners=True)。

Shape:

Input:
Output: where

Examples:

>>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
>>> input
tensor([[[[ 1.,  2.],
          [ 3.,  4.]]]])
>>> m = nn.UpsamplingBilinear2d(scale_factor=2)
>>> m(input)
tensor([[[[ 1.0000,  1.3333,  1.6667,  2.0000],
          [ 1.6667,  2.0000,  2.3333,  2.6667],
          [ 2.3333,  2.6667,  3.0000,  3.3333],
          [ 3.0000,  3.3333,  3.6667,  4.0000]]]])

DataParallel 層(多 GPU，分布式）

數(shù)據(jù)并行

class torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)?

在模塊級(jí)別實(shí)現(xiàn)數(shù)據(jù)并行性。

該容器通過按批處理維度中的組塊在指定設(shè)備之間劃分輸入來并行化給定module的應(yīng)用程序(其他對(duì)象將每個(gè)設(shè)備復(fù)制一次）。在前向傳遞中，模塊在每個(gè)設(shè)備上復(fù)制，每個(gè)副本處理輸入的一部分。在向后傳遞過程中，每個(gè)副本的梯度被累加到原始模塊中。

批處理大小應(yīng)大于使用的 GPU 數(shù)量。

另請(qǐng)參見：使用 nn.DataParallel 而不是并行處理

允許將任意位置和關(guān)鍵字輸入傳遞到 DataParallel 中，但是某些類型經(jīng)過特殊處理。張量將在指定的昏暗狀態(tài)(默認(rèn)值為 0）下分散為。元組，列表和字典類型將被淺表復(fù)制。其他類型將在不同線程之間共享，并且如果寫入模型的前向傳遞中，則可能會(huì)損壞。

運(yùn)行此 DataParallel 模塊之前，并行化的module必須在device_ids[0]上具有其參數(shù)和緩沖區(qū)。

Warning

在每個(gè)轉(zhuǎn)發(fā)中，module是在每個(gè)設(shè)備上復(fù)制的，因此對(duì)forward中正在運(yùn)行的模塊的任何更新都將丟失。例如，如果module具有在每個(gè)forward中遞增的計(jì)數(shù)器屬性，則它將始終保持在初始值，因?yàn)楦率窃?code>forward之后銷毀的副本上進(jìn)行的。但是， DataParallel 保證device[0]上的副本具有與基本并行化module共享存儲(chǔ)的參數(shù)和緩沖區(qū)。因此，將記錄對(duì)device[0]上參數(shù)或緩沖區(qū)的就地更新。例如， BatchNorm2d 和 spectral_norm() 依賴此行為來更新緩沖區(qū)。

Warning

module及其子模塊上定義的前向和后向掛鉤將被調(diào)用len(device_ids)次，每次輸入都位于特定設(shè)備上。特別地，僅保證掛鉤在相對(duì)應(yīng)的設(shè)備上的操作中以正確的順序執(zhí)行。例如，不能保證在所有 len(device_ids) forward() 調(diào)用<cite>之前執(zhí)行通過 register_forward_pre_hook() 設(shè)置的掛鉤。在該設(shè)備的相應(yīng) forward() 調(diào)用之前執(zhí)行。</cite>

Warning

當(dāng)module在forward()中返回標(biāo)量(即 0 維張量）時(shí)，此包裝器將返回一個(gè)長(zhǎng)度等于向量的設(shè)備，該矢量等于數(shù)據(jù)并行性中使用的設(shè)備數(shù)，其中包含每個(gè)設(shè)備的結(jié)果。

Note

在包裹在 DataParallel 中的 Module 中使用pack sequence -> recurrent network -> unpack sequence模式有個(gè)微妙之處。有關(guān)詳細(xì)信息，請(qǐng)參見我的循環(huán)網(wǎng)絡(luò)不適用于數(shù)據(jù)并行性。部分。

Parameters

模塊 (模塊)–要并行化的模塊
device_ids (python：int 的列表： 或 Torch.device)– CUDA 設(shè)備(默認(rèn)：所有設(shè)備 )
output_device (python：int 或 Torch.device)–輸出的設(shè)備位置(默認(rèn)值：device_ids [ 0]）

Variables

?DataParallel.module (模塊)–要并行化的模塊

Example:

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)  # input_var can be on any device, including CPU

分布式數(shù)據(jù)并行

class torch.nn.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False)?

在模塊級(jí)別實(shí)現(xiàn)基于torch.distributed包的分布式數(shù)據(jù)并行性。

該容器通過按批處理維度分塊指定設(shè)備之間的輸入來并行化給定模塊的應(yīng)用程序。該模塊在每臺(tái)機(jī)器和每臺(tái)設(shè)備上復(fù)制，并且每個(gè)這樣的副本處理一部分輸入。在向后傳遞過程中，將平均每個(gè)節(jié)點(diǎn)的梯度。

批處理大小應(yīng)大于本地使用的 GPU 數(shù)量。

另請(qǐng)參見：基礎(chǔ)知識(shí)和使用 nn.DataParallel 而不是并行處理。對(duì)輸入的限制與 torch.nn.DataParallel 相同。

此類的創(chuàng)建要求通過調(diào)用 torch.distributed.init_process_group() 已初始化torch.distributed。

DistributedDataParallel可以通過以下兩種方式使用：

單進(jìn)程多 GPU

在這種情況下，將在每個(gè)主機(jī)/節(jié)點(diǎn)上生成一個(gè)進(jìn)程，并且每個(gè)進(jìn)程將在運(yùn)行該節(jié)點(diǎn)的節(jié)點(diǎn)的所有 GPU 上運(yùn)行。要以這種方式使用DistributedDataParallel，您可以簡(jiǎn)單地按以下方式構(gòu)建模型：

>>> torch.distributed.init_process_group(backend="nccl")
>>> model = DistributedDataParallel(model) # device_ids will include all GPU devices by default

多進(jìn)程單 GPU

強(qiáng)烈建議將DistributedDataParallel與多個(gè)進(jìn)程配合使用，每個(gè)進(jìn)程都在單個(gè) GPU 上運(yùn)行。這是目前使用 PyTorch 進(jìn)行數(shù)據(jù)并行訓(xùn)練的最快方法，適用于單節(jié)點(diǎn)(multi-GPU）和多節(jié)點(diǎn)數(shù)據(jù)并行訓(xùn)練。對(duì)于單節(jié)點(diǎn)多 GPU 數(shù)據(jù)并行訓(xùn)練，它被證明比 torch.nn.DataParallel 快得多。

使用方法如下：在具有 N 個(gè) GPU 的每臺(tái)主機(jī)上，應(yīng)生成 N 個(gè)進(jìn)程，同時(shí)確保每個(gè)進(jìn)程在 0 至 N-1 的單個(gè) GPU 上可單獨(dú)運(yùn)行。因此，您的工作是通過調(diào)用以下命令來確保您的訓(xùn)練腳本在單個(gè)給定的 GPU 上運(yùn)行：

>>> torch.cuda.set_device(i)

我從 0 到 N-1。在每個(gè)過程中，應(yīng)參考以下內(nèi)容來構(gòu)造此模塊：

>>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...')
>>> model = DistributedDataParallel(model, device_ids=[i], output_device=i)

為了在每個(gè)節(jié)點(diǎn)上產(chǎn)生多個(gè)進(jìn)程，可以使用torch.distributed.launch或torch.multiprocessing.spawn

Note

nccl后端目前是與多進(jìn)程單 GPU 分布式訓(xùn)練一起使用的最快和強(qiáng)烈推薦的后端，這適用于單節(jié)點(diǎn)和多節(jié)點(diǎn)分布式訓(xùn)練

Note

該模塊還支持混合精度分布式訓(xùn)練。這意味著您的模型可以具有不同類型的參數(shù)，例如 fp16 和 fp32 的混合類型，對(duì)這些混合類型的參數(shù)進(jìn)行梯度降低將可以正常工作。另請(qǐng)注意，nccl后端是目前 fp16 / fp32 混合精度訓(xùn)練中最快，最受推薦的后端。

Note

如果在一個(gè)進(jìn)程上使用torch.save檢查模塊，在其他進(jìn)程上使用torch.load對(duì)其進(jìn)行恢復(fù)，請(qǐng)確保為每個(gè)進(jìn)程都正確配置了map_location。沒有map_location，torch.load會(huì)將模塊恢復(fù)到保存模塊的設(shè)備。

Warning

該模塊僅適用于gloo和nccl后端。

Warning

構(gòu)造函數(shù)，正向方法和輸出的微分(或此模塊輸出的函數(shù)）是分布式同步點(diǎn)。如果不同的進(jìn)程可能執(zhí)行不同的代碼，請(qǐng)考慮到這一點(diǎn)。

Warning

該模塊假定所有參數(shù)在創(chuàng)建時(shí)已在模型中注冊(cè)。不應(yīng)添加或刪除參數(shù)。同樣適用于緩沖區(qū)。

Warning

該模塊假定所有參數(shù)在模型中注冊(cè)的每個(gè)分布式過程都以相同的順序進(jìn)行。模塊本身將按照模型注冊(cè)參數(shù)的相反順序進(jìn)行梯度全約。換句話說，確保每個(gè)分布式過程具有完全相同的模型并因此具有完全相同的參數(shù)注冊(cè)順序是用戶的責(zé)任。

Warning

該模塊假定所有緩沖區(qū)和漸變都是密集的。

Warning

此模塊不適用于 torch.autograd.grad() (即，僅當(dāng)要在.grad參數(shù)的屬性中累積梯度時(shí)才適用）。

Warning

如果您打算將此模塊與nccl后端或gloo后端(使用 Infiniband）一起使用，以及使用多個(gè)工作程序的 DataLoader，請(qǐng)將并行處理啟動(dòng)方法更改為forkserver(僅適用于 Python 3），或者 spawn。不幸的是，Gloo(使用 Infiniband）和 NCCL2 都不是安全的，如果不更改此設(shè)置，您可能會(huì)遇到死鎖。

Warning

除非在forward()方法中初始化了掛鉤，否則將不再調(diào)用module及其子模塊上定義的向前和向后掛鉤。

Warning

在用 DistributedDataParallel 包裝模型之后，您永遠(yuǎn)不要嘗試更改模型的參數(shù)。換句話說，當(dāng)用 DistributedDataParallel 包裝模型時(shí)，DistributedDataParallel 的構(gòu)造函數(shù)將在構(gòu)造時(shí)在模型本身的所有參數(shù)上注冊(cè)其他梯度減少函數(shù)。如果在構(gòu)造 DistributedDataParallel 之后更改模型的參數(shù)，則不支持此操作，并且可能會(huì)發(fā)生意外行為，因?yàn)榭赡懿粫?huì)調(diào)用某些參數(shù)的梯度減小函數(shù)。

Note

參數(shù)從不在進(jìn)程之間廣播。該模塊對(duì)梯度執(zhí)行全縮減步驟，并假設(shè)優(yōu)化器將在所有過程中以相同方式修改它們。緩沖區(qū)(例如 BatchNorm 統(tǒng)計(jì)信息）在每次迭代中從模塊廣播到第 0 級(jí)進(jìn)程，并廣播到系統(tǒng)中的所有其他副本。

Parameters

module (Module) – module to be parallelized
device_ids (python：int 或 Torch.device] 的列表）– CUDA 設(shè)備。僅當(dāng)輸入模塊位于單個(gè) CUDA 設(shè)備上時(shí)才應(yīng)提供此選項(xiàng)。對(duì)于單設(shè)備模塊，i``th :attr:modulereplica is placed on ``device_ids[i]。對(duì)于多設(shè)備模塊和 CPU 模塊，device_ids 必須為 None 或?yàn)榭樟斜?，并且用于正向傳遞的輸入數(shù)據(jù)必須放置在正確的設(shè)備上。 (默認(rèn)：?jiǎn)卧O(shè)備模塊的所有設(shè)備）
output_device (python：int 或 Torch.device)–單設(shè)備 CUDA 輸出的設(shè)備位置模塊。對(duì)于多設(shè)備模塊和 CPU 模塊，它必須為 None(無），并且模塊本身指定輸出位置。 (對(duì)于單設(shè)備模塊，默認(rèn)值：device_ids [0]）
broadcast_buffers (bool )–該標(biāo)志在轉(zhuǎn)發(fā)功能開始時(shí)啟用模塊的同步(廣播）緩沖區(qū)。 (默認(rèn)：True）
process_group –用于減少所有分布式數(shù)據(jù)的進(jìn)程組。如果None，將使用由torch.distributed.init_process_group創(chuàng)建的默認(rèn)進(jìn)程組。 (默認(rèn)：None）
bucket_cap_mb – DistributedDataParallel 會(huì)將參數(shù)存儲(chǔ)到多個(gè)存儲(chǔ)桶中，以便每個(gè)存儲(chǔ)桶的梯度縮減可能與反向計(jì)算重疊。 bucket_cap_mb控制存儲(chǔ)桶的大小(以兆字節(jié)(MB）為單位）(默認(rèn)值：25）
find_unused_parameters (bool )–遍歷包裝模塊的forward函數(shù)的返回值中包含的所有張量的自動(dòng)梯度圖。在此圖表中未接收到漸變的參數(shù)會(huì)被搶先標(biāo)記為可以還原。請(qǐng)注意，從模塊參數(shù)派生的所有forward輸出必須參與計(jì)算損耗，然后再參與梯度計(jì)算。如果沒有，該包裝器將掛起，等待 autograd 為這些參數(shù)生成梯度。可以使用torch.Tensor.detach將來自模塊參數(shù)的其他未使用的輸出從 autograd 圖中分離出來。 (默認(rèn)：False）
check_reduction –設(shè)置為True時(shí)，它使 DistributedDataParallel 自動(dòng)檢查在每次迭代的正向功能開始時(shí)是否成功發(fā)布了先前迭代的向后縮減。通常您不需要啟用此選項(xiàng)，除非您觀察到奇怪的現(xiàn)象，例如不同的等級(jí)得到不同的梯度，如果正確使用 DistributedDataParallel，則不會(huì)發(fā)生這種情況。 (默認(rèn)：False）

Variables

?DistributedDataParallel.module (Module)–要并行化的模塊

Example:

>>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...')
>>> net = torch.nn.DistributedDataParallel(model, pg)

no_sync()?

上下文管理器，用于禁用 DDP 進(jìn)程之間的梯度同步。在此上下文中，梯度將累積在模塊變量上，稍后將在退出上下文的第一個(gè)向前-向后傳遞中進(jìn)行同步。

Example:

>>> ddp = torch.nn.DistributedDataParallel(model, pg)
>>> with ddp.no_sync():
...   for input in inputs:
...     ddp(input).backward()  # no synchronization, accumulate grads
... ddp(another_input).backward()  # synchronize grads

實(shí)用工具

clipgrad_norm

torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2)?

裁剪參數(shù)可迭代的梯度范數(shù)。

范數(shù)是在所有梯度上一起計(jì)算的，就好像它們被串聯(lián)到單個(gè)矢量中一樣。漸變就地修改。

Parameters

參數(shù)(可迭代的 [ tensor ]或 tensor)–張量的迭代或?qū)埩繗w一化的單個(gè)張量
max_norm (python：float 或 python：int )–梯度的最大范數(shù)
norm_type (python：float 或 python：int )–使用的 p 范數(shù)的類型。對(duì)于無窮大范數(shù)可以為'inf'。

Returns

參數(shù)的總范數(shù)(視為單個(gè)向量）。

clipgrad_value

torch.nn.utils.clip_grad_value_(parameters, clip_value)?

將可迭代參數(shù)的梯度剪切為指定值。

漸變就地修改。

Parameters

parameters (Iterable__[Tensor] or Tensor) – an iterable of Tensors or a single Tensor that will have gradients normalized
clip_value (python：float 或 python：int )–漸變的最大允許值。漸變被限制在范圍內(nèi)

parameters_to_vector

torch.nn.utils.parameters_to_vector(parameters)?

將參數(shù)轉(zhuǎn)換為一個(gè)向量

Parameters

參數(shù)(可迭代 [ tensor ] )–張量的迭代器模型的參數(shù)。

Returns

單個(gè)向量表示的參數(shù)

vector_to_parameters

torch.nn.utils.vector_to_parameters(vec, parameters)?

將一個(gè)向量轉(zhuǎn)換為參數(shù)

Parameters

vec (tensor)–單個(gè)矢量表示模型的參數(shù)。
parameters (Iterable__[Tensor]) – an iterator of Tensors that are the parameters of a model.

BasePruningMethod

class torch.nn.utils.prune.BasePruningMethod?

用于創(chuàng)建新修剪技術(shù)的抽象基類。

為需要重寫諸如 compute_mask() 和 apply() 等方法的定制提供框架。

classmethod apply(module, name, *args, **kwargs)?

添加了前向預(yù)掛接，可進(jìn)行即時(shí)修剪，并根據(jù)原始張量和修剪蒙版對(duì)張量進(jìn)行重新參數(shù)化。

Parameters

模塊 (nn.Module)–包含要修剪的張量的模塊
名稱 (str )– module中將對(duì)其進(jìn)行修剪的參數(shù)名稱。
args –傳遞給 BasePruningMethod 子類的參數(shù)
kwargs –傳遞給 BasePruningMethod 子類的關(guān)鍵字參數(shù)

apply_mask(module)?

只需處理要修剪的參數(shù)和生成的掩碼之間的乘法。從模塊中獲取遮罩和原始張量，然后返回該張量的修剪版本。

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

輸入張量的修剪版本

Return type

pruned_tensor (Torch.Tensor)

abstract compute_mask(t, default_mask)?

計(jì)算并返回輸入張量t的掩碼。從基礎(chǔ)default_mask(如果尚未修剪張量，應(yīng)為 1 的掩碼）開始，根據(jù)特定的修剪方法配方，生成一個(gè)隨機(jī)掩碼以應(yīng)用于default_mask的頂部。

Parameters

t (torch張量)–代表修剪參數(shù)的張量
default_mask (torch張量)–先前修剪迭代中的基礎(chǔ)掩碼，在應(yīng)用新掩碼后需要加以注意。與t相同。

Returns

應(yīng)用于t的遮罩，其亮度與t相同

Return type

面罩(torch。張量）

prune(t, default_mask=None)?

根據(jù) compute_mask() 中指定的修剪規(guī)則，計(jì)算并返回輸入張量t的修剪版本。

Parameters

t (torch張量)–張量到修剪(與default_mask相同的尺寸）。
default_mask (炬管張量 ， 可選）–先前修剪迭代的掩碼(如果有）。在確定應(yīng)對(duì)張量的哪一部分進(jìn)行修剪時(shí)考慮。如果為 None，則默認(rèn)為 1 的掩碼。

Returns

張量的修剪版本t。

remove(module)?

從模塊中刪除修剪重新參數(shù)化。被修剪的名為name的參數(shù)將被永久修剪，并且將從參數(shù)列表中刪除名為name+'_orig'的參數(shù)。同樣，從緩沖區(qū)中刪除名為name+'_mask'的緩沖區(qū)。

Note

修剪本身不會(huì)撤消或撤消！

修剪容器

class torch.nn.utils.prune.PruningContainer(*args)?

容器，其中包含一系列用于迭代修剪的修剪方法。跟蹤修剪方法的應(yīng)用順序，并處理合并的連續(xù)修剪調(diào)用。

接受 BasePruningMethod 的實(shí)例或它們的可迭代實(shí)例作為參數(shù)。

add_pruning_method(method)?

將子修剪method添加到容器中。

Parameters

方法(BasePruningMethod 的子類）–要添加到容器的子修剪方法。

classmethod apply(module, name, *args, **kwargs)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
args – arguments passed on to a subclass of BasePruningMethod
kwargs – keyword arguments passed on to a subclass of a BasePruningMethod

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

compute_mask(t, default_mask)?

通過計(jì)算新的局部掩碼并返回其與default_mask的組合來應(yīng)用最新的method。新的部分掩碼應(yīng)在default_mask未歸零的條目或通道上計(jì)算。將根據(jù)PRUNING_TYPE(由類型處理程序處理）來計(jì)算新掩碼的張量t的哪一部分：

對(duì)于“非結(jié)構(gòu)化”，蒙版將根據(jù)亂碼計(jì)算

非屏蔽條目列表；

- 對(duì)于“結(jié)構(gòu)化”，遮罩將根據(jù)非遮罩計(jì)算

張量中的通道；

- 對(duì)于“全局”，將在所有條目中計(jì)算掩碼。

Parameters

t (torch張量)–代表要修剪的參數(shù)的張量(與default_mask尺寸相同）。
default_mask (torch張量)–先前修剪迭代的掩碼。

Returns

合并了default_mask和來自當(dāng)前修剪method的新蒙版(與default_mask和t尺寸相同）的新蒙版的新蒙版。

Return type

mask (torch.Tensor)

prune(t, default_mask=None)?

根據(jù) compute_mask() 中指定的修剪規(guī)則，計(jì)算并返回輸入張量t的修剪版本。

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Removes the pruning reparameterization from a module. The pruned parameter named name remains permanently pruned, and the parameter named name+'_orig' is removed from the parameter list. Similarly, the buffer named name+'_mask' is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

Identity

class torch.nn.utils.prune.Identity?

實(shí)用修剪方法，不修剪任何單位，而是用一個(gè)“ 1”的掩碼生成修剪參數(shù)。

classmethod apply(module, name)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

prune(t, default_mask=None)?

根據(jù)compute_mask()中指定的修剪規(guī)則，計(jì)算并返回輸入張量t的修剪版本。

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

隨機(jī)非結(jié)構(gòu)化

class torch.nn.utils.prune.RandomUnstructured(amount)?

在張量中隨機(jī)修剪(當(dāng)前未修剪）單位。

Parameters

name (str) – parameter name within module on which pruning will act.
數(shù)量 (python：int 或 python：float )–修剪參數(shù)的數(shù)量。如果float，則應(yīng)在 0.0 到 1.0 之間，并且代表要修剪的參數(shù)的分?jǐn)?shù)。如果int，則表示要修剪的參數(shù)的絕對(duì)數(shù)量。

classmethod apply(module, name, amount)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

prune(t, default_mask=None)?

Computes and returns a pruned version of input tensor t according to the pruning rule specified in compute_mask().

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

L1 非結(jié)構(gòu)化

class torch.nn.utils.prune.L1Unstructured(amount)?

通過將具有最低 L1 范數(shù)的單元?dú)w零，在張量中修剪(當(dāng)前未修剪的）單元。

Parameters

amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

classmethod apply(module, name, amount)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

prune(t, default_mask=None)?

Computes and returns a pruned version of input tensor t according to the pruning rule specified in compute_mask().

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

隨機(jī)結(jié)構(gòu)

class torch.nn.utils.prune.RandomStructured(amount, dim=-1)?

在張量中隨機(jī)修剪整個(gè)(當(dāng)前未修剪的）通道。

Parameters

amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
暗淡 (python：int ， 可選）–暗淡的索引，我們沿著該暗淡定義了修剪通道。默認(rèn)值：-1。

classmethod apply(module, name, amount, dim=-1)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
dim (python:int__, optional) – index of the dim along which we define channels to prune. Default: -1.

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

compute_mask(t, default_mask)?

計(jì)算并返回輸入張量t的掩碼。從基礎(chǔ)default_mask(如果尚未修剪張量，應(yīng)為 1 的掩碼）開始，通過沿張量的指定暗部隨機(jī)清零通道，生成隨機(jī)掩碼以應(yīng)用于default_mask頂部。

Parameters

t (torch.Tensor) – tensor representing the parameter to prune
default_mask (torch.Tensor) – Base mask from previous pruning iterations, that need to be respected after the new mask is applied. Same dims as t.

Returns

mask to apply to t, of same dims as t

Return type

mask (torch.Tensor)

Raises

IndexError –如果self.dim >= len(t.shape)

prune(t, default_mask=None)?

根據(jù) compute_mask() 中指定的修剪規(guī)則，計(jì)算并返回輸入張量t的修剪版本。

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

Ln 結(jié)構(gòu)化

class torch.nn.utils.prune.LnStructured(amount, n, dim=-1)?

根據(jù)張量的 Ln 范數(shù)在張量中修剪整個(gè)(當(dāng)前未修剪的）通道。

Parameters

數(shù)量 (python：int 或 python：float )–修剪通道的數(shù)量。如果float，則應(yīng)在 0.0 到 1.0 之間，并且代表要修剪的參數(shù)的分?jǐn)?shù)。如果int，則表示要修剪的參數(shù)的絕對(duì)數(shù)量。
n (python：int ， python：float ， inf ， -inf ， '來回 ， 'nuc'）–請(qǐng)參閱有效的文檔 torch.norm() 中參數(shù)p的條目。
dim (python:int__, optional) – index of the dim along which we define channels to prune. Default: -1.

classmethod apply(module, name, amount, n, dim)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
n (python:int__, python:float__, inf__, -inf__, 'fro'__, 'nuc') – See documentation of valid entries for argument p in torch.norm().
暗淡 (python：int )–暗淡的索引，我們沿其定義修剪的通道。

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

compute_mask(t, default_mask)?

計(jì)算并返回輸入張量t的掩碼。從基本default_mask(如果尚未修剪張量，應(yīng)為 1 的掩碼）開始，通過將沿指定的暗角(具有最低 Ln 的通道）歸零，生成一個(gè)掩碼以應(yīng)用于default_mask頂部 -規(guī)范。

Parameters

t (torch.Tensor) – tensor representing the parameter to prune
default_mask (torch張量)–先前修剪迭代中的基礎(chǔ)掩碼，在應(yīng)用新掩碼后需要加以注意。與t相同。

Returns

mask to apply to t, of same dims as t

Return type

mask (torch.Tensor)

Raises

IndexError – if self.dim >= len(t.shape)

prune(t, default_mask=None)?

根據(jù) compute_mask() 中指定的修剪規(guī)則，計(jì)算并返回輸入張量t的修剪版本。

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

CustomFromMask

class torch.nn.utils.prune.CustomFromMask(mask)?

classmethod apply(module, name, mask)?

Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask.

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.

apply_mask(module)?

Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor.

Parameters

module (nn.Module) – module containing the tensor to prune

Returns

pruned version of the input tensor

Return type

pruned_tensor (torch.Tensor)

prune(t, default_mask=None)?

Computes and returns a pruned version of input tensor t according to the pruning rule specified in compute_mask().

Parameters

t (torch.Tensor) – tensor to prune (of same dimensions as default_mask).
default_mask (*torch.Tensor*, optional) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

pruned version of tensor t.

remove(module)?

Note

Pruning itself is NOT undone or reversed!

身份

torch.nn.utils.prune.identity(module, name)?

將修剪重新參數(shù)化應(yīng)用于與module中稱為name的參數(shù)相對(duì)應(yīng)的張量，而無需實(shí)際修剪任何單位。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Note

掩碼是一個(gè)張量。

Parameters

模塊 (nn.Module)–包含要修剪的張量的模塊。
name (str) – parameter name within module on which pruning will act.

Returns

輸入模塊的修改(即修剪）版本

Return type

模塊 (nn.Module)

Examples

>>> m = prune.identity(nn.Linear(2, 3), 'bias')
>>> print(m.bias_mask)
tensor([1., 1., 1.])

random_unstructured

torch.nn.utils.prune.random_unstructured(module, name, amount)?

通過刪除隨機(jī)選擇的(當(dāng)前未修剪的）單位的指定amount來修剪與module中稱為name的參數(shù)相對(duì)應(yīng)的張量。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)<cite>名稱</cite>的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

Returns

modified (i.e. pruned) version of the input module

Return type

module (nn.Module)

Examples

>>> m = prune.random_unstructured(nn.Linear(2, 3), 'weight', amount=1)
>>> torch.sum(m.weight_mask == 0)
tensor(1)

l1_unstructured

torch.nn.utils.prune.l1_unstructured(module, name, amount)?

通過刪除 L1 范數(shù)最低的(當(dāng)前未修剪）單位的指定<cite>量</cite>，修剪與module中稱為name的參數(shù)相對(duì)應(yīng)的張量。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

Returns

modified (i.e. pruned) version of the input module

Return type

module (nn.Module)

Examples

>>> m = prune.l1_unstructured(nn.Linear(2, 3), 'weight', amount=0.2)
>>> m.state_dict().keys()
odict_keys(['bias', 'weight_orig', 'weight_mask'])

隨機(jī)結(jié)構(gòu)

torch.nn.utils.prune.random_structured(module, name, amount, dim)?

通過沿著隨機(jī)選擇的指定dim移除(當(dāng)前未修剪的）通道的指定amount來修剪與module中稱為name的參數(shù)相對(duì)應(yīng)的張量。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
暗淡 (python：int )–暗淡的索引，我們沿其定義修剪的通道。

Returns

modified (i.e. pruned) version of the input module

Return type

module (nn.Module)

Examples

>>> m = prune.random_structured(
        nn.Linear(5, 3), 'weight', amount=3, dim=1
    )
>>> columns_pruned = int(sum(torch.sum(m.weight, dim=0) == 0))
>>> print(columns_pruned)
3

ln_ 結(jié)構(gòu)化

torch.nn.utils.prune.ln_structured(module, name, amount, n, dim)?

通過沿著具有最低 L''n`范數(shù)的指定dim移除(當(dāng)前未修剪的）通道的指定amount來修剪與module中稱為name的參數(shù)相對(duì)應(yīng)的張量。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'`的新參數(shù)中。

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
amount (python:int or python:float) – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
n (python:int__, python:float__, inf__, -inf__, 'fro'__, 'nuc') – See documentation of valid entries for argument p in torch.norm().
dim (python:int) – index of the dim along which we define channels to prune.

Returns

modified (i.e. pruned) version of the input module

Return type

module (nn.Module)

Examples

>>> m = prune.ln_structured(
       nn.Conv2d(5, 3, 2), 'weight', amount=0.3, dim=1, n=float('-inf')
    )

global_unstructured

torch.nn.utils.prune.global_unstructured(parameters, pruning_method, **kwargs)?

通過應(yīng)用指定的pruning_method全局修剪與parameters中所有參數(shù)相對(duì)應(yīng)的張量。通過以下方式修改模塊：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Parameters

參數(shù) (( 模塊，名稱 ）的迭代 ] 元組）–以全局方式修剪的模型參數(shù)，即在決定要修剪的權(quán)重之前，先匯總所有權(quán)重。模塊必須為nn.Module類型，名稱必須為字符串。
pruning_method (函數(shù)）–該模塊中的有效修剪函數(shù)，或者是由用戶實(shí)現(xiàn)的，滿足實(shí)施準(zhǔn)則并具有PRUNING_TYPE='unstructured'的自定義函數(shù)。
kwargs –其他關(guān)鍵字參數(shù)，例如：amount(整數(shù)或浮點(diǎn)數(shù)）：跨指定參數(shù)修剪的參數(shù)數(shù)量。如果float，則應(yīng)在 0.0 到 1.0 之間，并且代表要修剪的參數(shù)的分?jǐn)?shù)。如果int，則表示要修剪的參數(shù)的絕對(duì)數(shù)量。

Raises

TypeError –如果PRUNING_TYPE != 'unstructured'

Note

由于除非通過參數(shù)的大小對(duì)規(guī)范進(jìn)行規(guī)范化，否則全局結(jié)構(gòu)化修剪沒有多大意義，因此我們現(xiàn)在將全局修剪的范圍限制為非結(jié)構(gòu)化方法。

Examples

>>> net = nn.Sequential(OrderedDict([
        ('first', nn.Linear(10, 4)),
        ('second', nn.Linear(4, 1)),
    ]))
>>> parameters_to_prune = (
        (net.first, 'weight'),
        (net.second, 'weight'),
    )
>>> prune.global_unstructured(
        parameters_to_prune,
        pruning_method=prune.L1Unstructured,
        amount=10,
    )
>>> print(sum(torch.nn.utils.parameters_to_vector(net.buffers()) == 0))
tensor(10, dtype=torch.uint8)

custom_from_mask

torch.nn.utils.prune.custom_from_mask(module, name, mask)?

通過在mask中應(yīng)用預(yù)先計(jì)算的掩碼，修剪與module中稱為name的參數(shù)相對(duì)應(yīng)的張量。通過以下方式就地修改模塊(并返回修改后的模塊）：1）添加一個(gè)名為name+'_mask'的命名緩沖區(qū)，該緩沖區(qū)與通過修剪方法應(yīng)用于參數(shù)name的二進(jìn)制掩碼相對(duì)應(yīng)。 2）用已修剪版本替換參數(shù)name，而原始(未修剪）參數(shù)存儲(chǔ)在名為name+'_orig'的新參數(shù)中。

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.
掩碼 (tensor)–應(yīng)用于參數(shù)的二進(jìn)制掩碼。

Returns

modified (i.e. pruned) version of the input module

Return type

module (nn.Module)

Examples

>>> m = prune.custom_from_mask(
        nn.Linear(5, 3), name='bias', mask=torch.Tensor([0, 1, 0])
    )
>>> print(m.bias_mask)
tensor([0., 1., 0.])

去掉

torch.nn.utils.prune.remove(module, name)?

從模塊中刪除修剪重新參數(shù)化，并從前向掛鉤中刪除修剪方法。被修剪的名為name的參數(shù)將被永久修剪，并且將從參數(shù)列表中刪除名為name+'_orig'的參數(shù)。同樣，從緩沖區(qū)中刪除名為name+'_mask'的緩沖區(qū)。

Note

Pruning itself is NOT undone or reversed!

Parameters

module (nn.Module) – module containing the tensor to prune
name (str) – parameter name within module on which pruning will act.

Examples

>>> m = random_pruning(nn.Linear(5, 7), name='weight', amount=0.2)
>>> m = remove_pruning(m, name='weight')

is_pruned

torch.nn.utils.prune.is_pruned(module)?

通過在從 BasePruningMethod 繼承的模塊中查找forward_pre_hooks，檢查是否修剪了module。

Parameters

模塊 (nn.Module)–已修剪或未修剪的對(duì)象

Returns

是否修剪module的二進(jìn)制答案。

Examples

>>> m = nn.Linear(5, 7)
>>> print(prune.is_pruned(m))
False
>>> prune.random_pruning(m, name='weight', amount=0.2)
>>> print(prune.is_pruned(m))
True

weight_norm

torch.nn.utils.weight_norm(module, name='weight', dim=0)?

將權(quán)重歸一化應(yīng)用于給定模塊中的參數(shù)。

權(quán)重歸一化是將權(quán)重張量的大小與其方向解耦的重新參數(shù)化。這用兩個(gè)參數(shù)替換了name指定的參數(shù)(例如'weight'）：一個(gè)指定幅度(例如'weight_g'）和一個(gè)指定方向(例如'weight_v'）。權(quán)重歸一化是通過一個(gè)掛鉤實(shí)現(xiàn)的，該掛鉤在每次forward()調(diào)用之前從幅度和方向重新計(jì)算權(quán)重張量。

默認(rèn)情況下，使用dim=0，將針對(duì)每個(gè)輸出通道/平面獨(dú)立計(jì)算范數(shù)。要計(jì)算整個(gè)重量張量的范數(shù)，請(qǐng)使用dim=None。

參見 https://arxiv.org/abs/1602.07868

Parameters

模塊 (模塊)–包含模塊
名稱 (str ，可選）–重量參數(shù)的名稱
昏暗的 (python：int ， 可選）–計(jì)算范數(shù)的維度

Returns

帶有重量標(biāo)準(zhǔn)鉤的原始模塊

Example:

>>> m = weight_norm(nn.Linear(20, 40), name='weight')
>>> m
Linear(in_features=20, out_features=40, bias=True)
>>> m.weight_g.size()
torch.Size([40, 1])
>>> m.weight_v.size()
torch.Size([40, 20])

remove_weight_norm

torch.nn.utils.remove_weight_norm(module, name='weight')?

從模塊中刪除權(quán)重歸一化重新參數(shù)化。

Parameters

module (Module) – containing module
name (str__, optional) – name of weight parameter

例

>>> m = weight_norm(nn.Linear(20, 40))
>>> remove_weight_norm(m)

Spectrum_norm

torch.nn.utils.spectral_norm(module, name='weight', n_power_iterations=1, eps=1e-12, dim=None)?

將頻譜歸一化應(yīng)用于給定模塊中的參數(shù)。

頻譜歸一化通過使用權(quán)重矩陣計(jì)算的權(quán)重矩陣的頻譜范數(shù)重新調(diào)整權(quán)重張量，從而穩(wěn)定了生成對(duì)抗網(wǎng)絡(luò)(GAN）中鑒別器(批評(píng)家）的訓(xùn)練。如果權(quán)重張量的尺寸大于 2，則在冪迭代方法中將其重塑為 2D 以獲得頻譜范數(shù)。這是通過一個(gè)掛鉤實(shí)現(xiàn)的，該掛鉤在每次forward()調(diào)用之前計(jì)算頻譜范數(shù)并重新調(diào)整權(quán)重。

請(qǐng)參閱生成對(duì)抗網(wǎng)絡(luò)的頻譜歸一化。

Parameters

module (nn.Module) – containing module
name (str__, optional) – name of weight parameter
n_power_iterations (python：int ，可選）–計(jì)算頻譜范數(shù)的功率迭代次數(shù)
eps (python：float ， 可選）– epsilon 在計(jì)算范數(shù)時(shí)具有數(shù)值穩(wěn)定性
暗淡的 (python：int ，可選）–尺寸對(duì)應(yīng)于輸出數(shù)量，默認(rèn)為0，除了模塊是 ConvTranspose {1,2,3} d 的實(shí)例，當(dāng)它是1時(shí)

Returns

帶有頻譜范數(shù)掛鉤的原始模塊

Example:

>>> m = spectral_norm(nn.Linear(20, 40))
>>> m
Linear(in_features=20, out_features=40, bias=True)
>>> m.weight_u.size()
torch.Size([40])

remove_spectral_norm

torch.nn.utils.remove_spectral_norm(module, name='weight')?

從模塊中刪除頻譜歸一化重新參數(shù)化。

Parameters

module (Module) – containing module
name (str__, optional) – name of weight parameter

Example

>>> m = spectral_norm(nn.Linear(40, 10))
>>> remove_spectral_norm(m)

打包序列

torch.nn.utils.rnn.PackedSequence(data, batch_sizes=None, sorted_indices=None, unsorted_indices=None)?

保存打包序列的batch_sizes的數(shù)據(jù)和列表。

所有 RNN 模塊都將打包序列作為輸入。

Note

此類的實(shí)例永遠(yuǎn)不要手動(dòng)創(chuàng)建。它們應(yīng)通過 pack_padded_sequence() 之類的函數(shù)實(shí)例化。

批次大小代表批次中每個(gè)序列步驟的數(shù)量元素，而不是傳遞給 pack_padded_sequence() 的變化序列長(zhǎng)度。例如，給定數(shù)據(jù)abc和x， PackedSequence 將包含數(shù)據(jù)axbc和batch_sizes=[2,1,1]。

Variables

?PackedSequence.data (tensor)–包含壓縮序列的張量
?PackedSequence.batch_sizes (tensor)–整數(shù)張量，用于保存有關(guān)每個(gè)序列步驟的批次大小信息
?PackedSequence.sorted_indices (tensor ， 可選）–保持此 [ PackedSequence 由序列構(gòu)建。
?PackedSequence.unsorted_indices (tensor ， 可選）–整數(shù)的張量，表示如何恢復(fù)原始值順序正確的序列。

Note

data可以在任意設(shè)備上和任意 dtype 上。 sorted_indices和unsorted_indices必須是與data在同一設(shè)備上的torch.int64張量。

但是，batch_sizes應(yīng)該始終是 CPU torch.int64張量。

這個(gè)不變量在整個(gè) PackedSequence 類中保持不變，并且所有在 PyTorch 中構(gòu)造<cite>：class：PackedSequence</cite> 的函數(shù)都保持不變(即，它們僅傳遞符合此約束的張量）。

pack_padded_sequence

torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True)?

打包一個(gè) Tensor，其中包含可變長(zhǎng)度的填充序列。

input的大小可以為T x B x *，其中 <cite>T</cite> 是最長(zhǎng)序列的長(zhǎng)度(等于lengths[0]），B是批處理大小，*是任意數(shù)量的尺寸 (包括 0）。如果batch_first為True，則預(yù)期為B x T x * input。

對(duì)于未排序的序列，請(qǐng)使用 <cite>force_sorted = False</cite> 。如果enforce_sorted為True，則序列應(yīng)按長(zhǎng)度降序排列，即input[:,0]應(yīng)為最長(zhǎng)序列，而input[:,B-1]為最短序列。 <cite>forcen_sorted = True</cite> 僅對(duì)于 ONNX 導(dǎo)出是必需的。

Note

此函數(shù)接受至少具有二維的任何輸入。您可以將其應(yīng)用于包裝標(biāo)簽，并與它們一起使用 RNN 的輸出直接計(jì)算損失。可以通過訪問 .data屬性來從 PackedSequence 對(duì)象中檢索張量。

Parameters

輸入 (tensor)–填充了可變長(zhǎng)度序列的批處理。
長(zhǎng)度 (tensor)–每個(gè)批處理元素的序列長(zhǎng)度列表。
batch_first (bool ， 可選）–如果為True，則輸入應(yīng)為B x T x *格式。
強(qiáng)制排序的 (bool ，可選）–如果True，則輸入應(yīng)包含按長(zhǎng)度降序排列的序列。如果為False，則不檢查此條件。默認(rèn)值：True。

Returns

PackedSequence 對(duì)象

pad_packed_sequence

torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)?

填充打包的可變長(zhǎng)度序列批次。

這是 pack_padded_sequence() 的逆運(yùn)算。

返回的 Tensor 數(shù)據(jù)大小為T x B x *，其中 <cite>T</cite> 是最長(zhǎng)序列的長(zhǎng)度， <cite>B</cite> 是批處理大小。如果batch_first為 True，則數(shù)據(jù)將轉(zhuǎn)置為B x T x *格式。

批處理元素將按其長(zhǎng)度順序減小。

Note

total_length可用于在包裹在 DataParallel 中的 Module 中實(shí)現(xiàn)pack sequence -> recurrent network -> unpack sequence模式。有關(guān)詳細(xì)信息，請(qǐng)參見此常見問題解答部分。

Parameters

序列 (PackedSequence )–批量填充
batch_first (布爾 ， 可選）–如果為True，則輸出為B x T x *格式。
padding_value (python：float ， 可選）–填充元素的值。
total_length (python：int ，可選）–如果不是None，則輸出將被填充為長(zhǎng)度total_length 。如果total_length小于sequence中的最大序列長(zhǎng)度，則此方法將拋出ValueError。

Returns

張量元組包含填充的序列，張量包含批處理中每個(gè)序列的長(zhǎng)度列表。

pad_sequence

torch.nn.utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0)?

用padding_value填充可變長(zhǎng)度張量的列表

pad_sequence沿新維度堆疊張量列表，并將它們填充為相等的長(zhǎng)度。例如，如果輸入是大小為L x *的序列列表，并且 batch_first 為 False，否則為T x B x *。

<cite>B</cite> 是批處理大小。它等于sequences中的元素?cái)?shù)。 <cite>T</cite> 是最長(zhǎng)序列的長(zhǎng)度。 <cite>L</cite> 是序列的長(zhǎng)度。 <cite>*</cite> 是任意數(shù)量的尾隨尺寸，包括無。

Example

>>> from torch.nn.utils.rnn import pad_sequence
>>> a = torch.ones(25, 300)
>>> b = torch.ones(22, 300)
>>> c = torch.ones(15, 300)
>>> pad_sequence([a, b, c]).size()
torch.Size([25, 3, 300])

Note

此函數(shù)返回張量為T x B x *或B x T x *的張量，其中 <cite>T</cite> 是最長(zhǎng)序列的長(zhǎng)度。此函數(shù)假定序列中所有張量的尾隨尺寸和類型相同。

Parameters

序列(列表 [ tensor ] )–可變長(zhǎng)度序列的列表。
batch_first (布爾，可選）–如果為 True，則輸出為B x T x *，否則為T x B x *
padding_value (python：float ，可選）–填充元素的值。默認(rèn)值：0

Returns

如果batch_first為False，則大小為T x B x *的張量。否則大小為B x T x *的張量

pack_sequence

torch.nn.utils.rnn.pack_sequence(sequences, enforce_sorted=True)?

打包可變長(zhǎng)度張量的列表

sequences應(yīng)該是大小為L x *的張量的列表，其中 <cite>L</cite> 是序列的長(zhǎng)度， <cite>*</cite> 是任意數(shù)量的尾隨尺寸，包括零。

對(duì)于未排序的序列，請(qǐng)使用 <cite>force_sorted = False</cite> 。如果enforce_sorted為True，則序列應(yīng)按長(zhǎng)度減小的順序排序。 enforce_sorted = True僅對(duì)于 ONNX 導(dǎo)出是必需的。

Example

>>> from torch.nn.utils.rnn import pack_sequence
>>> a = torch.tensor([1,2,3])
>>> b = torch.tensor([4,5])
>>> c = torch.tensor([6])
>>> pack_sequence([a, b, c])
PackedSequence(data=tensor([ 1,  4,  6,  2,  5,  3]), batch_sizes=tensor([ 3,  2,  1]))

Parameters

序列(列表 [ tensor ] )–遞減序列的列表長(zhǎng)度。
強(qiáng)制排序的 (bool ， 可選）–如果True，則檢查輸入是否包含按長(zhǎng)度排序的降序序列。如果為False，則不檢查此條件。默認(rèn)值：True。

Returns

a PackedSequence object

展平

class torch.nn.Flatten(start_dim=1, end_dim=-1)?

將連續(xù)范圍的暗角展平為張量。用于Sequential。：param start_dim：首先變暗到變平(默認(rèn)= 1）。：param end_dim：最后變暗到變平(默認(rèn)= -1）。

Shape:

輸入：
輸出：(默認(rèn)情況下）。

Examples::
>>> m = nn.Sequential(
>>>     nn.Conv2d(1, 32, 5, 1, 1),
>>>     nn.Flatten()
>>> )

量化功能

量化是指用于執(zhí)行計(jì)算并以低于浮點(diǎn)精度的位寬存儲(chǔ)張量的技術(shù)。 PyTorch 支持每個(gè)張量和每個(gè)通道非對(duì)稱線性量化。要了解更多如何在 PyTorch 中使用量化函數(shù)的信息，請(qǐng)參閱[量化]文檔。

PyTorch torch.nn

參數(shù)

貨柜

模組

順序的

模塊列表

ModuleDict

參數(shù)表

ParameterDict

卷積層

轉(zhuǎn)換 1d

轉(zhuǎn)換 2d

轉(zhuǎn)換 3d

ConvTranspose1d

ConvTranspose2d

ConvTranspose3d

展開

折

匯聚層

MaxPool1d

MaxPool2d

MaxPool3d

MaxUnpool1d

MaxUnpool2d

MaxUnpool3d

平均池 1d

平均池 2d

平均池 3d

分?jǐn)?shù)最大池 2d

LPPool1d

LPPool2d

AdaptiveMaxPool1d

AdaptiveMaxPool2d

AdaptiveMaxPool3d

AdaptiveAvgPool1d

AdaptiveAvgPool2d

AdaptiveAvgPool3d

填充層

ReflectionPad1d

ReflectionPad2d

復(fù)制板 1d

復(fù)制板 2d

復(fù)制板 3d

ZeroPad2d

ConstantPad1d

ConstantPad2d

ConstantPad3d

非線性激活(加權(quán)和，非線性）

ELU

硬收縮

哈丹

漏尿

LogSigmoid

非線性激活(加權(quán)和，非線性）