MindSpore反向传播配置关键字参数

DechinPhy

发布于 2024-05-10 07:56:40

810

发布于 2024-05-10 07:56:40

技术背景

在MindSpore深度学习框架中，我们可以向construct函数传输必备参数或者关键字参数，这跟普通的Python函数没有什么区别。但是对于MindSpore中的自定义反向传播bprop函数，因为标准化格式决定了最后的两位函数输入必须是必备参数out和dout用于接收函数值和导数值。那么对于一个自定义的反向传播函数而言，我们有可能要传入多个参数。例如这样的一个案例：

import mindspore as ms
from mindspore import nn, Tensor, value_and_grad
from mindspore import numpy as msnp

class Net(nn.Cell):
    def bprop(self, x, y=1, out, dout):
        return msnp.cos(x) + y
    def construct(self, x, y=1):
        return msnp.sin(x) + y

x = Tensor([3.14], ms.float32)
net = Net()
print (net(x, y=1), value_and_grad(net)(x, y=0))

但是因为在Python的函数传参规则下，必备参数必须放在关键字参数之前，也就是out和dout这两个参数要放在前面，否则就会出现这样的报错：

  File "test_rand.py", line 53
    def bprop(self, x, y=1, out, dout):
             ^
SyntaxError: non-default argument follows default argument

按照普通Python函数的传参规则，我们可以把y这个关键字参数的放到最后面去：

import mindspore as ms
from mindspore import nn, Tensor, value_and_grad
from mindspore import numpy as msnp

class Net(nn.Cell):
    def bprop(self, x, out, dout, y=1):
        return msnp.cos(x) + y
    def construct(self, x, y=1):
        return msnp.sin(x) + y

x = Tensor([3.14], ms.float32)
net = Net()
print (net(x, y=1), value_and_grad(net)(x, y=0))

经过这一番调整之后，我们发现没有报错了，可以正常输出结果，但是这个结果似乎不太正常：

[1.0015925] (Tensor(shape=[1], dtype=Float32, value= [ 1.59254798e-03]), Tensor(shape=[1], dtype=Float32, value= [ 1.25169754e-06]))

因为这里x传入了一个近似的

\pi

，所以在construct函数计算函数值时，得到的结果应该是

\sin(\pi)+y

，那么这里面

取

和

所得到的结果都是对的。但是关键问题在反向传播函数的计算，原本应该是

\cos(\pi)+y=y-1

，但是在这里输入的

y=0

，而导数的计算结果却是

而不是正确结果

-1

。这就说明，在MindSpore的自定义反向传播函数中，并不支持传入关键字参数。

解决方案

刚好前面写了一篇关于PyTorch的文章，这篇文章中提到的两个Issue就针对此类问题。受到这两个Issue的启发，我们在MindSpore中如果需要自定义反向传播函数，可以这么写：

import mindspore as ms
from mindspore import nn, Tensor, value_and_grad
from mindspore import numpy as msnp

class Net(nn.Cell):
    def bprop(self, x, y, out, dout):
        return msnp.cos(x) + y if y is not None else msnp.cos(x)
    def construct(self, x, y=1):
        return msnp.sin(x) + y

x = Tensor([3.14], ms.float32)
net = Net()
print (net(x, y=1), value_and_grad(net)(x, y=0))

简单来说就是，把原本要传给bprop的关键字参数，转换成必备参数的方式进行传入，然后做一个条件判断：当给定了该输入的时候，执行计算一，如果不给定参数值，或者给一个None，执行计算二。上述代码的执行结果如下所示：

[1.0015925] (Tensor(shape=[1], dtype=Float32, value= [ 1.59254798e-03]), Tensor(shape=[1], dtype=Float32, value= [-9.99998748e-01]))

这里输出的结果都是正确的。

当然，这里因为我们其实是强行把关键字参数按照顺序变成了必备参数进行输入，所以在顺序上一定要严格遵守bprop所定义的必备参数的顺序，否则计算结果也会出错：

import mindspore as ms
from mindspore import nn, Tensor, value_and_grad
from mindspore import numpy as msnp

class Net(nn.Cell):
    def bprop(self, x, w, y, out, dout):
        return w*msnp.cos(x) + y if y is not None else msnp.cos(x)
    def construct(self, x, w=1, y=1):
        return msnp.sin(x) + y

x = Tensor([3.14], ms.float32)
net = Net()
print (net(x, y=1), value_and_grad(net)(x, y=0, w=2))

输出的结果为：

[1.0015925] (Tensor(shape=[1], dtype=Float32, value= [ 1.59254798e-03]), Tensor(shape=[1], dtype=Float32, value= [ 2.00000000e+00]))

那么很显然，这个结果就是因为在执行函数时给定的关键字参数跟必备参数顺序不一致，所以才出错的。

另外还有一个缺陷是，如果我们在传参给bprop的时候传递了一个None参数，那么不会使用construct函数中的缺省值，这需要我们自己手动设定了：

import mindspore as ms
from mindspore import nn, Tensor, value_and_grad
from mindspore import numpy as msnp

class Net(nn.Cell):
    def bprop(self, x, w, y, out, dout):
        return w*msnp.cos(x) + y if y is not None else w*msnp.cos(x)
    def construct(self, x, w=1, y=1):
        return msnp.sin(x) + y if y is not None else msnp.sin(x)

x = Tensor([3.14], ms.float32)
net = Net()
print (net(x, y=1), value_and_grad(net)(x, w=2, y=None))

输出结果为：

[1.0015925] (Tensor(shape=[1], dtype=Float32, value= [ 1.59254798e-03]), Tensor(shape=[1], dtype=Float32, value= [-1.99999750e+00]))

这个结果意味着，在bprop函数中执行时

参数的值是None而不是construct函数中的缺省值

。但是就目前来说，只有这一个方法可以允许我们向bprop函数传递关键字参数。

总结概要

继上一篇文章从Torch的两个Issue中找到一些类似的问题之后，可以发现深度学习框架对于自定义反向传播函数中的传参还是比较依赖于必备参数，而不是关键字参数，MindSpore深度学习框架也是如此。但是我们可以使用一些临时的解决方案，对此问题进行一定程度上的规避，只要能够自定义的传参顺序传入关键字参数即可。