本文最后更新于：2 年前

资料汇总

[[深度学习框架] PyTorch 常用代码段总结 _ 极市高质量视觉算法开发者社区 (2020-05-04 ).html](contents[深度学习框架] PyTorch 常用代码段总结 _ 极市高质量视觉算法开发者社区 (2020-05-04 ).html)
PyTorch的入门与实战（七月在线，褚老师）
- two_layer_neural_net.html
- 第五课CNN-Image-Classification.html
深度学习与PyTorch入门实战教程（人工智能101学院，龙良曲）
[PyTorch 模型训练实用教程.pdf](contents\PyTorch 模型训练实用教程.pdf) （很全面）

一些常识

构建tensor
- torch.rand(5,3)
  
  torch.zeros(5,3,dtype=torch.long) 与np不太一样
  或torch.zeros(5,3).long()
- 从已有数据构建tensor，会使用原有tensor的特征 x = torch.tensor([5.5,3])
  
  x.new_ones(5,3)
  
  torch.randn_like(x,dtype=torch.float)
x.shape 、x.size() 返回x的大小
==in-place运算==（以_结尾）,会直接改变变量
reshape一个tensor: x.view()
将一个元素的tensor变成数值,x.item()
tensor与array的相互转换

b = a.numpy() <–> a = torch.from_numpy(b) 此处a,b共享内存
加法：a = a+1 和 np.add(a,1,out=a)不同，前者分配了新内存空间，后者直接改变

CUDA

if torch.cuda.is_available():

device = torch.device("cuda")
y = torch.ones_like(x, device=device)
x = x.to(device) 
z=x+y
z.to("cpu", torch.double)

GPU上的tensor不能直接转numpy，y.to(“cpu”).data.numpy() 或y.cpu().data.numpy()
- 也可使用.cuda()

数据增强

[PyTorch 学习笔记（三）：transforms的二十二个方法_人工智能_TensorSense的博客-CSDN博客 (2020-05-06 ).html](contents\PyTorch 学习笔记（三）：transforms的二十二个方法_人工智能_TensorSense的博客-CSDN博客 (2020-05-06 ).html)

自定义数据集的读取

[PyTorch 学习笔记（一）：让PyTorch读取你的数据集 - 知乎 (2020-05-06 ).html](contents\PyTorch 学习笔记（一）：让PyTorch读取你的数据集 - 知乎 (2020-05-06 ).html)

[PyTorch 中自定义数据集的读取方法小结-PyTorch 中文网 (2020-05-06 ).html](contents\PyTorch 中自定义数据集的读取方法小结-PyTorch 中文网 (2020-05-06 ).html)

划分训练集和测试集

自定义Module类

[[深度学习框架] PyTorch 常用代码段总结 _ 极市高质量视觉算法开发者社区 (2020-05-04 ).html](contents[深度学习框架] PyTorch 常用代码段总结 _ 极市高质量视觉算法开发者社区 (2020-05-04 ).html)

two_layer_neural_net.html

Module类中的forward方法

模型的保存与加载

[Pytorch 保存模型与加载模型 - 知乎 (2020-05-06 ).html](contents\Pytorch 保存模型与加载模型 - 知乎 (2020-05-06 ).html)

parameters()和state_dict()的区别

pytorch 加载模型

仅加载重合部分
预训练模型有Module，目前的没有
目前模型比预训练模型多了一些层

权重初始化:

pytorch中的各种参数层（Linear、Conv2d、BatchNorm等）在__init__方法中定义后（==在__init__中添加self._initialize_weights() ，相当于类初始化时运行了该方法==），不需要手动初始化就可以直接使用，这是因为Pytorch对这些层都会进行默认初始化

基于Xavier algorithm的参数初始化代码：

def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()

优化器选择

[pytorch中使用torch.optim优化神经网络以及优化器的选择 - pytorch中文网 (2020-05-06 ).html](contents\pytorch中使用torch.optim优化神经网络以及优化器的选择 - pytorch中文网 (2020-05-06 ).html)

momentum，动量因子，计算本次梯度时结合上一次梯度的方向，相当于惯性，可以避免陷入局部最小值
Adam 优化器会自动计算momentum，因此设置时没有这个参数
正则化
- 能够降低网络复杂度，没有过拟合时不应该加，会降低网络的表达能力
- 二范数：在torch的优化器中设置weight_decay参数即可
- 一范数：torch中暂无API实现

学习率调整策略

[PyTorch 学习笔记（八）：PyTorch的六个学习率调整方法 - 知乎 (2020-05-06 ).html](contents\PyTorch 学习笔记（八）：PyTorch的六个学习率调整方法 - 知乎 (2020-05-06 ).html)

Pytorch提供了六种学习率调整方法，可分为三大类，分别是

有序调整；
自适应调整；
自定义调整。

第一类，依一定规律有序进行调整，这一类是最常用的，分别是等间隔下降(Step)，按需设定下降间隔(MultiStep)，指数下降(Exponential)和CosineAnnealing。这四种方法的调整时机都是人为可控的，也是训练时常用到的。

第二类，依训练状况伺机调整，这就是ReduceLROnPlateau方法。该法通过监测某一指标的变化情况，当该指标不再怎么变化的时候，就是调整学习率的时机，因而属于自适应的调整。

第三类，自定义调整，Lambda。Lambda方法提供的调整策略十分灵活，我们可以为不同的层设定不同的学习率调整方法，这在fine-tune中十分有用，我们不仅可为不同的层设定不同的学习率，还可以为其设定不同的学习率调整策略。

BN层的使用

BN 原理
实际使用时，BN层更新的是$\beta$和$\gamma$，而不是均值和方差

torch中BN层
- BN层的running_mean和running_var的更新是在forward()中进行的，想不更新需要net.eval()
- 最后一步也称之为仿射(affine)，引入这一步的目的主要是设计一个通道，使得输出output至少能够回到输入input的状态（当$\gamma=1,\beta=0$时）使得BN的引入至少不至于降低模型的表现，这是深度网络设计的一个套路
- 如果affine=False则γ=1,β=0\gamma=1,\beta=0γ=1,β=0，并且不能学习被更新。一般都会设置成affine=True
- track_running_stats=True表示跟踪整个训练过程中的batch的统计特性，得到方差和均值，而不只是仅仅依赖与当前输入的batch的统计特性。相反的，如果track_running_stats=False那么就只是计算当前输入的batch的统计特性中的均值和方差了。当在推理阶段的时候，如果track_running_stats=False，此时如果batch_size比较小，那么其统计特性就会和全局统计特性有着较大偏差，可能导致糟糕的效果。

torch.nn.BatchNorm1d(num_features,
eps=1e-05,
momentum=0.1,
affine=True,
track_running_stats=True)

权重初始化

def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()

固定随机数种子

https://www.jianshu.com/p/1b9e18146045

torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True

pytorch模型转换到onnx

https://blog.csdn.net/zxgmlcj/article/details/103279846

https://microsoft.github.io/onnxruntime/python/tutorial.html

可视化

tensorboad（待续）

visdom

安装：
- pip install visdom
- 或者从source安装，据说能避免一些莫名错误(进入visdom-master目录后运行pip install -e)
使用：
1. 命令行输入 python -m visdom.server
2. 按提示打开网页 http://localhost:8097
3. 添加下列代码
  - 一条曲线
  - 多条曲线
  - 图片和文本

visdom代码块

parser.add_argument('--visualize', default=True, help='whether need visdom')

# 可视化
    if args.visualize:
        viz = Visdom()
        viz.line([[0.0, 0.0]], [0.], win='train',
                 opts=dict(title='loc&conf loss', legend=['loc_loss', 'conf_loss']))
# 可视化
    if args.visualize:
        viz.line([[loc_loss, conf_loss]], [iteration], win='train', update='append')

参数统计

torchsummary统计参数

1
2
3

from torchsummary import summary
	summary(net, (3, 300, 300))

pytorch-opcounter 计算flops

from thop import profile
input = torch.randn(1, 3, 224, 224)
macs, params = profile(model, inputs=(input, ))

实战：手撕两层感知机

two_layer_neural_net.html

numpy实现

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

torch实现(见html)

torch + 优化器实现(见html)

自定义nn.Modules 实现

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

实战：简单的CNN网络（验证集）

第五课CNN-Image-Classification.html

定义训练集

def train(model, device, train_loader, optimizer, epoch, log_interval=100):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print("Train Epoch: {} [{}/{} ({:0f}%)]\tLoss: {:.6f}".format(
                epoch, batch_idx * len(data), len(train_loader.dataset), 
                100. * batch_idx / len(train_loader), loss.item()

定义验证集

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

训练+验证

torch.manual_seed(53113)

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
batch_size = test_batch_size = 32
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('./mnist_data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=test_batch_size, shuffle=True, **kwargs)


lr = 0.01
momentum = 0.5
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

epochs = 2
for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

save_model = True
if (save_model):
    torch.save(model.state_dict(),"mnist_cnn.pt")

==调用voc0712.py计算指标：==

1.准备好voc0712.py和voc_eval.py两个文件

2.将测试数据按照VOC数据集格式放好

— VOC2007 此处文件夹年份数字与对应下边

— Annotations 此处放所有标注文件

— ImageSets

—Main

— test.txt txt中格式为文件名(无后缀)+难例标识（0或1），如：base_0001 1

— train.txt

— JPEGImages 此处放所有图片文件

3.导入数据

1 2	`# load data testset = VOCDetection(VOCroot, [('2007', 'test')], None, AnnotationTransform())`

修改VOCroot为自己的路径，如VOCroot = '/home/usr/VOC/VOCdevkit'，该路径下含有步骤1所列的VOC2007文件夹。年份2007和txt文件名test改为自己的文件夹年份（意思是你也可以命名成VOC2010等，但要命名成VOC+数字，不然还需修改其他代码）和Main中的txt文件名。AnnotationTransform是voc0712.py中的一个类，用于读取voc格式的xml标注文件
VOCDetection是一个类，testset是该类的一个实例
==保证 VOCroot/annotations_cache/annots.pkl是正确的，不能确定就删掉，否则voc_eval读取时报错。==

4.执行检测

1	`testset.evaluate_detections(all_boxes, save_folder)`

evaluate_detections中包含两个子函数

def evaluate_detections(self, all_boxes, output_dir=None):
       """
       all_boxes is a list of length number-of-classes.
       Each list element is a list of length number-of-images.
       Each of those list elements is either an empty list []
       or a numpy array of detection.

       all_boxes[class][image] = [] or np.array of shape #dets x 5
       """
       self._write_voc_results_file(all_boxes)
       self._do_python_eval(output_dir)

先看_write_voc_results_file方法
- VOC_CLASSES 写在voc0712.py的开头，定义了要计算的类别
  
  `VOC_CLASSES = (‘background‘, # always index 0
```
         'person')`
```
- _get_voc_results_file_template 方法中定义了==记录结果==的txt文件路径filename，filename中的{:s}是会被赋给类别名的。self.root即步骤1中的VOCroot，这里filedir体现了为什么文件夹要命名成VOC+数字
- self.ids 在类初始化时定义，内容为[(VOC2007文件夹路径，测试集图片名),()…]。其中image_sets即步骤2中定义的[(‘2007’,’test’)]，读取了test.txt中的内容，也说明test.txt中内容应是测试集图片+空格+难例标识。这里rootpath体现了为什么文件夹要命名成VOC+数字。
- index = index[1] 即VOC2007文件夹路径，im_ind即测试图片的序号。同时验证了步骤2中evaluate_detections 的方法说明，输入的参数all_boxes格式是all_boxes[class][image] = [] or np.array of shape #dets x 5，该参数由其他检测代码段中得出，顺序与图片顺序需对应。最后写入==记录结果==的txt中，格式为 图片置信度 xmin ymin xmax ymax

def _write_voc_results_file(self, all_boxes):
        for cls_ind, cls in enumerate(VOC_CLASSES):
            cls_ind = cls_ind
            if cls == '__background__':
                continue
            print('Writing {} VOC results file'.format(cls))
            filename = self._get_voc_results_file_template().format(cls)
            with open(filename, 'wt') as f:
                for im_ind, index in enumerate(self.ids):
                    index = index[1]
                    dets = all_boxes[cls_ind][im_ind]
                    if dets == []:
                        continue
                    for k in range(dets.shape[0]):
                        f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
                                format(index, dets[k, -1],
                                       dets[k, 0] + 1, dets[k, 1] + 1,
                                       dets[k, 2] + 1, dets[k, 3] + 1))

然后是_do_python_eval方法
- 这里rootpath体现了为什么文件夹要命名成VOC+数字
- self.image_set 为步骤2中的第二个参数 [(‘2007’,’test’)]
- 文件夹的年份在2010之前和之后会使用两种ap计算方法
- filename = ==记录结果==的txt文件
- voc_eval 计算了recall,prec,ap，并将每一类的结果保存成pkl文件，保存路径output_dir是输入参数
- 将ap保存成列表，计算map并打印出来
- 定义了一个annotations_cache文件，并传入voc_eval

def _do_python_eval(self, output_dir='output'):
       rootpath = os.path.join(self.root, 'VOC' + self._year)
       name = self.image_set[0][1]
       annopath = os.path.join(
           rootpath,
           'Annotations',
           '{:s}.xml')
       imagesetfile = os.path.join(
           rootpath,
           'ImageSets',
           'Main',
           name + '.txt')
       cachedir = os.path.join(self.root, 'annotations_cache')
       aps = []
       # The PASCAL VOC metric changed in 2010
       use_07_metric = True if int(self._year) < 2010 else False
       print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
       if output_dir is not None and not os.path.isdir(output_dir):
           os.mkdir(output_dir)
       for i, cls in enumerate(VOC_CLASSES):

           if cls == '__background__':
               continue

           filename = self._get_voc_results_file_template().format(cls)
           rec, prec, ap = voc_eval(
               filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
               use_07_metric=use_07_metric)
           aps += [ap]
           print('AP for {} = {:.4f}'.format(cls, ap))
           if output_dir is not None:
               with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f:
                   pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
       print('Mean AP = {:.4f}'.format(np.mean(aps)))
       print('~~~~~~~~')
       print('Results:')
       for ap in aps:
           print('{:.3f}'.format(ap))
       print('{:.3f}'.format(np.mean(aps)))
       print('~~~~~~~~')

5.其他实现细节

voc_eval函数 voc_eval.py文件内容

输入参数
- detpath ：==保存结果==的txt，每一类都是单独的一个txt，同理计算每一类都会调用voc_eval函数
- annopath = os.path.join(rootpath,’Annotation’,’{:s}.xml’)，缺省文件名的xml路径
- imagesetfile：Main中的test.txt
会加载cachedir = os.path.join(self.root, ‘annotations_cache’) 中的annots.pkl文件。如无此文件，则按test.txt中的图片顺序读取xml文件，其中parse_rec的作用是解析xml文件,返回含有多个字典的列表。然后再以test.txt中的测试图片名为key存在recs字典中。

按test.txt中的图片顺序读取recs中每张图片的objects列表，取出其中所有’name’是当前类别的字典，存在R中。取出R中所有字典的‘bbox’的键值存在bbox中，取出R中所有字典的‘difficult’的键值存在difficult中。npos变量统计所有图片中非difficult的object个数。det是一个长为len(objects)的布尔列表，值全为false。将bbox,difficult,det按以图片名为key转存在class_recs字典中。

相当于把读取annots.pkl文件（或者直接解析xml文件）的结果存在recs字典中，在提取当前类别的结果存在class_recs中，都是按一张图片一张图片存的
读取==保存结果==的txt文件，image_ids存图片名，confindence存置信度，BB存目标框，都是np.array，顺序与txt一致。三个array均按照confidence排序
按排序好的顺序依次读取预测框，用R承接class_recs字典中对应图片的键值，将该图片中的所有标注框’bbox’存在BBGT中。和相应图片中的所有标注框做iou，计算TP和FP，’det’中的布尔列表在这里用于标志相应序号的标注框是否已经被匹配。fp,tp在相应预测框序号处置1，再用np.cumsum()可实现按置信度顺序得到累加的fp和tp。

np.finfo(np.float64).eps 的意思是取得float64类型中非负的最小值防止除0报错

def voc_eval(detpath,
             annopath,
             imagesetfile,
             classname,
             cachedir,
             ovthresh=0.5,
             use_07_metric=False):
    """rec, prec, ap = voc_eval(detpath,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])

    Top level function that does the PASCAL VOC evaluation.

    detpath: Path to detections
        detpath.format(classname) should produce the detection results file.
    annopath: Path to annotations
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in detpath.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt
    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, 'annots.pkl')
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    #imagenames = [x.strip() for x in lines]
    imagenames = []
    for line in lines:
        img_id, value = line.split()
        if value != '1':
            continue
        imagenames.append(img_id)

    if not os.path.isfile(cachefile):
        # load annots
        recs = {}
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            if i % 100 == 0:
                print('Reading annotation for {:d}/{:d}'.format(
                    i + 1, len(imagenames)))
        # save
        print('Saving cached annotations to {:s}'.format(cachefile))
        with open(cachefile, 'wb') as f:
            pickle.dump(recs, f)
    else:
        # load
        with open(cachefile, 'rb') as f:
            recs = pickle.load(f)

    # extract gt objects for this class
    class_recs = {}
    npos = 0
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]
        bbox = np.array([x['bbox'] for x in R])
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
        det = [False] * len(R)
        npos = npos + sum(~difficult)
        class_recs[imagename] = {'bbox': bbox,
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]
    confidence = np.array([float(x[1]) for x in splitlines])
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])

        # sort by confidence
    sorted_ind = np.argsort(-confidence)
    sorted_scores = np.sort(-confidence)
    BB = BB[sorted_ind, :]
    image_ids = [image_ids[x] for x in sorted_ind]

        # go down dets and mark TPs and FPs
    nd = len(image_ids)
    tp = np.zeros(nd)
    fp = np.zeros(nd)
    for d in range(nd):
        R = class_recs[image_ids[d]]
        bb = BB[d, :].astype(float)
        ovmax = -np.inf
        BBGT = R['bbox'].astype(float)

        if BBGT.size > 0:
            # compute overlaps
            # intersection
            ixmin = np.maximum(BBGT[:, 0], bb[0])
            iymin = np.maximum(BBGT[:, 1], bb[1])
            ixmax = np.minimum(BBGT[:, 2], bb[2])
            iymax = np.minimum(BBGT[:, 3], bb[3])
            iw = np.maximum(ixmax - ixmin + 1., 0.)
            ih = np.maximum(iymax - iymin + 1., 0.)
            inters = iw * ih

                # union
            uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                   (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                   (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

            overlaps = inters / uni
            ovmax = np.max(overlaps)
            jmax = np.argmax(overlaps)

        if ovmax > ovthresh:
            if not R['difficult'][jmax]:
                if not R['det'][jmax]:
                    tp[d] = 1.
                    R['det'][jmax] = 1
                else:
                    fp[d] = 1.
        else:
            fp[d] = 1.

        # compute precision recall
    fp = np.cumsum(fp)
    tp = np.cumsum(tp)
    rec = tp / float(npos)
        # avoid divide by zero in case the first detection matches a difficult
        # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
    ap = voc_ap(rec, prec, use_07_metric)

    return rec, prec, ap

voc_ap函数 voc_eval.py文件内容

输入按置信度顺序累加的 rec和prec，只看07年之后的计算方法，可以发现是类似积分的方法，先计算出rec变化的位置，其实就是rec的步长，然后乘以prec

# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))

# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
    mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])

all_boxes的格式

all_boxes[class][image] = [] or np.array of shape #dets x 5

num_images = len(testset)
num_classes = 2
all_boxes = [[[] for _ in range(num_images)]
                 for _ in range(num_classes)]
for i in tqdm(range(num_images)):
    img = testset.pull_image(i)
    # 省略检测过程...

    for j in range(1, num_classes):
        inds = np.where(scores[:, j] > thresh)[0]
        if len(inds) == 0:
            all_boxes[j][i] = np.empty([0, 5], dtype=np.float32)
            continue
        #省略NMS与维度变换...
        all_boxes[j][i] = c_dets

==踩坑==

BN层参数问题

torch0.4.1后，BN层3个参数，running_mean,running_var,num_batches_tracked

如果是模型参数（Orderdict格式，很容易修改）里少了num_batches_tracked变量，就加上去，如果是多了就删掉。偷懒的做法是将load_state_dict的strict参数置为False，如下所示：

1	`load_state_dict(torch.load(weight_path), strict=False)`

torch中 .modules()与.children()的区别

children()与modules()都是返回网络模型里的组成元素，但是children()返回的是最外层的元素，modules()返回的是所有的元素，包括不同级别的子元素。

modules()中重复的modules 只返回一次，是模块级的而不是torch.nn里基础的层

torchvision.transforms.ToTensor():

‘’’ Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]. ‘’’

onehot

在PyTorch中使用交叉熵损失函数的时候会自动把label转化成onehot，所以不用手动转化，而使用MSE需要手动转化成onehot编码。

nn类与F方法的区别

nn.CrossEntropyLoss()(output, target) 类要先加（）再加参数

F.nll_loss(output, target) 方法不需要

Dataloader长度

len(Dataloader.dataset) ==总的图片个数== ， len(Dataloader) ==batch的个数==

只安装torchvision

pip install --no-deps torchvision==0.4.0

深度学习

#经验

从pytorch模型到ava节点上一篇

pytorch资料汇总