Convert Pretrained Models

This document provides a brief intro of how to convert the pretrained model into the format of detrex.

Convert TorchVision Pretrained ResNet Models

To use the detectron2 provided pretrained weights, please refer to ImageNet Pretrained Models. Here we’ve noticed that detectron2 only provided a converted torchvision ResNet-50 model. For more pretrained models like ResNet{101, 152}. You can use the detectron2 provided conversion script to convert the torchvision pretrained weights into the format that can be used in detrex. Here’s the detailed tutorial about the usage the conversion script.

Download Pretrained Weights

Torchvision 0.11.0 was released packed with better pretrained weights on numerous models including ResNet. More details can be found in How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives, here we collected the download scripts for TorchVision ResNet models.

Name Download Pretrain Acc@1 Acc@5
ResNet-50 (ImageNet1k-V1)
 wget -O r50_v1.pth
IN1k 76.130 92.862
ResNet-50 (ImageNet1k-V2)
 wget -O r50_v2.pth
IN1k 80.858 95.434
ResNet-101 (ImageNet1k-V1)
 wget -O r101_v1.pth
IN1k 77.374 93.546
ResNet-101 (ImageNet1k-V2)
 wget -O r101_v2.pth
IN1k 81.886 95.780
ResNet-152 (ImageNet1k-V1)
 wget -O r152_v1.pth
IN1k 78.312 94.046
ResNet-152 (ImageNet1k-V2)
 wget -O r152_v2.pth
IN1k 82.284 96.002

Note: ImageNet1k-V1 means the old pretrained weights. ImageNet1k-V2 means the improved baseline results.

Run the Conversion

convert-torchvision-to-d2 (borrowed from detectron2)
#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates.

import pickle as pkl
import sys
import torch

  # download one of the ResNet{18,34,50,101,152} models from torchvision:
  wget -O r50.pth
  # run the conversion
  ./ r50.pth r50.pkl
  # Then, use r50.pkl with the following changes in config:

  WEIGHTS: "/path/to/r50.pkl"
  PIXEL_MEAN: [123.675, 116.280, 103.530]
  PIXEL_STD: [58.395, 57.120, 57.375]
    DEPTH: 50
    STRIDE_IN_1X1: False
  These models typically produce slightly worse results than the
  pre-trained ResNets we use in official configs, which are the
  original ResNet models released by MSRA.

if __name__ == "__main__":
    input = sys.argv[1]

    obj = torch.load(input, map_location="cpu")

    newmodel = {}
    for k in list(obj.keys()):
        old_k = k
        if "layer" not in k:
            k = "stem." + k
        for t in [1, 2, 3, 4]:
            k = k.replace("layer{}".format(t), "res{}".format(t + 1))
        for t in [1, 2, 3]:
            k = k.replace("bn{}".format(t), "conv{}.norm".format(t))
        k = k.replace("downsample.0", "shortcut")
        k = k.replace("downsample.1", "shortcut.norm")
        print(old_k, "->", k)
        newmodel[k] = obj.pop(old_k).detach().numpy()

    res = {"model": newmodel, "__author__": "torchvision", "matching_heuristics": True}

    with open(sys.argv[2], "wb") as f:
        pkl.dump(res, f)
    if obj:
        print("Unconverted keys:", obj.keys())

Firstly, create and copy the relative code mentioned above, then run:

python \
    /path/to/r101_v1.pth \  # path to the downloaded pretrained weights
    ./r101_v1.pkl  # where to save the converted weights

Then, change the training configs:

# your own

train.init_checkpoint = "path/to/r101_v1.pkl"

# make sure that the model config is consistent 
# with the following settings
model.backbone.stages.depth = 101
model.pixel_mean = [123.675, 116.280, 103.530]
model.pixel_std = [58.395, 57.120, 57.375]

Convert DETRs Pretrained Models

We also provides converters for a partial of projects in detrex. These conversions are modified from the detr-d2 conversion script to convert models trained by the original repo into the format of detrex models.

All these converters can be runned as:

python --source_model /path/to/pretrained_weight.pth --output_model converted_model.pth