Download Pretrained Backbone Weights

Here we collect the links of the backbone models which makes it easier for users to download pretrained weights for the builtin backbones. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area.

ResNet

We’ve already provided the tutorials of using torchvision pretrained ResNet models here: Download TorchVision ResNet Models.

Swin-Transformer

Here we borrowed the download links from the official implementation of Swin-Transformer.

Swin-Tiny

Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Tiny ImageNet-1K 224x224 81.2 95.5 - download
Swin-Tiny ImageNet-22K 224x224 80.9 96.0 download download
Using Swin-Tiny Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=96,
    depths=(2, 2, 6, 2),
    num_heads=(3, 6, 12, 24),
    drop_path_rate=0.1,
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth"

Swin-Small

Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Small ImageNet-1K 224x224 83.2 96.2 - download
Swin-Small ImageNet-22K 224x224 83.2 97.0 download download
Using Swin-Small Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=96,
    depths=(2, 2, 18, 2),
    num_heads=(3, 6, 12, 24),
    drop_path_rate=0.2,
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth"

Swin-Base

Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Base ImageNet-1K 224x224 83.5 96.5 - download
Swin-Base ImageNet-1K 384x384 84.5 97.0 - download
Swin-Base ImageNet-22K 224x224 85.2 97.5 download download
Swin-Base ImageNet-22K 384x384 86.4 98.0 download download
Using Swin-Base-224 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=128,
    depths=(2, 2, 18, 2),
    num_heads=(4, 8, 16, 32),
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth"
Using Swin-Base-384 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=384,
    embed_dim=128,
    depths=(2, 2, 18, 2),
    num_heads=(4, 8, 16, 32),
    window_size=12,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth"

Swin-Large

Name Pretrain Resolution Acc@1 Acc@5 22K Model 1K Model
Swin-Large ImageNet-22K 224x224 86.3 97.9 download download
Swin-Large ImageNet-22K 384x384 87.3 98.2 download download
Using Swin-Large-224 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=192,
    depths=(2, 2, 18, 2),
    num_heads=(6, 12, 24, 48),
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth"
Using Swin-Large-384 Backbone in Config
from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=384,
    embed_dim=192,
    depths=(2, 2, 18, 2),
    num_heads=(6, 12, 24, 48),
    window_size=12,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth"

ViTDet

Here we borrowed the download links from the official implementation of MAE.

ViT-Base ViT-Large ViT-Huge
Pretrained Checkpoint download download download
Using ViTDet Backbone in Config
import torch.nn as nn
from detectron2.config import LazyCall as L
from detectron2.layers import ShapeSpec
from detectron2.modeling import ViT, SimpleFeaturePyramid
from detectron2.modeling.backbone.fpn import LastLevelMaxPool

from .dino_r50 import model


# ViT Base Hyper-params
embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1

# Creates Simple Feature Pyramid from ViT backbone
model.backbone = L(SimpleFeaturePyramid)(
    net=L(ViT)(  # Single-scale ViT backbone
        img_size=1024,
        patch_size=16,
        embed_dim=embed_dim,
        depth=depth,
        num_heads=num_heads,
        drop_path_rate=dp,
        window_size=14,
        mlp_ratio=4,
        qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, eps=1e-6),
        window_block_indexes=[
            # 2, 5, 8 11 for global attention
            0,
            1,
            3,
            4,
            6,
            7,
            9,
            10,
        ],
        residual_block_indexes=[],
        use_rel_pos=True,
        out_feature="last_feat",
    ),
    in_feature="${.net.out_feature}",
    out_channels=256,
    scale_factors=(2.0, 1.0, 0.5),  # (4.0, 2.0, 1.0, 0.5) in ViTDet
    top_block=L(LastLevelMaxPool)(),
    norm="LN",
    square_pad=1024,
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth"

Please refer to DINO project for more details about the usage of vit backbone.

FocalNet

Here we borrowed the download links from the official implementation of FocalNet.

Model Depth Dim Kernels #Params. (M) Download
FocalNet-L [2, 2, 18, 2] 192 [5, 7, 9] 207 download
FocalNet-L [2, 2, 18, 2] 192 [3, 5, 7, 9] 207 download
FocalNet-XL [2, 2, 18, 2] 256 [5, 7, 9] 366 download
FocalNet-XL [2, 2, 18, 2] 256 [3, 5, 7, 9] 207 download
FocalNet-H [2, 2, 18, 2] 352 [5, 7, 9] 687 download
FocalNet-H [2, 2, 18, 2] 352 [3, 5, 7, 9] 687 download
Using FocalNet Backbone in Config
# focalnet-large-4scale baseline
model.backbone = L(FocalNet)(
    embed_dim=192,
    depths=(2, 2, 18, 2),
    focal_levels=(3, 3, 3, 3),
    focal_windows=(5, 5, 5, 5),
    use_conv_embed=True,
    use_postln=True,
    use_postln_in_modulation=False,
    use_layerscale=True,
    normalize_modulator=False,
    out_indices=(1, 2, 3),
)