Download Pretrained Backbone Weights

Here we collect the links of the backbone models which makes it easier for users to download pretrained weights for the builtin backbones. And this document will be kept updated. Most included models are borrowed from their original sources. Many thanks for their nicely work in the backbone area.

ResNet

We’ve already provided the tutorials of using torchvision pretrained ResNet models here: Download TorchVision ResNet Models.

Swin-Transformer

Here we borrowed the download links from the official implementation of Swin-Transformer.

Swin-Tiny

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Tiny	ImageNet-1K	224x224	81.2	95.5	-	download
Swin-Tiny	ImageNet-22K	224x224	80.9	96.0	download	download

Using Swin-Tiny Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=96,
    depths=(2, 2, 6, 2),
    num_heads=(3, 6, 12, 24),
    drop_path_rate=0.1,
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_tiny_patch4_window7_224_22kto1k_finetune.pth"

Swin-Small

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Small	ImageNet-1K	224x224	83.2	96.2	-	download
Swin-Small	ImageNet-22K	224x224	83.2	97.0	download	download

Using Swin-Small Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=96,
    depths=(2, 2, 18, 2),
    num_heads=(3, 6, 12, 24),
    drop_path_rate=0.2,
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_small_patch4_window7_224_22kto1k_finetune.pth"

Swin-Base

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Base	ImageNet-1K	224x224	83.5	96.5	-	download
Swin-Base	ImageNet-1K	384x384	84.5	97.0	-	download
Swin-Base	ImageNet-22K	224x224	85.2	97.5	download	download
Swin-Base	ImageNet-22K	384x384	86.4	98.0	download	download

Using Swin-Base-224 Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=128,
    depths=(2, 2, 18, 2),
    num_heads=(4, 8, 16, 32),
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window7_224.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window7_224_22kto1k.pth"

Using Swin-Base-384 Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=384,
    embed_dim=128,
    depths=(2, 2, 18, 2),
    num_heads=(4, 8, 16, 32),
    window_size=12,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
# train.init_checkpoint = "/path/to/swin_base_patch4_window12_384.pth"
train.init_checkpoint = "/path/to/swin_base_patch4_window12_384_22kto1k.pth"

Swin-Large

Name	Pretrain	Resolution	Acc@1	Acc@5	22K Model	1K Model
Swin-Large	ImageNet-22K	224x224	86.3	97.9	download	download
Swin-Large	ImageNet-22K	384x384	87.3	98.2	download	download

Using Swin-Large-224 Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=224,
    embed_dim=192,
    depths=(2, 2, 18, 2),
    num_heads=(6, 12, 24, 48),
    window_size=7,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window7_224_22kto1k.pth"

Using Swin-Large-384 Backbone in Config

from detectron2.config import LazyCall as L
from detectron2.modeling.backbone import SwinTransformer

# modify backbone config
model.backbone = L(SwinTransformer)(
    pretrain_img_size=384,
    embed_dim=192,
    depths=(2, 2, 18, 2),
    num_heads=(6, 12, 24, 48),
    window_size=12,
    out_indices=(1, 2, 3),
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/swin_large_patch4_window12_384_22kto1k.pth"

ViTDet

Here we borrowed the download links from the official implementation of MAE.

	ViT-Base	ViT-Large	ViT-Huge
Pretrained Checkpoint	download	download	download

Using ViTDet Backbone in Config

import torch.nn as nn
from detectron2.config import LazyCall as L
from detectron2.layers import ShapeSpec
from detectron2.modeling import ViT, SimpleFeaturePyramid
from detectron2.modeling.backbone.fpn import LastLevelMaxPool

from .dino_r50 import model


# ViT Base Hyper-params
embed_dim, depth, num_heads, dp = 768, 12, 12, 0.1

# Creates Simple Feature Pyramid from ViT backbone
model.backbone = L(SimpleFeaturePyramid)(
    net=L(ViT)(  # Single-scale ViT backbone
        img_size=1024,
        patch_size=16,
        embed_dim=embed_dim,
        depth=depth,
        num_heads=num_heads,
        drop_path_rate=dp,
        window_size=14,
        mlp_ratio=4,
        qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, eps=1e-6),
        window_block_indexes=[
            # 2, 5, 8 11 for global attention
            0,
            1,
            3,
            4,
            6,
            7,
            9,
            10,
        ],
        residual_block_indexes=[],
        use_rel_pos=True,
        out_feature="last_feat",
    ),
    in_feature="${.net.out_feature}",
    out_channels=256,
    scale_factors=(2.0, 1.0, 0.5),  # (4.0, 2.0, 1.0, 0.5) in ViTDet
    top_block=L(LastLevelMaxPool)(),
    norm="LN",
    square_pad=1024,
)

# setup init checkpoint path
train.init_checkpoint = "/path/to/mae_pretrain_vit_base.pth"

Please refer to DINO project for more details about the usage of vit backbone.

FocalNet

Here we borrowed the download links from the official implementation of FocalNet.

Model	Depth	Dim	Kernels	#Params. (M)	Download
FocalNet-L	[2, 2, 18, 2]	192	[5, 7, 9]	207	download
FocalNet-L	[2, 2, 18, 2]	192	[3, 5, 7, 9]	207	download
FocalNet-XL	[2, 2, 18, 2]	256	[5, 7, 9]	366	download
FocalNet-XL	[2, 2, 18, 2]	256	[3, 5, 7, 9]	207	download
FocalNet-H	[2, 2, 18, 2]	352	[5, 7, 9]	687	download
FocalNet-H	[2, 2, 18, 2]	352	[3, 5, 7, 9]	687	download

Using FocalNet Backbone in Config

# focalnet-large-4scale baseline
model.backbone = L(FocalNet)(
    embed_dim=192,
    depths=(2, 2, 18, 2),
    focal_levels=(3, 3, 3, 3),
    focal_windows=(5, 5, 5, 5),
    use_conv_embed=True,
    use_postln=True,
    use_postln_in_modulation=False,
    use_layerscale=True,
    normalize_modulator=False,
    out_indices=(1, 2, 3),
)