Model Zoo

Common Settings

  • All COCO models were trained on coco_2017_train and evaluated on coco_2017_val.

  • All models were trained using distributed training.

  • Most models were trained with 50 epochs settings (~51 COCO epochs) with multi-step LR scheduler which is the common setting in DETR-like methods.

COCO Object Detection Baselines

Here we provides our pretrained baselines with detrex. And more pretrained weights will be released in the future version. We also provide our converted pretrained weights for the users which will be marked as (converted).

DETR

Name Backbone Pretrained Epochs box
AP
Download
DETR-R50 (converted) R-50 IN1k 500 42.0 model
DETR-R50-DC5 (converted) R-50 IN1k 500 43.4 model
DETR-R101 (converted) R-101 IN1k 500 43.5 model
DETR-R101-DC5 (converted) R-101 IN1k 500 44.9 model

Deformable-DETR

Name Backbone Pretrained Epochs box
AP
Download
Deformable-DETR + Box Refinement R50 IN1k 50 47.0 model
Deformable-DETR + Box Refinement + Two Stage R50 IN1k 50 48.2 model

Anchor-DETR

Name Backbone Pretrain Epochs box
AP
download
Anchor-DETR-R50 R-50 IN1k 50 41.9 model
Anchor-DETR-R50 (converted) R-50 IN1k 50 42.2 model
Anchor-DETR-R50-DC5 (converted) R-50 IN1k 50 44.2 model
Anchor-DETR-R101 (converted) R-101 IN1k 50 43.5 model
Anchor-DETR-R101-DC5 (converted) R-101 IN1k 50 45.1 model

Conditional-DETR

Name Backbone Pretrain Epochs box
AP
download
Conditional-DETR-R50 R-50 IN1k 50 41.6 model
Conditional-DETR-R50-DC5 (converted) R-50-DC5 IN1k 50 43.8 model
Conditional-DETR-R101 (converted) R-101 IN1k 50 43.0 model
Conditional-DETR-R101-DC5 (converted) R-101-DC5 IN1k 50 45.1 model

DAB-DETR

Name Backbone Pretrained Epochs box
AP
Download
DAB-DETR-R50 R50 IN1k 50 43.3 model
DAB-DETR-R50-3patterns (converted) R-50 IN1k 50 42.8 model
DAB-DETR-R50-DC5 (converted) R-50 IN1k 50 44.6 model
DAB-DETR-R50-DC5-3patterns (converted) R-50 IN1k 50 45.7 model
DAB-DETR-R101 R101 IN1k 50 44.0 model
DAB-DETR-R101-DC5 (converted) R-101 IN1k 50 45.7 model
DAB-DETR-Swin-T Swin-Tiny-224 IN1k 50 45.2 model
DAB-Deformable-DETR-R50 R50 IN1k 50 49.0 model
DAB-Deformable-DETR-R50-Two-Stage R50 IN1k 50 49.7 model

DN-DETR

Name Backbone Pretrained Epochs box
AP
Download
DN-DETR-R50 R50 IN1k 50 44.7 model
DN-DETR-R50-DC5 (converted) R50 IN1k 50 46.3 model

DINO

Pretrained DINO with ResNet Backbone

Name Backbone Pretrained Epochs Denoising Queries box
AP
Download
DINO-R50-4scale R50 IN1k 12 100 49.2 model
DINO-R50-4scale (hacked trainer) R-50 IN1k 12 100 49.4 model
DINO-R50-4scale with EMA R-50 IN1k 12 100 49.4 model
DINO-R50-5scale R50 IN1k 12 100 49.6 model
DINO-R50-4scale R50 IN1k 12 300 49.5 model
DINO-R50-4scale R50 IN1k 24 100 50.6 model
DINO-R101-4scale R101 IN1k 12 100 50.0 model

Pretrained DINO with Swin-Transformer Backbone

Name Backbone Pretrained Epochs Denoising Queries box
AP
Download
DINO-Swin-T-224-4scale Swin-Tiny-224 IN1k 12 100 51.3 model
DINO-Swin-T-224-4scale Swin-Tiny-224 IN22k to IN1k 12 100 52.5 model
DINO-Swin-S-224-4scale Swin-Small-224 IN1k 12 100 53.0 model
DINO-Swin-B-384-4scale Swin-Base-384 IN22k to IN1k 12 100 55.8 model
DINO-Swin-L-224-4scale Swin-Large-224 IN22k to IN1k 12 100 56.9 model
DINO-Swin-L-384-4scale Swin-Large-384 IN22k to IN1k 12 100 56.9 model
DINO-Swin-L-384-5scale Swin-Large-384 IN22k to IN1k 12 100 57.5 model
DINO-Swin-L-384-4scale Swin-Large-384 IN22k to IN1k 36 100 58.1 model
DINO-Swin-L-384-5scale Swin-Large-384 IN22k to IN1k 36 100 58.5 model

Pretrained DINO with FocalNet Backbone

Name Backbone Pretrained Epochs Denoising Queries box
AP
Download
DINO-FocalNet-Large-4scale FocalNet-384-LRF-3Level IN22k 12 100 57.5 model
DINO-FocalNet-Large-4scale FocalNet-384-LRF-4Level IN22k 12 100 58.0 model
DINO-FocalNet-Large-5scale FocalNet-384-LRF-4Level IN22k 12 100 58.5 model

Pretrained DINO with ViTDet Backbone

Name Backbone Pretrained Epochs Denoising Queries box
AP
Download
DINO-ViTDet-Base-4scale ViT IN1k, MAE 12 100 50.2 model
DINO-ViTDet-Base-4scale ViT IN1k, MAE 50 100 55.0 model
DINO-ViTDet-Large-4scale ViT IN1k, MAE 12 100 52.9 model
DINO-ViTDet-Large-4scale ViT IN1k, MAE 50 100 57.5 model

H-Deformable-DETR

Name Backbone Pretrained Query Epochs box
AP
Download
H-Deformable-DETR-R50 + tricks (detrex) R50 IN1k 300 12 49.1 model
H-Deformable-DETR-R50 + tricks (converted) R50 IN1k 300 12 48.9 model
H-Deformable-DETR-R50 + tricks (converted) R50 IN1k 300 36 50.3 model
H-Deformable-DETR-Swin-T + tricks (converted) Swin-Tiny IN1k 300 12 50.6 model
H-Deformable-DETR-Swin-T + tricks (converted) Swin-Tiny IN1k 300 36 53.5 model
H-Deformable-DETR-Swin-L + tricks (converted) Swin-Large IN22k 300 12 56.2 model
H-Deformable-DETR-Swin-L + tricks (converted) Swin-Large IN22k 300 36 57.5 model
H-Deformable-DETR-Swin-L + tricks (converted) Swin-Large IN22k 900 12 56.4 model
H-Deformable-DETR-Swin-L + tricks (converted) Swin-Large IN22k 300 36 57.5 model

DETA

Name Backbone Pretrained Epochs box
AP
Download
Improved-Deformable-DETR-R50 (converted) R-50 IN1k 50 49.8 model
DETA-R50-5scale (bs=8, 180000 iterations) R-50 IN1k 12 50.0 model
DETA-R50-5scale (with hacked train engine) R-50 IN1k 12 49.9 model
DETA-R50-5scale-12ep (no frozen backbone) R-50 IN1k 12 50.2 model
DETA-R50-5scale (converted) R-50 IN1k 12 50.1 model
DETA-Swin-Large-finetune (converted) Swin-Large-384 Object 365 24 62.9 model