torchvision

The torchvision library consists of popular datasets, model architectures, and image transformations for computer vision. It consists of:

Training recipes for object detection, image classification, instance segmentation, video classification and semantic segmentation.
60+ pretrained models to use for fine-tuning (or training afresh).
Dataset loaders for popular vision datasets such as ImageNet, COCO, Cityscapes and more!

Tasks

Choose a task to see what models are available:

Viewing Models for Image Classification:

MODEL	TRAINED ON	TOP 1 ACCURACY	~FLOPS	YEAR
MobileNet V3	ImageNet	74.042%	225 Million	2019
MNASNet 1.0	ImageNet	73.51%	325 Million	2018
ShuffleNet V2	ImageNet	69.36%	149 Million	2018
MobileNet V2	ImageNet	71.88%	314 Million	2018
ResNeXt	ImageNet	79.31%	16 Billion	2016
DenseNet	ImageNet	77.65%	8 Billion	2016
Wide ResNet	ImageNet	78.84%	23 Billion	2016
SqueezeNet	ImageNet	58.19%	352 Million	2016
ResNet	ImageNet	78.31%	12 Billion	2015
Inception v3	ImageNet	77.45%	6 Billion	2015
GoogleNet	ImageNet	69.78%	2 Billion	2014
VGG	ImageNet	74.24%	20 Billion	2014
AlexNet	ImageNet	56.55%	715 Million	2014

Viewing Models for Object Detection:

MODEL	TRAINED ON	BOX AP	~FLOPS	YEAR
RetinaNet	COCO	36.4	527 Billion	2017
Mask R-CNN	COCO	37.9	447 Billion	2017
Faster R-CNN	COCO	37.0	447 Billion	2015

Viewing Models for Action Classification:

	MODEL	TRAINED ON	CLIP ACC@1	~FLOPS	PAPER	YEAR
	ResNet 3D	Kinetics-400	57.5	41 Billion		2017

Viewing Models for Instance Segmentation:

	MODEL	TRAINED ON	MASK AP	~FLOPS	PAPER	YEAR
	Mask R-CNN	COCO	34.6	447 Billion		2017