The torchvision library consists of popular datasets, model architectures, and image transformations for computer vision. It consists of:
Choose a task to see what models are available:
Viewing Models for Image Classification:
MODEL | TOP 1 ACCURACY | ~FLOPS | PAPER | YEAR | |
---|---|---|---|---|---|
|
MobileNet V3 |
74.042%
|
225 Million
|
||
|
MNASNet 1.0 |
73.51%
|
325 Million
|
||
|
ShuffleNet V2 |
69.36%
|
149 Million
|
||
|
MobileNet V2 |
71.88%
|
314 Million
|
||
|
ResNeXt |
79.31%
|
16 Billion
|
||
|
DenseNet |
77.65%
|
8 Billion
|
||
|
Wide ResNet |
78.84%
|
23 Billion
|
||
|
SqueezeNet |
58.19%
|
352 Million
|
||
|
ResNet |
78.31%
|
12 Billion
|
||
|
Inception v3 |
77.45%
|
6 Billion
|
||
|
GoogleNet |
69.78%
|
2 Billion
|
||
|
VGG |
74.24%
|
20 Billion
|
||
|
AlexNet |
56.55%
|
715 Million
|
Viewing Models for Object Detection:
MODEL | BOX AP | ~FLOPS | PAPER | YEAR | |
---|---|---|---|---|---|
|
RetinaNet |
36.4
|
527 Billion
|
||
|
Mask R-CNN |
37.9
|
447 Billion
|
||
|
Faster R-CNN |
37.0
|
447 Billion
|
Viewing Models for Action Classification:
MODEL | CLIP ACC@1 | ~FLOPS | PAPER | YEAR | |
---|---|---|---|---|---|
|
ResNet 3D |
57.5
|
41 Billion
|
Viewing Models for Instance Segmentation:
MODEL | MASK AP | ~FLOPS | PAPER | YEAR | |
---|---|---|---|---|---|
|
Mask R-CNN |
34.6
|
447 Billion
|