| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308 |
- Collections:
- - Name: CLIP
- Metadata:
- Architecture:
- - Attention Dropout
- - Convolution
- - Dense Connections
- - Dropout
- - GELU
- - Layer Normalization
- - Multi-Head Attention
- - Scaled Dot-Product Attention
- - Tanh Activation
- Paper:
- Title: Learning Transferable Visual Models From Natural Language Supervision
- URL: https://arxiv.org/abs/2103.00020
- README: configs/clip/README.md
- Code:
- URL: https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/models/backbones/vision_transformer.py
- Version: v1.0.0
- Models:
- - Name: vit-base-p32_clip-openai-pre_3rdparty_in1k
- Metadata:
- FLOPs: 4364335104
- Parameters: 88225000
- Training Data:
- - OpenAI
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 81.77
- Top 5 Accuracy: 95.89
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-pre_3rdparty_in1k_20221220-a0182ba9.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k
- - Name: vit-base-p32_clip-laion2b-pre_3rdparty_in1k
- Metadata:
- FLOPs: 4364335104
- Parameters: 88225000
- Training Data:
- - LAION-2B
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 82.46
- Top 5 Accuracy: 96.12
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-pre_3rdparty_in1k_20221220-194df57f.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k
- - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k
- Metadata:
- FLOPs: 4364335104
- Parameters: 88225000
- Training Data:
- - LAION-2B
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 83.06
- Top 5 Accuracy: 96.49
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k_20221220-b384e830.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k
- - Name: vit-base-p32_clip-openai-in12k-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 12661054464
- Parameters: 88225000
- Training Data:
- - OpenAI
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.13
- Top 5 Accuracy: 97.42
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-in12k-pre_3rdparty_in1k-384px_20221220-dc2e49ea.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k
- - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 12661054464
- Parameters: 88225000
- Training Data:
- - LAION-2B
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.39
- Top 5 Accuracy: 97.67
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-c7757552.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k
- - Name: vit-base-p16_clip-openai-pre_3rdparty_in1k
- Metadata:
- FLOPs: 16855600128
- Parameters: 86568424
- Training Data:
- - OpenAI
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.3
- Top 5 Accuracy: 97.5
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k_20221220-c7d9c899.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k
- - Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k
- Metadata:
- FLOPs: 16855600128
- Parameters: 86568424
- Training Data:
- - LAION-2B
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.49
- Top 5 Accuracy: 97.59
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k_20221220-5e24ff58.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k
- - Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k
- Metadata:
- FLOPs: 16855600128
- Parameters: 86568424
- Training Data:
- - OpenAI
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.99
- Top 5 Accuracy: 97.72
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k_20221220-90d930a8.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k
- - Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k
- Metadata:
- FLOPs: 16855600128
- Parameters: 86568424
- Training Data:
- - LAION-2B
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 86.02
- Top 5 Accuracy: 97.76
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k_20221220-a5e31f8c.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k
- - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-448px
- Metadata:
- FLOPs: 17202416640
- Parameters: 88225000
- Training Data:
- - LAION-2B
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 85.76
- Top 5 Accuracy: 97.63
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-448px_20221220-ca404a7d.pth
- Config: configs/clip/vit-base-p32_pt-64xb64_in1k-448px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k
- - Name: vit-base-p16_clip-openai-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 49370078208
- Parameters: 86568424
- Training Data:
- - OpenAI
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 86.25
- Top 5 Accuracy: 97.9
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k-384px_20221220-eb012e87.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k
- - Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 49370078208
- Parameters: 86568424
- Training Data:
- - LAION-2B
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 86.52
- Top 5 Accuracy: 97.97
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k-384px_20221220-558ed826.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k
- - Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 49370078208
- Parameters: 86568424
- Training Data:
- - OpenAI
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 86.87
- Top 5 Accuracy: 98.05
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k-384px_20221220-8df86b74.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k
- - Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k-384px
- Metadata:
- FLOPs: 49370078208
- Parameters: 86568424
- Training Data:
- - LAION-2B
- - ImageNet-12k
- - ImageNet-1k
- In Collection: CLIP
- Results:
- - Dataset: ImageNet-1k
- Metrics:
- Top 1 Accuracy: 87.17
- Top 5 Accuracy: 98.02
- Task: Image Classification
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-84ed0cc0.pth
- Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py
- Converted From:
- Code: https://github.com/rwightman/pytorch-image-models
- Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k
- - Name: vit-large-p14_clip-openai-pre_3rdparty
- Metadata:
- FLOPs: 59696580608
- Parameters: 303302656
- Training Data:
- - OpenAI
- In Collection: CLIP
- Weights: https://download.openmmlab.com/mmclassification/v0/clip/vit-large-p14_clip-openai-pre_3rdparty_20230517-95e2af0b.pth
- Config: configs/clip/vit-large-p14_headless.py
- Converted From:
- Code: https://github.com/mlfoundations/open_clip
- Weights: https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt
|