Models in this subsection is trained from scratch with random or manual initialization. The hyper-parameters are inherited from Swin, except for drop_path_rate and EMA. All models are trained with EMA except for the Vanilla-VMamba-T.
TP.(Throughput) and Train TP. (Train Throughput) are assessed on an A100 GPU paired with an AMD EPYC 7542 CPU, with batch size 128. Train TP. is tested with mix-resolution, excluding the time consumption of optimizers.
FLOPs and parameters are now gathered with head (In previous versions, without head, so the numbers raise a little bit).
we calculate FLOPs with the algorithm @albertguprovides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algorithm).
Models in this subsection is initialized from the models trained in classfication.
we now calculate FLOPs with the algrithm @albertguprovides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algrithm).
Models in this subsection is initialized from the models trained in classfication.
we now calculate FLOPs with the algrithm @albertguprovides, which will be bigger than previous calculation (which is based on the selective_scan_ref function, and ignores the hardware-aware algrithm).