# VMamba ### Installation **Step 1: Clone the VMamba repository:** To get started, first clone the VMamba repository and navigate to the project directory: ```bash git clone "this repo" cd VMamba ``` **Step 2: Environment Setup:** VMamba recommends setting up a conda environment and installing dependencies via pip. Use the following commands to set up your environment: Also, We recommend using the pytorch>=2.0, cuda>=11.8. But lower version of pytorch and CUDA are also supported. ***Create and activate a new conda environment*** ```bash conda create -n vmamba conda activate vmamba ``` ***Install Dependencies*** ```bash pip install -r requirements.txt cd kernels/selective_scan && pip install . ``` ***Dependencies for `Detection` and `Segmentation` (optional)*** ```bash pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0 ``` ### Data preparation We use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to load data: - For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like: ```bash $ tree data imagenet ├── train │ ├── class1 │ │ ├── img1.jpeg │ │ ├── img2.jpeg │ │ └── ... │ ├── class2 │ │ ├── img3.jpeg │ │ └── ... │ └── ... └── val ├── class1 │ ├── img4.jpeg │ ├── img5.jpeg │ └── ... ├── class2 │ ├── img6.jpeg │ └── ... └── ... ``` - To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes four files: - `train.zip`, `val.zip`: which store the zipped folder for train and validate splits. - `train_map.txt`, `val_map.txt`: which store the relative path in the corresponding zip file and ground truth label. Make sure the data folder looks like this: ```bash $ tree data data └── ImageNet-Zip ├── train_map.txt ├── train.zip ├── val_map.txt └── val.zip $ head -n 5 data/ImageNet-Zip/val_map.txt ILSVRC2012_val_00000001.JPEG 65 ILSVRC2012_val_00000002.JPEG 970 ILSVRC2012_val_00000003.JPEG 230 ILSVRC2012_val_00000004.JPEG 809 ILSVRC2012_val_00000005.JPEG 516 $ head -n 5 data/ImageNet-Zip/train_map.txt n01440764/n01440764_10026.JPEG 0 n01440764/n01440764_10027.JPEG 0 n01440764/n01440764_10029.JPEG 0 n01440764/n01440764_10040.JPEG 0 n01440764/n01440764_10042.JPEG 0 ``` - For ImageNet-22K dataset, make a folder named `fall11_whole` and move all images to labeled sub-folders in this folder. Then download the train-val split file ([ILSVRC2011fall_whole_map_train.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_train.txt) & [ILSVRC2011fall_whole_map_val.txt](https://github.com/SwinTransformer/storage/releases/download/v2.0.1/ILSVRC2011fall_whole_map_val.txt)) , and put them in the parent directory of `fall11_whole`. The file structure should look like: ```bash $ tree imagenet22k/ imagenet22k/ ├── ILSVRC2011fall_whole_map_train.txt ├── ILSVRC2011fall_whole_map_val.txt └── fall11_whole ├── n00004475 ├── n00005787 ├── n00006024 ├── n00006484 └── ... ``` ### Model Training and Inference **Classification** To train VMamba models for classification on ImageNet, use the following commands for different configurations: ```bash python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg --batch-size 128 --data-path --output /tmp ``` If you only want to test the performance (together with params and flops): ```bash python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg --batch-size 128 --data-path --output /tmp --pretrained ``` ***please refer to [modelcard](./modelcard.sh) for more details.*** **Detection and Segmentation** To evaluate with `mmdetection` or `mmsegmentation`: ```bash bash ./tools/dist_test.sh 1 ``` *use `--tta` to get the `mIoU(ms)` in segmentation* To train with `mmdetection` or `mmsegmentation`: ```bash bash ./tools/dist_train.sh 8 ``` For more information about detection and segmentation tasks, please refer to the manual of [`mmdetection`](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html) and [`mmsegmentation`](https://mmsegmentation.readthedocs.io/en/latest/user_guides/4_train_test.html). Remember to use the appropriate backbone configurations in the `configs` directory. ### Analysis Tools VMamba includes tools for visualizing mamba "attention" and effective receptive field, analysing throughput and train-throughput. Use the following commands to perform analysis: ```bash # Visualize Mamba "Attention" CUDA_VISIBLE_DEVICES=0 python analyze/attnmap.py # Analyze the effective receptive field CUDA_VISIBLE_DEVICES=0 python analyze/erf.py # Analyze the throughput and train throughput CUDA_VISIBLE_DEVICES=0 python analyze/tp.py ``` ***We also included other analysing tools that we may use in this project. Thanks to all who have contributes to these tools.***