VMamba

Installation

Step 1: Clone the VMamba repository:

To get started, first clone the VMamba repository and navigate to the project directory:

git clone "this repo"
cd VMamba

Step 2: Environment Setup:

VMamba recommends setting up a conda environment and installing dependencies via pip. Use the following commands to set up your environment: Also, We recommend using the pytorch>=2.0, cuda>=11.8. But lower version of pytorch and CUDA are also supported.

Create and activate a new conda environment

conda create -n vmamba
conda activate vmamba

Install Dependencies

pip install -r requirements.txt
cd kernels/selective_scan && pip install .

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Data preparation

We use standard ImageNet dataset, you can download it from http://image-net.org/. We provide the following two ways to load data:

For standard folder dataset, move validation images to labeled sub-folders. The file structure should look like:

$ tree data
imagenet
├── train
│   ├── class1
│   │   ├── img1.jpeg
│   │   ├── img2.jpeg
│   │   └── ...
│   ├── class2
│   │   ├── img3.jpeg
│   │   └── ...
│   └── ...
└── val
  ├── class1
  │   ├── img4.jpeg
  │   ├── img5.jpeg
  │   └── ...
  ├── class2
  │   ├── img6.jpeg
  │   └── ...
  └── ...

To boost the slow speed when reading images from massive small files, we also support zipped ImageNet, which includes four files:

train.zip, val.zip: which store the zipped folder for train and validate splits.

train_map.txt, val_map.txt: which store the relative path in the corresponding zip file and ground truth label. Make sure the data folder looks like this:

$ tree data
data
└── ImageNet-Zip
├── train_map.txt
├── train.zip
├── val_map.txt
└── val.zip
  
$ head -n 5 data/ImageNet-Zip/val_map.txt
ILSVRC2012_val_00000001.JPEG	65
ILSVRC2012_val_00000002.JPEG	970
ILSVRC2012_val_00000003.JPEG	230
ILSVRC2012_val_00000004.JPEG	809
ILSVRC2012_val_00000005.JPEG	516
  
$ head -n 5 data/ImageNet-Zip/train_map.txt
n01440764/n01440764_10026.JPEG	0
n01440764/n01440764_10027.JPEG	0
n01440764/n01440764_10029.JPEG	0
n01440764/n01440764_10040.JPEG	0
n01440764/n01440764_10042.JPEG	0

For ImageNet-22K dataset, make a folder named fall11_whole and move all images to labeled sub-folders in this folder. Then download the train-val split file (ILSVRC2011fall_whole_map_train.txt & ILSVRC2011fall_whole_map_val.txt) , and put them in the parent directory of fall11_whole. The file structure should look like:
```
$ tree imagenet22k/
imagenet22k/
├── ILSVRC2011fall_whole_map_train.txt
├── ILSVRC2011fall_whole_map_val.txt
└── fall11_whole
    ├── n00004475
    ├── n00005787
    ├── n00006024
    ├── n00006484
    └── ...
```

Model Training and Inference

Classification

To train VMamba models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --pretrained </path/of/checkpoint>

please refer to modelcard for more details.

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

For more information about detection and segmentation tasks, please refer to the manual of mmdetection and mmsegmentation. Remember to use the appropriate backbone configurations in the configs directory.

Analysis Tools

VMamba includes tools for visualizing mamba "attention" and effective receptive field, analysing throughput and train-throughput. Use the following commands to perform analysis:

# Visualize Mamba "Attention"
CUDA_VISIBLE_DEVICES=0 python analyze/attnmap.py

# Analyze the effective receptive field
CUDA_VISIBLE_DEVICES=0 python analyze/erf.py

# Analyze the throughput and train throughput
CUDA_VISIBLE_DEVICES=0 python analyze/tp.py

We also included other analysing tools that we may use in this project. Thanks to all who have contributes to these tools.

get_started.md 6.2 KB Riwayat Mentahan

VMamba

Installation

Data preparation

Model Training and Inference

Analysis Tools

get_started.md 6.2 KB

Riwayat Mentahan