# XNet XNet: A Staged Dual-Frequency Synergistic Framework via Wavelet-FFT for Medical Image Segmentation of Small Objects and Weak Boundaries ## Overview XNet is a medical image segmentation framework designed for accurate segmentation of small objects and weak boundaries. It combines wavelet transform and FFT enhancement with Swin-UNETR architecture to achieve superior performance in challenging scenarios. ## Key Features - **Dual-Frequency Enhancement**: Combines wavelet transform and FFT for multiscale feature extraction - **Swin-UNETR Backbone**: Leverages transformer-based architecture for robust segmentation - **Enhanced Data Augmentation**: Comprehensive augmentation pipeline for better generalization - **Multi-Metric Evaluation**: Dice, IoU, and Hausdorff Distance metrics ## Installation ### Prerequisites ```bash pip install torch monai swanlab opencv-python numpy ``` ## Quick Start ### 1. Prepare Your Dataset Organize your dataset in the following structure: ``` data/ └── Polyp-Detection-Dataset/ └── YourDatasetName/ ├── images/ ├─────1.png ├─────2.png ├─────... └── masks/ ├─────1.png ├─────2.png ├─────... └── train.txt └── val.txt ``` ### 2. Training Train the model on your dataset: ```bash python train.py \ --dataset_name YourDatasetName \ --data_root ./data/Polyp-Detection-Dataset \ --batch_size 4 \ --max_epochs 1000 \ --learning_rate 1e-4 \ --device cuda ``` #### Key Training Parameters | Parameter | Default | Description | |-----------------------------|------------|----------------------------| | `--dataset_name` | Required | Name of your dataset | | `--batch_size` | 4 | Batch size for training | | `--max_epochs` | 1000 | Maximum training epochs | | `--learning_rate` | 1e-4 | Initial learning rate | | `--feature_size` | 48 | Network feature dimension | | `--target_spatial_size` | (512, 512) | Input image size | | `--early_stopping_patience` | 100 | Early stopping patience | | `--use_wavelet` | True | Enable wavelet enhancement | | `--use_fft` | True | Enable FFT enhancement | ### 3. Evaluation Evaluate trained models: ```bash python eval.py \ --dataset_name YourDatasetName \ --data_root ./data/Polyp-Detection-Dataset \ --outputs_dir ./outputs \ --device cuda ``` #### Key Evaluation Parameters | Parameter | Default | Description | |------------------------|------------------|---------------------------------------------| | `--dataset_name` | Required | Name of your dataset | | `--outputs_dir` | ./outputs_minute | Directory containing trained models | | `--batch_size` | 1 | Batch size for evaluation | | `--save_visualization` | True | Save visualization results | | `--vis_num_samples` | 1000 | Number of samples to visualize | | `--best_metric` | False | Use best overall model (default: best Dice) | ## Model Architecture XNet integrates three key components: 1. **Wavelet Enhancement Module**: Captures multi-scale frequency features 2. **FFT Enhancement Module**: Enhances global frequency domain information 3. **Swin-UNETR v2**: Transformer-based backbone for robust feature extraction ## Output Structure After training, outputs are organized as: ``` outputs_minute/ ├── best_dice_model_YourDatasetName.pt ├── best_iou_model_YourDatasetName.pt ├── best_metric_model_YourDatasetName.pt └── checkpoints_YourDatasetName/ └── checkpoint_epoch=X.pt ``` ## Monitoring with SwanLab Training progress is automatically logged to SwanLab: Metrics tracked: - Training/validation loss - Dice coefficient - IoU (Intersection over Union) - Hausdorff Distance - Learning rate schedule ## Advanced Usage ### Resume Training Training automatically resumes from the latest checkpoint: ```bash python train.py --dataset_name YourDatasetName ``` ### Disable Components (Ablation Study) ```bash # Disable wavelet enhancement python train.py --dataset_name YourDatasetName --no_wavelet # Disable FFT enhancement python train.py --dataset_name YourDatasetName --no_fft ``` ### Custom Loss Weights ```bash python train.py \ --dataset_name YourDatasetName \ --dice_weight 1.0 \ --ce_weight 1.0 \ --iou_weight 1.0 ``` ## Citation If you find this work useful, please cite our paper. ## License This project is licensed under the Apache License.