addse
Code for "Absorbing Discrete Diffusion for Speech Enhancement".
Installation
-
Install uv
-
Clone the repository and install dependencies:
git clone [email protected]:philgzl/addse.git && cd addse && uv sync
Data preparation
Training data
The following datasets are used:
- Speech: EARS, LibriSpeech, VCTK, DNS5, MLS_URGENT_2025_track1
- Noise: WHAM_48kHz, DEMAND, FSD50K, DNS, FMA_medium
Place each dataset under data/external/. Then run the following scripts:
This converts the data to an optimized format for litdata and writes it in data/chunks/.
Alternatively, update the two shell scripts to use your own speech and noise data.
Validation data
Validation data is directly streamed from Hugging Face. No need to prepare anything.
Alternatively, update the configuration files in configs/ to use your own litdata-optimized validation data.
Evaluation data
Download the Clarity speech dataset to data/external/Clarity/. Then run:
The remaining evaluation data is directly streamed from Hugging Face.
Alternatively, update the configuration files in configs/ to use your own litdata-optimized evaluation data.
Training
To train a model:
Checkpoints and metrics are written to logs/<model_name>/.
You can use the --wandb option to log metrics to W&B, and the --log_model option to additionally upload checkpoints to W&B, after configuring a .env with your credentials.
Evaluation
To evaluate a trained model:
uv run addse eval configs/<model_name>.yaml logs/<model_name>/checkpoints/last.ckpt --num-consumers 4
The results are written in eval.db by default.
Trained checkpoints
Will be released soon.