ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 3.2KB

Third-Party Licenses

This directory contains the licenses for the third-party tools and libraries used in this project. Below is a list of the tools along with their licenses.

Tools and Licenses

Tool Name Usage Repository Page License
Spleeter Source separation (remove background music) GitHub MIT
Parsi.io Number extraction & number to text conversion GitHub Apache-2.0
Hazm Text normalization GitHub MIT
Pydub Silence detection/removal GitHub MIT
Perpos Part of speech tagging for sentence tokenization GitHub MIT
Vosk Forced alignment GitHub Apache-2.0
Whisper-fa Forced alignment HuggingFace Apache-2.0
Wav2vec2-v3 Forced alignment HuggingFace -
Wav2vec2-fa Forced alignment GitHub Apache-3.0
Hezar Forced alignment GitHub Apache-2.0
JiWER CER calculation GitHub Apache-2.0

License Files

This directory also contains the actual license files for each tool:

Please refer to these files for the full text of each license.