| @@ -1,14 +1,20 @@ | |||
| # ManaTTS-Persian-Speech-Dataset | |||
| ManaTTS is the largest publicly accessible single-speaker Persian corpus, comprising approximately 86 hours of audio with a sampling rate of 44.1 kHz. It is released under the open CC-0 license, enabling educational and commercial use. This dataset is a comprehensive speech dataset for the Persian language, collected from the [Nasl-e-Mana](https://naslemana.com/) magazine. It includes a wide range of topics and domains, making it suitable for training high-quality text-to-speech models. The dataset is accompanied by a fully transparent, open-source pipeline for data collection and processing, including tools for audio segmentation and forced alignment. | |||
| ManaTTS is the largest publicly accessible single-speaker Persian corpus, comprising over 114 hours of audio with a sampling rate of 44.1 kHz. It is released under the open CC-0 license, enabling educational and commercial use. This dataset is a comprehensive speech dataset for the Persian language, collected from the [Nasl-e-Mana](https://naslemana.com/) magazine. It includes a wide range of topics and domains, making it suitable for training high-quality text-to-speech models. The dataset is accompanied by a fully transparent, open-source pipeline for data collection and processing, including tools for audio segmentation and forced alignment. | |||
| ## Dataset | |||
| The ManaTTS dataset can be downloaded from [this link](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS). You can access a smaller, random sample of this dataset in the [sampled data directory](sample_data). These samples were selected to reflect the same distribution of match qualities as the complete dataset. For more details on match qualities, please refer to the paper (link to be updated). | |||
| [](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS) | |||
| The ManaTTS dataset can be downloaded from [this link](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS). You can access a smaller, random sample of this dataset in the [sampled data directory](sample_data). These samples were selected to reflect the same distribution of match qualities as the complete dataset. For more details on match qualities, please refer to [the paper](https://aclanthology.org/2025.naacl-long.464/). | |||
| ## Raw Data Crawling | |||
| [](https://colab.research.google.com/drive/1_E5KYAwuCr9B8k6EPYjVErsx-7rrr8Vl?usp=sharing) | |||
| The raw data for this dataset was crawled from the Nasl-e-Mana magazine website. The crawling script used for this purpose is also provided in this repository and on Google Colab in [this link](https://colab.research.google.com/drive/1_E5KYAwuCr9B8k6EPYjVErsx-7rrr8Vl?usp=sharing). | |||
| ## Processing Pipeline | |||
| [](https://colab.research.google.com/drive/1fWTy4IH2tSuOLrLSD8E8LMaUlI_Gnf-e?usp=sharing) | |||
| The following figure illustrates the overall processing pipeline used to create the ManaTTS dataset, including the steps for preproces | |||
| <p align="center"> | |||
| @@ -24,7 +30,9 @@ To run the pipeline, follow these steps: | |||
| 3. Execute the cells in the notebook sequentially | |||
| ## Trained TTS Model | |||
| A text-to-speech (TTS) model has been trained on the ManaTTS dataset. The code for training the model, as well as some output samples, are available in [this repository](https://github.com/MahtaFetrat/Persian-MultiSpeaker-Tacotron2). | |||
| [](https://huggingface.co/MahtaFetrat/Persian-Tacotron2-on-ManaTTS) | |||
| A text-to-speech (TTS) model has been trained on the ManaTTS dataset. The code for training the model, as well as some output samples, are available in [this repository](https://github.com/MahtaFetrat/Persian-MultiSpeaker-Tacotron2). The model weights and inference instructions can be found in [this repository](https://huggingface.co/MahtaFetrat/Persian-Tacotron2-on-ManaTTS). | |||
| ## Contributing | |||
| Contributions to this project are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request. | |||
| @@ -38,20 +46,38 @@ The ManaTTS dataset is provided exclusively for research and development purpose | |||
| By accessing and using the ManaTTS dataset, you are obligated to uphold the highest standards of integrity and respect for user privacy. Any violation of these principles may have severe legal and ethical consequences. | |||
| For any inquiries or clarifications regarding the use of this dataset, please reach out to us at [contact info to be updated]. Your cooperation in ensuring responsible use of this dataset is greatly appreciated. | |||
| For any inquiries or clarifications regarding the use of this dataset, please reach out to us. Your cooperation in ensuring responsible use of this dataset is greatly appreciated. | |||
| ## Acknowledgment | |||
| We would like to express our sincere gratitude to [Nasl-e-Mana](https://naslemana.com/), the monthly magazine of the blind community of Iran, for their generosity. Their commitment to openness and collaboration has been instrumental in advancing research and development in speech synthesis. We are especially thankful for their choice to release the data under the Creative Commons CC-0 license, allowing for unrestricted use and distribution. | |||
| ## Collaboration and Community Impact | |||
| We encourage researchers, developers, and the broader community to utilize the resources provided in this project, particularly in the development of high-quality screen readers and other assistive technologies to support the Iranian blind community. By fostering open-source collaboration, we aim to drive innovation and improve accessibility for all. | |||
| ## Citation | |||
| If you use this dataset or the processing pipeline in your work, please cite the following paper: | |||
| ```bash | |||
| @article{fetrat2024manatts, | |||
| title={ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages}, | |||
| author={Mahta Fetrat Qharabagh and Zahra Dehghanian and Hamid R. Rabiee}, | |||
| journal={arXiv preprint arXiv:2409.07259}, | |||
| year={2024}, | |||
| @inproceedings{qharabagh-etal-2025-manatts, | |||
| title = "{M}ana{TTS} {P}ersian: a recipe for creating {TTS} datasets for lower resource languages", | |||
| author = "Qharabagh, Mahta Fetrat and Dehghanian, Zahra and Rabiee, Hamid R.", | |||
| booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", | |||
| month = apr, | |||
| year = "2025", | |||
| address = "Albuquerque, New Mexico", | |||
| publisher = "Association for Computational Linguistics", | |||
| url = "https://aclanthology.org/2025.naacl-long.464/", | |||
| pages = "9177--9206", | |||
| } | |||
| ``` | |||
| --- | |||
| ## Aditional Links | |||
| - [ManaTTS Huggingface Repository](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS) | |||
| - [ManaTTS Paper](https://aclanthology.org/2025.naacl-long.464/) | |||
| - [Nasl-e-Mana Magazine](https://naslemana.com/) | |||
| - Tacotron2 Trained on ManaTTS [Huggingface](https://huggingface.co/MahtaFetrat/Persian-Tacotron2-on-ManaTTS) | [Github](https://github.com/MahtaFetrat/ManaTTS-Persian-Tacotron2-Model) | |||