| @@ -9,6 +9,8 @@ The text for this dataset was generated using GPT4o, with prompts covering a wid | |||
| These generated texts were then recorded in a quiet environment. The audio and text files underwent forced alignment using [aeneas](https://github.com/readbeyond/aeneas), resulting in smaller chunks of audio-text pairs as presented in this dataset. | |||
| ## Download | |||
| [](https://huggingface.co/datasets/MahtaFetrat/GPTInformal-Persian) | |||
| You can download the dataset from [this repository](https://huggingface.co/datasets/MahtaFetrat/GPTInformal-Persian). | |||
| ### Data Columns | |||
| @@ -28,11 +30,16 @@ Each Parquet file contains the following columns: | |||
| If you use GPTInformal-Persian in your research or projects, please cite the following paper: | |||
| ```bash | |||
| @article{fetrat2024manatts, | |||
| title={ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages}, | |||
| author={Mahta Fetrat Qharabagh and Zahra Dehghanian and Hamid R. Rabiee}, | |||
| journal={arXiv preprint arXiv:2409.07259}, | |||
| year={2024}, | |||
| @inproceedings{qharabagh-etal-2025-manatts, | |||
| title = "{M}ana{TTS} {P}ersian: a recipe for creating {TTS} datasets for lower resource languages", | |||
| author = "Qharabagh, Mahta Fetrat and Dehghanian, Zahra and Rabiee, Hamid R.", | |||
| booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", | |||
| month = apr, | |||
| year = "2025", | |||
| address = "Albuquerque, New Mexico", | |||
| publisher = "Association for Computational Linguistics", | |||
| url = "https://aclanthology.org/2025.naacl-long.464/", | |||
| pages = "9177--9206", | |||
| } | |||
| ``` | |||