Browse Source

Update 'README.md'

main
Mahta Fetrat 2 months ago
parent
commit
3356ba5e77
1 changed files with 12 additions and 5 deletions
  1. 12
    5
      README.md

+ 12
- 5
README.md View File

@@ -9,6 +9,8 @@ The text for this dataset was generated using GPT4o, with prompts covering a wid
These generated texts were then recorded in a quiet environment. The audio and text files underwent forced alignment using [aeneas](https://github.com/readbeyond/aeneas), resulting in smaller chunks of audio-text pairs as presented in this dataset.

## Download
[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-dataset-orange)](https://huggingface.co/datasets/MahtaFetrat/GPTInformal-Persian)

You can download the dataset from [this repository](https://huggingface.co/datasets/MahtaFetrat/GPTInformal-Persian).

### Data Columns
@@ -28,11 +30,16 @@ Each Parquet file contains the following columns:
If you use GPTInformal-Persian in your research or projects, please cite the following paper:

```bash
@article{fetrat2024manatts,
title={ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages},
author={Mahta Fetrat Qharabagh and Zahra Dehghanian and Hamid R. Rabiee},
journal={arXiv preprint arXiv:2409.07259},
year={2024},
@inproceedings{qharabagh-etal-2025-manatts,
title = "{M}ana{TTS} {P}ersian: a recipe for creating {TTS} datasets for lower resource languages",
author = "Qharabagh, Mahta Fetrat and Dehghanian, Zahra and Rabiee, Hamid R.",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.464/",
pages = "9177--9206",
}
```


Loading…
Cancel
Save