|
|
|
|
|
|
|
|
|
|
|
|
|
|
The dataset can be used to train Persian ASR and TTS models, specifically tailored for informal Persian speech. Additionally, it can be utilized to evaluate ASR models in terms of Character Error Rate (CER). For example, see [this repository](https://github.com/MahtaFetrat/Open-Persian-ASRs) for an ASR evaluation setup. |
|
|
The dataset can be used to train Persian ASR and TTS models, specifically tailored for informal Persian speech. Additionally, it can be utilized to evaluate ASR models in terms of Character Error Rate (CER). For example, see [this repository](https://github.com/MahtaFetrat/Open-Persian-ASRs) for an ASR evaluation setup. |
|
|
|
|
|
|
|
|
## Dataset |
|
|
|
|
|
The dataset can be accessed through the following links: |
|
|
|
|
|
|
|
|
|
|
|
- [Raw Data](link to be updated) |
|
|
|
|
|
- [Processed Data](https://dml.ir/virgool-informal) |
|
|
|
|
|
|
|
|
|
|
|
You can also see a smaller randomly selected subset of the processed data in [this directory](sample_data). |
|
|
|
|
|
|
|
|
|
|
|
## License |
|
|
## License |
|
|
|
|
|
|
|
|
This project is licensed under the open MIT License for the code and under the open CC-0 License for the data. |
|
|
This project is licensed under the open MIT License for the code and under the open CC-0 License for the data. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Contributions are welcome! Please feel free to submit a Pull Request. |
|
|
Contributions are welcome! Please feel free to submit a Pull Request. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
Enjoy working with the VirgoolInformal-Speech-Dataset! |
|
|
|