![]() |
8 months ago | |
---|---|---|
LICENSE | 8 months ago | |
README.md | 8 months ago | |
VirgoolInformal_Dataset_Processing.ipynb | 8 months ago |
This repository contains a dataset of informal Persian audio and text chunks suitable for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) tasks. The dataset was created by crawling informal Persian text from virgool.io using the crawling scripts from this repository, recording their spoken forms, and processing the raw audio and text files into smaller, equivalent chunks.
The dataset includes:
The raw data consists of:
The processed data consists of:
The processing of the raw data is documented in a Jupyter Notebook, which includes the following steps:
To run the processing notebook, place the raw data files into a folder named raw-data
in the root directory. The processed audio and text files will be output to a directory named processed-data
, and the forced alignment results will be written to forced-aligned-data
.
For detailed instructions on environment setup, please refer to the processing notebook.
You can view and run the processing notebook in Google Colab.
The dataset can be used to evaluate Persian ASR models in terms of Character Error Rate (CER). For example, see this repository (link to be updated) for an ASR evaluation setup.
This project is licensed under the open MIT License for the code and under the open CC-0 License for the data.
Contributions are welcome! Please feel free to submit a Pull Request.
Enjoy working with the VirgoolInformal-Speech-Dataset!