A dataset of informal Persian audio and text chunks, along with a fully open processing pipeline, suitable for ASR and TTS tasks. Created from crawled content on virgool.io.

Updated 2 months ago

Code and Resources for "LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study", introducing methods to leverage LLMs for G2P tasks without additional training, featuring Sentence-Bench and Kaamel-Dict.

Updated 2 months ago

A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject

Updated 2 months ago

Python package for detecting informal Persian text using regular expressions and rule-based methods

Updated 2 months ago

ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

Updated 2 months ago

Updated 6 months ago

Updated 9 months ago

Updated 9 months ago

Updated 10 months ago

Updated 1 year ago

Updated 1 year ago

The codes and documentation for my BSc project in the area of Cancer Genomics

Updated 1 year ago