![]() |
1 week ago | |
---|---|---|
assets | 2 weeks ago | |
model-weights | 2 weeks ago | |
testing-scripts | 2 weeks ago | |
training-scripts | 2 weeks ago | |
LICENSE | 2 weeks ago | |
README.md | 1 week ago |
Homo-GE2PE is a Persian grapheme-to-phoneme (G2P) model specialized in homograph disambiguation—words with identical spellings but context-dependent pronunciations (e.g., مرد pronounced as mard “man” or mord “died”). Introduced in Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models, the model extends GE2PE by fine-tuning it on the HomoRich dataset, explicitly designed for such pronunciation challenges.
model-weights/
│ ├── homo-ge2pe.zip # Homo-GE2PE model checkpoint
│ └── homo-t5.zip # Homo-T5 model checkpoint (T5-based G2P model)
training-scripts/
│ ├── finetune-ge2pe.py # Fine-tuning script for GE2PE
│ └── finetune-t5.py # Fine-tuning script for T5
testing-scripts/
│ └── test.ipynb # Benchmarking the models with SentenceBench Persian G2P Benchmark
assets/
│ └── (files required for inference, e.g., Parsivar, GE2PE.py)
Below are the performance metrics for each model variant on the SentenceBench dataset:
Model | PER (%) | Homograph Acc. (%) | Avg. Inf. Time (s) |
---|---|---|---|
GE2PE (Base) | 4.81 | 47.17 | 0.4464 |
Homo-T5 | 4.12 | 76.32 | 0.4141 |
Homo-GE2PE | 3.98 | 76.89 | 0.4473 |
For inference, use the provided inference.ipynb
notebook or the Colab link. The notebook demonstrates how to load the checkpoints and perform grapheme-to-phoneme conversion using Homo-GE2PE and Homo-T5.
The models in this repository were fine-tuned on HomoRich, the first large-scale public Persian homograph dataset for grapheme-to-phoneme (G2P) tasks, resolving pronunciation/meaning ambiguities in identically spelled words. Introduced in “Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models”, the dataset is available here (TODO: Update link).
If you use this project in your work, please cite the corresponding paper:
TODO
Contributions and pull requests are welcome. Please open an issue to discuss the changes you intend to make.