Browse Source

Update 'README.md'

main
Mahta Fetrat 1 month ago
parent
commit
4cdfbc45a8
1 changed files with 16 additions and 42 deletions
  1. 16
    42
      README.md

+ 16
- 42
README.md View File

@@ -1,6 +1,6 @@
# Persian G2P Tools Benchmark

This repository contains benchmarking notebooks for various Persian grapheme-to-phoneme (G2P) models, including both baseline models and the proposed Homo-GE2PE and Homo-T5 models in the *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models](link)* study. The benchmarks are conducted using the [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench).
This repository contains benchmarking notebooks for various Persian grapheme-to-phoneme (G2P) models, including both baseline models and the proposed Homo-GE2PE and Homo-T5 models in the *[Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models]([link](https://arxiv.org/abs/2505.12973))* study. The benchmarks are conducted using the [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench).

---

@@ -29,55 +29,29 @@ Each notebook benchmarks a specific model using the SentenceBench dataset. The r

The table below presents the performance of each model, averaged across 5 runs:

| Model | PER (%) | Homograph Acc. (%) | Avg. Inf. Time (s) |
| ---------------------------------------------- | ------- | ------------------ | ------------------ |
| PersianG2P (AzamRabiee) | 35.23 | 21.23 | 11.1374 |
| PasaOpasen\_PersianG2P | 15.04 | 37.74 | 2.1686 |
| persian\_phonemizer (de\_mh) | 25.27 | 29.25 | 0.1803 |
| Epitran (dmort27) | 45.12 | 0.00 | 0.0003 |
| G2P (mohamad\_hasan\_sohan\_ajini) | 19.63 | 29.91 | 28.0039 |
| Persian\_Grapheme\_To\_Phoneme (sajadalipour7) | 12.85 | 40.00 | 0.9685 |
| eSpeak NG | 6.92 | 43.87 | 0.0169 |
| GE2PE | 4.81 | 47.17 | 0.4464 |
| HomoFast eSpeak | 6.33 | 74.53 | 0.0084 |
| Homo-T5 | 4.12 | 76.32 | 0.4141 |
| Homo-GE2PE | 3.98 | 76.89 | 0.4473 |
| Model | PER (%) | Homograph Acc. (%) | Avg. Inf. Time (s) |
| ---------------------------------------------- | ------- | ------------------ | ------------------ |
| [Persian\_G2P](https://github.com/AzamRabiee/Persian_G2P) | 35.23 | 21.23 | 11.1374 |
| [PersianG2P](https://github.com/PasaOpasen/PersianG2P) | 15.04 | 37.74 | 2.1686 |
| [persian\_phonemizer](https://github.com/de-mh/persian_phonemizer) | 25.27 | 29.25 | 0.1803 |
| [Epitran](https://github.com/dmort27/epitran) | 45.12 | 0.00 | 0.0003 |
| [Persian G2P](https://github.com/mohamad-hasan-sohan-ajini/G2P) | 19.63 | 29.91 | 28.0039 |
| [Persian\_Grapheme\_To\_Phoneme](https://github.com/sajadalipour7/Persian-Grapheme-To-Phoneme-With-Transformer) | 12.85 | 40.00 | 0.9685 |
| [eSpeak NG](https://espeak.sourceforge.net/) | 6.92 | 43.87 | 0.0169 |
| [GE2PE](https://github.com/Sharif-SLPL/GE2PE) | 4.81 | 47.17 | 0.4464 |
| [HomoFast eSpeak](https://github.com/MahtaFetrat/HomoFast-eSpeak-Persian) | 6.33 | 74.53 | 0.0084 |
| [Homo-T5](https://github.com/MahtaFetrat/Homo-GE2PE-Persian) | 4.12 | 76.32 | 0.4141 |
| [Homo-GE2PE](https://github.com/MahtaFetrat/Homo-GE2PE-Persian) | 3.98 | 76.89 | 0.4473 |

---

## Contributions

Contributions and pull requests are welcome. Please open an issue to discuss the changes you intend to make.

---

## License

This repository is licensed under the MIT License.

---


## Citation

If you use this project in your work, please cite the corresponding paper:

> TODO
Contributions and pull requests are welcome. Please open an issue to discuss the changes you intend to make or the models/ttols you want to add to the benchmark.

---

### Additional Links

* [Link to Paper](https://anonymous.4open.science/r/HomoRich-G2P-Persian/)
* [Link to Paper](https://arxiv.org/abs/2505.12973)
* [SentenceBench Persian G2P Benchmark](https://huggingface.co/datasets/MahtaFetrat/SentenceBench)
* [Base GE2PE Model](https://github.com/Sharif-SLPL/GE2PE)
* [AzamRabiee Persian G2P](https://github.com/AzamRabiee/Persian_G2P)
* [PasaOpasen PersianG2P](https://github.com/PasaOpasen/PersianG2P)
* [de\_mh persian\_phonemizer](https://github.com/de-mh/persian_phonemizer)
* [dmort27 Epitran](https://github.com/dmort27/epitran)
* [eSpeak NG](https://espeak.sourceforge.net/)
* [mohamad\_hasan\_sohan\_ajini G2P](https://github.com/mohamad-hasan-sohan-ajini/G2P)
* [sajadalipour7 Persian Grapheme To Phoneme](https://github.com/sajadalipour7/Persian-Grapheme-To-Phoneme-With-Transformer)
* [Homo-GE2PE](https://anonymous.4open.science/r/Homo-GE2PE-Persian/) (TODO: To be updated)
* [Homo-T5](https://anonymous.4open.science/r/Homo-GE2PE-Persian/) (TODO: To be updated)
* [HomoFast eSpeak](https://anonymous.4open.science/r/HomoFast-eSpeak-Persian-FF31/) (TODO: To be updated)

Loading…
Cancel
Save