Python package for detecting informal Persian text using regular expressions and rule-based methods
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.0KB

1 year ago
1 year ago
1234567891011121314151617181920212223242526272829303132
  1. # Persian-Informal-Text-Detector
  2. Persian Informal Text Detector is a rule-based informal text detector based on regular expressions. It can be used to identify informal Persian text by detecting certain indicators such as informal words and verb formats.
  3. ## Source of Informal Text Indicators
  4. Some of the informal text indicators, such as informal words and verb formats, are derived from [this Wikipedia page](https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%D8%A7%D8%B4%D8%AA%D8%A8%D8%A7%D9%87%E2%80%8C%DB%8C%D8%A7%D8%A8/%D9%81%D9%87%D8%B1%D8%B3%D8%AA/%D8%BA%DB%8C%D8%B1%D8%B1%D8%B3%D9%85%DB%8C).
  5. ## Installation
  6. You can install Persian Informal Text Detector using pip:
  7. ```bash
  8. pip install informal_detector
  9. ```
  10. ## Example Usage
  11. ```python
  12. from informal_detector import is_informal
  13. # Returns True since the text contains at least one informal indicator
  14. result1 = is_informal("دلم میخواد برم خونه", threshold=1)
  15. print(result1) # Output: True
  16. # Returns False since the text does not contain enough informal indicators
  17. result2 = is_informal("نباید به خانه بروم", threshold=1)
  18. print(result2) # Output: False
  19. ```
  20. ## The `threshold` Argument
  21. The `threshold` keyword argument is crucial as it indicates how strict the detector should be. It determines the number of informal Persian indicators, such as informal words and verbs, required to classify a text as informal.
  22. A lower threshold is suitable for smaller text files, while a higher threshold is more appropriate for larger files where some formal sentences might exist but the text should still be marked as informal if it contain a significant number of informal indicators. A threshold of 1 means that a text is considered informal if it contains at least one informal word or verb.
  23. ## Contribution
  24. If you come across any issues or have ideas for improvements, please don't hesitate to let us know by opening an issue or sending a pull request. Thank you for using Persian Informal Text Detector!