Python package for detecting informal Persian text using regular expressions and rule-based methods
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
MahtaFetrat 4f591d8121 setup for pypi 10 months ago
informal_detector setup for pypi 10 months ago
.gitignore setup for pypi 10 months ago
LICENSE Initial commit 10 months ago
MANIFEST.in setup for pypi 10 months ago
README.md setup for pypi 10 months ago
setup.py setup for pypi 10 months ago

README.md

Persian-Informal-Text-Detector

Persian Informal Text Detector is a rule-based informal text detector based on regular expressions. It can be used to identify informal Persian text by detecting certain indicators such as informal words and verb formats.

Source of Informal Text Indicators

Some of the informal text indicators, such as informal words and verb formats, are derived from this Wikipedia page.

Installation

You can install Persian Informal Text Detector using pip:

pip install informal_detector

Example Usage

from informal_detector import is_informal

# Returns True since the text contains at least one informal indicator
result1 = is_informal("دلم میخواد برم خونه", threshold=1)
print(result1)  # Output: True

# Returns False since the text does not contain enough informal indicators
result2 = is_informal("نباید به خانه بروم", threshold=1)
print(result2)  # Output: False

The threshold Argument

The threshold keyword argument is crucial as it indicates how strict the detector should be. It determines the number of informal Persian indicators, such as informal words and verbs, required to classify a text as informal.

A lower threshold is suitable for smaller text files, while a higher threshold is more appropriate for larger files where some formal sentences might exist but the text should still be marked as informal if it contain a significant number of informal indicators. A threshold of 1 means that a text is considered informal if it contains at least one informal word or verb.