Here’s the README content in Markdown format:
Seperno is a powerful and customizable Persian text normalizer. It provides various tools to clean, preprocess, and normalize Persian text by handling spaces, URLs, punctuation, and more. This project is part of * Snapp Incubator* and aims to simplify text processing in Persian applications.
- Convert Half-Space to Space: Converts Persian half-spaces (
\u200c
) into regular spaces. - Remove URLs: Cleans text by removing URLs.
- Combine Multiple Spaces: Reduces multiple consecutive spaces into a single space.
- Remove Outer Spaces: Trims unnecessary spaces from the start and end of the text.
- Remove End-of-Line Characters: Removes specific characters like
.
or؟
at the end of a sentence. - Normalize Punctuation: Replaces punctuation marks with spaces or their normalized equivalents.
- Customizable: Use modular options to tailor the normalization process.
Install the package using go get
:
go get github.com/snapp-incubator/seperno
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(
seperno.WithConvertHalfSpaceToSpace(),
seperno.WithURLRemover(),
seperno.WithSpaceCombiner(),
)
text := "اسنپکپ تست https://example.com"
normalized := normalizer.BasicNormalizer(text)
fmt.Println(normalized) // Output: "اسنپ کپ تست"
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(seperno.WithConvertHalfSpaceToSpace())
text := "آسمانآبی"
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "اسمان ابی"
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(seperno.WithURLRemover())
text := "تست https://example.com"
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "تست "
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(seperno.WithSpaceCombiner())
text := "تست تست"
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "تست تست"
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(seperno.WithOuterSpaceRemover())
text := " تست "
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "تست"
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(seperno.WithEndsWithEndOfLineChar())
text := "تست."
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "تست"
}
package main
import (
"fmt"
"github.com/snapp-incubator/seperno"
)
func main() {
normalizer := seperno.NewNormalize(
seperno.WithNormalizePunctuations(),
seperno.WithOuterSpaceRemover(),
)
text := "سلام,خوبی؟چه خبرا."
fmt.Println(normalizer.BasicNormalizer(text)) // Output: "سلام خوبی چه خبرا"
}
To validate functionality, run the included test suite:
go test ./...