Skip to content

"Detect Language" is a program that can detect which language a given document is written in, implemented in C using the Levenshtein Distance algorithm. It also includes verbose output and unit testing for the algorithm.

Notifications You must be signed in to change notification settings

GameboyColor32/detect_language

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detect language

This program detects the language of a given document using the Levenshtein Distance algorithm. It is implemented in C for performance purposes and can be further optimized using GCC macros and shortcuts. The program excludes non-ASCII characters for now, which affects its precision.

Usage

To run the program, simply run the executable and provide the path to the file you want to analyze. For a more verbose output, use the make debug or make hard_d commands.

Implementation Details

The program generates a frequency table for each language it supports, based on the relative frequencies of letters in that language. Then, it calculates the Levenshtein Distance between the document and the frequency tables, using the lowest distance as the language detection result.

Unit tests were conducted on the Levenshtein Distance algorithm to ensure its accuracy and can be generated with the make test command.

Resources

About

"Detect Language" is a program that can detect which language a given document is written in, implemented in C using the Levenshtein Distance algorithm. It also includes verbose output and unit testing for the algorithm.

Topics

Resources

Stars

Watchers

Forks