Detect any text file charset encoding using Mozilla Charset Detector (UDE.CSharp).
FileEncoding support almost all charset encodings (utf-8, utf-7, utf-32, ISO-8859-1, ...). It checks if the file has a BOM header, and if not FileEncoding will load and analize the file bytes and try to decide its charset encoding.
- Byte order mark (BOM) detection
- Analyse file content
- Comprehensive charset encodings detection
- Large files support
You can install using NuGet, see SimpleHelpers.FileEncoding at NuGet.org
PM> Install-Package SimpleHelpers.FileEncoding
The nuget package contains C# source code.
The source code will be installed in your project with the following file system structure:
|-- <project root>
|-- SimpleHelpers
|-- FileEncoding.cs
If you prefer, you can also download the source code: FileEncoding.cs
Compiled version of "C# port of Mozilla Universal Charset Detector"
This userful library can detect the charset encoding by analysing a byte array.
Tries to detect the file encoding by checking byte order mark (BOM) existence and then loading a part of the file and tries to detect the charset using UDE.CSharp
var encoding = FileEncoding.DetectFileEncoding ("./my_text_file.txt");
Tries to load file content with the correct encoding.
This is a shortcut that uses System.IO.File.ReadAllText
to load the file content, but first it detects the correct encoding.
If the file doesn't exist or it couldn't be loaded, the provided defaultValue
(second parameter) will be returned.
var content = FileEncoding.TryLoadFile ("./my_text_file.txt", "");
Detects the encoding of textual data of the specified input data
var det = new FileEncoding ();
using (var stream = new System.IO.FileStream (inputFilename, System.IO.FileMode.Open))
{
det.Detect (inputStream);
}
// Finalize detection phase and gets detected encoding name
var encoding = det.Complete ();
// check results
Console.WriteLine ("IsText = {0}", det.IsText);
Console.WriteLine ("HasByteOrderMark = {0}", det.HasByteOrderMark);
Console.WriteLine ("EncodingName = {0}", det.EncodingName);