Skip to content

Powershell script to demonstrate encodings and translation, with sample input files.

Notifications You must be signed in to change notification settings

vlad-nestorov/encodings_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

encodings_demo

Powershell script to demonstrate encodings and translation, with sample input files.

This shows how to correctly translate one encoding to another. It also demonstrates how interpreting different encodings affects the result.

Resources

Usage

Generate output

  1. Place .txt files in the Samples folder
  2. Run EncodingDemo.ps1

Reading the output

file the sample file name.
encoding What encoding was the file interpreted with
value The file's contents as interpreted with the specified encoding
== Operator - comparison with == operator returns true
String.Equals - comparison with String.Equals function returns true

Notes:

  • To figure out the exact contents of the file, look at the Bytes encoding. This is a HEX representation.
  • A single Unicode codepoint can be stored in multiple bytes. To avoid ambiguity between little-endian and big-endian systems, the Unicode standard has a byte order mark character (BOM).
  • A UTF8 + BOM file read as a string in .NET will include the BOM character at the beginning of the string. Using the string comparison operator will ignore BOM, however string.Equals compares the pointers to the objects instead.

Included samples

All sample files contain a single character: Á (U+00C1). Up to U+007F Windows-1252 and UTF8 encodings will produce the same result. Because it is larger than U+007F, U+00C1 gets encoded in 2 bytes by UTF8, and only one byte by Windows-1252.

file encoding
ASCII.txt Windows-1252
UTF8.txt utf-8
UTF8-BOM.txt utf-8

Sample output

file encoding value == Operator String.Equals
ASCII.txt Bytes C1
ASCII.txt us-ascii ?
ASCII.txt utf-8
ASCII.txt Windows-1252 Á UTF8-BOM.txt(utf-8), UTF8.txt(utf-8) UTF8.txt(utf-8)
UTF8.txt Bytes C3-81
UTF8.txt us-ascii ??
UTF8.txt utf-8 Á ASCII.txt(Windows-1252), UTF8-BOM.txt(utf-8) ASCII.txt(Windows-1252)
UTF8.txt Windows-1252 �
UTF8-BOM.txt Bytes EF-BB-BF-C3-81
UTF8-BOM.txt us-ascii ?????
UTF8-BOM.txt utf-8 Á ASCII.txt(Windows-1252), UTF8.txt(utf-8)}
UTF8-BOM.txt Windows-1252 �

About

Powershell script to demonstrate encodings and translation, with sample input files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published