Sunday, 7 September 2025

The Jungle of Text Encoding

Dealing with textual data on the Internet is like navigating a jungle. 

Without some normalisation, you need to get adept at handling multiple encodings.

System.Text is your partner here.

This holds the Encoding class, which has various useful properties.

  • Encoding.ASCII 
  • Encoding.Default (default encoding for current .NET implementation)
  • Encoding.Latin1 (Latin 1 character set, ISO-8859-1)
  • Encoding.Unicode (encoding for UTF16 in little endian byte order)
  • Encoding.UTF32 (little endian)
  • Encoding.UTF7 (obsolete)
  • Encoding.UTF8
Recall that Windows is little-endian by default, running primarily on x86 and x86-64 architectures which are little endian.  Even Windows on ARM uses little endian mode (ARM is bi-endian which means it can be used in little or big endian mode).

No comments: