Jump to content
CCleaner Community Forums

Dup file output is in "Little-endian UTF-16 Unicode text" format - why?


Recommended Posts

Hello all...

I saved the "duplicate file" output to a file, and was trying to use Cygwin text-processing tools on it, but it wasn't working.

Looking into it, it looks like CCleaner outputs "Little-endian UTF-16 Unicode text" format text, and not regular old ASCII or UTF-8, which is what most apps I know use, even in Windows environments.

Can someone tell me why CCleaner uses this text flavor? And how to change it to use UTF-8 instead?

- Tim

Link to post
Share on other sites
  • Moderators

Regular old ASCII only suports Latin characters, and half the world uses some other script. I'm no expert, but to quote Wikip ' UTF-16 is used for text in the OS API in Microsoft Windows 2000 onwards', and ' UTF-16 is the native internal representation of text in the Microsoft Windows NT', which is the same thing I guess. So UTF-16 is what Windows uses.

The duplicate file txt output has a byte order marker of FF FE indicating that it is little endian. Wikip again '... the application is expected to figure out what encoding to use when reading text data.'

The duplicate file txt output is openable by Notepad, which reads the byte order marker amd interprets accordingly. I don't know why Cygwin can't do the same. I think that the bug lies with an application with a name beginning with one C.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...