Dup file output is in "Little-endian UTF-16 Unicode text" format - why?

tbessie · 19 February 2018 05:29

Hello all...

I saved the "duplicate file" output to a file, and was trying to use Cygwin text-processing tools on it, but it wasn't working.

Looking into it, it looks like CCleaner outputs "Little-endian UTF-16 Unicode text" format text, and not regular old ASCII or UTF-8, which is what most apps I know use, even in Windows environments.

Can someone tell me why CCleaner uses this text flavor? And how to change it to use UTF-8 instead?

- Tim

mta · 19 February 2018 07:47

as to why, no idea.

as to converting, Notepad++, under Encoding, has a Convert to UTF-8 option.

tbessie · 19 February 2018 19:04

Yeah, I used the cygwin utility 'iconv' to convert it. But it would be nice if I didn't have to do that.

Who would I ask about why, if not here? File a support ticket?

- Tim

mta · 19 February 2018 21:03

you can only try and see if they respond.

Augeas · 19 February 2018 23:20

Regular old ASCII only suports Latin characters, and half the world uses some other script. I'm no expert, but to quote Wikip ' UTF-16 is used for text in the OS API in Microsoft Windows 2000 onwards', and ' UTF-16 is the native internal representation of text in the Microsoft Windows NT', which is the same thing I guess. So UTF-16 is what Windows uses.

The duplicate file txt output has a byte order marker of FF FE indicating that it is little endian. Wikip again '... the application is expected to figure out what encoding to use when reading text data.'

The duplicate file txt output is openable by Notepad, which reads the byte order marker amd interprets accordingly. I don't know why Cygwin can't do the same. I think that the bug lies with an application with a name beginning with one C.

Dup file output is in "Little-endian UTF-16 Unicode text" format - why?

Announcements