Discussion:
Character encoding of redirected output
(too old to reply)
Anton Shepelev
2022-08-26 12:32:03 UTC
Permalink
Hello, all.

Is there a way to affect the encoding of the log file
with the output of the `net' command:

net start d >> log.txt 2>&1

Invoking `chcp' has no effect, so that

chcp 1251
net start d >> log.txt 2>&1

still produces `log.txt' with the default Cyrillic en-
coding, 866.
--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]
JJ
2022-08-27 04:06:17 UTC
Permalink
Post by Anton Shepelev
Hello, all.
Is there a way to affect the encoding of the log file
net start d >> log.txt 2>&1
Invoking `chcp' has no effect, so that
chcp 1251
net start d >> log.txt 2>&1
still produces `log.txt' with the default Cyrillic en-
coding, 866.
Not possible using batch file alone.

You'll need additional tools to convert the text encoding.

e.g. GNU's recode which can be downloaded as part of a Unix utilities
bundle.

https://sourceforge.net/projects/unxutils/
Anton Shepelev
2022-08-27 14:26:05 UTC
Permalink
Post by JJ
Post by Anton Shepelev
Invoking `chcp' has no effect, so that
chcp 1251
net start d >> log.txt 2>&1
still produces `log.txt' with the default Cyrillic
encoding, 866.
Not possible using batch file alone.
Do you mean that codepage 866 is built into the pro-
gram, and the terminal emulator does not apply any
convesion?
Post by JJ
You'll need additional tools to convert the text en-
coding. e.g. GNU's recode which can be downloaded
as part of a Unix utilities bundle.
I have never heard of `recode' but am familliar with
`iconv' and can use it. Thank you.
--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]
JJ
2022-08-27 16:56:33 UTC
Permalink
Post by Anton Shepelev
Post by JJ
Post by Anton Shepelev
Invoking `chcp' has no effect, so that
chcp 1251
net start d >> log.txt 2>&1
still produces `log.txt' with the default Cyrillic
encoding, 866.
Not possible using batch file alone.
Do you mean that codepage 866 is built into the pro-
gram, and the terminal emulator does not apply any
convesion?
Oh, wait. That code _should_ work. Even if NET's output contains a character
which is not within code page 1251's character set, the character will be
converted to `?`. Everything else should be converted properly.

Can you post the actual text output (not screenshot) of that NET command?
Post by Anton Shepelev
I have never heard of `recode' but am familliar with
`iconv' and can use it. Thank you.
Anything which can convert between character sets, can be used. It'll be
required for manual conversion or as a fallback when something doesn't work.
Anton Shepelev
2022-08-27 20:12:43 UTC
Permalink
AS: Invoking `chcp' has no effect, so that

chcp 1251
net start d >> log.txt 2>&1

still produces `log.txt' with the default Cyrillic
encoding, 866.

JJ: Not possible using batch file alone.

AS: Do you mean that codepage 866 is built into the
program, and the terminal emulator does not apply
any convesion?

You'll need additional tools to convert the text
encoding. e.g. GNU's recode which can be down-
loaded as part of a Unix utilities bundle.

AS: I have never heard of `recode' but am familliar
with `iconv' and can use it. Thank you.

JJ: Oh, wait. That code _should_ work. Even if NET's
output contains a character which is not within
code page 1251's character set, the character will
be converted to `?`. Everything else should be
converted properly.

I never said it didn't work, all I said was that chcp
had no effect on the actual encoding of the resulting
file.
Post by JJ
Can you post the actual text output (not screenshot)
of that NET command?
As I said, the actual text output is encoded in CP866.
How shall I post it here and what meaining will it
have? The best thing I can do is to attach the actual
file. I have no idea how or why attachments work in
Usenet, but they do... Also for your consideration,
my test script is shown below:

8<-------------------- test.bat ----------------------
@echo off

chcp 866 > NUL
net start d 2> 866.txt

chcp 1251 > NUL
net start d 2> 1251.txt

ECHO N|comp 866.txt 1251.txt > NUL 2>&1
IF NOT ERRORLEVEL 1 (ECHO 866.txt == 1251.txt) ^
ELSE (ECHO 866.txt != 1251.txt)
Post by JJ
8-------------------- test.bat ----------------------
8<------------------ .bat output ---------------------
866.txt == 1251.txt
Post by JJ
8------------------ .bat output ---------------------
JJ
2022-08-29 04:47:19 UTC
Permalink
Post by Anton Shepelev
AS: Invoking `chcp' has no effect, so that
[snip]

I've tested it in a VM using Russian language pack and Russian system code
page. The CHCP is not effective in that case. A manual conversion is still
needed.
Anton Shepelev
2022-08-29 07:25:23 UTC
Permalink
Post by Anton Shepelev
chcp 1251
net start d >> log.txt 2>&1
I've tested it in a VM using Russian language pack
and Russian system code page. The CHCP is not ef-
fective in that case. A manual conversion is still
needed.
Huge thanks for taking the trouble to confirm it, JJ.
Have you an idea why CHCP may not work? What is it
supposed to do? Shall console programs query the ef-
fective encoding (as set by CHCP) and recode their
output themselves?
--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]
JJ
2022-08-30 06:14:51 UTC
Permalink
Post by Anton Shepelev
Huge thanks for taking the trouble to confirm it, JJ.
Have you an idea why CHCP may not work? What is it
supposed to do? Shall console programs query the ef-
fective encoding (as set by CHCP) and recode their
output themselves?
I think the main reason is because all programs are run using the system
code page, which is the global code page setting. IOTW, the system code page
is the default code page for all programs.

There's active code page which is a per-process code page setting. CMD's
`CHCP` command simply changes the active code page. But that code page
setting is not inherited to child processes. Unlike something like the
working directory and environment variables which are (by default) inherited
to child processes.

In your case, the system code page is 866. Even though CMD's code page has
been changed to 1251 using `CHCP`, the `NET` program is run using code page
866. The `NET` program can not know the destination code page (which is
1251), because the standard output handle is basically a temporary data
storage which is not aware of code page and data format (i.e. it's just a
dumb binary data storage), and the system treat data storage as non Unicode
text storage with unknown code page. Thus the system convert `NET` program's
Unicode text to non Unicode using its own code page 866. CMD doesn't and
can't know which code page the received data is supposed to be treated as.
CMD only know one code page: 1251. So, even if a conversion is applied, the
source and destination code page would be the same and the resulting data
would be unchanged.

Unfortunately, Windows doesn't provide a built in feature to specify which
code page a program should be run with. Microsoft *does* provide the feature
as a separate downloadable tool called Mcrosoft AppLocale back in Windows XP
era, but it is now discontinued and no longer unsupported. While it's still
usable in newer Windows versions, it requires installation. That's less
convenient than using e.g. `iconv` or `recode`.
Anton Shepelev
2022-08-30 23:05:34 UTC
Permalink
Have you an idea why CHCP may not work? What is it
supposed to do? Shall console programs query the
effective encoding (as set by CHCP) and recode
their output themselves?
I think the main reason is because all programs are
run using the system code page, which is the global
code page setting. IOTW, the system code page is the
default code page for all programs.
There's active code page which is a per-process code
page setting. CMD's `CHCP` command simply changes
the active code page. But that code page setting is
not inherited to child processes. Unlike something
like the working directory and environment variables
which are (by default) inherited to child processes.
[...]
Many thanks for the detailed and humane explanation,
JJ. Very rarely these days one is honoured with a co-
herent answer of several fluent paragraphs. The mod-
ern norm is a sloppy oneliner in a stinking chat or
social network. Usenet must be the last resort of
truly educated people.
--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]
Loading...