Commit 610d5221 authored by Russ Cox's avatar Russ Cox

os: fix handling of Windows Unicode console input and ^Z

Go 1.5 worked with Unicode console input but not ^Z.
Go 1.6 did not work with Unicode console input but did handle one ^Z case.
Go 1.7 did not work with Unicode console input but did handle one ^Z case.

The intent of this CL is for Go 1.8 to work with Unicode console input
and also handle all ^Z cases.

Here's a simple test program for reading from the console.
It prints a "> " prompt, calls read, prints what it gets, and repeats.

	package main

	import (
	    "fmt"
	    "os"
	)

	func main() {
	    p := make([]byte, 100)
	    fmt.Printf("> ")
	    for {
	        n, err := os.Stdin.Read(p)
	        fmt.Printf("[%d %q %v]\n> ", n, p[:n], err)
	    }
	}

On Unix, typing a ^D produces a break in the input stream.
If the ^D is at the beginning of a line, then the 0 bytes returned
appear as an io.EOF:

	$ go run /tmp/x.go
	> hello
	[6 "hello\n" <nil>]
	> hello^D[5 "hello" <nil>]
	> ^D[0 "" EOF]
	> ^D[0 "" EOF]
	> hello^Dworld
	[5 "hello" <nil>]
	> [6 "world\n" <nil>]
	>

On Windows, the EOF character is ^Z, not ^D, and there has
been a long-standing problem that in Go programs, ^Z on Windows
does not behave in the expected way, namely like ^D on Unix.
Instead, the ^Z come through as literal ^Z characters:

	C:\>c:\go1.5.4\bin\go run x.go
	> ^Z
	[3 "\x1a\r\n" <nil>]
	> hello^Zworld
	[13 "hello\x1aworld\r\n" <nil>]
	>

CL 4310 attempted to fix this bug, then known as #6303,
by changing the use of ReadConsole to ReadFile.
This CL was released as part of Go 1.6 and did fix the case
of a ^Z by itself, but not as part of a larger input:

	C:\>c:\go1.6.3\bin\go run x.go
	> ^Z
	[0 "" EOF]
	> hello^Zworld
	[13 "hello\x1aworld\r\n" <nil>]
	>

So the fix was incomplete.
Worse, the fix broke Unicode console input.

ReadFile does not handle Unicode console input correctly.
To handle Unicode correctly, programs must use ReadConsole.
Early versions of Go used ReadFile to read the console,
leading to incorrect Unicode handling, which was filed as #4760
and fixed in CL 7312053, which switched to ReadConsole
and was released as part of Go 1.1 and still worked as of Go 1.5:

	C:\>c:\go1.5.4\bin\go run x.go
	> hello
	[7 "hello\r\n" <nil>]
	> hello world
	[16 "hello world\r\n" <nil>]
	>

But in Go 1.6:

	C:\>c:\go1.6.3\bin\go run x.go
	> hello
	[7 "hello\r\n" <nil>]
	> hello world
	[0 "" EOF]
	>

That is, changing back to ReadFile in Go 1.6 reintroduced #4760,
which has been refiled as #17097. (We have no automated test
for this because we don't know how to simulate console input
in a test: it appears that one must actually type at a keyboard
to use the real APIs. This CL at least adds a comment warning
not to reintroduce ReadFile again.)

CL 29493 attempted to fix #17097, but it was not a complete fix:
the hello world example above still fails, as does Shift-JIS input,
which was filed as #17939.

CL 29493 also broke ^Z handling, which was filed as #17427.

This CL attempts the never before successfully performed trick
of simultaneously fixing Unicode console input and ^Z handling.
It changes the console input to use ReadConsole again,
as in Go 1.5, which seemed to work for all known Unicode input.
Then it adds explicit handling of ^Z in the input stream.
(In the case where standard input is a redirected file, ^Z processing
should not happen, and it does not, because this code path is only
invoked when standard input is the console.)

With this CL:

	C:\>go run x.go
	> hello
	[7 "hello\r\n" <nil>]
	> hello world
	[16 "hello world\r\n" <nil>]
	> ^Z
	[0 "" EOF]
	> [2 "\r\n" <nil>]
	> hello^Zworld
	[5 "hello" <nil>]
	> [0 "" EOF]
	> [7 "world\r\n" <nil>]

This almost matches Unix:

	$ go run /tmp/x.go
	> hello
	[6 "hello\n" <nil>]
	> hello world
	[15 "hello world\n" <nil>]
	> ^D
	[0 "" EOF]
	> [1 "\n" <nil>]
	> hello^Dworld
	[5 "hello" <nil>]
	> [6 "world\n" <nil>]
	>

The difference is in the handling of hello^Dworld / hello^Zworld.
On Unix, hello^Dworld terminates the read of hello but does not
result in a zero-length read between reading hello and world.
This is dictated by the tty driver, not any special Go code.

On Windows, in this CL, hello^Zworld inserts a zero length read
result between hello and world, which is treated as an interior EOF.
This is implemented by the Go code in this CL, but it matches the
handling of ^Z on the console in other programs:

	C:\>copy con x.txt
	hello^Zworld
	        1 file(s) copied.

	C:\>type x.txt
	hello
	C:\>

A natural question is how to test all this. As noted above, we don't
know how to write automated tests using the actual Windows console.
CL 29493 introduced the idea of substituting a different syscall.ReadFile
implementation for testing; this CL continues that idea but substituting
for syscall.ReadConsole instead. To avoid the regression of putting
ReadFile back, this CL adds a comment warning against that.

Fixes #17427.
Fixes #17939.

Change-Id: Ibaabd0ceb2d7af501d44ac66d53f64aba3944142
Reviewed-on: https://go-review.googlesource.com/33451
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: default avatarQuentin Smith <quentin@golang.org>
Reviewed-by: default avatarAlex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
parent 8a2c34e4
......@@ -7,9 +7,7 @@ package os
// Export for testing.
var (
NewConsoleFile = newConsoleFile
GetCPP = &getCP
ReadFileP = &readFile
ResetGetConsoleCPAndReadFileFuncs = resetGetConsoleCPAndReadFileFuncs
FixLongPath = fixLongPath
NewConsoleFile = newConsoleFile
ReadConsoleFunc = &readConsole
)
......@@ -5,7 +5,6 @@
package os
import (
"errors"
"internal/syscall/windows"
"io"
"runtime"
......@@ -29,7 +28,9 @@ type file struct {
// only for console io
isConsole bool
lastbits []byte // first few bytes of the last incomplete rune in last write
readbuf []byte // last few bytes of the last read that did not fit in the user buffer
readuint16 []uint16 // buffer to hold uint16s obtained with ReadConsole
readbyte []byte // buffer to hold decoding of readuint16 from utf16 to utf8
readbyteOffset int // readbyte[readOffset:] is yet to be consumed with file.Read
}
// Fd returns the Windows handle referencing the open file.
......@@ -53,7 +54,6 @@ func newFile(h syscall.Handle, name string) *File {
func newConsoleFile(h syscall.Handle, name string) *File {
f := newFile(h, name)
f.isConsole = true
f.readbuf = make([]byte, 0, 4)
return f
}
......@@ -203,101 +203,79 @@ func (file *file) close() error {
return err
}
var (
// These variables are used for testing readConsole.
getCP = windows.GetConsoleCP
readFile = syscall.ReadFile
)
func resetGetConsoleCPAndReadFileFuncs() {
getCP = windows.GetConsoleCP
readFile = syscall.ReadFile
}
var readConsole = syscall.ReadConsole // changed for testing
// copyReadConsoleBuffer copies data stored in f.readbuf into buf.
// It adjusts f.readbuf accordingly and returns number of bytes copied.
func (f *File) copyReadConsoleBuffer(buf []byte) (n int, err error) {
n = copy(buf, f.readbuf)
newsize := copy(f.readbuf, f.readbuf[n:])
f.readbuf = f.readbuf[:newsize]
return n, nil
}
// readOneUTF16FromConsole reads single character from console,
// converts it into utf16 and return it to the caller.
func (f *File) readOneUTF16FromConsole() (uint16, error) {
var buf [1]byte
mbytes := make([]byte, 0, 4)
cp := getCP()
for {
var nmb uint32
err := readFile(f.fd, buf[:], &nmb, nil)
if err != nil {
return 0, err
}
if nmb == 0 {
continue
// readConsole reads utf16 characters from console File,
// encodes them into utf8 and stores them in buffer b.
// It returns the number of utf8 bytes read and an error, if any.
func (f *File) readConsole(b []byte) (n int, err error) {
if len(b) == 0 {
return 0, nil
}
mbytes = append(mbytes, buf[0])
// Convert from 8-bit console encoding to UTF16.
// MultiByteToWideChar defaults to Unicode NFC form, which is the expected one.
nwc, err := windows.MultiByteToWideChar(cp, windows.MB_ERR_INVALID_CHARS, &mbytes[0], int32(len(mbytes)), nil, 0)
if err != nil {
if err == windows.ERROR_NO_UNICODE_TRANSLATION {
continue
}
return 0, err
if f.readuint16 == nil {
// Note: syscall.ReadConsole fails for very large buffers.
// The limit is somewhere around (but not exactly) 16384.
// Stay well below.
f.readuint16 = make([]uint16, 0, 10000)
f.readbyte = make([]byte, 0, 4*cap(f.readuint16))
}
if nwc != 1 {
return 0, errors.New("MultiByteToWideChar returns " + itoa(int(nwc)) + " characters, but only 1 expected")
for f.readbyteOffset >= len(f.readbyte) {
n := cap(f.readuint16) - len(f.readuint16)
if n > len(b) {
n = len(b)
}
var wchars [1]uint16
nwc, err = windows.MultiByteToWideChar(cp, windows.MB_ERR_INVALID_CHARS, &mbytes[0], int32(len(mbytes)), &wchars[0], nwc)
var nw uint32
err := readConsole(f.fd, &f.readuint16[:len(f.readuint16)+1][len(f.readuint16)], uint32(n), &nw, nil)
if err != nil {
return 0, err
}
return wchars[0], nil
uint16s := f.readuint16[:len(f.readuint16)+int(nw)]
f.readuint16 = f.readuint16[:0]
buf := f.readbyte[:0]
for i := 0; i < len(uint16s); i++ {
r := rune(uint16s[i])
if utf16.IsSurrogate(r) {
if i+1 == len(uint16s) {
if nw > 0 {
// Save half surrogate pair for next time.
f.readuint16 = f.readuint16[:1]
f.readuint16[0] = uint16(r)
break
}
r = utf8.RuneError
} else {
r = utf16.DecodeRune(r, rune(uint16s[i+1]))
if r != utf8.RuneError {
i++
}
}
// readConsole reads utf16 characters from console File,
// encodes them into utf8 and stores them in buffer buf.
// It returns the number of utf8 bytes read and an error, if any.
func (f *File) readConsole(buf []byte) (n int, err error) {
if len(buf) == 0 {
return 0, nil
}
if len(f.readbuf) > 0 {
return f.copyReadConsoleBuffer(buf)
}
wchar, err := f.readOneUTF16FromConsole()
if err != nil {
return 0, err
n := utf8.EncodeRune(buf[len(buf):cap(buf)], r)
buf = buf[:len(buf)+n]
}
r := rune(wchar)
if utf16.IsSurrogate(r) {
wchar, err := f.readOneUTF16FromConsole()
if err != nil {
return 0, err
f.readbyte = buf
f.readbyteOffset = 0
if nw == 0 {
break
}
r = utf16.DecodeRune(r, rune(wchar))
}
if nr := utf8.RuneLen(r); nr > len(buf) {
start := len(f.readbuf)
for ; nr > 0; nr-- {
f.readbuf = append(f.readbuf, 0)
src := f.readbyte[f.readbyteOffset:]
var i int
for i = 0; i < len(src) && i < len(b); i++ {
x := src[i]
if x == 0x1A { // Ctrl-Z
if i == 0 {
f.readbyteOffset++
}
utf8.EncodeRune(f.readbuf[start:cap(f.readbuf)], r)
} else {
utf8.EncodeRune(buf, r)
buf = buf[nr:]
n += nr
break
}
if n > 0 {
return n, nil
b[i] = x
}
return f.copyReadConsoleBuffer(buf)
f.readbyteOffset += i
return i, nil
}
// read reads up to len(b) bytes from the File.
......
......@@ -5,19 +5,21 @@
package os_test
import (
"bytes"
"encoding/hex"
"fmt"
"internal/syscall/windows"
"internal/testenv"
"io"
"io/ioutil"
"os"
osexec "os/exec"
"path/filepath"
"reflect"
"runtime"
"sort"
"strings"
"syscall"
"testing"
"unicode/utf16"
"unsafe"
)
......@@ -641,88 +643,70 @@ func TestStatSymlinkLoop(t *testing.T) {
}
func TestReadStdin(t *testing.T) {
defer os.ResetGetConsoleCPAndReadFileFuncs()
old := *os.ReadConsoleFunc
defer func() {
*os.ReadConsoleFunc = old
}()
testConsole := os.NewConsoleFile(syscall.Stdin, "test")
var (
hiraganaA_CP932 = []byte{0x82, 0xa0}
hiraganaA_UTF8 = "\u3042"
var tests = []string{
"abc",
"äöü",
"\u3042",
"“hi”™",
"hello\x1aworld",
"\U0001F648\U0001F649\U0001F64A",
}
for _, consoleSize := range []int{1, 2, 3, 10, 16, 100, 1000} {
for _, readSize := range []int{1, 2, 3, 4, 5, 8, 10, 16, 20, 50, 100} {
for _, s := range tests {
t.Run(fmt.Sprintf("c%d/r%d/%s", consoleSize, readSize, s), func(t *testing.T) {
s16 := utf16.Encode([]rune(s))
*os.ReadConsoleFunc = func(h syscall.Handle, buf *uint16, toread uint32, read *uint32, inputControl *byte) error {
if inputControl != nil {
t.Fatalf("inputControl not nil")
}
n := int(toread)
if n > consoleSize {
n = consoleSize
}
n = copy((*[10000]uint16)(unsafe.Pointer(buf))[:n], s16)
s16 = s16[n:]
*read = uint32(n)
t.Logf("read %d -> %d", toread, *read)
return nil
}
tests = []struct {
cp uint32
input []byte
output string // always utf8
}{
{
cp: 437,
input: []byte("abc"),
output: "abc",
},
{
cp: 850,
input: []byte{0x84, 0x94, 0x81},
output: "äöü",
},
{
cp: 932,
input: hiraganaA_CP932,
output: hiraganaA_UTF8,
},
{
cp: 932,
input: bytes.Repeat(hiraganaA_CP932, 2),
output: strings.Repeat(hiraganaA_UTF8, 2),
},
{
cp: 932,
input: append(bytes.Repeat(hiraganaA_CP932, 3), '.'),
output: strings.Repeat(hiraganaA_UTF8, 3) + ".",
},
{
cp: 932,
input: append(append([]byte("hello"), hiraganaA_CP932...), []byte("world")...),
output: "hello" + hiraganaA_UTF8 + "world",
},
{
cp: 932,
input: append(append([]byte("hello"), bytes.Repeat(hiraganaA_CP932, 5)...), []byte("world")...),
output: "hello" + strings.Repeat(hiraganaA_UTF8, 5) + "world",
},
var all []string
var buf []byte
chunk := make([]byte, readSize)
for {
n, err := testConsole.Read(chunk)
buf = append(buf, chunk[:n]...)
if err == io.EOF {
all = append(all, string(buf))
if len(all) >= 5 {
break
}
)
for _, consoleReadBufSize := range []int{1, 2, 3, 4, 5, 8, 10, 16, 20, 50, 100} {
for _, readFileBufSize := range []int{1, 2, 3, 10, 16, 100, 1000} {
nextTest:
for ti, test := range tests {
input := bytes.NewBuffer(test.input)
*os.ReadFileP = func(h syscall.Handle, buf []byte, done *uint32, o *syscall.Overlapped) error {
if len(buf) > readFileBufSize {
buf = buf[:readFileBufSize]
}
n, err := input.Read(buf)
*done = uint32(n)
return err
buf = buf[:0]
} else if err != nil {
t.Fatalf("reading %q: error: %v", s, err)
}
*os.GetCPP = func() uint32 {
return test.cp
if len(buf) >= 2000 {
t.Fatalf("reading %q: stuck in loop: %q", s, buf)
}
var bigbuf []byte
for len(bigbuf) < len([]byte(test.output)) {
buf := make([]byte, consoleReadBufSize)
n, err := testConsole.Read(buf)
if err != nil {
t.Errorf("test=%d bufsizes=%d,%d: read failed: %v", ti, consoleReadBufSize, readFileBufSize, err)
continue nextTest
}
bigbuf = append(bigbuf, buf[:n]...)
want := strings.Split(s, "\x1a")
for len(want) < 5 {
want = append(want, "")
}
have := hex.Dump(bigbuf)
expected := hex.Dump([]byte(test.output))
if have != expected {
t.Errorf("test=%d bufsizes=%d,%d: %q expected, but %q received", ti, consoleReadBufSize, readFileBufSize, expected, have)
continue nextTest
if !reflect.DeepEqual(all, want) {
t.Errorf("reading %q:\nhave %x\nwant %x", s, all, want)
}
})
}
}
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment