From 9a1bd729d2f44943a8217cb23f5f7e031e1535a5 Mon Sep 17 00:00:00 2001 From: oehhar Date: Fri, 3 Nov 2023 20:19:51 +0000 Subject: Add encoding error chapters to the man apge of gets, read and puts. C library still pending. Review welcome --- doc/gets.n | 6 ++++++ doc/puts.n | 6 ++++++ doc/read.n | 15 +++++++++++++++ 3 files changed, 27 insertions(+) diff --git a/doc/gets.n b/doc/gets.n index 57532c0..0e4239f 100644 --- a/doc/gets.n +++ b/doc/gets.n @@ -47,6 +47,12 @@ produce the same results as if there were an input line consisting only of the end-of-line character(s). The \fBeof\fR and \fBfblocked\fR commands can be used to distinguish these three cases. +.SH "ENCODING ERRORS" +.PP +Encoding errors may exist, if the encoding profile \fBstrict\fR is used. +An encoding error is reported by the POSIX error code \fBEILSEQ\fR. +The file pointer is unchanged in the error case and an eventual returned +character count is \fB-1\fR. .SH "EXAMPLE" This example reads a file one line at a time and prints it out with the current line number attached to the start of each line. diff --git a/doc/puts.n b/doc/puts.n index 0e23c80..e8df581 100644 --- a/doc/puts.n +++ b/doc/puts.n @@ -62,6 +62,12 @@ To avoid wasting memory, nonblocking I/O should normally be used in an event-driven fashion with the \fBfileevent\fR command (do not invoke \fBputs\fR unless you have recently been notified via a file event that the channel is ready for more output data). +.SH "ENCODING ERRORS" +.PP +Encoding errors may exist, if the encoding profile \fBstrict\fR is used. +\fBputs\fR writes out data until an encoding error occurs and fails with +POSIX error code \fBEILSEQ\fR on a non encodable data. + .SH EXAMPLES .PP Write a short message to the console (or wherever \fBstdout\fR is diff --git a/doc/read.n b/doc/read.n index 9a9a7e8..2a865da 100644 --- a/doc/read.n +++ b/doc/read.n @@ -50,6 +50,21 @@ newline characters according to the \fB\-translation\fR option for the channel. See the \fBfconfigure\fR manual entry for a discussion on ways in which \fBfconfigure\fR will alter input. +.SH "ENCODING ERRORS" +.PP +Encoding errors may exist, if the encoding profile \fBstrict\fR is used. +An encoding error is reported by the POSIX error code \fBEILSEQ\fR. +.PP +In blocking mode, the error is directly thrown, even, if there is a +leading decodable data portion. +The file pointer is advanced just before the encoding error. +An eventual well decoded data chunk before the encoding error is lost. +It is proposed to return this portion within the additional key \fB-data\fR +in the error dictionary. +.PP +In non blocking mode, first, any data without encoding error is returned +(without error state). +In the next call, no data is returned and the \fBEILSEQ\fR error state is set. .SH "USE WITH SERIAL PORTS" '\" Note: this advice actually applies to many versions of Tcl .PP -- cgit v0.12 From 6e78bdfc14c74c8679b8bd149069d40d7c7a00a4 Mon Sep 17 00:00:00 2001 From: oehhar Date: Sat, 4 Nov 2023 12:43:29 +0000 Subject: Add read example --- doc/read.n | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/doc/read.n b/doc/read.n index 2a865da..66e6c30 100644 --- a/doc/read.n +++ b/doc/read.n @@ -53,6 +53,8 @@ which \fBfconfigure\fR will alter input. .SH "ENCODING ERRORS" .PP Encoding errors may exist, if the encoding profile \fBstrict\fR is used. +Encoding errors are special, as an eventual introspection or recovery is +possible by changing to an encoding which accepts the data. An encoding error is reported by the POSIX error code \fBEILSEQ\fR. .PP In blocking mode, the error is directly thrown, even, if there is a @@ -65,6 +67,28 @@ in the error dictionary. In non blocking mode, first, any data without encoding error is returned (without error state). In the next call, no data is returned and the \fBEILSEQ\fR error state is set. +.PP +Here is an example with an encoding error in UTF-8 encoding, which is then +introspected by a switch to the binary encoding. The test file contains a not +continued multi-byte sequence at position 1 (\fBA \\xC3 B\fR): +.CS +% set f [open test_A_195_B.txt r] +file35a65a0 +% fconfigure $f -encoding utf-8 -profile strict +% catch {read $f} e d +1 +% set d +-data A -code 1 -level 0 +-errorstack {INNER {invokeStk1 read file35a65a0}} +-errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} +-errorinfo {...} -errorline 1 +% tell $f +1 +% fconfigure $f -encoding binary -profile strict +% read $f +ÃB +.CE +ToDo: -data is TIP 653 and may be removed here or explained. .SH "USE WITH SERIAL PORTS" '\" Note: this advice actually applies to many versions of Tcl .PP -- cgit v0.12 From eb121d13b5b652da7400bbe6b5b58d06c992709e Mon Sep 17 00:00:00 2001 From: oehhar Date: Sun, 5 Nov 2023 17:56:27 +0000 Subject: Worked on examples for read and gets --- doc/gets.n | 33 +++++++++++++++++++++++++++++++-- doc/puts.n | 2 +- doc/read.n | 29 ++++++++++++++++++++++++++++- 3 files changed, 60 insertions(+), 4 deletions(-) diff --git a/doc/gets.n b/doc/gets.n index 0e4239f..59e00c7 100644 --- a/doc/gets.n +++ b/doc/gets.n @@ -50,9 +50,38 @@ these three cases. .SH "ENCODING ERRORS" .PP Encoding errors may exist, if the encoding profile \fBstrict\fR is used. +Encoding errors are special, as an eventual introspection or recovery is +possible by changing to an encoding which accepts the data. An encoding error is reported by the POSIX error code \fBEILSEQ\fR. -The file pointer is unchanged in the error case and an eventual returned -character count is \fB-1\fR. +The file pointer is unchanged in the error case. +.PP +Here is an example with an encoding error in UTF-8 encoding, which is then +introspected by a switch to the binary encoding. The test file contains a not +continued multi-byte sequence at position 1 (\fBA \\xC3 B\fR): +.PP +File creation for example +.CS +% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f +.CE +Encoding error example +.CS +% set f [open test_A_195_B.txt r] +file384b6a8 +% fconfigure $f -encoding utf-8 -profile strict +% catch {gets $f} e d +1 +% set d +-data A -code 1 -level 0 +-errorstack {INNER {invokeStk1 gets file384b6a8}} +-errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} +-errorinfo {...} -errorline 1 +% tell $f +0 +% fconfigure $f -encoding binary -profile strict +% gets $f +AÃB +.CE +ToDo: -data is TIP 653 and may be removed here or explained. .SH "EXAMPLE" This example reads a file one line at a time and prints it out with the current line number attached to the start of each line. diff --git a/doc/puts.n b/doc/puts.n index e8df581..0943f87 100644 --- a/doc/puts.n +++ b/doc/puts.n @@ -66,7 +66,7 @@ via a file event that the channel is ready for more output data). .PP Encoding errors may exist, if the encoding profile \fBstrict\fR is used. \fBputs\fR writes out data until an encoding error occurs and fails with -POSIX error code \fBEILSEQ\fR on a non encodable data. +POSIX error code \fBEILSEQ\fR. .SH EXAMPLES .PP diff --git a/doc/read.n b/doc/read.n index 66e6c30..4e93d58 100644 --- a/doc/read.n +++ b/doc/read.n @@ -71,10 +71,18 @@ In the next call, no data is returned and the \fBEILSEQ\fR error state is set. Here is an example with an encoding error in UTF-8 encoding, which is then introspected by a switch to the binary encoding. The test file contains a not continued multi-byte sequence at position 1 (\fBA \\xC3 B\fR): +.PP +File creation for examples +. +.CS +% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f +.CE +Blocking example +. .CS % set f [open test_A_195_B.txt r] file35a65a0 -% fconfigure $f -encoding utf-8 -profile strict +% fconfigure $f -encoding utf-8 -profile strict -blocking 1 % catch {read $f} e d 1 % set d @@ -87,6 +95,25 @@ file35a65a0 % fconfigure $f -encoding binary -profile strict % read $f ÃB +% close $f +.CE +Non blocking example +. +.CS +% set f [open test_A_195_B.txt r] +file35a65a0 +% fconfigure $f -encoding utf-8 -profile strict -blocking 1 +% read $f +A +% tell $f +1 +% catch {read $f} e d +1 +% set d +-data {} -code 1 -level 0 +-errorstack {INNER {invokeStk1 read file384b228}} +-errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} +-errorinfo {...} -errorline 1 .CE ToDo: -data is TIP 653 and may be removed here or explained. .SH "USE WITH SERIAL PORTS" -- cgit v0.12 From c6d4c4edeb9215994e40f1037aae1635e0887c70 Mon Sep 17 00:00:00 2001 From: oehhar Date: Sun, 5 Nov 2023 17:59:09 +0000 Subject: Mixed-up -blocking 0/1 --- doc/read.n | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/read.n b/doc/read.n index 4e93d58..c1291bf 100644 --- a/doc/read.n +++ b/doc/read.n @@ -102,7 +102,7 @@ Non blocking example .CS % set f [open test_A_195_B.txt r] file35a65a0 -% fconfigure $f -encoding utf-8 -profile strict -blocking 1 +% fconfigure $f -encoding utf-8 -profile strict -blocking 0 % read $f A % tell $f -- cgit v0.12 From b0bcd99f0853484343dec35867a3db0aeafd1bba Mon Sep 17 00:00:00 2001 From: oehhar Date: Mon, 6 Nov 2023 15:10:18 +0000 Subject: Refine read and gets documentation for encoding error case --- doc/gets.n | 8 +++++--- doc/read.n | 10 +++++----- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/doc/gets.n b/doc/gets.n index 59e00c7..29355a4 100644 --- a/doc/gets.n +++ b/doc/gets.n @@ -61,7 +61,7 @@ continued multi-byte sequence at position 1 (\fBA \\xC3 B\fR): .PP File creation for example .CS -% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f +% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\\xC3B; close $f .CE Encoding error example .CS @@ -71,7 +71,7 @@ file384b6a8 % catch {gets $f} e d 1 % set d --data A -code 1 -level 0 +-code 1 -level 0 -errorstack {INNER {invokeStk1 gets file384b6a8}} -errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} -errorinfo {...} -errorline 1 @@ -81,7 +81,9 @@ file384b6a8 % gets $f AÃB .CE -ToDo: -data is TIP 653 and may be removed here or explained. +Compared to \fBread\fR, any already decoded data is not consumed. +The file position is still at 0 and the recovery \fBgets\fR returns also the +already well decoded leading data. .SH "EXAMPLE" This example reads a file one line at a time and prints it out with the current line number attached to the start of each line. diff --git a/doc/read.n b/doc/read.n index c1291bf..2add683 100644 --- a/doc/read.n +++ b/doc/read.n @@ -54,7 +54,8 @@ which \fBfconfigure\fR will alter input. .PP Encoding errors may exist, if the encoding profile \fBstrict\fR is used. Encoding errors are special, as an eventual introspection or recovery is -possible by changing to an encoding which accepts the data. +possible by changing to an encoding (or encoding profile), which accepts +the data. An encoding error is reported by the POSIX error code \fBEILSEQ\fR. .PP In blocking mode, the error is directly thrown, even, if there is a @@ -75,7 +76,7 @@ continued multi-byte sequence at position 1 (\fBA \\xC3 B\fR): File creation for examples . .CS -% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\xC3B; close $f +% set f [open test_A_195_B.txt wb]; puts -nonewline $f A\\xC3B; close $f .CE Blocking example . @@ -86,7 +87,7 @@ file35a65a0 % catch {read $f} e d 1 % set d --data A -code 1 -level 0 +-code 1 -level 0 -errorstack {INNER {invokeStk1 read file35a65a0}} -errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} -errorinfo {...} -errorline 1 @@ -110,12 +111,11 @@ A % catch {read $f} e d 1 % set d --data {} -code 1 -level 0 +-code 1 -level 0 -errorstack {INNER {invokeStk1 read file384b228}} -errorcode {POSIX EILSEQ {invalid or incomplete multibyte or wide character}} -errorinfo {...} -errorline 1 .CE -ToDo: -data is TIP 653 and may be removed here or explained. .SH "USE WITH SERIAL PORTS" '\" Note: this advice actually applies to many versions of Tcl .PP -- cgit v0.12