Read a file character by character/UTF8

From Rosetta Code
Read a file character by character/UTF8 is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Read a file one character at a time, as opposed to reading the entire file at once.

The solution may be implemented as a procedure, which returns the next character in the file on each consecutive call (returning EOF when the end of the file is reached).

The procedure should support the reading of files containing UTF8 encoded wide characters, returning whole characters for each consecutive read.

See also

Run BASIC

<lang runbasic>open file.txt" for binary as #f numChars = 1 ' specify number of characters to read a$ = input$(#f,numChars) ' read number of characters specified b$ = input$(#f,1) ' read one character close #f</lang>

Perl 6

Perl 6 has a built in method .getc to get a single character from an open file handle. File handles default to UTF-8, so they will handle multi-byte characters correctly.

To read a single character at a time from the Standard Input terminal; $*IN in Perl 6: <lang perl6>.say while defined $_ = $*IN.getc;</lang>

Or, from a file: <lang perl6>my $filename = 'whatever';

my $in = open( $filename, :r ) or die "$!\n";

print $_ while defined $_ = $in.getc;</lang>

Tcl

To read a single character from a file, use: <lang tcl>set ch [read $channel 1]</lang> This will read multiple bytes sufficient to obtain a Unicode character if a suitable encoding has been configured on the channel. For binary channels, this will always consume exactly one byte. However, the low-level channel buffering logic may consume more than one byte (which only really matters where the channel is being handed on to another process and the channel is over a file descriptor that doesn't support the lseek OS call); the extent of buffering can be controlled via: <lang tcl>fconfigure $channel -buffersize $byteCount</lang> When the channel is only being accessed from Tcl (or via Tcl's C API) it is not normally necessary to adjust this option.