Commit c68b54dc authored by pazsan's avatar pazsan

Added documentation for xchars

parent 96d4643d
......@@ -9202,7 +9202,55 @@ doc-broken-pipe-error
@node Xchars and Unicode, , Pipes, Other I/O
@subsection Xchars and Unicode
This chapter needs completion
ASCII is only appropriate for the English language. Most western
languages however fit somewhat into the Forth frame, since a byte is
sufficient to encode the few special characters in each (though not
always the same encoding can be used; latin-1 is most widely used,
though). For other languages, different char-sets have to be used,
several of them variable-width. Most prominent representant is
UTF-8. Let's call these extended characters xchars. The primitive
fixed-size characters stored as bytes are called pchars in this
The xchar words add a few data types:
@var{xc} is an extended char (xchar) on the stack. It occupies one cell,
and is a subset of unsigned cell. Note: UTF-8 can not store more that
31 bits; on 16 bit systems, only the UCS16 subset of the UTF-8
character set can be used.
@var{xc-addr} is the address of an xchar in memory. Alignment
requirements are the same as @var{c-addr}. The memory representation of an
xchar differs from the stack representation, and depends on the
encoding used. An xchar may use a variable number of pchars in memory.
@var{xc-addr} @var{u} is a buffer of xchars in memory, starting at
@var{xc-addr}, @var{u} pchars long.
@end itemize
There's a new environment query
@node OS command line arguments, Locals, Other I/O, Words
@section OS command line arguments
......@@ -23,19 +23,58 @@
\ utf-8 words with an appropriate setting of max-single-byte, but I
\ like to see how an 8bit setting without UTF-8 stuff looks like.
DEFER XEMIT ( xc -- )
DEFER XKEY ( -- xc )
DEFER XCHAR+ ( xc-addr1 -- xc-addr2 )
DEFER XCHAR- ( xc-addr1 -- xc-addr2 )
DEFER +X/STRING ( xc-addr1 u1 -- xc-addr2 u2 )
DEFER X\STRING- ( xc-addr1 u1 -- xc-addr1 u2 )
DEFER XC@ ( xc-addr -- xc )
DEFER XC!+? ( xc xc-addr1 u1 -- xc-addr2 u2 f ) \ f if operation succeeded
DEFER XC@+ ( xc-addr1 -- xc-addr2 xc )
DEFER XC-SIZE ( xc -- u ) \ size in cs
DEFER X-SIZE ( xc-addr u1 -- u2 ) \ size in cs
DEFER X-WIDTH ( addr u -- n ) \ size in fixed chars
DEFER -TRAILING-GARBAGE ( addr u1 -- addr u2 ) \ remove trailing incomplete xc
Defer xemit ( xc -- ) \ xchar-ext
\G Prints an xchar on the terminal.
Defer xkey ( -- xc ) \ xchar-ext
\G Reads an xchar from the terminal. This will discard all input
\G events up to the completion of the xchar.
Defer xchar+ ( xc-addr1 -- xc-addr2 ) \ xchar-ext
\G Adds the size of the xchar stored at @var{xc-addr1} to this address,
\G giving @var{xc-addr2}.
Defer xchar- ( xc-addr1 -- xc-addr2 ) \ xchar-ext
\G Goes backward from @var{xc_addr1} until it finds an xchar so that
\G the size of this xchar added to @var{xc_addr2} gives
\G @var{xc_addr1}.
Defer +x/string ( xc-addr1 u1 -- xc-addr2 u2 ) \ xchar plus-x-slash-string
\G Step forward by one xchar in the buffer defined by address
\G @var{xc-addr1}, size @var{u1} pchars. @var{xc-addr2} is the address
\G and u2 the size in pchars of the remaining buffer after stepping
\G over the first xchar in the buffer.
Defer x\string- ( xc-addr1 u1 -- xc-addr1 u2 ) \ xchar x-back-string-minus
\G Step backward by one xchar in the buffer defined by address
\G @var{xc-addr1} and size @var{u1} in pchars, starting at the end of
\G the buffer. @var{xc-addr1} is the address and @var{u2} the size in
\G pchars of the remaining buffer after stepping backward over the
\G last xchar in the buffer.
Defer xc@ ( xc-addr -- xc ) \ xchar-ext xc-fetch
\G Fetchs the xchar @var{xc} at @var{xc-addr1}.
Defer xc!+? ( xc xc-addr1 u1 -- xc-addr2 u2 f ) \ xchar-ext xc-store-plus-query
\G Stores the xchar @var{xc} into the buffer starting at address
\G @var{xc-addr1}, @var{u1} pchars large. @var{xc-addr2} points to the
\G first memory location after @var{xc}, @var{u2} is the remaining
\G size of the buffer. If the xchar @var{xc} did fit into the buffer,
\G @var{f} is true, otherwise @var{f} is false, and @var{xc-addr2}
\G @var{u2} equal @var{xc-addr1} @var{u1}. XC!+? is safe for buffer
\G overflows, and therefore preferred over XC!+.
Defer xc@+ ( xc-addr1 -- xc-addr2 xc ) \ xchar-ext xc-fetch-plus
\G Fetchs the xchar @var{xc} at @var{xc-addr1}. @var{xc-addr2} points
\G to the first memory location after @var{xc}.
Defer xc-size ( xc -- u ) \ xchar-ext
\G Computes the memory size of the xchar @var{xc} in pchars.
Defer x-size ( xc-addr u1 -- u2 ) \ xchar
\G Computes the memory size of the first xchar stored at @var{xc-addr}
\G in pchars.
Defer x-width ( xc-addr u -- n ) \ xchar-ext
\G @var{n} is the number of monospace ASCII pchars that take the same
\G space to display as the the xchar string starting at @var{xc-addr},
\G using @var{u} pchars; assuming a monospaced display font,
\G i.e. pchar width is always an integer multiple of the width of an
\G ASCII pchar.
Defer -trailing-garbage ( xc-addr u1 -- addr u2 ) \ xchar-ext
\G Examine the last XCHAR in the buffer @var{xc-addr} @var{u1}---if
\G the encoding is correct and it repesents a full pchar, @var{u2}
\G equals @var{u1}, otherwise, @var{u2} represents the string without
\G the last (garbled) xchar.
\ derived words, faster implementations are probably possible
......@@ -319,8 +319,13 @@ here wc-table - Constant #wc-table
IF set-encoding-utf-8 ELSE set-encoding-fixed-width THEN ;
environment-wordlist set-current
: xchar-encoding
max-single-byte $80 = IF s" UTF-8" ELSE s" ISO-LATIN-1" THEN ;
: xchar-encoding ( -- addr u ) \ xchar-ext
\G Returns a printable ASCII string that reperesents the encoding,
\G and use the preferred MIME name (if any) or the name in
\G @url{} like
\G ``ISO-LATIN-1'' or ``UTF-8'', with the exception of ``ASCII'', where
\G we prefer the alias ``ASCII''.
max-single-byte $80 = IF s" UTF-8" ELSE s" ISO-LATIN-1" THEN ;
forth definitions
:noname ( -- )
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment