port.tex 14.8 KB
Newer Older
pknaggs's avatar
pknaggs committed
1 2
% !TeX root = forth.tex
% !TeX spell check = en_US
3
\annex{Portability guide} % E.  (informative annex)
4 5 6
\label{annex:port}

\section{Introduction} % E.1
pknaggs's avatar
pknaggs committed
7 8
\label{port:intro}

9 10
A primary goal of Forth 94 was to enable a programmer to write Forth
programs that work on a wide variety of machines, Forth-\snapshot{}
pknaggs's avatar
pknaggs committed
11
continues this practice.  This goal is accomplished by allowing some
12 13 14 15 16 17
key Forth terms to be implementation defined (e.g., cell size) and
by providing Forth operators (words) that conceal the implementation.
This allows the implementor to produce the Forth system that most
effectively uses the native hardware. The machine independent
operators, together with some programmer discipline, support program
portability.
18

pknaggs's avatar
pknaggs committed
19
It can be difficult for someone familiar with only one machine
20
architecture to imagine the problems caused by transporting programs
pknaggs's avatar
pknaggs committed
21 22
between dissimilar machines.
This Annex provides guidelines for writing portable Forth programs.
23
The first section describes ways to make a program hardware independent.
pknaggs's avatar
pknaggs committed
24

25 26 27
The second section describes assumptions about Forth implementations
that many programmers make, but can't be relied upon in a portable program.

28 29 30 31 32 33

\section{Hardware peculiarities} % E.2
\label{port:hardware}

\subsection{Data/memory abstraction} % E.2.1

pknaggs's avatar
pknaggs committed
34
This standard gives definitions for data and memory that
35 36
apply to a wide variety of computers. These definitions give us a way
to talk about the common elements of data and memory while ignoring
pknaggs's avatar
pknaggs committed
37
the details of specific hardware. Similarly, Forth programs that
38 39 40
use data and memory in ways that conform to these definitions can
also ignore hardware details. The following sections discuss the
definitions and describe how to write programs that are independent
pknaggs's avatar
pknaggs committed
41
of the data and memory peculiarities of different computers.
42 43 44

\subsection{Definitions} % E.2.2

pknaggs's avatar
pknaggs committed
45 46 47
Three terms defined by this standard are address unit, cell, and
character.

48 49 50 51 52
The address space of a Forth system is divided into an array of
address units; an address unit is the smallest collection of bits that
can be addressed. In other words, an address unit is the number of
bits spanned by the addresses \emph{addr} and \emph{addr}+1.
The most prevalent machines use 8-bit address units, but other
pknaggs's avatar
pknaggs committed
53
address unit sizes exist.
54

pknaggs's avatar
pknaggs committed
55
In this standard, the size of a cell is an implementation-defined
56 57 58 59
number of address units. Forth implemented on a 16-bit microprocessor
could use a 16-bit cell and an implementation on a 32-bit machine
could use a 32-bit cell. Less common cell sizes (e.g., 18-bit or
36-bit machines, etc.) could implement Forth systems with their native
60 61
cell sizes. In all of these systems, Forth words such as \word{DUP}
and \word{!} do the same things (duplicate the top cell on the stack
62
and store the second cell into the address given by the first cell,
pknaggs's avatar
pknaggs committed
63
respectively).
64 65 66 67 68 69 70 71 72 73 74 75 76 77

Similarly, the definition of a character has been generalized to be
an implementation-defined number of address units (but at least eight
bits). This removes the need for a Forth implementor to provide 8-bit
characters on processors where it is inappropriate. For example, on
an 18-bit machine with a 9-bit address unit, a 9-bit character would
be most convenient. Since, by definition, you can't address anything
smaller than an address unit, a character must be at least as big as
an address unit. This will result in big characters on machines with
large address units. An example is a 16-bit cell addressed machine
where a 16-bit character makes the most sense.

\subsection{Addressing memory} % E.2.3

pknaggs's avatar
pknaggs committed
78
One of the most common portability problems is the addressing of 
79
successive cells in memory. Given the memory address of a cell, how
80 81
do you find the address of the next cell? 
On a byte-addressed machine
82 83
with 32-bit cells the code to find the next cell would be \texttt{4 +}.
The code would be \word{1+} on a cell-addressed processor and
84
\texttt{16 +} on a bit-addressed processor with 16-bit cells.
pknaggs's avatar
pknaggs committed
85
This standard provides a
86 87 88 89
next-cell operator named \word{CELL+} that can be used in all of these cases.
Given an address, \word{CELL+} adjusts the address by the size of a cell
(measured in address units).

pknaggs's avatar
pknaggs committed
90 91
A related problem is that of addressing an array of cells in an
arbitrary order. This standard provides a portable scaling operator named \word{CELLS}.
92 93
Given a number \emph{n}, \word{CELLS} returns the number of address
units needed to hold \param{n} cells.   Using \word{CELLS}, we can make
pknaggs's avatar
pknaggs committed
94
a portable definition of an \texttt{ARRAY} defining word:
95

96
\begin{quote}\ttfamily
pknaggs's avatar
pknaggs committed
97 98
	\word{:} ARRAY \word{p} u -{}- ) \word{CREATE} ~ \word{CELLS} \word{ALLOT} \\
	\hspace*{2em}\word{DOES} \word{p} u -{}- addr ) \word{SWAP} \word{CELLS} \word{+} \word{;}
99
\end{quote}
100

101
There are also portability problems with addressing arrays of
102
characters. 
pknaggs's avatar
pknaggs committed
103 104
In a byte-addressed machine, the size of a character equals the
size of an address unit.  Addresses of successive characters
105
in memory can be found using \word{1+} and scaling indices into a character
pknaggs's avatar
pknaggs committed
106 107 108 109 110 111 112
array is a no-op (i.e., \texttt{1 *}).  However, there could be
implementations where a character is larger than an address unit.
The \word{CHAR+} and \word{CHARS} operators, analogous to
\word{CELL+} and \word{CELLS} are available to allow maximum portability.

This standard generalizes the definition of some Forth words that operate
on regions of memory to use address units. One example is
113 114 115
\word{ALLOT}.  By prefixing \word{ALLOT} with the appropriate scaling operator
(\word{CELLS}, \word{CHARS}, etc.), space for any desired data structure can
be allocated (see definition of array above). For example:
116 117
\begin{quote}\ttfamily
	\word{CREATE} ABUFFER 5 \word{CHARS} \word{ALLOT}
118
	\word{p} \textrm{allot 5 character buffer})
119
\end{quote}
120

121 122 123

\subsection{Alignment problems} % E.2.4

pknaggs's avatar
pknaggs committed
124 125 126
Some processors have restrictions on the addresses that can be used by
memory access instructions. This standard does not require an
implementor of a Forth to make alignment transparent; on the
127
contrary, it requires (in Section \xref[3.3.3.1 Address alignment]{usage:aaddr}) that
pknaggs's avatar
pknaggs committed
128
a standard Forth program assume that character and cell alignment may be
129
required.
pknaggs's avatar
pknaggs committed
130
One pitfall caused by alignment restrictions
pknaggs's avatar
pknaggs committed
131 132
is in creating tables containing both characters and cells. When
\word{,} (comma) or \word{C,} is used to initialize a table, data
pknaggs's avatar
pknaggs committed
133
are stored at the data-space pointer. Consequently, it must be
pknaggs's avatar
pknaggs committed
134 135
suitably aligned. For example, a non-portable table definition
would be:
136

137 138 139
\begin{quote}\ttfamily
	\word{CREATE} ATABLE 1 \word{C,} X \word{,} 2 \word{C,} Y \word{,}
\end{quote}
140 141 142 143 144 145

On a machine that restricts memory fetches to aligned addresses,
\word{CREATE} would leave the data space pointer at an aligned address.
However, the first \word{C,} would leave the data space pointer at an
unaligned address,  and the subsequent \word{,} (comma) would violate
the alignment restriction by storing \texttt{X} at an unaligned address.
pknaggs's avatar
pknaggs committed
146
A portable way to create the table is:
147

148 149 150 151
\begin{quote}\ttfamily
	\word{CREATE} ATABLE 1 \word{C,}
		\word{ALIGN} X \word{,} 2 \word{C,} \word{ALIGN} Y \word{,}
\end{quote}
152

153 154 155 156
\word{ALIGN} adjusts the data space pointer to the first aligned
address greater than or equal to its current address. An aligned
address is suitable for storing or fetching characters, cells, cell
pairs, or double-cell numbers.
157
%
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174
After initializing the table, we would also like to read values from
the table. For example, assume we want to fetch the first cell,
\texttt{X}, from the table. \texttt{ATABLE} \word{CHAR+} gives the
address of the first thing after the character. However this may not
be the address of \texttt{X} since we aligned the dictionary pointer
between the \word{C,} and the \word{,}. The portable way to get the
address of \texttt{X} is:
\begin{quote}\ttfamily
	ATABLE \word{CHAR+} \word{ALIGNED}
\end{quote}
\word{ALIGNED} adjusts the address on top of the stack to the first
aligned address greater than or equal to its current value.


\section{Number representation} % E.3

\subsection{Big endian vs. little endian} % E.3.1
pknaggs's avatar
pknaggs committed
175
\label{port:endian}
176 177 178 179

The constituent bits of a number in memory are kept in different
orders on different machines. Some machines place the most-significant
part of a number at an address in memory with less-significant parts
pknaggs's avatar
pknaggs committed
180 181 182 183
following it at higher addresses; this is known as big-endian
ording. Other machines do the opposite; the
least-significant part is stored at the lowest address (little-endian
ordering).
184

pknaggs's avatar
pknaggs committed
185 186
For example, the following code for a 16-bit little endian Forth
would produce the answer 1:
187 188
\begin{quote}\ttfamily
	\word{VARIABLE} FOO
pknaggs's avatar
pknaggs committed
189
	\quad 1 FOO \word{!}
190 191
	\quad FOO \word{C@}
\end{quote}
192

193
The same code on a 16-bit big-endian Forth would produce the
pknaggs's avatar
pknaggs committed
194
answer 0. A portable program cannot exploit the representation
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
of a number in memory.

A related issue is the representation of cell pairs and double-cell
numbers in memory. When a cell pair is moved from the stack to memory
with \word{2!}, the cell that was on top of the stack is placed at the
lower memory address. It is useful and reasonable to manipulate the
individual cells when they are in memory.

\subsection{ALU organization} % E.3.2

Different computers use different bit patterns to represent integers.
Possibilities include binary representations (two's complement, one's
complement, sign magnitude, etc.) and decimal representations (BCD,
etc.). Each of these formats creates advantages and disadvantages in
the design of a computer's arithmetic logic unit (ALU). The most
commonly used representation, two's complement, is popular because of
pknaggs's avatar
pknaggs committed
211
the simplicity of its addition and subtraction operations.
212 213 214 215 216 217 218 219 220 221 222 223 224 225

Programmers who have grown up on two's complement machines tend to
become intimate with their representation of numbers and take some
properties of that representation for granted. For example, a trick
to find the remainder of a number divided by a power of two is to mask
off some bits with \word{AND}. A common application of this trick is
to test a number for oddness using 1 \word{AND}. However, this will
not work on a one's complement machine if the number is negative (a
portable technique is 2 \word{MOD}).

The remainder of this section is a (non-exhaustive) list of things to
watch for when portability between machines with binary representations
other than two's complement is desired.

pknaggs's avatar
pknaggs committed
226 227
To convert a single-cell number to a double-cell number, Forth 94
provided the operator \word{StoD}. To convert a double-cell number to
228 229 230 231 232 233 234
single-cell, Forth programmers have traditionally used \word{DROP}.
However, this trick doesn't work on sign-magnitude machines. For
portability a \word[double]{DtoS} operator is available. Converting an
unsigned single-cell number to a double-cell number can be done portably
by pushing a zero on the stack.


235
\ifrelease\else
236 237 238 239
\begin{editor}
This would be a good place to add a discussion
of characters and the extended character word set.
\end{editor}
240
\fi
241

242 243 244
\section{Forth system implementation} % E.4

During Forth's history, an amazing variety of implementation techniques
245
have been developed. The ANS Forth Standard encourages this diversity
246
and consequently restricts the assumptions a user can make about the
247
underlying implementation of an ANS Forth system. Users of a particular
248 249 250 251 252 253 254 255 256
Forth implementation frequently become accustomed to aspects of the
implementation and assume they are common to all Forths. This section
points out many of these incorrect assumptions.

\subsection{Definitions} % E.4.1

Traditionally, Forth definitions have consisted of the name of the
Forth word, a dictionary search link, data describing how to execute
the definition, and parameters describing the definition itself. These
pknaggs's avatar
pknaggs committed
257 258
components have historically been referred to as the name, link, code,
and parameter fields.
259
No method for accessing these fields has been found that works
260
across all of the Forth implementations currently in use. Therefore,
pknaggs's avatar
pknaggs committed
261
a portable Forth program may not use the name, link, or code field
262 263 264 265 266
in any way. Use of the parameter field (renamed to data field for
clarity) is limited to the operations described below.

Only words defined with \word{CREATE} or with other defining words
that call \word{CREATE} have data fields. The other defining words
pknaggs's avatar
pknaggs committed
267
in the standard (\word{VARIABLE}, \word{CONSTANT}, \word{:}, etc.)
268 269 270
might not be implemented with \word{CREATE}. Consequently, a Standard
Program must assume that words defined by \word{VARIABLE},
\word{CONSTANT}, \word{:}, etc., may have no data fields. There is no
pknaggs's avatar
pknaggs committed
271 272 273 274 275 276 277
portable way for a Standard Program to modify the value of a constant or to
``patch'' a colon definition at run time.
The \word{DOES} part of a defining word operates on a data field,
so \word{DOES} may only be used on words ultimately defined by \word{CREATE}.

In standard Forth, \word{FIND}, \word{[']} and \word{'} (tick) return an
unspecified entity called an execution token. There are only a
278 279 280
few things that may be done with an execution token. The token may be
passed to \word{EXECUTE} to execute the word ticked or compiled into
the current definition with \word{COMPILE,}. The token can also be
pknaggs's avatar
pknaggs committed
281
stored in a variable or other data structure and used later.
282 283
Finally, if the word ticked was defined via \word{CREATE}, \word{toBODY}
converts the execution token into the word's data-field address.
284

285
An execution token cannot be assumed to be an address and may not
pknaggs's avatar
pknaggs committed
286
be used as one.
287 288 289 290 291 292


\subsection{Stacks} % E.4.2

In some Forth implementations, it is possible to find the address of
a stack in memory and manipulate the stack as an array of cells. This
pknaggs's avatar
pknaggs committed
293 294
technique is not portable. On some systems, especially
Forth-in-hardware systems, the stacks might be in memory
295 296 297 298 299 300 301
that can't be addressed by the program or might not be in memory at
all. Forth's parameter and return stacks must be treated as stacks.

A Standard Program may use the return stack directly only for
temporarily storing values. Every value examined or removed from the
return stack using \word{R@}, \word{Rfrom}, or \word{2Rfrom} must have been
put on the stack explicitly using \word{toR} or \word{2toR}. Even this
pknaggs's avatar
pknaggs committed
302
must be done carefully because the system may use the return stack to
303
hold return addresses and loop-control parameters. Section
pknaggs's avatar
pknaggs committed
304
\xref[3.2.3.3 Return stack]{usage:returnstack} of the standard has a
305 306 307 308 309
list of restrictions.


\section{Summary} % E.6

pknaggs's avatar
pknaggs committed
310
The Forth Standard does not force anyone to write
311
a portable program. In situations where performance is paramount,
pknaggs's avatar
pknaggs committed
312
the programmer is encouraged to use every trick available. On the
313
other hand, if portability to a wide variety of systems is needed%
pknaggs's avatar
pknaggs committed
314 315
(or anticipated), this standard provides the tools to accomplish this. There
might be no such thing as a completely portable program. A programmer, using
316 317 318 319 320 321
this guide, should intelligently weigh the tradeoffs of providing
portability to specific machines. For example, machines that use
sign-magnitude numbers are rare and probably don't deserve much
thought. But, systems with different cell sizes will certainly be
encountered and should be provided for. In general, making a program
portable clarifies both the programmer's thinking process and the
pknaggs's avatar
pknaggs committed
322
final program.