port.tex 14.8 KB
 pknaggs committed Feb 29, 2012 1 2 % !TeX root = forth.tex % !TeX spell check = en_US  pknaggs committed Nov 06, 2012 3 \annex{Portability guide} % E. (informative annex)  pknaggs committed Aug 23, 2007 4 5 6 \label{annex:port} \section{Introduction} % E.1  pknaggs committed Oct 26, 2013 7 8 \label{port:intro}  pknaggs committed May 13, 2014 9 10 A primary goal of Forth 94 was to enable a programmer to write Forth programs that work on a wide variety of machines, Forth-\snapshot{}  pknaggs committed Nov 18, 2014 11 continues this practice. This goal is accomplished by allowing some  pknaggs committed May 13, 2014 12 13 14 15 16 17 key Forth terms to be implementation defined (e.g., cell size) and by providing Forth operators (words) that conceal the implementation. This allows the implementor to produce the Forth system that most effectively uses the native hardware. The machine independent operators, together with some programmer discipline, support program portability.  pknaggs committed Nov 06, 2012 18   pknaggs committed Oct 26, 2013 19 It can be difficult for someone familiar with only one machine  pknaggs committed Nov 06, 2012 20 architecture to imagine the problems caused by transporting programs  pknaggs committed Oct 26, 2013 21 22 between dissimilar machines. This Annex provides guidelines for writing portable Forth programs.  pknaggs committed Nov 06, 2012 23 The first section describes ways to make a program hardware independent.  pknaggs committed Oct 26, 2013 24   pknaggs committed Nov 06, 2012 25 26 27 The second section describes assumptions about Forth implementations that many programmers make, but can't be relied upon in a portable program.  pknaggs committed Aug 23, 2007 28 29 30 31 32 33  \section{Hardware peculiarities} % E.2 \label{port:hardware} \subsection{Data/memory abstraction} % E.2.1  pknaggs committed Oct 26, 2013 34 This standard gives definitions for data and memory that  pknaggs committed Aug 23, 2007 35 36 apply to a wide variety of computers. These definitions give us a way to talk about the common elements of data and memory while ignoring  pknaggs committed Oct 26, 2013 37 the details of specific hardware. Similarly, Forth programs that  pknaggs committed Aug 23, 2007 38 39 40 use data and memory in ways that conform to these definitions can also ignore hardware details. The following sections discuss the definitions and describe how to write programs that are independent  pknaggs committed Oct 26, 2013 41 of the data and memory peculiarities of different computers.  pknaggs committed Aug 23, 2007 42 43 44  \subsection{Definitions} % E.2.2  pknaggs committed Oct 26, 2013 45 46 47 Three terms defined by this standard are address unit, cell, and character.  pknaggs committed Nov 06, 2012 48 49 50 51 52 The address space of a Forth system is divided into an array of address units; an address unit is the smallest collection of bits that can be addressed. In other words, an address unit is the number of bits spanned by the addresses \emph{addr} and \emph{addr}+1. The most prevalent machines use 8-bit address units, but other  pknaggs committed Oct 26, 2013 53 address unit sizes exist.  pknaggs committed Nov 06, 2012 54   pknaggs committed Oct 26, 2013 55 In this standard, the size of a cell is an implementation-defined  pknaggs committed Nov 06, 2012 56 57 58 59 number of address units. Forth implemented on a 16-bit microprocessor could use a 16-bit cell and an implementation on a 32-bit machine could use a 32-bit cell. Less common cell sizes (e.g., 18-bit or 36-bit machines, etc.) could implement Forth systems with their native  pknaggs committed May 13, 2014 60 61 cell sizes. In all of these systems, Forth words such as \word{DUP} and \word{!} do the same things (duplicate the top cell on the stack  pknaggs committed Nov 06, 2012 62 and store the second cell into the address given by the first cell,  pknaggs committed Oct 26, 2013 63 respectively).  pknaggs committed Aug 23, 2007 64 65 66 67 68 69 70 71 72 73 74 75 76 77  Similarly, the definition of a character has been generalized to be an implementation-defined number of address units (but at least eight bits). This removes the need for a Forth implementor to provide 8-bit characters on processors where it is inappropriate. For example, on an 18-bit machine with a 9-bit address unit, a 9-bit character would be most convenient. Since, by definition, you can't address anything smaller than an address unit, a character must be at least as big as an address unit. This will result in big characters on machines with large address units. An example is a 16-bit cell addressed machine where a 16-bit character makes the most sense. \subsection{Addressing memory} % E.2.3  pknaggs committed Oct 26, 2013 78 One of the most common portability problems is the addressing of  pknaggs committed Aug 23, 2007 79 successive cells in memory. Given the memory address of a cell, how  pknaggs committed Nov 06, 2012 80 81 do you find the address of the next cell? On a byte-addressed machine  pknaggs committed Aug 23, 2007 82 83 with 32-bit cells the code to find the next cell would be \texttt{4 +}. The code would be \word{1+} on a cell-addressed processor and  pknaggs committed Nov 06, 2012 84 \texttt{16 +} on a bit-addressed processor with 16-bit cells.  pknaggs committed Oct 26, 2013 85 This standard provides a  pknaggs committed Nov 06, 2012 86 87 88 89 next-cell operator named \word{CELL+} that can be used in all of these cases. Given an address, \word{CELL+} adjusts the address by the size of a cell (measured in address units).  pknaggs committed Oct 26, 2013 90 91 A related problem is that of addressing an array of cells in an arbitrary order. This standard provides a portable scaling operator named \word{CELLS}.  pknaggs committed Nov 06, 2012 92 93 Given a number \emph{n}, \word{CELLS} returns the number of address units needed to hold \param{n} cells. Using \word{CELLS}, we can make  pknaggs committed Oct 26, 2013 94 a portable definition of an \texttt{ARRAY} defining word:  pknaggs committed Nov 06, 2012 95   pknaggs committed Aug 23, 2007 96 \begin{quote}\ttfamily  pknaggs committed Oct 26, 2013 97 98  \word{:} ARRAY \word{p} u -{}- ) \word{CREATE} ~ \word{CELLS} \word{ALLOT} \\ \hspace*{2em}\word{DOES} \word{p} u -{}- addr ) \word{SWAP} \word{CELLS} \word{+} \word{;}  pknaggs committed Aug 23, 2007 99 \end{quote}  pknaggs committed Nov 06, 2012 100   pknaggs committed Aug 23, 2007 101 There are also portability problems with addressing arrays of  pknaggs committed Nov 06, 2012 102 characters.  pknaggs committed Oct 26, 2013 103 104 In a byte-addressed machine, the size of a character equals the size of an address unit. Addresses of successive characters  pknaggs committed Nov 06, 2012 105 in memory can be found using \word{1+} and scaling indices into a character  pknaggs committed Oct 26, 2013 106 107 108 109 110 111 112 array is a no-op (i.e., \texttt{1 *}). However, there could be implementations where a character is larger than an address unit. The \word{CHAR+} and \word{CHARS} operators, analogous to \word{CELL+} and \word{CELLS} are available to allow maximum portability. This standard generalizes the definition of some Forth words that operate on regions of memory to use address units. One example is  pknaggs committed Nov 06, 2012 113 114 115 \word{ALLOT}. By prefixing \word{ALLOT} with the appropriate scaling operator (\word{CELLS}, \word{CHARS}, etc.), space for any desired data structure can be allocated (see definition of array above). For example:  pknaggs committed Aug 23, 2007 116 117 \begin{quote}\ttfamily \word{CREATE} ABUFFER 5 \word{CHARS} \word{ALLOT}  pknaggs committed Nov 06, 2012 118  \word{p} \textrm{allot 5 character buffer})  pknaggs committed Aug 23, 2007 119 \end{quote}  pknaggs committed Nov 06, 2012 120   pknaggs committed Aug 23, 2007 121 122 123  \subsection{Alignment problems} % E.2.4  pknaggs committed Oct 26, 2013 124 125 126 Some processors have restrictions on the addresses that can be used by memory access instructions. This standard does not require an implementor of a Forth to make alignment transparent; on the  pknaggs committed Nov 06, 2012 127 contrary, it requires (in Section \xref[3.3.3.1 Address alignment]{usage:aaddr}) that  pknaggs committed Oct 26, 2013 128 a standard Forth program assume that character and cell alignment may be  pknaggs committed Nov 06, 2012 129 required.  pknaggs committed Oct 26, 2013 130 One pitfall caused by alignment restrictions  pknaggs committed Feb 29, 2012 131 132 is in creating tables containing both characters and cells. When \word{,} (comma) or \word{C,} is used to initialize a table, data  pknaggs committed Oct 26, 2013 133 are stored at the data-space pointer. Consequently, it must be  pknaggs committed Feb 29, 2012 134 135 suitably aligned. For example, a non-portable table definition would be:  pknaggs committed Nov 06, 2012 136   pknaggs committed Aug 23, 2007 137 138 139 \begin{quote}\ttfamily \word{CREATE} ATABLE 1 \word{C,} X \word{,} 2 \word{C,} Y \word{,} \end{quote}  pknaggs committed Nov 06, 2012 140 141 142 143 144 145  On a machine that restricts memory fetches to aligned addresses, \word{CREATE} would leave the data space pointer at an aligned address. However, the first \word{C,} would leave the data space pointer at an unaligned address, and the subsequent \word{,} (comma) would violate the alignment restriction by storing \texttt{X} at an unaligned address.  pknaggs committed Oct 26, 2013 146 A portable way to create the table is:  pknaggs committed Nov 06, 2012 147   pknaggs committed Aug 23, 2007 148 149 150 151 \begin{quote}\ttfamily \word{CREATE} ATABLE 1 \word{C,} \word{ALIGN} X \word{,} 2 \word{C,} \word{ALIGN} Y \word{,} \end{quote}  pknaggs committed Nov 06, 2012 152   pknaggs committed Aug 23, 2007 153 154 155 156 \word{ALIGN} adjusts the data space pointer to the first aligned address greater than or equal to its current address. An aligned address is suitable for storing or fetching characters, cells, cell pairs, or double-cell numbers.  pknaggs committed Nov 06, 2012 157 %  pknaggs committed Aug 23, 2007 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 After initializing the table, we would also like to read values from the table. For example, assume we want to fetch the first cell, \texttt{X}, from the table. \texttt{ATABLE} \word{CHAR+} gives the address of the first thing after the character. However this may not be the address of \texttt{X} since we aligned the dictionary pointer between the \word{C,} and the \word{,}. The portable way to get the address of \texttt{X} is: \begin{quote}\ttfamily ATABLE \word{CHAR+} \word{ALIGNED} \end{quote} \word{ALIGNED} adjusts the address on top of the stack to the first aligned address greater than or equal to its current value. \section{Number representation} % E.3 \subsection{Big endian vs. little endian} % E.3.1  pknaggs committed Oct 26, 2013 175 \label{port:endian}  pknaggs committed Aug 23, 2007 176 177 178 179  The constituent bits of a number in memory are kept in different orders on different machines. Some machines place the most-significant part of a number at an address in memory with less-significant parts  pknaggs committed Oct 26, 2013 180 181 182 183 following it at higher addresses; this is known as big-endian ording. Other machines do the opposite; the least-significant part is stored at the lowest address (little-endian ordering).  pknaggs committed Nov 06, 2012 184   pknaggs committed Oct 26, 2013 185 186 For example, the following code for a 16-bit little endian Forth would produce the answer 1:  pknaggs committed Aug 23, 2007 187 188 \begin{quote}\ttfamily \word{VARIABLE} FOO  pknaggs committed Oct 26, 2013 189  \quad 1 FOO \word{!}  pknaggs committed Aug 23, 2007 190 191  \quad FOO \word{C@} \end{quote}  pknaggs committed Nov 06, 2012 192   pknaggs committed May 13, 2014 193 The same code on a 16-bit big-endian Forth would produce the  pknaggs committed Oct 26, 2013 194 answer 0. A portable program cannot exploit the representation  pknaggs committed Aug 23, 2007 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 of a number in memory. A related issue is the representation of cell pairs and double-cell numbers in memory. When a cell pair is moved from the stack to memory with \word{2!}, the cell that was on top of the stack is placed at the lower memory address. It is useful and reasonable to manipulate the individual cells when they are in memory. \subsection{ALU organization} % E.3.2 Different computers use different bit patterns to represent integers. Possibilities include binary representations (two's complement, one's complement, sign magnitude, etc.) and decimal representations (BCD, etc.). Each of these formats creates advantages and disadvantages in the design of a computer's arithmetic logic unit (ALU). The most commonly used representation, two's complement, is popular because of  pknaggs committed Oct 26, 2013 211 the simplicity of its addition and subtraction operations.  pknaggs committed Aug 23, 2007 212 213 214 215 216 217 218 219 220 221 222 223 224 225  Programmers who have grown up on two's complement machines tend to become intimate with their representation of numbers and take some properties of that representation for granted. For example, a trick to find the remainder of a number divided by a power of two is to mask off some bits with \word{AND}. A common application of this trick is to test a number for oddness using 1 \word{AND}. However, this will not work on a one's complement machine if the number is negative (a portable technique is 2 \word{MOD}). The remainder of this section is a (non-exhaustive) list of things to watch for when portability between machines with binary representations other than two's complement is desired.  pknaggs committed Oct 26, 2013 226 227 To convert a single-cell number to a double-cell number, Forth 94 provided the operator \word{StoD}. To convert a double-cell number to  pknaggs committed Aug 23, 2007 228 229 230 231 232 233 234 single-cell, Forth programmers have traditionally used \word{DROP}. However, this trick doesn't work on sign-magnitude machines. For portability a \word[double]{DtoS} operator is available. Converting an unsigned single-cell number to a double-cell number can be done portably by pushing a zero on the stack.  pknaggs committed May 13, 2014 235 \ifrelease\else  pknaggs committed Nov 06, 2012 236 237 238 239 \begin{editor} This would be a good place to add a discussion of characters and the extended character word set. \end{editor}  pknaggs committed May 13, 2014 240 \fi  pknaggs committed Nov 06, 2012 241   pknaggs committed Aug 23, 2007 242 243 244 \section{Forth system implementation} % E.4 During Forth's history, an amazing variety of implementation techniques  pknaggs committed Nov 06, 2012 245 have been developed. The ANS Forth Standard encourages this diversity  pknaggs committed Aug 23, 2007 246 and consequently restricts the assumptions a user can make about the  pknaggs committed Nov 06, 2012 247 underlying implementation of an ANS Forth system. Users of a particular  pknaggs committed Aug 23, 2007 248 249 250 251 252 253 254 255 256 Forth implementation frequently become accustomed to aspects of the implementation and assume they are common to all Forths. This section points out many of these incorrect assumptions. \subsection{Definitions} % E.4.1 Traditionally, Forth definitions have consisted of the name of the Forth word, a dictionary search link, data describing how to execute the definition, and parameters describing the definition itself. These  pknaggs committed Oct 26, 2013 257 258 components have historically been referred to as the name, link, code, and parameter fields.  pknaggs committed Nov 06, 2012 259 No method for accessing these fields has been found that works  pknaggs committed Aug 23, 2007 260 across all of the Forth implementations currently in use. Therefore,  pknaggs committed Oct 26, 2013 261 a portable Forth program may not use the name, link, or code field  pknaggs committed Aug 23, 2007 262 263 264 265 266 in any way. Use of the parameter field (renamed to data field for clarity) is limited to the operations described below. Only words defined with \word{CREATE} or with other defining words that call \word{CREATE} have data fields. The other defining words  pknaggs committed Oct 26, 2013 267 in the standard (\word{VARIABLE}, \word{CONSTANT}, \word{:}, etc.)  pknaggs committed Aug 23, 2007 268 269 270 might not be implemented with \word{CREATE}. Consequently, a Standard Program must assume that words defined by \word{VARIABLE}, \word{CONSTANT}, \word{:}, etc., may have no data fields. There is no  pknaggs committed Oct 26, 2013 271 272 273 274 275 276 277 portable way for a Standard Program to modify the value of a constant or to patch'' a colon definition at run time. The \word{DOES} part of a defining word operates on a data field, so \word{DOES} may only be used on words ultimately defined by \word{CREATE}. In standard Forth, \word{FIND}, \word{[']} and \word{'} (tick) return an unspecified entity called an execution token. There are only a  pknaggs committed Aug 23, 2007 278 279 280 few things that may be done with an execution token. The token may be passed to \word{EXECUTE} to execute the word ticked or compiled into the current definition with \word{COMPILE,}. The token can also be  pknaggs committed Oct 26, 2013 281 stored in a variable or other data structure and used later.  pknaggs committed Nov 06, 2012 282 283 Finally, if the word ticked was defined via \word{CREATE}, \word{toBODY} converts the execution token into the word's data-field address.  pknaggs committed Aug 23, 2007 284   pknaggs committed Nov 06, 2012 285 An execution token cannot be assumed to be an address and may not  pknaggs committed Oct 26, 2013 286 be used as one.  pknaggs committed Aug 23, 2007 287 288 289 290 291 292  \subsection{Stacks} % E.4.2 In some Forth implementations, it is possible to find the address of a stack in memory and manipulate the stack as an array of cells. This  pknaggs committed Oct 26, 2013 293 294 technique is not portable. On some systems, especially Forth-in-hardware systems, the stacks might be in memory  pknaggs committed Aug 23, 2007 295 296 297 298 299 300 301 that can't be addressed by the program or might not be in memory at all. Forth's parameter and return stacks must be treated as stacks. A Standard Program may use the return stack directly only for temporarily storing values. Every value examined or removed from the return stack using \word{R@}, \word{Rfrom}, or \word{2Rfrom} must have been put on the stack explicitly using \word{toR} or \word{2toR}. Even this  pknaggs committed Oct 26, 2013 302 must be done carefully because the system may use the return stack to  pknaggs committed Aug 23, 2007 303 hold return addresses and loop-control parameters. Section  pknaggs committed Oct 26, 2013 304 \xref[3.2.3.3 Return stack]{usage:returnstack} of the standard has a  pknaggs committed Aug 23, 2007 305 306 307 308 309 list of restrictions. \section{Summary} % E.6  pknaggs committed Oct 26, 2013 310 The Forth Standard does not force anyone to write  pknaggs committed Aug 23, 2007 311 a portable program. In situations where performance is paramount,  pknaggs committed Oct 26, 2013 312 the programmer is encouraged to use every trick available. On the  pknaggs committed Nov 06, 2012 313 other hand, if portability to a wide variety of systems is needed%  pknaggs committed Oct 26, 2013 314 315 (or anticipated), this standard provides the tools to accomplish this. There might be no such thing as a completely portable program. A programmer, using  pknaggs committed Aug 23, 2007 316 317 318 319 320 321 this guide, should intelligently weigh the tradeoffs of providing portability to specific machines. For example, machines that use sign-magnitude numbers are rare and probably don't deserve much thought. But, systems with different cell sizes will certainly be encountered and should be provided for. In general, making a program portable clarifies both the programmer's thinking process and the  pknaggs committed Oct 26, 2013 322 final program.