TITLE: *n - A Machine-Independent List Processing Language AUTHOR(S): Richard A. Stone DEPARTMENT: 18PR02314 LOCATION: Western Electric, ERC, Princeton LOCAL REPORT NUMBER: CC 5269 CASE: 15006 DATE: May 3, 1973 VERSION: 6.0 ABSTRACT: *n is a highly efficient list processing language which will run on a variety of small and large computers. It can be interfaced with standard programming languages to add smaller, faster list processing subroutines to a program. *n is a version of *1 (decendant of L6). REFERENCES: CC4868 - User's Manual for ERC Version of *1 INDEXING TERMS: Computers, list processing, machine independence, S/360, DECsystem10, PDP-11, compilers, *1 BACKROUND --------- HISTORY ------- *n began with L6 (Low Level Linked List Language) by K. Knowlton at Bell Laboratories on the 7090. Carnegie Mellon University modified it to run as macros on the 360, and called it *1 (pronounced "star one"). The ERC of Westen Electric wrote a general compiler (called *n) that can be modified to produce code for any of several machine. Three examples are the compiler for the PDP11 (STAR11),the IBM 360 (STAR360), and the PDP10 (STAR10). The compiler is written in SNOBOL IV and will run on any machine that supports SNOBOL IV. VERSIONS -------- Different versions of *n use the designation m.i.d., where 'm' is the version of this manual, 'i' is the number of changes in the machine independent version since the last manual change, and 'd' is the number of changes in the particular machine dependent part since the last manual change. Hence, "5.1.2" says that the manual last changed for the 5th time, the machine independent code changed once since then and the machine dependent code twice. USING THE STARn COMPILER ----- --- ----- -------- COMPILING ON THE 360 -------------------- To compile a program: //jobname JOB (..... // EXEC STARn //STEPC.SYSIN DD * -*1 deck- /* To compile a program and punch out assembly source: //jobname JOB (..... // EXEC STARn,PUNCH=1 //STEPC.SYSIN DD * -*1 deck- /* NOTE: For the PDP11, use STAR11. For the 360, use STAR360. For the PDP10, use STAR10. For compile and link for the 360, use STAR360L. COMPILING ON THE PDP10 ---------------------- To compile a program: .R STARn *assembly-file,list-file=source-file/switch:setting .COMPILE assembly-file/LIST NOTE: The ",list-file" may be ommitted. The "source-file" may be files separated by commas, which will be concatenated Files may be devices such as "TTY:" or "LPT:" As many switches as desired may be set. Switches are as described under "Control Cards" NOTE: For the PDP11, use STAR11. For the 360, use STAR360. For the PDP10, use STAR10. NOTE: For the PDP11, use the extension ".P11" for the assembly For the PDP10, use the extension ".MAC" for the assembly DIFFERENCES BETWEEN VERSIONS OF *1 ----------- ------- -------- -- -- CARNEGIE-MELLON *1 FEATURES MISSING FROM *n ------------------------------------------- (1) Fields are restricted to the following sizes: single bits, whole bytes, half words, whole words. (2) In operations of the form (X,op,Y), X and Y shall determine fields of the same length. (NOTE: For convenience, because based fields cannot have a defined size, the size will be taken from the size of the right side with which it is first used) (3) The only form of subroutine call is CALLS. (4) Conversion Operations are not supported. (5) The tests (S) and (not S) are not supported. (6) Continuations are not permitted (7) The only single bit operations are: (B,=,0), (B,=,1), (B,<-,0), (B,<-,1), (B1,<-,B2). (8) Tests for sequences that yield the same location (=o and =/o) are not supported. *n FEATURES NOT SUPPORTED BY STAR11 ----------------------------------- (1) Shift operations of more than 1 bit or a variable number of bits is not supported (except on the 11/40 or above). *n FEATURES NOT SUPPORTED BY STAR360 ------------------------------------ none *n FEATURES NOT SUPPORTED BY STAR10 ----------------------------------- none DEBUGGING --------- When the debugging switch ("+DEBUG" - see CONTROL CARDS) is on, additional code is generated to allow the user to determine where a programme terminated abnormally. DEBUGGING ON THE PDP11 ---------------------- On termination, the STARn statement number may be found in octal in register 0. DEBUGGING ON THE PDP10 ---------------------- On termination, the STARn statement number may be found in register 0, and the routine name in register 2 (in 6 bit ascii). These may be found, following a fault, by typing "E 0" and "E 1", respectively. DEBUGGING ON THE 360 -------------------- On termination, a traceback is given showing the routine name and STARn statement number in parenthesis, followed by the location from which that routine was called, and so on. CONTROL CARDS ------------- GENERAL FORMAT -------------- Control cards are all of the form "*+xxx" or "*-xxx". Where xxx is the switch name, plus turns the switch on, and minus turns it off. Switches may be placed anywhere in a deck and affect all subsequent cards. The "*" is optional and has been included in the syntax to allow control cards to be ignored on the CMU version. CAVEAT PROGRAMETEUR ------------------- Most of the switches have not been fully tested, and care should be taken when they are first used. SWITCHES -------- INITIAL SWITCH SETTING MEANING ------ ------- --------- OPT1 -(off) Register optimization OPT2 -(off) Machine dependent optimization (indirect) DEBUG +(on) Adds inline statement numbers, etc. REG +(on) Allow REG to be in a register REENT -(off) Generate reenterable code RCALL -(off) Calls use a reenterable sequence M40 (PDP11) -(off) Model 40 or above PIC (PDP11) -(off) Position independent code OPTIMIZATION ------------ Optimized code may be generated by adding the cards "+OPT1" (for machine independent register optimization) and "+OPT2" (for machine dependent address optimization). The two options may be turned on (with the "+") or off (with a "-") for different sections of code. It is not obvious what code will be produced, and hence, they should not be used for debugging. In addition, the two types of optimization may conflict -- so that the two together might be produce less optimization than one or the other. Still, the one test performed showed the both to be better than either separately (see Appendix I). DEBUGGING --------- The debugging switch causes one extra instruction to be generated per STARn statement to allow determining the location of abnormal terminations at execution time. The program will be larger and run slower as a result (see DEBUGGING). USING A REGISTER FOR A BASEFIELD -------------------------------- The name "REG" has been reserved to tell the compiler that the basefield is to go into a register. Only one register is available, but any number of basefields may be defined in it (note, though, that they will wipe each other out). Note, also, that this scheme is compatible with *1 for the 360, and will not be effected by the lack of this feature on another compiler. In fact, the feature may be turned off by a "-REG" card. EXAMPLE: BFIELDE P,REG BFIELD R,REG BFIELD Z,REG DO (P,+,5),... REENTERABLE CODE ---------------- The "+REENT" card can be used to cause the compiler to generate reenterable code. NOTE: Routines compiled with the "+REENT" switch are also recursive, and hense, may call themselves or their caller. However, on the 360 this is highly inefficient and should be avoided if possible. In the present implementation, "+REG" cannot be used with "+REENT". M40 (PDP11 only) --- Generates code using the op codes available on the models 11/40 and 11/45. Otherwise, the existance of an EAE is assumed. POSITION INDEPENDENT CODE (PDP11 ONLY) ------------------------- It is sometimes desirable to have routines that can be moved anywhere in core at execution time. This is done with the "+PIC" card, and is only very slightly less efficient. NOTES ON *n IMPLEMENTATION -------------------------- GENERAL NOTES FOR ALL MACHINES ----------------------------- (1) The above rules imply that products and dividends cannot exceed 1 word. (2) FORTRAN may call, or be called from *n on any machine. The calling mechanism is transparent across machines, so that programs should require no changes. (3) The field boundaries are taken modulo the machine word size. For the PDP11 this implies that "(0,15)","(0,31)", and "(16,31)" are all equivalent. (4) The operations (X,<-,0), (X,<-,1), (X,=,0), and (X,=,1) produce efficient code on all machines for X as a single bit and can be used for packing. (5) Code is better optimized when constants are placed on the right. Thus, "(L,=,0)" will produce better code than "(0,=,L)". MACHINE DEPENDENT NOTES ----------------------- (1) The following table gives a rough approximation of the relative efficiency of different operations on the various machines (word operations=1) MACHINE WORD 1/2W BYTE BIT OTHER ------- ---- ---- ---- --- ----- 360 1 1 2 1 PDP11 1 1 1 PDP10 1 2 3 1 3 (2) In most cases, the PDP10 can handle bytes of any size. (3) In most cases, the 360 can handle differing right and left sizes. (4) Some operations are slower on the PDP11. These are, in order of increasing inefficiency: AND, XOR, MULTIPLICATION, DIVISION, GENERALIZED POINTERS. WRITING MACHINE INDEPENDENT CODE -------------------------------- In general, most of the programs in *n will be independent of machine. The following are a few pitfalls to avoid if the code is intended to move from machine to machine: --Never intermix assembly language with the *n. --Make sure values don't exceed the word or byte size (as appropriate) of the smallest machine. --Use either 'Y' format (bytes) or non-'Y' format (words), but don't mix them, as this course leads only to madness. (e.g. byte 8 is the same on the PDP11 and IBM/360, word 2 is the same on both machines, but byte 8 corresponds to word 2 on the 360 and word 4 on the PDP11). Further, bytes are numbered backwards on the PDP11! --Decide which storage locations contain numbers, and which contain addresses! Performing arithmetic on addresses, or pointer operations on numbers, will cause endless grief when going to different computers, or sometimes even going to larger versions of the same computer. APPENDIX I - COMPILER COMPARISONS --------------------------------- SYSTEM COMPILE COMPILE COMPILE COMPILE EXECUTE USED MINUTES MINUTES MINUTES BYTES BYTES (*N) (ASSMB) (TOTAL) ------------- ------- ------- ------- ------- ---------- *1 (CMU) 6.4m .9m 7.3m 170K 6420 *360 (ERC) .75m .9m 1.5m 150K 4188 +OPT1 " " " " 4054(3.1%) *11 (ERC) " .75m " " 2922 +OPT1 " " " " 2802(4.1%) +OPT2 " " " " 2780(4.8%) +OPT1+OPT2 " " " " 2710(7.2%) note: compilation is on the IBM S/360 model 50 Note: PDP10 execution bytes is just slightly less than the 360, assuming 4.5 bytes/word. The following conclusions may be drawn from the above: 1. The ERC compiler runs in 12% of the time of the CMU compiler in roughly the same core. 2. The ERC compiler can produce code that is 60% of the size of the CMU compiler. (Only 3.1% is due to register optimization) 3. The PDP11 architechure allows programs to be produced at 66% of the size of the 360. APPENDIX II - STARn EXTENSIONS ------------------------------ STARn has some minor extensions over the CMU *1. However, for compatability, it is not recommended that these features be used (they are included here only for completeness). 1. Negative constants may be used, as in: (S,<-,-5) 2. Shifting may be specified as an infix operator, as in: (F,<-<-,4) 3. The character "&" may be used in place of "/&/", as in: (F,&,Q) 4. Recursive code (routines which call themselves or the caller) may be used if the "+REENT" switch is set. 5. Character strings (see appropriate appendix). 6. User defined operations (see appropriate appendix). APPENDIX III - CRITERIA FOR A *N MACHINE ---------------------------------------- A *n compiler can be written for any computer. However is should be recognized that some machines will be superior to others. The following list are those features which could appear in specifications for a *n machine. *********************************************************************** [The following are considered manditory] --Words must have at least K bits each (K is application dependent). --The must be at least N words (N is application dependent). --Word size a multiple of 8. --Four registers must be available (excluding those used by hardware, as base registers, for standard linkage conventions, etc.). --It must be possible to do "indexed loads" using at least two of the registers. The instructions should permit fixed offsets from the index. --The following operations should be available: ADD, SUBTRACT, CALL, RETURN, STORE, COMPLIMENT, MULTIPLY, COMPARISON (LEQ, GEQ, GT, LT, NEQ, EQ, UN). *********************************************************************** [The following are considered highly desirable, and are listed in order of decreasing desirability] --An assembler running on the host (not necessarily vendor supplied). --Bytes and half-words (with load/store operations). --Addressability through all of core (no pages, base addressable regions, etc.) --All operations can be performed on half words and bytes. --The following operations should be available: SET-BIT, CLEAR-BIT, TEST-BIT, SHIFT, DIVIDE. --The following operations should be available: AND, OR, EXCLUSIVE-OR. --Byte addressability. --Efficient core to core operations. --Instructions for zeroing, incrementing, decrementing. --Indirect addressing. --An assembler with the ability to handle literals. --Eight bit bytes. *********************************************************************** APPENDIX IV - CHARACTER STRING EXTENSIONS ----------------------------------------- DESCRIPTION ----------- Character strings are considered block sequences and may appear where-ever a block sequence appears. The strings are preceded by " =C' " and followed by " ' ". Any character may be used in place of the quote. Character strings are always padded to a full word with zeros. At least one null (zero) character will be included. Character strings may only be used on the right of an operation, and take their length from the length of the left operand. EXAMPLES -------- IF (PA,=,C'.'),THEN,(P,<-[,=C"Help! I'm trapped in a *n program!") CALLS PRINT,(=C'END OF PROGRAM') NOTES ----- This is an experimental feature, and may change with time. APPENDIX V - USER DEFINED OPERATIONS ------------------------------------ DESCRIPTION ----------- The operation field of the DO, IF, or IFANY statements may contain either the standard *1 operations, or user defined operations. A user defined operation is any name starting with a letter and followed by letters and numbers. This will then become the name of a subroutine that is called with the left and right operands as arguments. I.E., if the operation is (left,op,right), then it becomes the equivalent of: CALLS op,(left,right) EXAMPLES -------- DO (LAA,<-,2),(PRINT,LAA) a printing routine IF (L,EQ,=C'Hi'),THEN,(L,GETS,=C'Bye') string handling IF (A,GT,B),THEN,(A,MU,B) shades of L6 NOTES ----- 1. It is up to the system programmers at an installation to write and put useful routines in libraries. 2. Some of the more useful routines, particularly in the character handling area, might be incorporated into *n at some future date. 3. This is a convenient way of doing "inline" subroutine calls.