Proposed standard of ASCII representation of APL version 3.1 Sam Sirlin, February 21, 1994 with help from Lee Dickey John Mitloener -------------------------------------------------------------------------- Introduction After seeing what's out there, I decided to come up with my own ASCII representation, as a standard of some sort should exist by now. I've followed the existing representations (they're suprisingly similar) in general, but added many more, since most of the existing schemes don't quite accomadate all possible needed characters. The standard I have suggested could be used for 1. discussion of APL over ASCII bbs or networks 2. editing of APL functions in non-APL environments 3. interchanging functions or workspaces, in addition to the workspace interchange standard. This is version 3 of my proposed standard. Changes include: - some extras from C/Unix/ASCII usage, for control characters... - provision for overstrikes to produce any other possible characters based on the ISO set. - inclusion of alternate transliterations, usually longer, that are easier to read and remember - splitting up the whole set of characters into the ASCII set, the ISO set and a set of known ASCII/ISO overstrikes. - discussion of quotes - discussion of PP - inclusion of quote escapes -------------------------------------------------------------------------- I. Symbol Transliteration This section documents substitution of individual symbols Symbols allowed in the representation: ASCII 32-126, (origin 0) ASCII 10 linefeed ASCII 13 newline These are the printable ASCII characters. Representation of other characters follows: .......................................................................... 0. Escape Character Rather than use a pure numerical substitution such as 10 for lf etc, readability demands that some character(s) be used to set off the transliteration. It is quite evident from past schemes that no two people (especially APLers) will agree on the same escapes. Default: . can be changed as desired via (for example) .escape @ Escape representations start with the (current) escape character and end with a space. Note that case of the symbols in a representation (following the escape character with no space) should be irrelevant. Thus .bx and .BX are identical, but different than . bx I think an exception should be made for underscored letters, allowing both .za as underscore-lowercase a, and .zA or .ZA as underscore-uppercase a. Quoting is easily included, for example via .escape {} which then produces {bx}. .......................................................................... 1. The ASCII characters While readability is required, it's often useful to refer to them by their numerical position. They can be generally represented this way as .ASCIInnn where nnn is a number from 0-127 (origin 0). I've also listed some translateration representations for commonly used control characters (anyone want more?). I've thought of giving enough representations below so that printable ASCII can easily be represented even if you only have access to an APL type ball (except for lower case letters...) - the reverse problem! 0 null .null 4 ^D .eot 7 ^G, bell .bell 8 ^H, backspace .bs 9 ^I, tab .ht 10 ^J, linefeed .lf 12 ^L, formfeed .ff 13 ^M, newline .nl 16 ^P .dle 17 ^Q .dc1 19 ^S .dc3 27 escape .esc 34 double quote .dqt 96 left single quote .lqt 123 left brace .lb 124 split stile .sst 125 right brace .rb 127 delete .delete This could also be done for EBCDIC (sp?) if anyone cared. .......................................................................... 2. The ISO APL characters The ISO standard APL character set. This is the basic building block from which all APL characters are made. It doesn't include many important characters that are usually made up of overstrikes of these. It does include two that are commonly made using overstrikes: diamond and dollar sign (what about not equal, less than or equal, or greater than or equal, can/should they be entered as overstrikes?). Listing of the character set ISO # name(s) (origin 0) 33 dieresis, each 34 right parenthesis 35 less than, left caret 36 less than or equal, not greater 37 equal 38 greater than, right caret 39 right bracket, right square bracket 40 or, inverted caret, down caret 41 and, caret, up caret 42 not equal 43 divide, reciprocal 44 comma, ravel, catenate, laminate 45 plus sign, conjugate 46 period, dot, full stop 47 slash, reduction, solidus, stroke 48 0 49 1 50 2 51 3 52 4 53 5 54 6 55 7 56 8 57 9 58 left parenthesis 59 left bracket, left square bracket 60 semicolon 61 times, signum, multiply sign 62 colon 63 back slash, scan, reverse solidus, strike 64 negative, high minus, macron, overbar 65 alpha 66 decode, base, down tack 67 intersection, down U, cap, up shoe 68 floor, minimum, down stile 69 epsilon, member of, datatype 70 underscore, stress, underbar 71 del, nabla 72 delta 73 iota 74 jot, small o 75 quote, single quote 76 quad, quad box, box 77 stile, magnitude, absolute value, verticle line, vergule 78 encode, represent, up tack 79 circle, large o 80 power, exponentiation, star, asterisk 81 query, roll, random, question mark 82 rho, shape, reshape 83 ceiling, maximum, up stile 84 tilde, not, without 85 drop, down arrow, split 86 union, up U, cup, down shoe 87 omega 88 disclose, contains (strict), left U, link, pick, right shoe 89 take, up arrow, mix 90 enclose, contained in (strict), right U, shoe, left shoe 91 left arrow, assign, assignment 92 left tack, dex 93 branch, right arrow , goto 94 greater than or equal, not less 95 minus sign, negate, bar 96 diamond, {and, or; less than, greater than} 97 A 98 B 99 C 100 D 101 E 102 F 103 G 104 H 105 I 106 J 107 K 108 L 109 M 110 N 111 O 112 P 113 Q 114 R 115 S 116 T 117 U 118 V 119 W 120 X 121 Y 122 Z 123 left brace, left curly bracket 124 right tack, lev 125 right brace, right curly bracket 126 dollar sign, {S, stile} .......................................................................... ASCII Representation of the ISO APL character set General representation: .isoapl{nnn} where {nnn} is an integer between 33 and 126 (0 index origin). For example .isoapl122 is Z. These are given standard names that start with the escape character (. is shown). In addition, many characters are given one or more alternate names, usually longer, that are perhaps more easily understood. I feel that while uniqueness of representation is nice, it should be sacraficed to make it easier for people to read and write the representation. name suggested alternates standard 33 dieresis .dd .dieresis 34 right parenthesis ) 35 less than, left caret < 36 less than or equal, not greater .le 37 equal = 38 greater than, right caret > 39 right bracket, right square bracket ] 40 or, inverted caret .or 41 and, caret .and & 42 not equal .ne 43 divide .div .divide % 44 comma, ravel, catenate, laminate , 45 plus sign, conjugate + 46 period, dot, full stop . 47 slash, reduction, solidus, stroke / 48 0 0 49 1 1 50 2 2 51 3 3 52 4 4 53 5 5 54 6 6 55 7 7 56 8 8 57 9 9 58 left parenthesis ( 59 left bracket, left square bracket [ 60 semicolon ; 61 times, signum, multiply sign .ti .times 62 colon : 63 backslash \ .bl 64 negative, high minus, macron .ng _ 65 alpha .al .alpha 66 decode, base .de .decode 67 intersection, down U, cap .du 68 floor, min .fl .floor 69 epsilon .ep .epsilon 70 underscore, stress, underbar _ 71 del .dl .del 72 delta .ld .delta 73 iota .io .iota 74 jot, small o .so .jot 75 quote, single quote ' 76 quad, quad box, box .bx .quad .box 77 stile, magnitude, absolute value .ab .abs 78 encode, represent .en .encode 79 circle, large o .lo .circle 80 power, exponentiation, star * 81 query, roll, random, question mark ? 82 rho .ro .rho 83 ceiling, max .ce .ceiling 84 tilde, not, without ~ 85 drop, down arrow .da .drop 86 union, up U, cup .uu .union 87 omega .om .omega 88 disclose, contains (strict) .lu 89 take, up arrow .ua .take 90 enclose, contained in (strict) .ru 91 left arrow, assign, assignment .is 92 left tack, dex .lk 93 branch, arrow right, goto .go .goto 94 greater than or equal, not less .ge 95 minus sign, negate, bar - 96 diamond .dm .diamond 97 A A 98 B B 99 C C 100 D D 101 E E 102 F F 103 G G 104 H H 105 I I 106 J J 107 K K 108 L L 109 M M 110 N N 111 O O 112 P P 113 Q Q 114 R R 115 S S 116 T T 117 U U 118 V V 119 W W 120 X X 121 Y Y 122 Z Z 123 left brace, left curly bracket { 124 right tack, lev .rk 125 right brace, right curly bracket } 126 dollar sign $ .......................................................................... 3. Additional Known Overstrikes Here are some more names, mostly from STSC's list. Some are common, some I know of no current use for. I'm sure there are more that somebody wants to use. # name overstrike 002 inequivalent not equal, underscore 003 epsilon underscore, find epsilon, underscore 015 log, logarithm power, circle 016 quad jot quad, jot 017 quad backslash, sandwich quad, backslash 018 iota underscore iota, underscore 019 del tilde, protected del del, tilde 021 ibeam encode, decode 022 zilde 0, tilde 030 upgrade, grade up delta, stile 031 downgrade, grade down del, stile 033 shriek, factorial quote, dot 151 quote quad quad, ' 152 domino, divide quad, matrix divide, matrix inverse quad, divide 155 cent sign enclose, stile 158 catbar ,, minus 159 frown tilde, dieresis 166 lamp, comment intersection, jot 167 backslash bar, column backslash, scan first backslash, minus 169 squad left bracket, right bracket 170 snout encode, dieresis 171 frog del, dieresis 172 sourpuss power, dieresis 227 hoot, paw, on jot, dieresis 228 holler, hoof, upon circle, dieresis 229 nor not, tilde 232 rotate, reverse circle, stile 233 rotate bar, column reverse, rotate first circle, minus 234 nand and, tilde 235 slash bar, column slash, column reduce, over first slash, minus 237 transpose circle, backslash 240 equivalent, match, depth equal, underscore 241 delta underscore, delta stress delta, underscore 244 format, thorn encode, jot 245 execute, hydrant, pawn decode, jot contained in, subset enclose, underscore contains disclose, underscore A underscore B underscore C underscore D underscore E underscore F underscore G underscore H underscore I underscore J underscore K underscore L underscore M underscore N underscore O underscore P underscore Q underscore R underscore S underscore T underscore U underscore V underscore W underscore X underscore Y underscore Z underscore a underscore b underscore c underscore d underscore e underscore f underscore g underscore h underscore i underscore j underscore k underscore l underscore m underscore n underscore o underscore p underscore q underscore r underscore s underscore t underscore u underscore v underscore w underscore x underscore y underscore z underscore .......................................................................... ASCII Representation of Known Overstrikes Quad AV order: 1. STSC 2. IPSharp (ISI?) 3. IAPL 1 2 3 002 148 inequivalent .ine 003 141 epsilon underscore .zep 015 67 171 log, logarithm .lg .log 016 quad jot .qjt .quadjot 017 quad backslash, sandwich .qbs .quadbs 018 142 iota underscore .zio .ziota 019 160 128 del tilde, protected del .pd .deltilde 021 63 137 ibeam .ib .ibeam 022 138 zilde .zil .zilde 030 71 175 upgrade, grade up .gu .gradeup 031 72 174 downgrade, grade down .gd .gradedown 033 48 33 shriek, factorial .fac ! 151 66 187 quote quad .qq .quotequad 152 76 186 domino, divide quad, matrix divide .dq .domino 155 5 149 cent sign .cen .cent 158 254 catbar .ctb .catbar 159 144 frown .frn .frown 166 70 189 lamp, comment .lmp .lamp 167 75 251 backslash bar, column bslash, scan first .cb .cbslash 169 squad .sqd .squad 170 145 snout .snt .snout 171 146 frog .frg .frog 172 143 sourpuss .sop .sourpuss 227 164 139 hoot, paw, on .hoo .hoot 228 165 140 holler, hoof, upon .hol .holler 229 69 166 nor .nor 232 49 172 rotate, reverse .rv .reverse .rotate 233 73 253 rotate bar, column reverse, rotate first .cr .crv .creverse .crotate 234 68 168 nand .nand 235 74 252 slash bar, column slash, column reduce, over first .cs .cslash 237 62 173 transpose .tr .transpose 240 163 147 equivalent, matches .eqv .equivalent .match 241 139 192 delta underscore, delta stress .zld .zdelta 244 77 222 format, thorn .fm .format 245 78 220 execute, hydrant, pawn .xq .execute .do contained in, subset .zru contains .zlu 193 A underscore .zA 194 B underscore .zB 195 C underscore .zC 196 D underscore .zD 197 E underscore .zE 198 F underscore .zF 199 G underscore .zG 200 H underscore .zH 201 I underscore .zI 202 J underscore .zJ 203 K underscore .zK 204 L underscore .zL 205 M underscore .zM 206 N underscore .zN 207 O underscore .zO 208 P underscore .zP 209 Q underscore .zQ 210 R underscore .zR 211 S underscore .zS 212 T underscore .zT 213 U underscore .zU 214 V underscore .zV 215 W underscore .zW 216 X underscore .zX 217 Y underscore .zY 218 Z underscore .zZ a underscore .za b underscore .zb c underscore .zc d underscore .zd e underscore .ze f underscore .zf g underscore .zg h underscore .zh i underscore .zi j underscore .zj k underscore .zk l underscore .zl m underscore .zm n underscore .zn o underscore .zo p underscore .zp q underscore .zq r underscore .zr s underscore .zs t underscore .zt u underscore .zu v underscore .zv w underscore .zw x underscore .zx y underscore .zy z underscore .zz .......................................................................... 4. Arbitrary Overstrikes When there are symbols that you want that haven't been given a name, or you just want to show the explicit overstrikes, just use symbol .ov symbol or symbol .overstrike symbol where symbol represents any of the above symbols. Is there anything left impossible to represent? .......................................................................... 5. Quotes Special care must be taken to deal properly with symbols inside quotation marks. For the purposes of representation, all symbols must be replaced by one of the above, otherwise the text would not be readable. If an interpreter (or compiler) uses the ASCII representation, then the interpreter must allow a mechanism for input and output of non-printable characters. A decision must be made about examples like: .rho .bx .is '.rho' and .rho .bx .is '.bx .ov .div' or .rho .bx .is '.bx .ov .div' Both of these quoted strings should parse as a single character if executed, yet their respective ASCII representations are of length 4, 12, and 14, respectively. Note that this is not a problem of the representation, but only for the interpreter. If the representation is used as a medium for transporting APL code from one interpreter to the other, then neither interpreter need see the above until after resolution into one character by the representation translator. If the interpreter itself uses the ASCII representation then the problem exists, but can easily be resolved by the introduction of a system function that does the transformation between internal APL and printable ASCII representation. We propose this function(s) be monadic function .bxapl, c .is .bxapl s where s is a character array in ASCII representation c is a character array of the same dimension as s, converted to APL internal representation The near-inverse would be the monadic function .bxascii, c .is .bxascii s where s is a character array in APL internal representation c is a character array in ASCII representation, of the same dimension as s, except possibly for the last dimension, which may be larger. Hence in an interpreter (or compiler) we could see the following: .rho '.rho' 4 .rho .bxapl '.rho' 1 .rho 2 4 .rho '.rho' 2 4 .rho a .is .bxapl 2 4 .rho '.rho' 2 4 (.bxapl '.rho') = a 1 0 0 0 1 0 0 0 ' ' = a 0 1 1 1 0 1 1 1 .rho .bxascii a .is .bxapl 2 4 .rho '.rho' 2 7 Note that .bxascii and .bxapl are not exact inverses. .bxascii must sometimes add space, yet .bxapl can't always remove it, since otherwise arrays might not be easily converted. What is .bxapl to do with a .is 2 8 .rho '.rho ', '.rho .is' except leave extra blanks where characters are "removed?" In addition, while .bxascii will always work, .bxapl is dependant on the system atomic vector, and hence may run into an unknown character. One last issue is how these functions treat '.' (the current escape character). My current thinking is that if .bxascii encounters the escape, it should be retained together with an extra space: '. ' = .bxascii '.' Then, when .bxapl encounters '. ', it can simply replace this by '.' This way constructions such as 'a+.dl b' will be preserved af, assuming dl is a user-defined dyadic scalar function. In addition, text with '.'s will be preserved. An alternative is for .bxascii to replace '.' with '..' etc, but which looks better, 'a+. *b' or 'a+..*b'? I prefer the first. TeX and C go with the second (for \). This leaves the (really) final issue of what .bxapl does with '.xxx', with xxx an unknown name. This is bound to come up since different systems have different character sets. This could 1. yield a domain error and stop 2. parse it as '. xxx' and continue 3. ignore the whole word and continue I currently favor the second, as then the user will see something she can do something with, if desired. As a side effect, it should probably warn the user as well. -------------------------------------------------------------------------- II. Representation of Objects I've really only considered readable representation of indivual characters here. This can be used with APL's standard (del) editor representation for functions if one wants simplicity. A more modern, complete representation of objects (including data), can be done with John Mitloener's PP scheme. (need a reference here...) -------------------------------------------------------------------------- -------------------------------------------------------------------------- Some Comments: STSC actually equates underscored letters with the lowercase alphabet. I think both can be accomodated. IP Sharp appears to allow lowercase on some machines through overstriking with macron. The TeX font has provision for overstriking (which is how many of the compound symbols above are defined. This allows easy definition of many more symbols. I know various other keyword forms have existed at one time or another. STSC had one that they seem to have stopped using. (see LJD's list). If you have questions or comments, let me know: Sam Sirlin sam on the BBS\APL sam@kalessin.jpl.nasa.gov References: 1. STSC, APL*Plus II Quick Reference Guide, 1988 2. IP Sharp, Sharp APL Pocket Reference, 1984 3. T. Budd, APLC token list, 1987. 4. A. Hohti, O. Kanerva, TeX APL font definition apldef.tex, described in TUGboat, 1987. 5. latex APL ref... 6. PP ref... .......................................................................... ..........................................................................