Proposed standard of ASCII representation of APL
version 3.1
Sam Sirlin, February 21, 1994
with help from
Lee Dickey
John Mitloener
--------------------------------------------------------------------------
Introduction
After seeing what's out there, I decided to come up with my own ASCII
representation, as a standard of some sort should exist by now. I've
followed the existing representations (they're suprisingly similar) in
general, but added many more, since most of the existing schemes don't
quite accomadate all possible needed characters.
The standard I have suggested could be used for
1. discussion of APL over ASCII bbs or networks
2. editing of APL functions in non-APL environments
3. interchanging functions or workspaces, in addition to the
workspace interchange standard.
This is version 3 of my proposed standard. Changes include:
- some extras from C/Unix/ASCII usage, for control characters...
- provision for overstrikes to produce any other possible characters
based on the ISO set.
- inclusion of alternate transliterations, usually longer, that are
easier to read and remember
- splitting up the whole set of characters into the ASCII set, the ISO
set and a set of known ASCII/ISO overstrikes.
- discussion of quotes
- discussion of PP
- inclusion of quote escapes
--------------------------------------------------------------------------
I. Symbol Transliteration
This section documents substitution of individual symbols
Symbols allowed in the representation: ASCII 32-126, (origin 0)
ASCII 10 linefeed
ASCII 13 newline
These are the printable ASCII characters. Representation of other
characters follows:
..........................................................................
0. Escape Character
Rather than use a pure numerical substitution such as 10 for lf etc,
readability demands that some character(s) be used to set off the
transliteration. It is quite evident from past schemes that no two
people (especially APLers) will agree on the same escapes.
Default: .
can be changed as desired via (for example)
.escape @
Escape representations start with the (current) escape character and
end with a space.
Note that case of the symbols in a representation (following the escape
character with no space) should be irrelevant. Thus
.bx
and
.BX
are identical, but different than
. bx
I think an exception should be made for underscored letters, allowing
both
.za
as underscore-lowercase a, and
.zA or .ZA
as underscore-uppercase a.
Quoting is easily included, for example via
.escape {}
which then produces {bx}.
..........................................................................
1. The ASCII characters
While readability is required, it's often useful to refer to them by
their numerical position. They can be generally represented this way
as
.ASCIInnn
where nnn is a number from 0-127 (origin 0). I've also listed some
translateration representations for commonly used control characters
(anyone want more?).
I've thought of giving enough representations below so that printable
ASCII can easily be represented even if you only have access to an APL
type ball (except for lower case letters...) - the reverse problem!
0 null .null
4 ^D .eot
7 ^G, bell .bell
8 ^H, backspace .bs
9 ^I, tab .ht
10 ^J, linefeed .lf
12 ^L, formfeed .ff
13 ^M, newline .nl
16 ^P .dle
17 ^Q .dc1
19 ^S .dc3
27 escape .esc
34 double quote .dqt
96 left single quote .lqt
123 left brace .lb
124 split stile .sst
125 right brace .rb
127 delete .delete
This could also be done for EBCDIC (sp?) if anyone cared.
..........................................................................
2. The ISO APL characters
The ISO standard APL character set. This is the basic building block
from which all APL characters are made. It doesn't include many
important characters that are usually made up of overstrikes of these.
It does include two that are commonly made using overstrikes: diamond
and dollar sign (what about not equal, less than or equal, or greater
than or equal, can/should they be entered as overstrikes?).
Listing of the character set
ISO # name(s)
(origin 0)
33 dieresis, each
34 right parenthesis
35 less than, left caret
36 less than or equal, not greater
37 equal
38 greater than, right caret
39 right bracket, right square bracket
40 or, inverted caret, down caret
41 and, caret, up caret
42 not equal
43 divide, reciprocal
44 comma, ravel, catenate, laminate
45 plus sign, conjugate
46 period, dot, full stop
47 slash, reduction, solidus, stroke
48 0
49 1
50 2
51 3
52 4
53 5
54 6
55 7
56 8
57 9
58 left parenthesis
59 left bracket, left square bracket
60 semicolon
61 times, signum, multiply sign
62 colon
63 back slash, scan, reverse solidus, strike
64 negative, high minus, macron, overbar
65 alpha
66 decode, base, down tack
67 intersection, down U, cap, up shoe
68 floor, minimum, down stile
69 epsilon, member of, datatype
70 underscore, stress, underbar
71 del, nabla
72 delta
73 iota
74 jot, small o
75 quote, single quote
76 quad, quad box, box
77 stile, magnitude, absolute value, verticle line, vergule
78 encode, represent, up tack
79 circle, large o
80 power, exponentiation, star, asterisk
81 query, roll, random, question mark
82 rho, shape, reshape
83 ceiling, maximum, up stile
84 tilde, not, without
85 drop, down arrow, split
86 union, up U, cup, down shoe
87 omega
88 disclose, contains (strict), left U, link, pick, right shoe
89 take, up arrow, mix
90 enclose, contained in (strict), right U, shoe, left shoe
91 left arrow, assign, assignment
92 left tack, dex
93 branch, right arrow , goto
94 greater than or equal, not less
95 minus sign, negate, bar
96 diamond, {and, or; less than, greater than}
97 A
98 B
99 C
100 D
101 E
102 F
103 G
104 H
105 I
106 J
107 K
108 L
109 M
110 N
111 O
112 P
113 Q
114 R
115 S
116 T
117 U
118 V
119 W
120 X
121 Y
122 Z
123 left brace, left curly bracket
124 right tack, lev
125 right brace, right curly bracket
126 dollar sign, {S, stile}
..........................................................................
ASCII Representation of the ISO APL character set
General representation:
.isoapl{nnn}
where {nnn} is an integer between 33 and 126 (0 index origin). For
example
.isoapl122
is Z.
These are given standard names that start with the escape character (.
is shown). In addition, many characters are given one or more
alternate names, usually longer, that are perhaps more easily
understood. I feel that while uniqueness of representation is nice, it
should be sacraficed to make it easier for people to read and write
the representation.
name suggested alternates
standard
33 dieresis .dd .dieresis
34 right parenthesis )
35 less than, left caret <
36 less than or equal, not greater .le
37 equal =
38 greater than, right caret >
39 right bracket, right square bracket ]
40 or, inverted caret .or
41 and, caret .and &
42 not equal .ne
43 divide .div .divide %
44 comma, ravel, catenate, laminate ,
45 plus sign, conjugate +
46 period, dot, full stop .
47 slash, reduction, solidus, stroke /
48 0 0
49 1 1
50 2 2
51 3 3
52 4 4
53 5 5
54 6 6
55 7 7
56 8 8
57 9 9
58 left parenthesis (
59 left bracket, left square bracket [
60 semicolon ;
61 times, signum, multiply sign .ti .times
62 colon :
63 backslash \ .bl
64 negative, high minus, macron .ng _
65 alpha .al .alpha
66 decode, base .de .decode
67 intersection, down U, cap .du
68 floor, min .fl .floor
69 epsilon .ep .epsilon
70 underscore, stress, underbar _
71 del .dl .del
72 delta .ld .delta
73 iota .io .iota
74 jot, small o .so .jot
75 quote, single quote '
76 quad, quad box, box .bx .quad .box
77 stile, magnitude, absolute value .ab .abs
78 encode, represent .en .encode
79 circle, large o .lo .circle
80 power, exponentiation, star *
81 query, roll, random, question mark ?
82 rho .ro .rho
83 ceiling, max .ce .ceiling
84 tilde, not, without ~
85 drop, down arrow .da .drop
86 union, up U, cup .uu .union
87 omega .om .omega
88 disclose, contains (strict) .lu
89 take, up arrow .ua .take
90 enclose, contained in (strict) .ru
91 left arrow, assign, assignment .is
92 left tack, dex .lk
93 branch, arrow right, goto .go .goto
94 greater than or equal, not less .ge
95 minus sign, negate, bar -
96 diamond .dm .diamond
97 A A
98 B B
99 C C
100 D D
101 E E
102 F F
103 G G
104 H H
105 I I
106 J J
107 K K
108 L L
109 M M
110 N N
111 O O
112 P P
113 Q Q
114 R R
115 S S
116 T T
117 U U
118 V V
119 W W
120 X X
121 Y Y
122 Z Z
123 left brace, left curly bracket {
124 right tack, lev .rk
125 right brace, right curly bracket }
126 dollar sign $
..........................................................................
3. Additional Known Overstrikes
Here are some more names, mostly from STSC's list. Some are common,
some I know of no current use for. I'm sure there are more that
somebody wants to use.
# name overstrike
002 inequivalent not equal, underscore
003 epsilon underscore, find epsilon, underscore
015 log, logarithm power, circle
016 quad jot quad, jot
017 quad backslash, sandwich quad, backslash
018 iota underscore iota, underscore
019 del tilde, protected del del, tilde
021 ibeam encode, decode
022 zilde 0, tilde
030 upgrade, grade up delta, stile
031 downgrade, grade down del, stile
033 shriek, factorial quote, dot
151 quote quad quad, '
152 domino, divide quad,
matrix divide, matrix inverse quad, divide
155 cent sign enclose, stile
158 catbar ,, minus
159 frown tilde, dieresis
166 lamp, comment intersection, jot
167 backslash bar, column
backslash, scan first backslash, minus
169 squad left bracket, right bracket
170 snout encode, dieresis
171 frog del, dieresis
172 sourpuss power, dieresis
227 hoot, paw, on jot, dieresis
228 holler, hoof, upon circle, dieresis
229 nor not, tilde
232 rotate, reverse circle, stile
233 rotate bar, column reverse,
rotate first circle, minus
234 nand and, tilde
235 slash bar, column slash,
column reduce, over first slash, minus
237 transpose circle, backslash
240 equivalent, match, depth equal, underscore
241 delta underscore, delta stress delta, underscore
244 format, thorn encode, jot
245 execute, hydrant, pawn decode, jot
contained in, subset enclose, underscore
contains disclose, underscore
A underscore
B underscore
C underscore
D underscore
E underscore
F underscore
G underscore
H underscore
I underscore
J underscore
K underscore
L underscore
M underscore
N underscore
O underscore
P underscore
Q underscore
R underscore
S underscore
T underscore
U underscore
V underscore
W underscore
X underscore
Y underscore
Z underscore
a underscore
b underscore
c underscore
d underscore
e underscore
f underscore
g underscore
h underscore
i underscore
j underscore
k underscore
l underscore
m underscore
n underscore
o underscore
p underscore
q underscore
r underscore
s underscore
t underscore
u underscore
v underscore
w underscore
x underscore
y underscore
z underscore
..........................................................................
ASCII Representation of Known Overstrikes
Quad AV order:
1. STSC
2. IPSharp (ISI?)
3. IAPL
1 2 3
002 148 inequivalent .ine
003 141 epsilon underscore .zep
015 67 171 log, logarithm .lg .log
016 quad jot .qjt .quadjot
017 quad backslash, sandwich .qbs .quadbs
018 142 iota underscore .zio .ziota
019 160 128 del tilde, protected del .pd .deltilde
021 63 137 ibeam .ib .ibeam
022 138 zilde .zil .zilde
030 71 175 upgrade, grade up .gu .gradeup
031 72 174 downgrade, grade down .gd .gradedown
033 48 33 shriek, factorial .fac !
151 66 187 quote quad .qq .quotequad
152 76 186 domino, divide quad, matrix divide .dq .domino
155 5 149 cent sign .cen .cent
158 254 catbar .ctb .catbar
159 144 frown .frn .frown
166 70 189 lamp, comment .lmp .lamp
167 75 251 backslash bar, column bslash,
scan first .cb .cbslash
169 squad .sqd .squad
170 145 snout .snt .snout
171 146 frog .frg .frog
172 143 sourpuss .sop .sourpuss
227 164 139 hoot, paw, on .hoo .hoot
228 165 140 holler, hoof, upon .hol .holler
229 69 166 nor .nor
232 49 172 rotate, reverse .rv .reverse .rotate
233 73 253 rotate bar, column reverse,
rotate first .cr .crv
.creverse .crotate
234 68 168 nand .nand
235 74 252 slash bar, column slash,
column reduce, over first .cs .cslash
237 62 173 transpose .tr .transpose
240 163 147 equivalent, matches .eqv .equivalent .match
241 139 192 delta underscore, delta stress .zld .zdelta
244 77 222 format, thorn .fm .format
245 78 220 execute, hydrant, pawn .xq .execute .do
contained in, subset .zru
contains .zlu
193 A underscore .zA
194 B underscore .zB
195 C underscore .zC
196 D underscore .zD
197 E underscore .zE
198 F underscore .zF
199 G underscore .zG
200 H underscore .zH
201 I underscore .zI
202 J underscore .zJ
203 K underscore .zK
204 L underscore .zL
205 M underscore .zM
206 N underscore .zN
207 O underscore .zO
208 P underscore .zP
209 Q underscore .zQ
210 R underscore .zR
211 S underscore .zS
212 T underscore .zT
213 U underscore .zU
214 V underscore .zV
215 W underscore .zW
216 X underscore .zX
217 Y underscore .zY
218 Z underscore .zZ
a underscore .za
b underscore .zb
c underscore .zc
d underscore .zd
e underscore .ze
f underscore .zf
g underscore .zg
h underscore .zh
i underscore .zi
j underscore .zj
k underscore .zk
l underscore .zl
m underscore .zm
n underscore .zn
o underscore .zo
p underscore .zp
q underscore .zq
r underscore .zr
s underscore .zs
t underscore .zt
u underscore .zu
v underscore .zv
w underscore .zw
x underscore .zx
y underscore .zy
z underscore .zz
..........................................................................
4. Arbitrary Overstrikes
When there are symbols that you want that haven't been given a name,
or you just want to show the explicit overstrikes, just use
symbol .ov symbol
or
symbol .overstrike symbol
where symbol represents any of the above symbols. Is there anything
left impossible to represent?
..........................................................................
5. Quotes
Special care must be taken to deal properly with symbols inside
quotation marks. For the purposes of representation, all symbols must
be replaced by one of the above, otherwise the text would not be
readable. If an interpreter (or compiler) uses the ASCII
representation, then the interpreter must allow a mechanism for input
and output of non-printable characters.
A decision must be made about examples like:
.rho .bx .is '.rho'
and
.rho .bx .is '.bx .ov .div'
or
.rho .bx .is '.bx .ov .div'
Both of these quoted strings should parse as a single character if
executed, yet their respective ASCII representations are of length 4,
12, and 14, respectively. Note that this is not a problem of the
representation, but only for the interpreter. If the representation is
used as a medium for transporting APL code from one interpreter to the
other, then neither interpreter need see the above until after
resolution into one character by the representation translator. If the
interpreter itself uses the ASCII representation then the problem
exists, but can easily be resolved by the introduction of a system
function that does the transformation between internal APL and
printable ASCII representation. We propose this function(s) be
monadic function .bxapl,
c .is .bxapl s
where
s is a character array in ASCII representation
c is a character array of the same dimension as s, converted to
APL internal representation
The near-inverse would be the monadic function .bxascii,
c .is .bxascii s
where
s is a character array in APL internal representation
c is a character array in ASCII representation,
of the same dimension as s, except possibly
for the last dimension, which may be larger.
Hence in an interpreter (or compiler) we could see the following:
.rho '.rho'
4
.rho .bxapl '.rho'
1
.rho 2 4 .rho '.rho'
2 4
.rho a .is .bxapl 2 4 .rho '.rho'
2 4
(.bxapl '.rho') = a
1 0 0 0
1 0 0 0
' ' = a
0 1 1 1
0 1 1 1
.rho .bxascii a .is .bxapl 2 4 .rho '.rho'
2 7
Note that .bxascii and .bxapl are not exact inverses. .bxascii must
sometimes add space, yet .bxapl can't always remove it, since
otherwise arrays might not be easily converted. What is .bxapl to do
with
a .is 2 8 .rho '.rho ', '.rho .is'
except leave extra blanks where characters are "removed?" In addition,
while .bxascii will always work, .bxapl is dependant on the system
atomic vector, and hence may run into an unknown character.
One last issue is how these functions treat '.' (the current escape
character). My current thinking is that if .bxascii encounters the
escape, it should be retained together with an extra space:
'. ' = .bxascii '.'
Then, when .bxapl encounters '. ', it can simply replace this by '.'
This way constructions such as 'a+.dl b' will be preserved af,
assuming dl is a user-defined dyadic scalar function. In addition,
text with '.'s will be preserved. An alternative is for .bxascii to
replace '.' with '..' etc, but which looks better, 'a+. *b' or
'a+..*b'? I prefer the first. TeX and C go with the second (for \).
This leaves the (really) final issue of what .bxapl does with '.xxx',
with xxx an unknown name. This is bound to come up since different
systems have different character sets. This could
1. yield a domain error and stop
2. parse it as '. xxx' and continue
3. ignore the whole word and continue
I currently favor the second, as then the user will see something she
can do something with, if desired. As a side effect, it should
probably warn the user as well.
--------------------------------------------------------------------------
II. Representation of Objects
I've really only considered readable representation of indivual
characters here. This can be used with APL's standard (del) editor
representation for functions if one wants simplicity.
A more modern, complete representation of objects (including data),
can be done with John Mitloener's PP scheme. (need a reference
here...)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Some Comments:
STSC actually equates underscored letters with the lowercase alphabet. I
think both can be accomodated.
IP Sharp appears to allow lowercase on some machines through
overstriking with macron.
The TeX font has provision for overstriking (which is how many of
the compound symbols above are defined. This allows easy definition
of many more symbols.
I know various other keyword forms have existed at one time or another.
STSC had one that they seem to have stopped using. (see LJD's list).
If you have questions or comments, let me know:
Sam Sirlin
sam on the BBS\APL
sam@kalessin.jpl.nasa.gov
References:
1. STSC, APL*Plus II Quick Reference Guide, 1988
2. IP Sharp, Sharp APL Pocket Reference, 1984
3. T. Budd, APLC token list, 1987.
4. A. Hohti, O. Kanerva, TeX APL font definition
apldef.tex, described in TUGboat, 1987.
5. latex APL ref...
6. PP ref...
..........................................................................
..........................................................................