Differences between revisions 1 and 26 (spanning 25 versions)
Revision 1 as of 2008-12-22 14:26:23
Size: 4200
Editor: anonymous
Comment:
Revision 26 as of 2009-01-05 17:49:55
Size: 6763
Editor: anonymous
Comment: APL+WIN atomic vector added
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= APL+WIN to Unicode =
At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN.
## page was renamed from AplPlusWinToUnicode
= APL to Unicode =
Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable.
Line 4: Line 5:
GrahamSteer has provided the following functions to enable this facility. Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion.
Line 6: Line 7:
`AplToUtf8` takes the name of a function and converts the code, not just to unicode but to UTF-8.
No doubt it could be amended to accept one or more lines of code if an entire function were not available or required.
The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file.

My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use.

`AplToUtf8` takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input.
Line 10: Line 14:
 ∇ AplToUtf8 f  ∇  AplToUtf8 f
Line 12: Line 16:
  ⍝Get a character representation of the function
   f←⎕cr f
⍝Get a character representation of the function
f←⎕cr f
Line 15: Line 19:
  ⍝Append new line and carriage return characters
   f←(f,⎕av[14]),⎕av[11]
⍝Append new line and carriage return characters
f←(f,⎕tcnl),⎕tclf
Line 18: Line 22:
  ⍝Convert each character to its unicode binary value
   f←∊Utf8 ¨∆aplToUtf8[¯1+⎕av⍳,f;2]
⍝Convert each character to its unicode binary value
f←∊Utf8 ¨∆avutf8[⎕av⍳,f]
Line 21: Line 25:
  ⍝Add the encoding level header and convert back to ascii characters
   f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f
⍝Add the encoding level header and convert back to ascii characters
f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f
Line 24: Line 28:
  ⍝File the character stream
   f FileData 'c:\test.txt'
⍝File the character stream
f FileData 'c:\unicode.txt'
Line 28: Line 33:

`Utf8` which is called under each `(¨)` in the above is an eloquent restating of the definition of the encoding it implements.
`Utf8` which is called under each `(¨)` in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section.
Line 37: Line 41:
  ⍝Convert the character to bytes according to the UTF8 specification   ⍝Convert the character to bytes according to the UTF-8 specification
Line 50: Line 54:
The function `FileData` is a simple utility function to file the result. I am sure you all have your own versions.
Line 51: Line 56:
`∆aplToUtf8`, below, is a two column integer matrix that maps the APL+WIN ⎕AV positions to their unicode code-points.

Graham used the functions to convert themselves into unicode for the wiki. To retrieve them from here for use in APL+WIN you would presumably have to copy them and correct the apl characters manually unless you already had the reverse translation functions in your workspace!

The function `FileData` is left as an exercise for the reader.
`Utf8ToApl` is the reverse function. It assumes that the unicode resides in a native text file.
Line 58: Line 59:
1 1
2 2
3 9079
4 9674
5 168
6 8592
7 7
8 8
9 9
10 10
11 8834
12 12
13 13
14 8835
15 9055
16 16
17 17
18 18
19 19
20 20
21 21
22 9068
23 9077
24 8593
25 8595
26 8594
27 27
28 8867
29 8866
30 9035
31 9042
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
41 41
42 42
43 43
44 44
45 8801
46 46
47 47
48 48
49 49
50 50
51 51
52 52
53 53
54 54
55 55
56 56
57 57
58 58
59 59
60 60
61 61
62 62
63 63
64 64
65 65
66 66
67 67
68 68
69 69
70 70
71 71
72 72
73 73
74 74
75 75
76 76
77 77
78 78
79 79
80 80
81 81
82 82
83 83
84 84
85 85
86 86
87 87
88 88
89 89
90 90
91 91
92 92
93 93
94 94
95 95
96 96
97 97
98 98
99 99
100 100
101 101
102 102
103 103
104 104
105 105
106 106
107 107
108 108
109 109
110 110
111 111
112 112
113 113
114 114
115 115
116 116
117 117
118 118
119 119
120 120
121 121
122 122
123 123
124 124
125 125
126 8764
127 127
128 128
129 63
130 63
131 63
132 63
133 63
134 8800
135 63
136 63
137 63
138 63
139 63
140 63
141 8968
142 63
143 8970
144 63
145 8710
146 215
147 63
148 63
149 9109
150 63
151 9054
152 9017
153 63
154 63
155 63
156 63
157 63
158 9066
159 63
160 63
161 63
162 63
163 63
164 63
165 63
166 9053
167 9024
168 63
169 63
170 63
171 63
172 63
173 63
174 63
175 63
176 63
177 63
178 63
179 63
180 63
181 63
182 63
183 63
184 63
185 63
186 63
187 63
188 63
189 63
190 63
191 63
192 63
193 63
194 63
195 63
196 63
197 63
198 63
199 63
200 63
201 63
202 63
203 63
204 63
205 63
206 63
207 63
208 63
209 63
210 63
211 63
212 63
213 63
214 63
215 63
216 63
217 63
218 63
219 63
220 63
221 63
222 63
223 63
224 9082
225 63
226 9075
227 63
228 63
229 9073
230 8869
231 8868
232 9021
233 8854
234 9074
235 9023
236 8711
237 9033
238 8714
239 9067
240 63
241 9049
242 8805
243 8804
244 9045
245 9038
246 247
247 63
248 8728
249 9675
250 8744
251 9076
252 63
253 175
254 124
255 63
256 63
 ∇ r←Utf8ToApl;v

⍝Tie the native file containing the unicode
'c:\unicode.txt' ⎕ntie ¯1

⍝Read the bits from the file
v←⎕nread ¯1 11,(⎕nsize ¯1),0

⍝Untie the file
⎕nuntie ¯1

⍝Initialise the results vector
r←0⍴0

⍝Convert the bits to integers
v←2⊥⍉((.125×⍴v),8)⍴v

⍝Strip off the encoding header if present
:if 617=+/3↑v
   v←3↓v
:endif

⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification
:while 0≠⍴v

⍝Determine how many bytes represent the next character
    :select +/(↑v)>0 127 223 239
    :case 1
        r←r,2⊥1↓(8⍴2)⊤v[1]
        v←1↓v
    :case 2
        r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2]
        v←2↓v
    :case 3
        r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3]
        v←3↓v
    :case 4
        r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4]
        v←4↓v
    :endselect

:endwhile

⍝Convert unicode integers back to ⎕av characters
r←⎕av[(∆avutf8⍳r)∼11]

Line 315: Line 107:
`∆avutf8` is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points.
Line 316: Line 109:
{{{
    0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055
   16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042
   32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
   48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
   64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
   80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
   96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
  112 113 114 115 116 117 118 119 120 121 122 123 166 125 8764 127
  199 252 233 226 228 224 8800 231 234 235 232 239 238 8968 196 8970
  201 8710 215 244 246 9109 251 9054 9017 214 220 162 163 63 9066 9064
  225 237 243 250 241 209 9053 9024 191 9015 337 248 253 161 171 187
 9109 9109 9109 124 124 124 124 43 43 124 124 43 43 43 43 43
  192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
   45 209 210 211 212 213 214 43 216 217 218 219 220 221 124 255
 9082 223 9075 9060 227 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067
 8801 9049 8805 8804 9045 9038 247 34 8728 9675 8744 9076 8745 175 124 0

}}}
I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly.

Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf.

{{{
   ⍷◊¨← ⊂ ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼
Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯|

}}}
I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at [[AplToUnicodeII]]

Author: GrahamSteer

----

APL to Unicode

Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable.

Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion.

The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file.

My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use.

AplToUtf8 takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input.

 ∇  AplToUtf8 f

⍝Get a character representation of the function
f←⎕cr f

⍝Append new line and carriage return characters
f←(f,⎕tcnl),⎕tclf

⍝Convert each character to its unicode binary value
f←∊Utf8 ¨∆avutf8[⎕av⍳,f]

⍝Add the encoding level header and convert back to ascii characters
f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f

⍝File the character stream
f FileData 'c:\unicode.txt'

Utf8 which is called under each (¨) in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section.

 ∇ r←Utf8 c

  ⍝Determine the number of bytes required to represent the character in unicode
   r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16

  ⍝Convert the character to bytes according to the UTF-8 specification
   :Select r
   :Case 1
       r←⍎⍕0,(7⍴2)⊤c
   :Case 2
       r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c
   :Case 3
       r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c
   :Case 4
       r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c
   :EndSelect

The function FileData is a simple utility function to file the result. I am sure you all have your own versions.

Utf8ToApl is the reverse function. It assumes that the unicode resides in a native text file.

 ∇  r←Utf8ToApl;v

⍝Tie the native file containing the unicode
'c:\unicode.txt' ⎕ntie ¯1

⍝Read the bits from the file
v←⎕nread ¯1 11,(⎕nsize ¯1),0

⍝Untie the file
⎕nuntie ¯1

⍝Initialise the results vector
r←0⍴0

⍝Convert the bits to integers
v←2⊥⍉((.125×⍴v),8)⍴v

⍝Strip off the encoding header if present
:if 617=+/3↑v
   v←3↓v
:endif

⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification
:while 0≠⍴v

⍝Determine how many bytes represent the next character
    :select +/(↑v)>0 127 223 239
    :case 1
        r←r,2⊥1↓(8⍴2)⊤v[1]
        v←1↓v
    :case 2
        r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2]
        v←2↓v
    :case 3
        r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3]
        v←3↓v
    :case 4
        r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4]
        v←4↓v
    :endselect

:endwhile

⍝Convert unicode integers back to ⎕av characters
r←⎕av[(∆avutf8⍳r)∼11]

∆avutf8 is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points.

    0    1    2 9079 9674  168 8592    7    8    9   10 8834   12   13 8835 9055
   16   17   18 9067   20   21 9068 9077 8593 8595 8594   27 8867 8866 9035 9042
   32   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47
   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63
   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79
   80   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95
   96   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111
  112  113  114  115  116  117  118  119  120  121  122  123  166  125 8764  127
  199  252  233  226  228  224 8800  231  234  235  232  239  238 8968  196 8970
  201 8710  215  244  246 9109  251 9054 9017  214  220  162  163   63 9066 9064
  225  237  243  250  241  209 9053 9024  191 9015  337  248  253  161  171  187
 9109 9109 9109  124  124  124  124   43   43  124  124   43   43   43   43   43
  192  193  194  195  196  197  198  199  200  201  202  203  204  205  206  207
   45  209  210  211  212  213  214   43  216  217  218  219  220  221  124  255
 9082  223 9075 9060  227 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067
 8801 9049 8805 8804 9045 9038  247   34 8728 9675 8744 9076 8745  175  124    0

I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly.

Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf.

   ⍷◊¨←    ⊂  ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼
Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯|

I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at AplToUnicodeII

Author: GrahamSteer


CategoryUnicode

AplToUnicode (last edited 2009-01-17 09:47:54 by anonymous)