APL to Unicode
Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable.
Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion.
The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file.
My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use.
AplToUtf8 takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input.
∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\unicode.txt' ∇
Utf8 which is called under each (¨) in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section.
∇ r←Utf8 c ⍝Determine the number of bytes required to represent the character in unicode r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16 ⍝Convert the character to bytes according to the UTF-8 specification :Select r :Case 1 r←⍎⍕0,(7⍴2)⊤c :Case 2 r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c :Case 3 r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c :Case 4 r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c :EndSelect ∇
The function FileData is a simple utility function to file the result. I am sure you all have your own versions.
Utf8ToApl is the reverse function. It assumes that the unicode resides in a native text file.
∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\unicode.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇
∆avutf8 is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points.
0 1 2 9079 8900 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 166 125 126 127 199 252 233 226 228 224 8800 231 234 235 232 239 238 8968 196 8970 201 8710 215 244 246 9109 251 9054 9017 214 220 162 163 63 9066 9064 225 237 243 250 241 209 9053 9024 191 9015 337 248 253 161 171 187 9109 9109 9109 124 124 124 124 43 43 124 124 43 43 43 43 43 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 45 209 210 211 212 213 214 43 216 217 218 219 220 221 124 255 9082 223 9075 9060 227 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 34 8728 9675 8744 9076 8745 175 124 0
I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly.
Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf.
⍷◊¨← ⊂ ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼ Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯|
I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at AplToUnicodeII
Author: GrahamSteer