Size: 4200
Comment:
|
Size: 6763
Comment: APL+WIN atomic vector added
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= APL+WIN to Unicode = At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN. |
## page was renamed from AplPlusWinToUnicode = APL to Unicode = Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable. |
Line 4: | Line 5: |
GrahamSteer has provided the following functions to enable this facility. | Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion. |
Line 6: | Line 7: |
`AplToUtf8` takes the name of a function and converts the code, not just to unicode but to UTF-8. No doubt it could be amended to accept one or more lines of code if an entire function were not available or required. |
The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file. My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use. `AplToUtf8` takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input. |
Line 10: | Line 14: |
∇ AplToUtf8 f | ∇ AplToUtf8 f |
Line 12: | Line 16: |
⍝Get a character representation of the function f←⎕cr f |
⍝Get a character representation of the function f←⎕cr f |
Line 15: | Line 19: |
⍝Append new line and carriage return characters f←(f,⎕av[14]),⎕av[11] |
⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf |
Line 18: | Line 22: |
⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆aplToUtf8[¯1+⎕av⍳,f;2] |
⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] |
Line 21: | Line 25: |
⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f |
⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f |
Line 24: | Line 28: |
⍝File the character stream f FileData 'c:\test.txt' |
⍝File the character stream f FileData 'c:\unicode.txt' |
Line 28: | Line 33: |
`Utf8` which is called under each `(¨)` in the above is an eloquent restating of the definition of the encoding it implements. |
`Utf8` which is called under each `(¨)` in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section. |
Line 37: | Line 41: |
⍝Convert the character to bytes according to the UTF≡8 specification | ⍝Convert the character to bytes according to the UTF-8 specification |
Line 50: | Line 54: |
The function `FileData` is a simple utility function to file the result. I am sure you all have your own versions. | |
Line 51: | Line 56: |
`∆aplToUtf8`, below, is a two column integer matrix that maps the APL+WIN ⎕AV positions to their unicode code-points. Graham used the functions to convert themselves into unicode for the wiki. To retrieve them from here for use in APL+WIN you would presumably have to copy them and correct the apl characters manually unless you already had the reverse translation functions in your workspace! The function `FileData` is left as an exercise for the reader. |
`Utf8ToApl` is the reverse function. It assumes that the unicode resides in a native text file. |
Line 58: | Line 59: |
1 1 2 2 3 9079 4 9674 5 168 6 8592 7 7 8 8 9 9 10 10 11 8834 12 12 13 13 14 8835 15 9055 16 16 17 17 18 18 19 19 20 20 21 21 22 9068 23 9077 24 8593 25 8595 26 8594 27 27 28 8867 29 8866 30 9035 31 9042 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 8801 46 46 47 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 64 64 65 65 66 66 67 67 68 68 69 69 70 70 71 71 72 72 73 73 74 74 75 75 76 76 77 77 78 78 79 79 80 80 81 81 82 82 83 83 84 84 85 85 86 86 87 87 88 88 89 89 90 90 91 91 92 92 93 93 94 94 95 95 96 96 97 97 98 98 99 99 100 100 101 101 102 102 103 103 104 104 105 105 106 106 107 107 108 108 109 109 110 110 111 111 112 112 113 113 114 114 115 115 116 116 117 117 118 118 119 119 120 120 121 121 122 122 123 123 124 124 125 125 126 8764 127 127 128 128 129 63 130 63 131 63 132 63 133 63 134 8800 135 63 136 63 137 63 138 63 139 63 140 63 141 8968 142 63 143 8970 144 63 145 8710 146 215 147 63 148 63 149 9109 150 63 151 9054 152 9017 153 63 154 63 155 63 156 63 157 63 158 9066 159 63 160 63 161 63 162 63 163 63 164 63 165 63 166 9053 167 9024 168 63 169 63 170 63 171 63 172 63 173 63 174 63 175 63 176 63 177 63 178 63 179 63 180 63 181 63 182 63 183 63 184 63 185 63 186 63 187 63 188 63 189 63 190 63 191 63 192 63 193 63 194 63 195 63 196 63 197 63 198 63 199 63 200 63 201 63 202 63 203 63 204 63 205 63 206 63 207 63 208 63 209 63 210 63 211 63 212 63 213 63 214 63 215 63 216 63 217 63 218 63 219 63 220 63 221 63 222 63 223 63 224 9082 225 63 226 9075 227 63 228 63 229 9073 230 8869 231 8868 232 9021 233 8854 234 9074 235 9023 236 8711 237 9033 238 8714 239 9067 240 63 241 9049 242 8805 243 8804 244 9045 245 9038 246 247 247 63 248 8728 249 9675 250 8744 251 9076 252 63 253 175 254 124 255 63 256 63 |
∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\unicode.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇ |
Line 315: | Line 107: |
`∆avutf8` is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points. | |
Line 316: | Line 109: |
{{{ 0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 166 125 8764 127 199 252 233 226 228 224 8800 231 234 235 232 239 238 8968 196 8970 201 8710 215 244 246 9109 251 9054 9017 214 220 162 163 63 9066 9064 225 237 243 250 241 209 9053 9024 191 9015 337 248 253 161 171 187 9109 9109 9109 124 124 124 124 43 43 124 124 43 43 43 43 43 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 45 209 210 211 212 213 214 43 216 217 218 219 220 221 124 255 9082 223 9075 9060 227 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 34 8728 9675 8744 9076 8745 175 124 0 }}} I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly. Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf. {{{ ⍷◊¨← ⊂ ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼ Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯| }}} I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at [[AplToUnicodeII]] Author: GrahamSteer ---- |
APL to Unicode
Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable.
Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion.
The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file.
My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use.
AplToUtf8 takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input.
∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\unicode.txt' ∇
Utf8 which is called under each (¨) in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section.
∇ r←Utf8 c ⍝Determine the number of bytes required to represent the character in unicode r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16 ⍝Convert the character to bytes according to the UTF-8 specification :Select r :Case 1 r←⍎⍕0,(7⍴2)⊤c :Case 2 r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c :Case 3 r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c :Case 4 r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c :EndSelect ∇
The function FileData is a simple utility function to file the result. I am sure you all have your own versions.
Utf8ToApl is the reverse function. It assumes that the unicode resides in a native text file.
∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\unicode.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇
∆avutf8 is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points.
0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 166 125 8764 127 199 252 233 226 228 224 8800 231 234 235 232 239 238 8968 196 8970 201 8710 215 244 246 9109 251 9054 9017 214 220 162 163 63 9066 9064 225 237 243 250 241 209 9053 9024 191 9015 337 248 253 161 171 187 9109 9109 9109 124 124 124 124 43 43 124 124 43 43 43 43 43 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 45 209 210 211 212 213 214 43 216 217 218 219 220 221 124 255 9082 223 9075 9060 227 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 34 8728 9675 8744 9076 8745 175 124 0
I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly.
Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf.
⍷◊¨← ⊂ ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼ Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯|
I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at AplToUnicodeII
Author: GrahamSteer