Size: 4200
Comment:
|
Size: 4800
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= APL+WIN to Unicode = At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN. |
= APL+WIN to Unicode - Work in Progress = At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN at least up to version 5. |
Line 4: | Line 4: |
GrahamSteer has provided the following functions to enable this facility. | I am developing a set of functions to enable this facility. The forward functions from APL+WIN to Unicode UTF-8 are below. The reverse functions will follow shortly. |
Line 6: | Line 6: |
`AplToUtf8` takes the name of a function and converts the code, not just to unicode but to UTF-8. No doubt it could be amended to accept one or more lines of code if an entire function were not available or required. |
`AplToUtf8` takes the name of a function and converts the code to Unicode UTF-8 encoding. At the moment this function simply deals with whole functions but can easily be generalised to work with any character string input. |
Line 10: | Line 10: |
∇ AplToUtf8 f | ∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\test.txt' |
Line 12: | Line 27: |
⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕av[14]),⎕av[11] ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆aplToUtf8[¯1+⎕av⍳,f;2] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\test.txt' |
|
Line 29: | Line 30: |
`Utf8` which is called under each `(¨)` in the above is an eloquent restating of the definition of the encoding it implements. | `Utf8` which is called under each `(¨)` in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. |
Line 37: | Line 38: |
⍝Convert the character to bytes according to the UTF≡8 specification | ⍝Convert the character to bytes according to the UTF-8 specification |
Line 51: | Line 52: |
`∆aplToUtf8`, below, is a two column integer matrix that maps the APL+WIN ⎕AV positions to their unicode code-points. | `∆avutf8`, below, is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points. The large number of 63 entries represent question marks and anyone who knows the corresponding unicode code points for those places in the APL+WIN atomic vector should feel free to add them in. Also please feel free to correct any errors in the vector that might show up when using it with the above functions |
Line 53: | Line 54: |
Graham used the functions to convert themselves into unicode for the wiki. To retrieve them from here for use in APL+WIN you would presumably have to copy them and correct the apl characters manually unless you already had the reverse translation functions in your workspace! | I used the functions to convert themselves into unicode for the wiki. I will shortly be posting the reverse functions. Anyone wishing to create their own versions for another interpreter need only concern themselves with `AplToUtf8` and their specific atomic vector to unicode translation vector. Utf8 is trivial and should work with any interpreter that supports control structures. |
Line 55: | Line 56: |
The function `FileData` is left as an exercise for the reader. | The function `FileData` is a simple utility function to file the result. I am sure you all have your own versions but for completeness I might list it when the more important stuff is finished. |
Line 58: | Line 59: |
1 1 2 2 3 9079 4 9674 5 168 6 8592 7 7 8 8 9 9 10 10 11 8834 12 12 13 13 14 8835 15 9055 16 16 17 17 18 18 19 19 20 20 21 21 22 9068 23 9077 24 8593 25 8595 26 8594 27 27 28 8867 29 8866 30 9035 31 9042 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 8801 46 46 47 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62 63 63 64 64 65 65 66 66 67 67 68 68 69 69 70 70 71 71 72 72 73 73 74 74 75 75 76 76 77 77 78 78 79 79 80 80 81 81 82 82 83 83 84 84 85 85 86 86 87 87 88 88 89 89 90 90 91 91 92 92 93 93 94 94 95 95 96 96 97 97 98 98 99 99 100 100 101 101 102 102 103 103 104 104 105 105 106 106 107 107 108 108 109 109 110 110 111 111 112 112 113 113 114 114 115 115 116 116 117 117 118 118 119 119 120 120 121 121 122 122 123 123 124 124 125 125 126 8764 127 127 128 128 129 63 130 63 131 63 132 63 133 63 134 8800 135 63 136 63 137 63 138 63 139 63 140 63 141 8968 142 63 143 8970 144 63 145 8710 146 215 147 63 148 63 149 9109 150 63 151 9054 152 9017 153 63 154 63 155 63 156 63 157 63 158 9066 159 63 160 63 161 63 162 63 163 63 164 63 165 63 166 9053 167 9024 168 63 169 63 170 63 171 63 172 63 173 63 174 63 175 63 176 63 177 63 178 63 179 63 180 63 181 63 182 63 183 63 184 63 185 63 186 63 187 63 188 63 189 63 190 63 191 63 192 63 193 63 194 63 195 63 196 63 197 63 198 63 199 63 200 63 201 63 202 63 203 63 204 63 205 63 206 63 207 63 208 63 209 63 210 63 211 63 212 63 213 63 214 63 215 63 216 63 217 63 218 63 219 63 220 63 221 63 222 63 223 63 224 9082 225 63 226 9075 227 63 228 63 229 9073 230 8869 231 8868 232 9021 233 8854 234 9074 235 9023 236 8711 237 9033 238 8714 239 9067 240 63 241 9049 242 8805 243 8804 244 9045 245 9038 246 247 247 63 248 8728 249 9675 250 8744 251 9076 252 63 253 175 254 124 255 63 256 63 |
0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 8764 127 63 63 63 63 63 63 8800 63 63 63 63 63 63 8968 63 8970 63 8710 215 63 63 9109 63 9054 9017 63 63 63 63 63 9066 63 63 63 63 63 63 63 9053 9024 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 9082 63 9075 63 63 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 63 8728 9675 8744 9076 63 175 124 63 |
APL+WIN to Unicode - Work in Progress
At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN at least up to version 5.
I am developing a set of functions to enable this facility. The forward functions from APL+WIN to Unicode UTF-8 are below. The reverse functions will follow shortly.
AplToUtf8 takes the name of a function and converts the code to Unicode UTF-8 encoding. At the moment this function simply deals with whole functions but can easily be generalised to work with any character string input.
∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\test.txt' ∇
Utf8 which is called under each (¨) in the above simply implements the UTF-8 specification to create the unicode byte structure for each character.
∇ r←Utf8 c ⍝Determine the number of bytes required to represent the character in unicode r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16 ⍝Convert the character to bytes according to the UTF-8 specification :Select r :Case 1 r←⍎⍕0,(7⍴2)⊤c :Case 2 r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c :Case 3 r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c :Case 4 r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c :EndSelect ∇
∆avutf8, below, is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points. The large number of 63 entries represent question marks and anyone who knows the corresponding unicode code points for those places in the APL+WIN atomic vector should feel free to add them in. Also please feel free to correct any errors in the vector that might show up when using it with the above functions
I used the functions to convert themselves into unicode for the wiki. I will shortly be posting the reverse functions. Anyone wishing to create their own versions for another interpreter need only concern themselves with AplToUtf8 and their specific atomic vector to unicode translation vector. Utf8 is trivial and should work with any interpreter that supports control structures.
The function FileData is a simple utility function to file the result. I am sure you all have your own versions but for completeness I might list it when the more important stuff is finished.
0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 8764 127 63 63 63 63 63 63 8800 63 63 63 63 63 63 8968 63 8970 63 8710 215 63 63 9109 63 9054 9017 63 63 63 63 63 9066 63 63 63 63 63 63 63 9053 9024 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 9082 63 9075 63 63 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 63 8728 9675 8744 9076 63 175 124 63