Size: 4921
Comment:
|
Size: 8442
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
I am developing a set of functions to enable this facility. The forward functions from APL+WIN to Unicode UTF-8 are below. The reverse functions will follow shortly. | I am developing a set of functions to enable this facility. The current prototypes are below. The aim is to miss out the file stage and work directly via the clipboard. |
Line 54: | Line 54: |
I used the functions to convert themselves into unicode for the wiki. I will shortly be posting the reverse functions. Anyone wishing to create their own versions for another interpreter need only concern themselves with `AplToUtf8` and their specific atomic vector to unicode translation vector. Utf8 is trivial and should work with any interpreter that supports control structures. | I used the functions to convert themselves into unicode for the wiki. Anyone wishing to create their own versions for another interpreter firstly need to create the appropriate translation vector for their interpreter. The functions should readily translate if not directly usable. |
Line 77: | Line 77: |
Here is the first prototype reverse function, `Utf8ToApl`. It assumes that the unicode resides in a native text file. {{{ ∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\test.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇ }}} |
APL+WIN to Unicode - Work in Progress
At present (2008-12-22) it isn't possible to copy and paste unicode from and to APL+WIN at least up to version 5.
I am developing a set of functions to enable this facility. The current prototypes are below. The aim is to miss out the file stage and work directly via the clipboard.
AplToUtf8 takes the name of a function and converts the code to Unicode UTF-8 encoding. At the moment this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input.
∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\test.txt' ∇
Utf8 which is called under each (¨) in the above simply implements the UTF-8 specification to create the unicode byte structure for each character.
∇ r←Utf8 c ⍝Determine the number of bytes required to represent the character in unicode r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16 ⍝Convert the character to bytes according to the UTF-8 specification :Select r :Case 1 r←⍎⍕0,(7⍴2)⊤c :Case 2 r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c :Case 3 r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c :Case 4 r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c :EndSelect ∇
∆avutf8, below, is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points. The large number of 63 entries represent question marks and anyone who knows the corresponding unicode code points for those places in the APL+WIN atomic vector should feel free to add them in. Also please feel free to correct any errors in the vector that might show up when using it with the above functions
I used the functions to convert themselves into unicode for the wiki. Anyone wishing to create their own versions for another interpreter firstly need to create the appropriate translation vector for their interpreter. The functions should readily translate if not directly usable.
The function FileData is a simple utility function to file the result. I am sure you all have your own versions but for completeness I might list it when the more important stuff is finished.
0 1 2 9079 9674 168 8592 7 8 9 10 8834 12 13 8835 9055 16 17 18 9067 20 21 9068 9077 8593 8595 8594 27 8867 8866 9035 9042 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 8764 127 63 63 63 63 63 63 8800 63 63 63 63 63 63 8968 63 8970 63 8710 215 63 63 9109 63 9054 9017 63 63 63 63 63 9066 63 63 63 63 63 63 63 9053 9024 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 63 9082 63 9075 63 63 9073 8869 8868 9021 8854 9074 9023 8711 9033 8714 9067 8801 9049 8805 8804 9045 9038 247 63 8728 9675 8744 9076 63 175 124 63
Here is the first prototype reverse function, Utf8ToApl. It assumes that the unicode resides in a native text file.
∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\test.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇