Differences between revisions 5 and 6
Revision 5 as of 2009-01-05 22:18:49
Size: 13073
Editor: anonymous
Comment: APL+WIN atomic vector to unicode mapping added
Revision 6 as of 2009-01-17 09:45:34
Size: 13071
Editor: anonymous
Comment: code points of "diamond" and "not" adjusted to translate from APL+WIN into Dyalog unicode
Deletions are marked like this. Additions are marked like this.
Line 77: Line 77:
 226 151 138 194 168 0 226 134 144 7 0 0  226 139 132 194 168 0 226 134 144 7 0 0
Line 107: Line 107:
   194 166 0 125 0 0 226 136 188 127 0 0    194 166 0 125 0 0     126 0 0 127 0 0
Line 151: Line 151:
 ◊ 226 151 138 ¨ 194 168 0 ← 226 134 144 BEL 7 0 0  ◊ 226 139 132 ¨ 194 168 0 ← 226 134 144 BEL 7 0 0
Line 181: Line 181:
 ¦ 194 166 0 } 125 0 0 ∼ 226 136 188 DEL 127 0 0  ¦ 194 166 0 } 125 0 0 ∼     126 0 0 DEL 127 0 0

APL to Unicode - A more APL like version??!!

Whilst the material described below relates specifically to APL+WIN it should be readily customisable to work with any APL interpreter that is not already unicode capable.

Currently the two functions involved assume a native text file c:\unicode.txt exists.

The AplToUtf8 function converts a character string in the workspace to unicode UTF-8 and writes it to the text file and the Utf8ToApl reads the text file and converts the unicode to characters in the workspace.

The unicode can be copied from the text file and pasted into e-mails, news group posts, other unicode enabled APL interpreters etc. Conversely anything received in unicode via such routes can be copied and pasted to the text file for conversion back for use in the workspace. The easiest way to achieve this is to use MS Note Pad to create the original text file with the font set to APL385 Unicode and the encoding set to UTF-8

My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here.

AplToUtf8

 ∇  AplToUtf8 c                                               
                                                          
⍝ Index into the translation nested vector via ⎕av         
⍝ inserting a question mark for any invalid character      
  c←(∆avutf8,⊂63 0 0)[⎕av⍳c]                                   
                                                          
⍝ Remove any padding zeros, add the encoding level header and convert to bits
  c←⍎⍕,⍉(8⍴2)⊤239 187 191,∊(+/¨c>0)↑¨c                      
                                                          
⍝ Clear the text file and append the bit stream    
  'c:\unicode.txt' ⎕ntie ¯1                                 
  0 ⎕nresize ¯1                                             
  c ⎕nappend ¯1                                             
  ⎕nuntie ¯1                                                

To convert a function in an APL+WIN workspace to unicode one would execute:

    AplToUtf8 ((⎕cr 'AplToUtf8'),⎕tcnl),⎕tclf

Utf8ToApl

 ∇ r←Utf8ToApl;n                                                      
                                                                   
⍝ Read the unicode text file as a bit stream                 
  'c:\unicode.txt' ⎕ntie ¯1                                          
  r←⎕nread ¯1 11,(n←⎕nsize ¯1),0                                     
  ⎕nuntie ¯1                                                         
                                                                   
⍝ Form a bit matrix with one byte per row                           
  r←(n,8)⍴r                                                          
                                                                   
⍝ Determine the start byte of each character                        
  n←⍎⍕0≠(0=r[;1])+(2=+/r[;⍳2])+3=+/r[;⍳3]                            
                                                                   
⍝ Convert the bytes for each character to three integers            
⍝ padding as necessary with zeros and form a nested vector          
  r←3↑¨n ⎕penclose 2⊥⍉r                                              
                                                                   
⍝ Index into ⎕av via the translation nested vector dropping any file
⍝ header and inserting a question mark for any invalid character    
  r←(⎕av,'?')[∆avutf8⍳r∼⊂239 187 191]                                   

To convert a unicode function from the text file and display it in an APL+WIN workspace one would execute:

    Utf8ToApl∼⎕tclf 

Both functions use the nested vector ∆avutf8 to map the APL+WIN ⎕AV positions to their unicode code-points. The code points are each expressed as one element in the nested vector comprising three integers reflecting their corresponding byte patterns in the UTF-8 specification (see http://en.wikipedia.org/wiki/UTF-8). For single and double byte characters zeros are added to make up the triplets.

       0 0 0        1 0 0        2 0 0  226 141 183
 226 139 132    194 168 0  226 134 144        7 0 0
       8 0 0        9 0 0       10 0 0  226 138 130
      12 0 0       13 0 0  226 138 131  226 141 159
 195 165 0 0  195 166 0 0  195 172 0 0  226 141 171
 195 153 0 0  195 146 0 0  226 141 172  226 141 181
 226 134 145  226 134 147  226 134 146       27 0 0
 226 138 163  226 138 162  226 141 139  226 141 146
      32 0 0       33 0 0       34 0 0       35 0 0
      36 0 0       37 0 0       38 0 0       39 0 0
      40 0 0       41 0 0       42 0 0       43 0 0
      44 0 0       45 0 0       46 0 0       47 0 0
      48 0 0       49 0 0       50 0 0       51 0 0
      52 0 0       53 0 0       54 0 0       55 0 0
      56 0 0       57 0 0       58 0 0       59 0 0
      60 0 0       61 0 0       62 0 0       63 0 0
      64 0 0       65 0 0       66 0 0       67 0 0
      68 0 0       69 0 0       70 0 0       71 0 0
      72 0 0       73 0 0       74 0 0       75 0 0
      76 0 0       77 0 0       78 0 0       79 0 0
      80 0 0       81 0 0       82 0 0       83 0 0
      84 0 0       85 0 0       86 0 0       87 0 0
      88 0 0       89 0 0       90 0 0       91 0 0
      92 0 0       93 0 0       94 0 0       95 0 0
      96 0 0       97 0 0       98 0 0       99 0 0
     100 0 0      101 0 0      102 0 0      103 0 0
     104 0 0      105 0 0      106 0 0      107 0 0
     108 0 0      109 0 0      110 0 0      111 0 0
     112 0 0      113 0 0      114 0 0      115 0 0
     116 0 0      117 0 0      118 0 0      119 0 0
     120 0 0      121 0 0      122 0 0      123 0 0
   194 166 0      125 0 0      126 0 0      127 0 0
   195 135 0    195 188 0    195 169 0    195 162 0
   195 164 0    195 160 0  226 137 160    195 167 0
   195 170 0    195 171 0    195 168 0    195 175 0
   195 174 0  226 140 136    195 132 0  226 140 138
   195 137 0  226 136 134    195 151 0    195 180 0
   195 182 0  226 142 149    195 187 0  226 141 158
 226 140 185    195 150 0    195 156 0    194 162 0
   194 163 0       63 0 0  226 141 170  226 141 168
   195 161 0    195 173 0    195 179 0    195 186 0
   195 177 0    195 145 0  226 141 157  226 141 128
   194 191 0  226 140 183    197 145 0    195 184 0
   195 189 0    194 161 0    194 171 0    194 187 0
 226 142 149  226 142 149  226 142 149      124 0 0
     124 0 0      124 0 0      124 0 0       43 0 0
      43 0 0      124 0 0      124 0 0       43 0 0
      43 0 0       43 0 0       43 0 0       43 0 0
   195 128 0    195 129 0    195 130 0    195 131 0
   195 132 0    195 133 0    195 134 0    195 135 0
   195 136 0    195 137 0    195 138 0    195 139 0
   195 140 0    195 141 0    195 142 0    195 143 0
      45 0 0    195 145 0    195 146 0    195 147 0
   195 148 0    195 149 0    195 150 0       43 0 0
   195 152 0    195 153 0    195 154 0    195 155 0
   195 156 0    195 157 0      124 0 0    195 191 0
 226 141 186    195 159 0  226 141 179  226 141 164
   195 163 0  226 141 177  226 138 165  226 138 164
 226 140 189  226 138 150  226 141 178  226 140 191
 226 136 135  226 141 137  226 136 138  226 136 169
 226 137 161  226 141 153  226 137 165  226 137 164
 226 141 149  226 141 142    195 183 0       34 0 0
 226 136 152  226 151 139  226 136 168  226 141 180
 226 136 170    194 175 0      124 0 0        0 0 0

I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly.

Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector to translation vector mapping below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf.

NUL       0 0 0   SOH       1 0 0   STX       2 0 0    ⍷  226 141 183 
 ◊  226 139 132    ¨    194 168 0    ←  226 134 144   BEL       7 0 0 
BS        8 0 0   HT        9 0 0   LF       10 0 0    ⊂  226 138 130 
FF       12 0 0   NL       13 0 0    ⊃  226 138 131    ⍟  226 141 159 
 å  195 165 0 0    æ  195 166 0 0    ì  195 172 0 0    ⍫  226 141 171 
 Ù  195 153 0 0    Ò  195 146 0 0    ⍬  226 141 172    ⍵  226 141 181 
 ↑  226 134 145    ↓  226 134 147    →  226 134 146   ESC      27 0 0 
 ⊣  226 138 163    ⊢  226 138 162    ⍋  226 141 139    ⍒  226 141 146 
Space    32 0 0    !       33 0 0    "       34 0 0    #       35 0 0 
 $       36 0 0    %       37 0 0    &       38 0 0    '       39 0 0 
 (       40 0 0    )       41 0 0    *       42 0 0    +       43 0 0 
 ,       44 0 0    -       45 0 0    .       46 0 0    /       47 0 0 
 0       48 0 0    1       49 0 0    2       50 0 0    3       51 0 0 
 4       52 0 0    5       53 0 0    6       54 0 0    7       55 0 0 
 8       56 0 0    9       57 0 0    :       58 0 0    ;       59 0 0 
 <       60 0 0    =       61 0 0    >       62 0 0    ?       63 0 0 
 @       64 0 0    A       65 0 0    B       66 0 0    C       67 0 0 
 D       68 0 0    E       69 0 0    F       70 0 0    G       71 0 0 
 H       72 0 0    I       73 0 0    J       74 0 0    K       75 0 0 
 L       76 0 0    M       77 0 0    N       78 0 0    O       79 0 0 
 P       80 0 0    Q       81 0 0    R       82 0 0    S       83 0 0 
 T       84 0 0    U       85 0 0    V       86 0 0    W       87 0 0 
 X       88 0 0    Y       89 0 0    Z       90 0 0    [       91 0 0 
 \       92 0 0    ]       93 0 0    ^       94 0 0    _       95 0 0 
 `       96 0 0    a       97 0 0    b       98 0 0    c       99 0 0 
 d      100 0 0    e      101 0 0    f      102 0 0    g      103 0 0 
 h      104 0 0    i      105 0 0    j      106 0 0    k      107 0 0 
 l      108 0 0    m      109 0 0    n      110 0 0    o      111 0 0 
 p      112 0 0    q      113 0 0    r      114 0 0    s      115 0 0 
 t      116 0 0    u      117 0 0    v      118 0 0    w      119 0 0 
 x      120 0 0    y      121 0 0    z      122 0 0    {      123 0 0 
 ¦    194 166 0    }      125 0 0    ∼      126 0 0   DEL     127 0 0 
 Ç    195 135 0    ü    195 188 0    é    195 169 0    â    195 162 0 
 ä    195 164 0    à    195 160 0    ≠  226 137 160    ç    195 167 0 
 ê    195 170 0    ë    195 171 0    è    195 168 0    ï    195 175 0 
 î    195 174 0    ⌈  226 140 136    Ä    195 132 0    ⌊  226 140 138 
 É    195 137 0    ∆  226 136 134    ×    195 151 0    ô    195 180 0 
 ö    195 182 0    ⎕  226 142 149    û    195 187 0    ⍞  226 141 158 
 ⌹  226 140 185    Ö    195 150 0    Ü    195 156 0    ¢    194 162 0 
 £    194 163 0    ?       63 0 0    ⍪  226 141 170    ⍨  226 141 168 
 á    195 161 0    í    195 173 0    ó    195 179 0    ú    195 186 0 
 ñ    195 177 0    Ñ    195 145 0    ⍝  226 141 157    ⍀  226 141 128 
 ¿    194 191 0    ⌷  226 140 183    ő    197 145 0    ø    195 184 0 
 ý    195 189 0    ¡    194 161 0    «    194 171 0    »    194 187 0 
 ⎕  226 142 149    ⎕  226 142 149    ⎕  226 142 149    |      124 0 0 
 |      124 0 0    |      124 0 0    |      124 0 0    +       43 0 0 
 +       43 0 0    |      124 0 0    |      124 0 0    +       43 0 0 
 +       43 0 0    +       43 0 0    +       43 0 0    +       43 0 0 
 À    195 128 0    Á    195 129 0    Â    195 130 0    Ã    195 131 0 
 Ä    195 132 0    Å    195 133 0    Æ    195 134 0    Ç    195 135 0 
 È    195 136 0    É    195 137 0    Ê    195 138 0    Ë    195 139 0 
 Ì    195 140 0    Í    195 141 0    Î    195 142 0    Ï    195 143 0 
 -       45 0 0    Ñ    195 145 0    Ò    195 146 0    Ó    195 147 0 
 Ô    195 148 0    Õ    195 149 0    Ö    195 150 0    +       43 0 0 
 Ø    195 152 0    Ù    195 153 0    Ú    195 154 0    Û    195 155 0 
 Ü    195 156 0    Ý    195 157 0    |      124 0 0    ÿ    195 191 0 
 ⍺  226 141 186    ß    195 159 0    ⍳  226 141 179    ⍤  226 141 164 
 ã    195 163 0    ⍱  226 141 177    ⊥  226 138 165    ⊤  226 138 164 
 ⌽  226 140 189    ⊖  226 138 150    ⍲  226 141 178    ⌿  226 140 191 
 ∇  226 136 135    ⍉  226 141 137    ∊  226 136 138    ∩  226 136 169 
 ≡  226 137 161    ⍙  226 141 153    ≥  226 137 165    ≤  226 137 164 
 ⍕  226 141 149    ⍎  226 141 142    ÷    195 183 0    "       34 0 0 
 ∘  226 136 152    ○  226 151 139    ∨  226 136 168    ⍴  226 141 180 
 ∪  226 136 170    ¯    194 175 0    |      124 0 0   NUL       0 0 0 

Author: GrahamSteer


CategoryUnicode

AplToUnicodeII (last edited 2009-01-17 09:45:34 by anonymous)