APL to Unicode - A more APL like version??!!

Whilst the material described below relates specifically to APL+WIN it should be readily customisable to work with any APL interpreter that is not already unicode capable.

Currently the two functions involved assume a native text file c:\unicode.txt exists.

The AplToUtf8 function converts a character string in the workspace to unicode UTF-8 and writes it to the text file and the Utf8ToApl reads the text file and converts the unicode to characters in the workspace.

The unicode can be copied from the text file and pasted into e-mails, news group posts, other unicode enabled APL interpreters etc. Conversely anything received in unicode via such routes can be copied and pasted to the text file for conversion back for use in the workspace. The easiest way to achieve this is to use MS Note Pad to create the original text file with the font set to APL385 Unicode and the encoding set to UTF-8

My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here.

AplToUtf8

 ∇  AplToUtf8 c                                               
                                                          
⍝ Index into the translation nested vector via ⎕av         
⍝ inserting a question mark for any invalid character      
  c←(∆avutf8,⊂63 0 0)[⎕av⍳c]                                   
                                                          
⍝ Remove any padding zeros, add the encoding level header and convert to bits
  c←⍎⍕,⍉(8⍴2)⊤239 187 191,∊(+/¨c>0)↑¨c                      
                                                          
⍝ Clear the text file and append the bit stream    
  'c:\unicode.txt' ⎕ntie ¯1                                 
  0 ⎕nresize ¯1                                             
  c ⎕nappend ¯1                                             
  ⎕nuntie ¯1                                                

 ∇

To convert a function in an APL+WIN workspace to unicode one would execute:

    AplToUtf8 ((⎕cr 'AplToUtf8'),⎕tcnl),⎕tclf

Utf8ToApl

 ∇ r←Utf8ToApl;n                                                      
                                                                   
⍝ Read the unicode text file as a bit stream                 
  'c:\unicode.txt' ⎕ntie ¯1                                          
  r←⎕nread ¯1 11,(n←⎕nsize ¯1),0                                     
  ⎕nuntie ¯1                                                         
                                                                   
⍝ Form a bit matrix with one byte per row                           
  r←(n,8)⍴r                                                          
                                                                   
⍝ Determine the start byte of each character                        
  n←⍎⍕0≠(0=r[;1])+(2=+/r[;⍳2])+3=+/r[;⍳3]                            
                                                                   
⍝ Convert the bytes for each character to three integers            
⍝ padding as necessary with zeros and form a nested vector          
  r←3↑¨n ⎕penclose 2⊥⍉r                                              
                                                                   
⍝ Index into ⎕av via the translation nested vector dropping any file
⍝ header and inserting a question mark for any invalid character    
  r←(⎕av,'?')[∆avutf8⍳r∼⊂239 187 191]                                   

∇

To convert a unicode function from the text file and display it in an APL+WIN workspace one would execute:

    Utf8ToApl∼⎕tcnl

Both functions use the nested vector ∆avutf8 to map the APL+WIN ⎕AV positions to their unicode code-points. The code points are each expressed as one element in the nested vector comprising three integers reflecting their corresponding byte patterns in the UTF-8 specification (see http://en.wikipedia.org/wiki/UTF-8). For single and double byte characters zeros are added to make up the triplets.

       0 0 0        1 0 0        2 0 0  226 141 183
 226 151 138    194 168 0  226 134 144        7 0 0
       8 0 0        9 0 0       10 0 0  226 138 130
      12 0 0       13 0 0  226 138 131  226 141 159
      16 0 0       17 0 0       18 0 0  226 141 171
      20 0 0       21 0 0  226 141 172  226 141 181
 226 134 145  226 134 147  226 134 146       27 0 0
 226 138 163  226 138 162  226 141 139  226 141 146
      32 0 0       33 0 0       34 0 0       35 0 0
      36 0 0       37 0 0       38 0 0       39 0 0
      40 0 0       41 0 0       42 0 0       43 0 0
      44 0 0       45 0 0       46 0 0       47 0 0
      48 0 0       49 0 0       50 0 0       51 0 0
      52 0 0       53 0 0       54 0 0       55 0 0
      56 0 0       57 0 0       58 0 0       59 0 0
      60 0 0       61 0 0       62 0 0       63 0 0
      64 0 0       65 0 0       66 0 0       67 0 0
      68 0 0       69 0 0       70 0 0       71 0 0
      72 0 0       73 0 0       74 0 0       75 0 0
      76 0 0       77 0 0       78 0 0       79 0 0
      80 0 0       81 0 0       82 0 0       83 0 0
      84 0 0       85 0 0       86 0 0       87 0 0
      88 0 0       89 0 0       90 0 0       91 0 0
      92 0 0       93 0 0       94 0 0       95 0 0
      96 0 0       97 0 0       98 0 0       99 0 0
     100 0 0      101 0 0      102 0 0      103 0 0
     104 0 0      105 0 0      106 0 0      107 0 0
     108 0 0      109 0 0      110 0 0      111 0 0
     112 0 0      113 0 0      114 0 0      115 0 0
     116 0 0      117 0 0      118 0 0      119 0 0
     120 0 0      121 0 0      122 0 0      123 0 0
   194 166 0      125 0 0  226 136 188      127 0 0

I used the functions to convert themselves into unicode for the wiki. Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. An excellent place to start for anyone wishing to do this is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf. The functions themselves should readily translate to any interpreter if not usable directly.

Author: GrahamSteer

CategoryUnicode