## page was renamed from AplPlusWinToUnicode = APL to Unicode = Whilst the material described below relates specifically to APL+WIN is should be readily customisable to work with any APL interpreter that is not already unicode capable. Currently the APL to unicode functions write the unicode to native text files from which it can be cut and pasted into emails, newsgroups, web pages etc. Similarly the unicode to APL function requires the unicode to be cut and pasted from its source into a native text file prior to conversion. The unicode can be copied and pasted to the text files using MSNotePad with the APL385 Unicode font. Also make sure you select UTF-8 as the encoding when doing a "Save as" when you save a file. My original aim was to work directly via the clipboard but the amount of APL code required to manage the windows clipboard is prohibitive for displaying here. APL+WIN has in-built user commands (]clipcopy and ]clippaste) to do the job and I suggest APL+WIN users use those if they want to go directly via the clipboard. Users of other interpreters no doubt have their own equivalents they can use. `AplToUtf8` takes the name of a function and converts the code to Unicode UTF-8 encoding. As it stands this function simply deals with whole functions but can easily be generalised to work with any character string input. For a quick and dirty job just comment out the first two lines of working code for it to work on simple character input. {{{ ∇ AplToUtf8 f ⍝Get a character representation of the function f←⎕cr f ⍝Append new line and carriage return characters f←(f,⎕tcnl),⎕tclf ⍝Convert each character to its unicode binary value f←∊Utf8 ¨∆avutf8[⎕av⍳,f] ⍝Add the encoding level header and convert back to ascii characters f←82 ⎕dr 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1,f ⍝File the character stream f FileData 'c:\unicode.txt' ∇ }}} `Utf8` which is called under each `(¨)` in the above simply implements the UTF-8 specification to create the unicode byte structure for each character. Anyone interested in the byte structure can see it here: http://en.wikipedia.org/wiki/UTF-8 and scroll down to the Description section. {{{ ∇ r←Utf8 c ⍝Determine the number of bytes required to represent the character in unicode r←+/(⌈/((21⍴2)⊤c)/⌽⍳21)>0 7 11 16 ⍝Convert the character to bytes according to the UTF-8 specification :Select r :Case 1 r←⍎⍕0,(7⍴2)⊤c :Case 2 r←⍎⍕(1 1 0,5↑r),1 0,5↓r←(11⍴2)⊤c :Case 3 r←⍎⍕(1 1 1 0,4↑r),(1 0,6↑4↓r),1 0,10↓r←(16⍴2)⊤c :Case 4 r←⍎⍕(1 1 1 1 0,3↑r),(1 0,6↑3↓r),(1 0,6↑9↓r),1 0,15↓r←(21⍴2)⊤c :EndSelect ∇ }}} The function `FileData` is a simple utility function to file the result. I am sure you all have your own versions. `Utf8ToApl` is the reverse function. It assumes that the unicode resides in a native text file. {{{ ∇ r←Utf8ToApl;v ⍝Tie the native file containing the unicode 'c:\unicode.txt' ⎕ntie ¯1 ⍝Read the bits from the file v←⎕nread ¯1 11,(⎕nsize ¯1),0 ⍝Untie the file ⎕nuntie ¯1 ⍝Initialise the results vector r←0⍴0 ⍝Convert the bits to integers v←2⊥⍉((.125×⍴v),8)⍴v ⍝Strip off the encoding header if present :if 617=+/3↑v v←3↓v :endif ⍝Decode the unicode bytes back to integers in accordance with the UTF-8 specification :while 0≠⍴v ⍝Determine how many bytes represent the next character :select +/(↑v)>0 127 223 239 :case 1 r←r,2⊥1↓(8⍴2)⊤v[1] v←1↓v :case 2 r←r,2⊥(3↓(8⍴2)⊤v[1]),2↓(8⍴2)⊤v[2] v←2↓v :case 3 r←r,2⊥(4↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),2↓(8⍴2)⊤v[3] v←3↓v :case 4 r←r,2⊥(5↓(8⍴2)⊤v[1]),(2↓(8⍴2)⊤v[2]),(2↓(8⍴2)⊤v[3]),2↓(8⍴2)⊤v[4] v←4↓v :endselect :endwhile ⍝Convert unicode integers back to ⎕av characters r←⎕av[(∆avutf8⍳r)∼11] ∇ }}} `∆avutf8` is a vector used to map the APL+WIN ⎕AV positions to their unicode code-points. {{{}}} I used the functions to convert themselves into unicode for the wiki. They should readily translate to any interpreter if not usable directly. Anyone wishing to create their own versions for another interpreter needs firstly to create the appropriate translation vector for their interpreter. To get you started I have reproduced the APL+WIN atomic vector below. Another excellent resource is Adrian Smith's article in Vector http://www.vector.org.uk/resource/uniref.pdf. {{{ ⍷◊¨← ⊂ ⊃⍟åæì⍫ÙÒ⍬⍵↑↓→ ⊣⊢⍋⍒ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{¦}∼ Çüéâäà≠çêëèïî⌈Ä⌊É∆×ôö⎕û⍞⌹ÖÜ¢£?⍪⍨áíóúñÑ⍝⍀¿⌷őøý¡«»⎕⎕⎕||||++||+++++ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ-ÑÒÓÔÕÖ+ØÙÚÛÜÝ|ÿ⍺ß⍳⍤ã⍱⊥⊤⌽⊖⍲⌿∇⍉∊∩≡⍙≥≤⍕⍎÷"∘○∨⍴∪¯| }}} I could not resist the challenge when one reader commented that these functions were not very "APL like" so I created a new set at [[AplToUnicodeII]] Author: GrahamSteer ---- CategoryUnicode