CSV to APL

CSV stands for comma separated values. Those files are still used to transport tabular data between applications that are not directly connected.

Such files can be edited with any spreadsheet application like Microsoft Excel.

There are some things one need to know about CSV files in order to deal with them:

Excel escapes these values by embedding the field inside a set of double quotes. For example, a single cell with the text apples, carrots, and oranges becomes "apples, carrots, and oranges".

For details and background information see:

Note that the format comes with a nasty built-in-problem: there is no way to recognize a cell as being numeric. Converting cells which only contain a proper number does not help because if you enter a digit with a leading quote, Excel handles this as text but again this cannot be recognized as text in the csv file.

The only solution is therefore to make an informed guess. This informed guess can vary from file to file and person to person, so please look at the functions Csv2MatrixWithDyalog and Csv2Numeric below to see if you want to change them before you start using them.

Reading a CSV file using Dyalog APL

Given an Excel spreadsheet that looks like this:

numeric

char

date

currency

misc

1

1

2015-03-24

1.23

Yes

2

Hello

2015-01-01

¯10

3

1999-12-31

No

4

More

2001-02-01

123456789.1

"Are your sure?"

Less

5

Much more

2014-04-03

0

apples, carrots, and bananas

Saving this into a csv file the file can be read into APL. The variable would look like this:

csvapl.jpg

To convert this into an APL matrix is a two-step-process:

  1. Partition the simple string from file
  2. Extract the data and build up the APL matrix

First step: partition the string being read from file

With the following two functions this variable can be transformed into an APL array where every item represents a record. Data masked by double quotes (") remain unchanged.

The functions can deal with files from Unix, Mac and Windows.

r←PartitionRecordsWithDyalog string;masked;cr;lf;bool
⍝ Takes a string and partitions records.
⍝ Can deal with Mac/Unix/Windows files.
⍝ For that, CR+LF as well as single LFs are converted into CR.
⍝ CR is then used to partition "string".
 ⎕IO←1 ⋄ ⎕ML←3
 (cr lf)←⎕UCS 10 13                         ⍝ <CarriageReturn> and <LineFeed>
 :If 0<+/bool←(cr,lf)⍷string                ⍝ are there any cr+lf in "string"?
     string←(~bool)/string                  ⍝ Let only the cr survive
 :EndIf
 :If 0<+/bool←cr=string                     ⍝ Are there still any cr's?
     (bool/string)←lf                       ⍝ Convert them to lf
 :EndIf
⍝ In the remaining string, there might be lf's inside text, Those
⍝ need to be masked before we decide where records really start.
 masked←~{⍵∨≠\⍵}'"'=string                  ⍝ what is not escaped (between "")
 :If 1∊bool←lf=masked/string                ⍝ are there any unmasked lf in "string"?
     r←(~masked\bool)⊂string
 :Else ⍝ so it's a single record
     r←⊂string
 :EndIf

Second step: extract the real data

 r←{sep}Csv2MatrixWithDyalog csv;bool;⎕IO
⍝ Convert vector-of-text-vectors "csv" that is assumed to
⍝ come from  a *.csv file and which got already partitioned
⍝ into an APL matrix. Takes care of escaped stuff etc.
⍝ "sep" defaults to a comma.
 ⎕IO←1 ⋄ ⎕ML←3
 sep←{2=⎕NC ⍵:⍎⍵ ⋄ ','}'sep'
 r←(⌽∨\0≠⌽↑∘⍴¨csv)/csv          ⍝ remove empty stuff from the end if any
 r←',',¨r                       ⍝ Add starting seperator
 bool←{~{⍵∨≠\⍵}'"'=⍵}¨r         ⍝ prepare booleans useful to mask escaped stuff
 r←r{⎕ML←1 ⋄ ⍺⊂⍨⍵=sep}¨bool{⍺\⍺/⍵}¨r   ⍝ partition fields by unmasked commas
 r←⊃{1↓¨⍵}¨r                    ⍝ Drop command and transform to a matrix
 r←{'"'≠1⍴⍵:⍵ ⋄ ¯1↓1↓⍵}¨r       ⍝ remove leading and trailing "
 r←Csv2Numeric r                ⍝ Convert numeric cells
 r←(~'""'∘⍷¨r)/¨r               ⍝ Reduce double-" to single ones

 r←{ignore}Csv2Numeric r;buffer
⍝ Transform cells that contain digits into numeric values, BUT:
⍝ * Commas are ignored.
⍝ * "$£€¥" are ignored because the left argument "ignore" defaults to those.
⍝ * Blanks are removed
⍝ Example:
⍝ (¯10 3 4 1234.5 12 1000  '1A')←Csv2Numeric '-10' '3' '4' '123,4.5' '£12' '1E3' '1A'
 ignore←{0<⎕NC ⍵:⍎⍵ ⋄ '$£€¥'}'ignore'
 buffer←{0=+/bool←'-'=w←⍵:⍵ ⋄ (bool/w)←'¯' ⋄ w}¨r  ⍝ "buffer" is a copy of r with "¯" for "-"
 r←buffer{(0∊⍴⍵):'' ⋄ ,↑1⊃v←⎕VFI ⍺~' ,',ignore:↑2⊃v ⋄ ⍵}¨r ⍝ make fields with appropriate content numeric

Putting it all together

 r←{sep} DealWithCsv filename;data
⍝ Read "filename" which is assumed to be a *.csv file
⍝ and convert it into a matrix
 sep←{2=⎕NC ⍵:⍎⍵ ⋄ ','}'sep'
 data←FileRead filename
 data←PartitionRecordsWithDyalog data
 r←sep Csv2MatrixWithDyalog data

The resulting variable in APL would look like this:

csvinapl.jpg

Note that the 1 in the second row/second column got converted into the number because the contents of the cell remained of digits only. However, in the original Excel spreadsheet that cell is text; this is indicated by the small green triangle. This information is not contained in the CSV file.

Writing a CSV file using Dyalog APL

Given an APL array like:

   ⎕←2 3⍴'APL' 'is' 'fine, very fine' 1 2.2 ¯3
 APL   is  fine, very fine
   1  2.2    ¯3

The following function takes such an array as right argument and converts it into a string that can be written to a file with the extension ".csv". The left argument defaults to "windows" and can be "unix" or "mac" as well. Note that the left argument is case sensitive. The left argument is used to determine the appropriate record separator.

 r←{os}Array2CsvWithDyalog array;cr;lf;sep;bool;IsChar;dq
 ⎕IO←1 ⋄ ⎕ML←3
 :If 0=⎕NC'os'
     os←'windows'
 :EndIf
 'Invalid left argument; must be one of: windows, unix, mac'⎕SIGNAL 11/⍨~(⊂os)∊'windows' 'unix' 'mac'
 'Right argument must have a depth of 2'⎕SIGNAL 11/⍨2≠≡array
 'Right argument must be either a matrix or a vector'⎕SIGNAL 11/⍨~(⍴⍴array)∊1 2
 (cr lf)←⎕TC[2 3]                                 ⍝ <CarriageReturn> and <LineFeed>
 sep←('windows' 'unix' 'mac'⍳⊂os)⊃(cr,lf)lf cr    ⍝ select proper record separator
 IsChar←{0 2∊⍨10|⎕DR ⍵}                           ⍝ Version 12 compatible
 bool←,~IsChar¨array                              ⍝ locate numbers
 (bool/,array)←{('-',⍵)[('¯',⍵)⍳⍵]}¨⍕¨bool/,array ⍝ make numbers text and convert ¯ to -
 dq←,'"'∊¨array                                   ⍝ Where are double quotes in the text?
 (dq/,array)←{⍵/⍨1+'"'=⍵}¨dq/,array               ⍝ Double the double quotes
 bool←dq∨,(lf∊¨array)∨','∊¨array                  ⍝ where are special chars used?
 (bool/,array)←{'"',⍵,'"'}¨bool/,array            ⍝ escape field with special chars
 array←{⊃{⍺,',',⍵}/⍵}¨↓array                      ⍝ separate fields by comma
 r←⊃,/array,¨⊂sep                                 ⍝ make it simple

⍝ Example:
#.Array2CsvWithDyalog 2 3⍴'APL' 'really "really" is' 'fine, very fine' 1 2.2 ¯3
APL,"really ""really"" is","fine, very fine"
1,2.2,-3

Author: KaiJaeger

Update -- KaiJaeger 2012-08-05 11:06:46 incorporating a couple of findings/suggestions from EllisMorgan.

Update -- KaiJaeger 2015-03-24 11:26:39 bug fix: empty cells were not handled correctly.

Update -- KaiJaeger 2016-02-02 13:28:21 Improvements as suggested by PierreGilbert.


CategoryArticles

CsvToApl (last edited 2017-02-16 19:12:54 by KaiJaeger)