Differences between revisions 3 and 4
Revision 3 as of 2007-11-25 18:38:24
Size: 3610
Editor: KaiJaeger
Comment:
Revision 4 as of 2007-11-25 18:38:42
Size: 3605
Editor: KaiJaeger
Comment:
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
Note that the format comes with some a nasty built-in-problem: there is no way to recognize a cell as being numeric. Converting cells which contains appropriate does not help because if you enter a digit with a leading quote, Excel handles this as text but again this cannot be recognized as text in the csv file. The only solution is therefore to make an informed guess. Note that the format comes with a nasty built-in-problem: there is no way to recognize a cell as being numeric. Converting cells which contains appropriate does not help because if you enter a digit with a leading quote, Excel handles this as text but again this cannot be recognized as text in the csv file. The only solution is therefore to make an informed guess.

CSV to APL

CSV stands for comma separated values. Those files are still used to transport tabular data between applications that are not directly connected. The files can be edited with any spreadsheet application like Microsoft Excel.

There are some things one need to know about CSV file in order to deal with them:

  • Fields are separated by commas
  • Records are separated with system end of line characters, CRLF (ASCII 13 Dec or 0D Hex and ASCII 10 Dec or 0A Hex respectively) for Windows, LF for Unix, and CR for Mac
  • If a field contains either a comma or one of the end line chars, either the char(s) or the whole contents needs to be escaped. Excel escapes these values by embedding the field inside a set of double quotes. For example, a single cell with the text apples, carrots, and oranges becomes "apples, carrots, and oranges"

For details and background information see http://www.csvreader.com/csv_format.php

Note that the format comes with a nasty built-in-problem: there is no way to recognize a cell as being numeric. Converting cells which contains appropriate does not help because if you enter a digit with a leading quote, Excel handles this as text but again this cannot be recognized as text in the csv file. The only solution is therefore to make an informed guess.

Given an Excel spreadsheet that looks like this:

attachment:cvsexcel3.jpg

Saving this into a csv file, the file can be read into APL. The variable would look like this:

attachment:csvapl.jpg

With the following two functions this variable can be transformed into an APL matrix.

APL2 Version

 r←Csv2MatrixWithAPL2 csv;crlf;buffer;bool;⎕IO;isNotEmpty
⍝ Convert a simple string "csv" that is assumed to come from a *.csv 
⍝ file into an APL matrix. Takes care of escaped stuff.
 ⎕IO←1
 crlf←⎕AV[4 3]  ⍝ That's what CR LF is in Dyalog APL - tell me what it is in APL2?!
 buffer←crlf,csv
 r←1↓¨buffer⊂⍨~crlf⍷buffer       ⍝ partion records
 r←(⌽∨\0≠⌽↑¨⍴¨r)/r               ⍝ remove empty stuff from the end
 bool←~bool∨¨≠\¨bool←'"'=¨r      ⍝ ignore what's escaped (between " and ")
 r←⊃(','≠¨bool\¨bool/¨r)⊂¨r      ⍝ partition fields by commas
 r←('"'=¨↑¨r)↓¨r                 ⍝ remove leading "
 r←(-'"'=¨↑¨¯1↑¨r)↓¨r            ⍝ remove trailing "
 isNotEmpty←0<↑¨⍴¨r              ⍝ remember empty fields
 bool←,isNotEmpty∧∧/¨r∊¨⊂⎕D,'.'  ⍝ fields which contains digits only, and maybe a .
 (bool/,r)←⍎¨bool/,r             ⍝ Make those numeric

Dyalog Version

 r←Csv2MatrixWithDyalog csv;crlf;bool;⎕IO
⍝ Convert a simple string "csv" that is assumed to come from a *.csv
⍝ file into an APL matrix. Takes care of escaped stuff.
 ⎕IO←1 ⋄ ⎕ML←3
 crlf←⎕AV[4 3]                  ⍝ that's what CR LF is in Dyalog APL
 r←{1↓¨⍵⊂⍨~crlf⍷⍵}crlf,csv      ⍝ partion records
 r/⍨←⌽∨\0≠⌽↑∘⍴¨r                ⍝ remove empty stuff from the end
 bool←{~{⍵∨≠\⍵}'"'=⍵}¨r         ⍝ prepare booleans useful to mask escaped stuff
 r←⊃r{⍺⊂⍨⍵≠','}¨bool{⍺\⍺/⍵}¨r   ⍝ partition fields by unmasked commas
 r←{'"'≠1⍴⍵:⍵ ⋄ ¯1↓1↓⍵}¨r       ⍝ remove leading and trailing "
 r←{↑1⊃v←⎕VFI ⍵:↑2⊃v ⋄ ⍵}¨r     ⍝ make fields whith appropriate content numeric scalars

The resulting vaiable in APL would look like this:

attachment:csvinapl.jpg

Author: KaiJaeger

CsvToApl (last edited 2017-02-16 19:12:54 by KaiJaeger)