netCSV

(Show (Hide table-of-contents) table-of-contents)

Contents

netCSV

Overview

netCSV is a Dyalog namespace that contains useful methods for reading large comma-separated values (CSV) files representing tabular data directly into a DataTable. The DataTable can be used later for binding or querying. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The netCSV namespace is using the assembly LumenWorks.Framework.IO.dll (version 3.9.0) that is the work of Sébastien Lorion and Paul Hatcher and must be installed in the same directory has dyalog.exe.

A CSV file representing tabular data can be compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' of the CSV file. Each record is separated with ⎕UCS 13 10 (carriage return, linefeed) and each field is separated by a 'field separator'.

The CSV format is still popular because it is simple and it can be read by Microsoft Excel. The problem is that not all countries are using the same decimal separator and field separator and those need to be known when parsing a file that was not written on the same computer. To obtain the default decimal separator and field separator on your computer that will be used by Excel you can do:

      ⎕USING←'System.Globalization,mscorlib.dll'
      CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator
      CultureInfo.CurrentCulture.TextInfo.ListSeparator

The first record of the CSV file may or may not contain a 'Header' record that is the name of each columns (fields). Since this namespace is dealing with DataTable (where all columns needs to have a name) it is assume that the first record will contains the name of the fields.

For a small size file or a variables in the workspace you can use the functions explained in CsvToApl. For more Dyalog methods related to a DataTable you explore netDataTable.

SimpleCsvFileToDT

SimpleCsvFileToDT will parse a CSV file to a DataTable where all the columns will be of type String. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (in the following example it is ';'). For example if you have a file named Client.csv like this:

ClientId;FirstName;MiddleName;LastName;Occupation;City
100;Michael;Son;Ray;Actuary;Fort Worth
101;Emmanuel;Nick;Williams;Earth Moving Equipment Mechanic;Ontario
102;Phyllis;Melanie;Fox;Knitter;Fresno
103;Ricardo;Percy;Beasley;Labourer;Columbus
104;Kerri;Jewell;Bennett;Production Manager;Jacksonville
105;Carlos;Luis;Hart;Locksmith;Buffalo
106;Heath;Herbert;Glass;Aeronautical Engineer;Seattle
107;Randall;Hans;Tanner;Air Traffic Controller;Corona
108;Damien;;Washington;Chartered Secretary;Allentown
109;Charlene;Alice;Thompson;Committee Clerk;Salem
110;Ann;Alicia;Potts;Actor;Raleigh
111;Richard;Gerardo;Riddle;Assayer Sampler;Akron
112;Pierre;Adolfo;Mendez;Oceanographer;Birmingham

You can do the following to obtain a DataTable:

      dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv'  ⍝ ';' is the field separator

      (⌷dt.Columns).DataType
 System.String  System.String  System.String  System.String  System.String  System.String

      ShowDT dt

You can use Converters on the .Net side to format differently the character representation of a date or number.

CsvFileToDT

CsvFileToDT will parse a CSV file to a DataTable while preserving the data type of each column. The method will analyse the first record to determine the type (either DateTime, Double or String) of each columns. For the file msft.csv that looks like this:

Date,Open,High,Low,Close,Volume
05-Feb-16,51.94,52,49.56,50.16,62008994
04-Feb-16,52.1,52.81,51.37,52,46803371
03-Feb-16,53.25,53.39,51.26,52.16,57347519
02-Feb-16,54.17,54.26,52.65,53,54453559
01-Feb-16,54.88,55.09,54.5,54.71,43587105
29-Jan-16,54.73,55.09,54,55.09,83137039
28-Jan-16,51.86,52.21,51.25,52.06,58489190
27-Jan-16,52.01,52.2,51.02,51.22,36266102
26-Jan-16,51.79,52.44,51.55,52.17,28699484
25-Jan-16,51.94,52.65,51.65,51.79,34497323
22-Jan-16,51.41,52.33,51.26,52.29,36878402
21-Jan-16,51,51.58,50.3,50.48,40169367
.....

You can do the following to obtain a DataTable where the type of each columns will be preserved:

      dt ← ',.' CsvFileToDT 'd:\msft.csv'  ⍝ ',' is the field separator and '.' is the decimal separator

      (⌷dt.Columns).DataType
 System.DateTime  System.Double  System.Double  System.Double  System.Double  System.Double

      ShowDT dt

Sometimes the automatic detection will not work and you will want to explicitly specify the type of each columns. This can be done by inserting the type (either Boolean, Byte, Decimal, Double, Single, Int64, Int16, String, DateTime) within curly braces in the header record of the CSV file. The following file ZipCode.csv is using that feature:

ZIP Code{String},ZIP Code Type{String},City{String},City Type{String},State{String},State Code{String},Area Code{Int16},Latitude{Double},Longitude{Double}
00501,U,Holtsville,D,New York,NY,631,40.8147,-73.0451
00544,U,Holtsville,D,New York,NY,631,40.8128,-73.0480
00601,S,Adjuntas,D,Puerto Rico,PR,787,18.1650,-66.7240
00601,S,Jard de Adjuntas,N,Puerto Rico,PR,787,18.1650,-66.7240
00601,S,Urb San Joaquin,N,Puerto Rico,PR,787,18.1650,-66.7240
00602,S,Aguada,D,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Alts de Aguada,N,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Comunidad Las Flores,N,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Ext Los Robles,N,Puerto Rico,PR,787,18.3825,-67.1871
...

And you do the following to obtain the DataTable with the specified types:

      dt ← ',.' CsvFileToDT 'd:\ZipCode.csv'  ⍝ ',' is the field separator and '.' is the decimal separator

      (⌷dt.Columns).DataType
 System.String  System.String  System.String  System.String  System.String  System.String  System.Int16  System.Double  System.Double 

      ShowDT dt

DTtoCsvFile

CsvFileNew

If you want to construct your own CSV file the methods CsvFileNew, CsvFileAddRecords and AplToCsvRecords can assist you in doing so. The syntax of CsvFileNew is as follow:

    r ← fileName CsvFileNew header

    ⍝ header   = Columns Names separated by a delimiter like ',' or ';'
    ⍝ fileName = Fully qualified file name with extension.
    ⍝ r        = 1 if successfull, 0 (ERROR) if failure

CsvFileAddRecords

One of the advantage of the CSV file format is that you can add easily an additional record to the file. The method CsvFileAddRecords will a some record(s) to a CSV file. The syntax is as follow:

    r ← fileName CsvFileAddRecords records

    ⍝ records =  Line(s) of text with a field delimiter like ',' or ';'
    ⍝ fileName = Fully qualified file name with extension.
    ⍝ r        = 1 if successfull, 0 (ERROR) if failure

AplToCsvRecords

The method AplToCsvRecords will convert an Apl array to individual CSV records (one record per row of the array). The syntax is as follow:

    records ← {FieldDecSep} AplToCsvRecords aplArray
    ⍝
    ⍝ aplArray    = Apl array
    ⍝ FieldDecSep = Field and Decimal Separator (like ',.' or ';,' )
    ⍝ records     = one record per row of aplArray formatted according to the CSV format

How to install netCSV in your workspace

Download netCSV.v1.0.txt
Do a Select all (Ctrl+A) and a copy (Ctrl+C).
In your workspace execute )ed ⍟ netCSV
Paste (Ctrl+V) the text into the Dyalog editor
Press Escape and ')save' your workspace

Optionally to de-script the namespace you can do:

'netCSV' ⎕SE.SALTUtils.CopyNs netCSV

Download LumenWorks.Framework.IO.dll and install the assembly in the same directory as dyalog.exe

Version Information

February 2016   - Initial version (1.0)

Original author:	Pierre Gilbert
Responsible:	PierreGilbert
Email:	`<apgil AT SPAMFREE videotron DOT ca>`

CategoryDyalog - CategoryDyalogDotNet - CategoryDyalogDotNetUtilities - CategoryDotNet

-  ⇤ ← Revision 3 as of 2016-02-07 13:37:46 → 
  Size: 4952
  Editor: PierreGilbert
  Comment:
+   ← Revision 6 as of 2016-02-07 17:16:00 → ⇥
  Size: 9459
  Editor: PierreGilbert
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 10:
-`netCSV` is a Dyalog namespace that contains useful methods for reading large [[https://en.wikipedia.org/wiki/Comma-separated_values|comma-separated values (CSV)]] file representing tabular data directly into a [[https://msdn.microsoft.com/en-us/library/system.data.datatable%28v=vs.110%29.aspx|DataTable]]. The !DataTable can be used later for [[netDataTable#Binding_a_DataTable|binding]] or [[http://aplwiki.com/netDataTable#To_Query_a_DataTable_.28Contributed_by_Richard_Proctor.29|querying]]. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The `netCSV` namespace is using the assembly `LumenWorks.Framework.IO.dll` (version 3.9.0) that is the work of [[http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader|Sébastien Lorion]] and [[https://www.nuget.org/packages/LumenWorksCsvReader/|Paul Hatcher|]] and must be installed in the same directory has `dyalog.exe`.<<BR>><<BR>>
A CSV file representing tabular data can compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' for the CSV file. Each record is separated with `⎕UCS 13 10` (carriage return, linefeed) and each field is separated by a 'field separator'.<<BR>>
+`netCSV` is a Dyalog namespace that contains useful methods for reading large [[https://en.wikipedia.org/wiki/Comma-separated_values|comma-separated values (CSV)]] files representing tabular data directly into a [[https://msdn.microsoft.com/en-us/library/system.data.datatable%28v=vs.110%29.aspx|DataTable]]. The !DataTable can be used later for [[netDataTable#Binding_a_DataTable|binding]] or [[http://aplwiki.com/netDataTable#To_Query_a_DataTable_.28Contributed_by_Richard_Proctor.29|querying]]. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The `netCSV` namespace is using the assembly `LumenWorks.Framework.IO.dll` (version 3.9.0) that is the work of [[http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader|Sébastien Lorion]] and [[https://www.nuget.org/packages/LumenWorksCsvReader/|Paul Hatcher|]] and must be installed in the same directory has `dyalog.exe`.<<BR>><<BR>>
A CSV file representing tabular data can be compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' of the CSV file. Each record is separated with `⎕UCS 13 10` (carriage return, linefeed) and each field is separated by a 'field separator'.<<BR>><<BR>>
 Line 19:
-For a small size file or a variables in the workspace you can use the functions explained in CsvToApl.
+For a small size file or a variables in the workspace you can use the functions explained in CsvToApl. For more Dyalog methods related to a !DataTable you explore [[netDataTable]].
 Line 22:
-`SimpleCsvFileToDT` will parse a CSV parser to a !DataTable where all the columns will be of type [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (CSVSeparator). For example if you have a file named [[attachment:Client.csv]] like this:
+`SimpleCsvFileToDT` will parse a CSV file to a !DataTable where all the columns will be of type [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (in the following example it is ';'). For example if you have a file named [[attachment:Client.csv]] like this:
 Line 41:
-      dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv'
+      dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv'  ⍝ ';' is the field separator

      (⌷dt.Columns).DataType
 System.String  System.String  System.String  System.String  System.String  System.String
-Line 45:
+Line 49:
-You can use [[wpfXmlBindingDemo#Converters_.Net_Side|Converters]] on the .Net side can convert the character representation of a date or number to actual !DateTime or !Double .Net values.
=== FastCsvFileToDT ===
+You can use [[wpfXmlBindingDemo#Converters_.Net_Side|Converters]] on the .Net side to format differently the character representation of a date or number.
=== CsvFileToDT ===
`CsvFileToDT`  will parse a CSV file to a !DataTable while preserving the data type of each column. The method will analyse the first record to determine the type (either [[https://msdn.microsoft.com/en-us/library/system.datetime%28v=vs.110%29.aspx|DateTime]], [[https://msdn.microsoft.com/en-us/library/system.double%28v=vs.110%29.aspx|Double]] or [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]) of each columns. For the file [[attachment:msft.csv]] that looks like this:
{{{
Date,Open,High,Low,Close,Volume
05-Feb-16,51.94,52,49.56,50.16,62008994
04-Feb-16,52.1,52.81,51.37,52,46803371
03-Feb-16,53.25,53.39,51.26,52.16,57347519
02-Feb-16,54.17,54.26,52.65,53,54453559
01-Feb-16,54.88,55.09,54.5,54.71,43587105
29-Jan-16,54.73,55.09,54,55.09,83137039
28-Jan-16,51.86,52.21,51.25,52.06,58489190
27-Jan-16,52.01,52.2,51.02,51.22,36266102
26-Jan-16,51.79,52.44,51.55,52.17,28699484
25-Jan-16,51.94,52.65,51.65,51.79,34497323
22-Jan-16,51.41,52.33,51.26,52.29,36878402
21-Jan-16,51,51.58,50.3,50.48,40169367
.....
}}}
You can do the following to obtain a !DataTable where the type of each columns will be preserved:
{{{
      dt ← ',.' CsvFileToDT 'd:\msft.csv'  ⍝ ',' is the field separator and '.' is the decimal separator
-Line 48:
+Line 72:
+      (⌷dt.Columns).DataType
 System.DateTime  System.Double  System.Double  System.Double  System.Double  System.Double

      ShowDT dt
}}}
{{attachment:msft.png || width=640}}<<BR>>
Sometimes the automatic detection will not work and you will want to explicitly specify the type of each columns. This can be done by inserting the type (either Boolean, Byte, Decimal, Double, Single, Int64, Int16, String, !DateTime) within curly braces in the header record of the CSV file. The following file [[attachment:ZipCode.csv]] is using that feature:
{{{
ZIP Code{String},ZIP Code Type{String},City{String},City Type{String},State{String},State Code{String},Area Code{Int16},Latitude{Double},Longitude{Double}
00501,U,Holtsville,D,New York,NY,631,40.8147,-73.0451
00544,U,Holtsville,D,New York,NY,631,40.8128,-73.0480
00601,S,Adjuntas,D,Puerto Rico,PR,787,18.1650,-66.7240
00601,S,Jard de Adjuntas,N,Puerto Rico,PR,787,18.1650,-66.7240
00601,S,Urb San Joaquin,N,Puerto Rico,PR,787,18.1650,-66.7240
00602,S,Aguada,D,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Alts de Aguada,N,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Comunidad Las Flores,N,Puerto Rico,PR,787,18.3825,-67.1871
00602,S,Ext Los Robles,N,Puerto Rico,PR,787,18.3825,-67.1871
...
}}}
And you do the following to obtain the !DataTable with the specified types:
{{{
      dt ← ',.' CsvFileToDT 'd:\ZipCode.csv'  ⍝ ',' is the field separator and '.' is the decimal separator

      (⌷dt.Columns).DataType
 System.String  System.String  System.String  System.String  System.String  System.String  System.Int16  System.Double  System.Double 

      ShowDT dt
}}}
{{attachment:ZipCode.png || width=640}}<<BR>>
-Line 51:
+Line 105:
+If you want to construct your own CSV file the methods `CsvFileNew`, `CsvFileAddRecords` and `AplToCsvRecords` can assist you in doing so. The syntax of `CsvFileNew` is as follow:
{{{
    r ← fileName CsvFileNew header
-Line 52:
+Line 109:
+    ⍝ header   = Columns Names separated by a delimiter like ',' or ';'
    ⍝ fileName = Fully qualified file name with extension.
    ⍝ r        = 1 if successfull, 0 (ERROR) if failure
}}}
-Line 53:
+Line 114:
+One of the advantage of the CSV file format is that you can add easily an additional record to the file. The method `CsvFileAddRecords` will a some record(s) to a CSV file. The syntax is as follow:
{{{
    r ← fileName CsvFileAddRecords records

    ⍝ records =  Line(s) of text with a field delimiter like ',' or ';'
    ⍝ fileName = Fully qualified file name with extension.
    ⍝ r        = 1 if successfull, 0 (ERROR) if failure
}}}
-Line 55:
+Line 124:
+The method `AplToCsvRecords` will convert an Apl array to individual CSV records (one record per row of the array). The syntax is as follow:
{{{
    records ← {FieldDecSep} AplToCsvRecords aplArray
    ⍝
    ⍝ aplArray    = Apl array
    ⍝ FieldDecSep = Field and Decimal Separator (like ',.' or ';,' )
    ⍝ records     = one record per row of aplArray formatted according to the CSV format
}}}