Differences between revisions 2 and 3
Revision 2 as of 2016-02-07 00:55:59
Size: 2245
Comment:
Revision 3 as of 2016-02-07 13:37:46
Size: 4952
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
`netCSV` is a Dyalog namespace that contains useful methods for reading a large [[https://en.wikipedia.org/wiki/Comma-separated_values|comma-separated values (CSV)]] file directly into a [[https://msdn.microsoft.com/en-us/library/system.data.datatable%28v=vs.110%29.aspx|DataTable]]. The !DataTable can be used later for [[netDataTable#Binding_a_DataTable|binding]] or [[http://aplwiki.com/netDataTable#To_Query_a_DataTable_.28Contributed_by_Richard_Proctor.29|querying]]. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The `netCSV` namespace is using an assembly named `LumenWorks.Framework.IO.dll` that is the work of [[http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader|Sébastien Lorion]] and [[https://www.nuget.org/packages/LumenWorksCsvReader/|Paul Hatcher|]] and must be installed in the same directory has `dyalog.exe`.<<BR>>

For a small size file or characters in the workspace you can use the functions as explained in CsvToApl.
`netCSV` is a Dyalog namespace that contains useful methods for reading large [[https://en.wikipedia.org/wiki/Comma-separated_values|comma-separated values (CSV)]] file representing tabular data directly into a [[https://msdn.microsoft.com/en-us/library/system.data.datatable%28v=vs.110%29.aspx|DataTable]]. The !DataTable can be used later for [[netDataTable#Binding_a_DataTable|binding]] or [[http://aplwiki.com/netDataTable#To_Query_a_DataTable_.28Contributed_by_Richard_Proctor.29|querying]]. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The `netCSV` namespace is using the assembly `LumenWorks.Framework.IO.dll` (version 3.9.0) that is the work of [[http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader|Sébastien Lorion]] and [[https://www.nuget.org/packages/LumenWorksCsvReader/|Paul Hatcher|]] and must be installed in the same directory has `dyalog.exe`.<<BR>><<BR>>
A CSV file representing tabular data can compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' for the CSV file. Each record is separated with `⎕UCS 13 10` (carriage return, linefeed) and each field is separated by a 'field separator'.<<BR>>
The CSV format is still popular because it is simple and it can be read by Microsoft Excel. The problem is that not all countries are using the same decimal separator and field separator and those need to be known when parsing a file that was not written on the same computer. To obtain the default decimal separator and field separator on your computer that will be used by Excel you can do:
{{{
      ⎕USING←'System.Globalization,mscorlib.dll'
      CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator
      CultureInfo.CurrentCulture.TextInfo.ListSeparator
}}}
The first record of the CSV file may or may not contain a 'Header' record that is the name of each columns (fields). Since this namespace is dealing with !DataTable (where all columns needs to have a name) it is assume that the first record will contains the name of the fields.<<BR>><<BR>>

For a small size file or a variables in the workspace you can use the functions explained in CsvToApl.
Line 16: Line 22:
`SimpleCsvFileToDT` will parse a CSV parser to a !DataTable where all the columns will be of type [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (CSVSeparator). For example if you have a file named [[attachment:Client.csv]] like this:
{{{
ClientId;FirstName;MiddleName;LastName;Occupation;City
100;Michael;Son;Ray;Actuary;Fort Worth
101;Emmanuel;Nick;Williams;Earth Moving Equipment Mechanic;Ontario
102;Phyllis;Melanie;Fox;Knitter;Fresno
103;Ricardo;Percy;Beasley;Labourer;Columbus
104;Kerri;Jewell;Bennett;Production Manager;Jacksonville
105;Carlos;Luis;Hart;Locksmith;Buffalo
106;Heath;Herbert;Glass;Aeronautical Engineer;Seattle
107;Randall;Hans;Tanner;Air Traffic Controller;Corona
108;Damien;;Washington;Chartered Secretary;Allentown
109;Charlene;Alice;Thompson;Committee Clerk;Salem
110;Ann;Alicia;Potts;Actor;Raleigh
111;Richard;Gerardo;Riddle;Assayer Sampler;Akron
112;Pierre;Adolfo;Mendez;Oceanographer;Birmingham
}}}
You can do the following to obtain a !DataTable:
{{{
      dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv'
      ShowDT dt
}}}
{{attachment:Client.png || width=640}}<<BR>>
You can use [[wpfXmlBindingDemo#Converters_.Net_Side|Converters]] on the .Net side can convert the character representation of a date or number to actual !DateTime or !Double .Net values.

UNDER CONSTRUCTION

netCSV

(Hide table-of-contents)

Overview

netCSV is a Dyalog namespace that contains useful methods for reading large comma-separated values (CSV) file representing tabular data directly into a DataTable. The DataTable can be used later for binding or querying. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The netCSV namespace is using the assembly LumenWorks.Framework.IO.dll (version 3.9.0) that is the work of Sébastien Lorion and Paul Hatcher and must be installed in the same directory has dyalog.exe.

A CSV file representing tabular data can compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' for the CSV file. Each record is separated with ⎕UCS 13 10 (carriage return, linefeed) and each field is separated by a 'field separator'.
The CSV format is still popular because it is simple and it can be read by Microsoft Excel. The problem is that not all countries are using the same decimal separator and field separator and those need to be known when parsing a file that was not written on the same computer. To obtain the default decimal separator and field separator on your computer that will be used by Excel you can do:

      ⎕USING←'System.Globalization,mscorlib.dll'
      CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator
      CultureInfo.CurrentCulture.TextInfo.ListSeparator

The first record of the CSV file may or may not contain a 'Header' record that is the name of each columns (fields). Since this namespace is dealing with DataTable (where all columns needs to have a name) it is assume that the first record will contains the name of the fields.

For a small size file or a variables in the workspace you can use the functions explained in CsvToApl.

SimpleCsvFileToDT

SimpleCsvFileToDT will parse a CSV parser to a DataTable where all the columns will be of type String. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (CSVSeparator). For example if you have a file named Client.csv like this:

ClientId;FirstName;MiddleName;LastName;Occupation;City
100;Michael;Son;Ray;Actuary;Fort Worth
101;Emmanuel;Nick;Williams;Earth Moving Equipment Mechanic;Ontario
102;Phyllis;Melanie;Fox;Knitter;Fresno
103;Ricardo;Percy;Beasley;Labourer;Columbus
104;Kerri;Jewell;Bennett;Production Manager;Jacksonville
105;Carlos;Luis;Hart;Locksmith;Buffalo
106;Heath;Herbert;Glass;Aeronautical Engineer;Seattle
107;Randall;Hans;Tanner;Air Traffic Controller;Corona
108;Damien;;Washington;Chartered Secretary;Allentown
109;Charlene;Alice;Thompson;Committee Clerk;Salem
110;Ann;Alicia;Potts;Actor;Raleigh
111;Richard;Gerardo;Riddle;Assayer Sampler;Akron
112;Pierre;Adolfo;Mendez;Oceanographer;Birmingham  

You can do the following to obtain a DataTable:

      dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv'
      ShowDT dt

Client.png
You can use Converters on the .Net side can convert the character representation of a date or number to actual DateTime or !Double .Net values.

FastCsvFileToDT

DTtoCsvFile

CsvFileNew

CsvFileAddRecords

AplToCsvRecords

How to install netCSV in your workspace

  1. Download netCSV.v1.0.txt

  2. Do a Select all (Ctrl+A) and a copy (Ctrl+C).
  3. In your workspace execute )ed ⍟ netCSV

  4. Paste (Ctrl+V) the text into the Dyalog editor
  5. Press Escape and ')save' your workspace

Optionally to de-script the namespace you can do:

'netCSV' ⎕SE.SALTUtils.CopyNs netCSV

Download LumenWorks.Framework.IO.dll and install the assembly in the same directory as dyalog.exe

Version Information

February 2016   - Initial version (1.0)

Original author:

Pierre Gilbert

Responsible:

PierreGilbert

Email:

<apgil AT SPAMFREE videotron DOT ca>


CategoryDyalog - CategoryDyalogDotNet - CategoryDyalogDotNetUtilities - CategoryDotNet

netCSV (last edited 2016-04-29 13:47:03 by PierreGilbert)