= netCSV = ~-<>-~ <> == Overview == `netCSV` is a Dyalog namespace that contains useful methods for reading large [[https://en.wikipedia.org/wiki/Comma-separated_values|comma-separated values (CSV)]] files representing tabular data directly into a [[https://msdn.microsoft.com/en-us/library/system.data.datatable%28v=vs.110%29.aspx|DataTable]]. The !DataTable can be used later for [[netDataTable#Binding_a_DataTable|binding]] or [[http://aplwiki.com/netDataTable#To_Query_a_DataTable_.28Contributed_by_Richard_Proctor.29|querying]]. The large CSV file will not transit in the APL workspace and may not cause a WORKSPACE FULL ERROR. The `netCSV` namespace is using the assembly `LumenWorks.Framework.IO.dll` (version 3.9.0) that is the work of [[http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader|Sébastien Lorion]] and [[https://www.nuget.org/packages/LumenWorksCsvReader/|Paul Hatcher|]] and must be installed in the same directory has `dyalog.exe` (as shown by `+2 ⎕NQ '.' 'GetEnvironment' 'Dyalog'`).<
><
> A CSV file representing tabular data can be compared to a matrix in APL. Each row of the matrix is a 'record' of the CSV file and each column of the matrix is a 'field' of the CSV file. Each record is separated with `⎕UCS 13 10` (carriage return, linefeed) and each field is separated by a 'field separator' (typically a comma or semicolon).<
><
> The CSV format is still popular because it is simple and it can be read by Microsoft Excel. The problem is that not all countries are using the same decimal separator and field separator and those need to be known when parsing a file that was not written on the same computer. To obtain the default decimal separator and field separator on your computer that will be used by Excel you can do: {{{ ⎕USING←'System.Globalization,mscorlib.dll' CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator CultureInfo.CurrentCulture.TextInfo.ListSeparator }}} The first record of the CSV file may or may not contain a 'Header' record that is the name of each columns (fields). Since this namespace is dealing with !DataTable (where all columns needs to have a name) it is assume that the first record will always contains the name of the fields.<
><
> For a small size file or a variables in the workspace you can use the functions explained in [[CsvToApl#Reading_a_CSV_file_using_Dyalog_APL|CsvToApl]]. For more Dyalog methods related to a !DataTable you can explore [[netDataTable]]. === SimpleCsvFileToDT === `SimpleCsvFileToDT` will parse a CSV file to a !DataTable where all the columns will be of type [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]. There is no conversion to a date or to a number that will be made. The only thing that is required to know is the field separator (in the following example it is '`;`'). For example if you have a file named [[attachment:Client.csv]] like this: {{{ ClientId;FirstName;MiddleName;LastName;Occupation;City 100;Michael;Son;Ray;Actuary;Fort Worth 101;Emmanuel;Nick;Williams;Earth Moving Equipment Mechanic;Ontario 102;Phyllis;Melanie;Fox;Knitter;Fresno 103;Ricardo;Percy;Beasley;Labourer;Columbus 104;Kerri;Jewell;Bennett;Production Manager;Jacksonville 105;Carlos;Luis;Hart;Locksmith;Buffalo 106;Heath;Herbert;Glass;Aeronautical Engineer;Seattle 107;Randall;Hans;Tanner;Air Traffic Controller;Corona 108;Damien;;Washington;Chartered Secretary;Allentown 109;Charlene;Alice;Thompson;Committee Clerk;Salem 110;Ann;Alicia;Potts;Actor;Raleigh 111;Richard;Gerardo;Riddle;Assayer Sampler;Akron 112;Pierre;Adolfo;Mendez;Oceanographer;Birmingham }}} You can do the following to obtain a !DataTable: {{{ dt ← ';' SimpleCsvFileToDT 'd:\Clients.csv' ⍝ ';' is the field separator (⌷dt.Columns).DataType System.String System.String System.String System.String System.String System.String ShowDT dt }}} {{attachment:Client.png || width=640}}<
> You can use [[wpfXmlBindingDemo#Converters_.Net_Side|Converters]] on the .Net side to format differently the character representation of a date or number. === CsvFileToDT === `CsvFileToDT` will parse a CSV file to a !DataTable while preserving the data type of each column. The method will analyse the first record to determine the type (either [[https://msdn.microsoft.com/en-us/library/system.datetime%28v=vs.110%29.aspx|DateTime]], [[https://msdn.microsoft.com/en-us/library/system.double%28v=vs.110%29.aspx|Double]] or [[https://msdn.microsoft.com/en-us/library/system.string%28v=vs.110%29.aspx|String]]) of each columns. For the file [[attachment:msft.csv]] that looks like this: {{{ Date,Open,High,Low,Close,Volume 05-Feb-16,51.94,52,49.56,50.16,62008994 04-Feb-16,52.1,52.81,51.37,52,46803371 03-Feb-16,53.25,53.39,51.26,52.16,57347519 02-Feb-16,54.17,54.26,52.65,53,54453559 01-Feb-16,54.88,55.09,54.5,54.71,43587105 29-Jan-16,54.73,55.09,54,55.09,83137039 28-Jan-16,51.86,52.21,51.25,52.06,58489190 27-Jan-16,52.01,52.2,51.02,51.22,36266102 26-Jan-16,51.79,52.44,51.55,52.17,28699484 25-Jan-16,51.94,52.65,51.65,51.79,34497323 22-Jan-16,51.41,52.33,51.26,52.29,36878402 21-Jan-16,51,51.58,50.3,50.48,40169367 ..... }}} You can do the following to obtain a !DataTable where the type of each columns will be preserved: {{{ dt ← ',.' CsvFileToDT 'd:\msft.csv' ⍝ ',' is the field separator and '.' is the decimal separator (⌷dt.Columns).DataType System.DateTime System.Double System.Double System.Double System.Double System.Double ShowDT dt }}} {{attachment:msft.png || width=640}}<
> Sometimes the automatic detection will not work and you will want to explicitly specify the type of each columns. This can be done by inserting the type (either Boolean, Byte, Decimal, Double, Single, Int64, Int16, String, !DateTime) within curly braces in the header record of the CSV file. The following file [[attachment:ZipCode.csv]] is using that feature: {{{ ZIP Code{String},ZIP Code Type{String},City{String},City Type{String},State{String},State Code{String},Area Code{Int16},Latitude{Double},Longitude{Double} 00501,U,Holtsville,D,New York,NY,631,40.8147,-73.0451 00544,U,Holtsville,D,New York,NY,631,40.8128,-73.0480 00601,S,Adjuntas,D,Puerto Rico,PR,787,18.1650,-66.7240 00601,S,Jard de Adjuntas,N,Puerto Rico,PR,787,18.1650,-66.7240 00601,S,Urb San Joaquin,N,Puerto Rico,PR,787,18.1650,-66.7240 00602,S,Aguada,D,Puerto Rico,PR,787,18.3825,-67.1871 00602,S,Alts de Aguada,N,Puerto Rico,PR,787,18.3825,-67.1871 00602,S,Comunidad Las Flores,N,Puerto Rico,PR,787,18.3825,-67.1871 00602,S,Ext Los Robles,N,Puerto Rico,PR,787,18.3825,-67.1871 ... }}} And you do the following to obtain the !DataTable with the specified types: {{{ dt ← ',.' CsvFileToDT 'd:\ZipCode.csv' ⍝ ',' is the field separator and '.' is the decimal separator (⌷dt.Columns).DataType System.String System.String System.String System.String System.String System.String System.Int16 System.Double System.Double ShowDT dt }}} {{attachment:ZipCode.png || width=640}}<
> This curly brace feature is not standard to the CSV format and will not be supported by Microsoft Excel. === DTtoCsvFile === The method `DTtoCsvFile` will save a !DataTable to a CSV file. It is using the assembly `Syncfusion.XlsIO.Base.dll` from Syncfusion. The field delimiter can be specified in the method but the assembly will use the decimal separator of the local culture. It will return the number of rows written to the CSV file. {{{ noRows ← dt DTtoCsvFile fileName }}} === CsvFileNew === If you want to construct your own CSV file the methods `CsvFileNew`, `CsvFileAddRecords` and `AplToCsvRecords` can assist you in doing so. The syntax of `CsvFileNew` is as follow: {{{ r ← fileName CsvFileNew header ⍝ header = Columns Names separated by a delimiter like ',' or ';' ⍝ fileName = Fully qualified file name with extension. ⍝ r = 1 if successfull, 0 (ERROR) if failure 'd:\Car.csv' CsvFileNew 'Year,Make,Model,Description,Price' 1 }}} === CsvFileAddRecords === One of the advantage of the CSV file format is that you can add easily an additional record to the file. The method `CsvFileAddRecords` will add some record(s) to a CSV file. The syntax is as follow: {{{ r ← fileName CsvFileAddRecords records ⍝ records = Line(s) of text with a field delimiter like ',' or ';' ⍝ fileName = Fully qualified file name with extension. ⍝ r = 1 if successfull, 0 (ERROR) if failure 'd:\Car.csv' CsvFileAddRecords '1996,Jeep,Grand Cherokee,MUST SELL!,4799' 1 }}} === AplToCsvRecords === The method `AplToCsvRecords` will convert an Apl array to individual CSV records (one record per row of the array). The syntax is as follow: {{{ records ← {FieldDecSep} AplToCsvRecords aplArray ⍝ ⍝ aplArray = Apl array ⍝ FieldDecSep = Field and Decimal Separator (like ',.' or ';,' ) ⍝ records = one record per row of aplArray formatted according to the CSV format 'd:\Car.csv' CsvFileAddRecords AplToCsvRecords 1997 'Ford' 'E350' 'ac, abs, moon' 3000 1 }}} And the file `d:\Car.csv` will look like this: {{{ Year,Make,Model,Description,Price 1996,Jeep,Grand Cherokee,MUST SELL!,4799 1997,Ford,E350,"ac, abs, moon",3000 }}} == How to install netCSV in your workspace == 1. Download [[attachment:netCSV.v1.1.txt]] 1. Do a Select all (Ctrl+A) and a copy (Ctrl+C). 1. In your workspace execute `)ed ⍟ netCSV` 1. Paste (Ctrl+V) the text into the Dyalog editor 1. Press Escape and ')save' your workspace Optionally to de-script the namespace you can do: {{{ 'netCSV' ⎕SE.SALTUtils.CopyNs netCSV }}} Download [[attachment:LumenWorks.Framework.IO.dll]] and install the assembly in the same directory as `dyalog.exe` == Version Information == {{{ February 2016 - Initial version (1.0) April 2016 - version (1.1) SimpleCsvFileToDT and CsvFileToDT modified so it will not block the file on disk while doing the conversion to a DataTable }}} ||Original author: ||Pierre Gilbert || ||Responsible: ||PierreGilbert || ||Email: || <> || ---- CategoryDyalog - CategoryDyalogDotNet - CategoryDyalogDotNetUtilities - CategoryDotNet