Javascript required
Skip to content Skip to sidebar Skip to footer

Read First Line of Csv File C

This series will explore various aspects for importing a CSV file with comma-separated-values (.csv) into a SQL-Server database.  CSV files are a mutual fashion to share data in plain text format from sources such as a database table(s) to another database east.yard. from SQL-Server to an Oracle database.

The accompanying source code and code blocks have been kept very simply so that following forth and learning the basics is not overwhelming equally this generally can happen the deeper into the import procedure you the develop go.

When information exported from a database to a customer database that has a matching database table(s) with matching columns the procedure is not always unproblematic, for example business rules may indicate new incoming data tin can't overwrite existing data or incoming data needs to be merged with existing data.

In the wild rarely is a uncomplicated import possible as database information types all accept the same basic types just are handled differently from database to database. Couple this with a flat CSV file may demand to be split up up into multiple database tables.

Office 1-A part of the series

The following should e'er exist considered when importing CSV files.

  • All columns are suspect to be missing birthday or missing in one or more than rows.
  • Mixed data types, consider a column with dates where some rows may take malformed dates, dates setup for a dissimilar civilisation, columns that should be numeric were some rows accept no value or unexpected format etc.
  • Columns which have values that are not valid to your business organisation e.g. a list of products that need to map to a product tabular array where there are products that yous don't handle.
  • Column values out of range e.1000. a numeric column has a range of 1 through x but incoming data has values 1 through 100.
  • The file is in use past another process and is locked.
  • The file is extremely large and processing time may take hours, have a plan such equally to run a nightly job.
  • Handling rows/columns that don't fit into the database, have a plan to handle them as several examples will exist shown in this series.
  • Offer clients, a method(s) to review suspect information, modify or refuse the data.
  • Consider an intermediate database table then that processing suspect data tin can be done over fourth dimension especially when there is a large data fix that may accept hours or days to process.

Consider working with CSV files as a puzzle no matter what the construction should be and that parsing dissimilar files usually has their ain quirks.

Role ane goals

To read a simple CSV file just over 7,500 records, nine columns with types ranging from integer, float, date time and strings with malformed data.

To parse information a TextFieldParser volition exist used to read and parse information. Alternates to a TextFieldParser are reading data using a Stream (StreamReader) or OleDb when sticking with pure Microsoft classes. Outside of this there are several libraries that can handle reading CSV files withal as stated this series is solely for working with Microsoft classes.

During parsing exclamation is performed to validate data is the proper types, not empty and if in validate ranges. Data read in is placed into a list of a class designed to handle the data read in from the CSV file.

The TextFieldParser class does a great chore at processing incoming information which is why this class was selected. As with any grade there tin can be unknowns which become known once you take worked with them and learn them. With the TextFieldParser when looping though lines in a file, empty lines are skipped. In the code sample nothing is done but the line count will be off past the corporeality of empty lines encountered to what might be learned from opening the file in Notepad++ or like text editor.  Using OleDb or a Steam lines are not ignored but nothing is truly gained if the record count is correct e.grand. in that location are 150 lines were l lines are empty and you look 100 lines of valid data. This means yous take received the correct amount of data, but that there are empty lines to filter out.

Requires

Visual interface

The interface is done using Windows forms project every bit these types of projects are easy to setup then setting a web project upwardly coupled with a Windows class project demand not be installed on a user'southward machine simply instead may exist executed from a shared location.

File pick

In the lawmaking samples beneath a hard-coded file is used, in the wild a file may be selected by a file selection dialog, past reading one or more files from a directory listing. If the procedure were to exist from a directory listing then the results would go directly to a intermediate table for review while in the code samples provided here they are sent straight to a DataGridView.

Parsing data using StreamReader

First check to ensure the file to parse exists. In the following code block mHasException and mLastException are from a base exception class which the grade for parsing inherits. The return blazon is a ValueTuple (installed using NuGet Package Director).

if (!File.Exists(_inputFileName))

{

mHasException = truthful ;

mLastException = new FileNotFoundException($ "Missing {_inputFileName}" );

return (mHasException, new List<DataItem>(), new List<DataItemInvalid>() );

}

If the file exists the adjacent step is to setup several variables which volition be used for validation purposes and return types which volition contain valid and if presented invalid data when read in information from the CSV file.

var validRows = new List<DataItem>();

var invalidRows = new List<DataItemInvalid>();

var validateBad = 0;

int index = 0;

int district = 0;

int filigree = 0;

int nCode = 0;

bladder latitude = 0;

bladder longitude = 0;

The following code block follows the lawmaking block above.

A while statement is used to loop through each line in the CSV file. For each line, split the line by comma in this case which is the most mutual delimiter. Next validate there are nine elements in the string array. If there are not nine elements in the assortment then place them into a possible reject container.

Note that the outset line contains column names which is skip by checking the index/line number stored in the variable alphabetize.

Following the check for nine elements int a line seven elements in the string array are checked to ensure they can be converted to the expected data type ranging from date to numerics and also empty cord values.

Passing the type check above the section under the annotate Questionable fields will do several more checks eastward.yard. does the NICIC field incorporate data that is not in an expected range. Note all data should be checked hither such as the data in part[iii] as this can be subjective to the data in other elements in the array then this is left to the review process which will provides a grid with a dropdown of validate selections to select from. If in that location are issues to review a tape a belongings is gear up to flag the data for a manual review process and loaded into a listing.

try

{

using (var readFile = new StreamReader(_inputFileName))

{

cord line;

string [] parts;

while ((line = readFile.ReadLine()) != null )

{

parts = line.Dissever( ',' );

alphabetize += one;

if (parts == null )

{

intermission ;

}

index += ane;

validateBad = 0;

if (parts.Length != 9)

{

invalidRows.Add( new DataItemInvalid() { Row = index, Line = string .Join( "," , parts) });

continue ;

}

// Skip offset row which in this example is a header with column names

if (index <= ane) proceed ;

/*

* These columns are checked for proper types

*/

var validRow = DateTime.TryParse(parts[0], out var d) &&

float .TryParse(parts[7].Trim(), out breadth) &&

bladder .TryParse(parts[8].Trim(), out longitude) &&

int .TryParse(parts[two], out district) &&

int .TryParse(parts[4], out grid) &&

! string .IsNullOrWhiteSpace(parts[5]) &&

int .TryParse(parts[6], out nCode);

/*

* Questionable fields

*/

if ( string .IsNullOrWhiteSpace(parts[ane]))

{

validateBad += ane;

}

if ( cord .IsNullOrWhiteSpace(parts[3]))

{

validateBad += i;

}

// NICI code must exist 909 or greater

if (nCode < 909)

{

validateBad += ane;

}

if (validRow)

{

validRows.Add( new DataItem()

{

Id = alphabetize,

Date = d,

Address = parts[1],

District = commune,

Crush = parts[3],

Grid = filigree,

Clarification = parts[5],

NcicCode = nCode,

Breadth = breadth,

Longitude = longitude,

Inspect = validateBad > 0

});

}

else

{

// fields to review in specific rows

invalidRows.Add( new DataItemInvalid() { Row = index, Line = string .Bring together( "," , parts) });

}

}

}

}

grab (Exception ex)

{

mHasException = true ;

mLastException = ex;

}

Once the above code has completed the following line of code returns data to the calling course/window which is a ValueTupler.

render (IsSuccessFul, validRows, invalidRows);


Parsing information using TextFieldParser

This instance uses a TextFieldParser to process data. Rather then splitting lines manually as done above TextFieldParser.ReadFields method handles the splitting by the delimiter assigned in parser.Delimiters. The remainder for validating data is no different then done with StreamReader. One major deviation is empty lines are ignored unlike with SteamReader.

public ( bool Success, Listing<DataItem>, Listing<DataItemInvalid>, int EmptyLineCount) LoadCsvFileTextFieldParser()

{

mHasException = false ;

var validRows = new Listing<DataItem>();

var invalidRows = new List<DataItemInvalid>();

var validateBad = 0;

int index = 0;

int district = 0;

int grid = 0;

int nCode = 0;

float latitude = 0;

float longitude = 0;

var emptyLineCount = 0;

var line = "" ;

try

{

/*

* If interested in blank line count

*/

using (var reader = File.OpenText(_inputFileName))

{

while ((line = reader.ReadLine()) != nix ) // EOF

{

if ( string .IsNullOrWhiteSpace(line))

{

emptyLineCount++;

}

}

}

using (var parser = new TextFieldParser(_inputFileName))

{

parser.Delimiters = new [] { "," };

while ( true )

{

string [] parts = parser.ReadFields();

if (parts == nix )

{

intermission ;

}

alphabetize += 1;

validateBad = 0;

if (parts.Length != 9)

{

invalidRows.Add( new DataItemInvalid() { Row = index, Line = string .Bring together( "," , parts) });

continue ;

}

// Skip first row which in this case is a header with cavalcade names

if (alphabetize <= one) continue ;

/*

* These columns are checked for proper types

*/

var validRow = DateTime.TryParse(parts[0], out var d) &&

float .TryParse(parts[7].Trim(), out latitude) &&

bladder .TryParse(parts[8].Trim(), out longitude) &&

int .TryParse(parts[2], out commune) &&

int .TryParse(parts[four], out filigree) &&

! cord .IsNullOrWhiteSpace(parts[5]) &&

int .TryParse(parts[6], out nCode);

/*

* Questionable fields

*/

if ( string .IsNullOrWhiteSpace(parts[1]))

{

validateBad += one;

}

if ( string .IsNullOrWhiteSpace(parts[3]))

{

validateBad += 1;

}

// NICI code must exist 909 or greater

if (nCode < 909)

{

validateBad += 1;

}

if (validRow)

{

validRows.Add together( new DataItem()

{

Id = index,

Date = d,

Accost = parts[1],

Commune = district,

Beat = parts[3],

Grid = grid,

Clarification = parts[v],

NcicCode = nCode,

Latitude = latitude,

Longitude = longitude,

Inspect = validateBad > 0

});

}

else

{

// fields to review in specific rows

invalidRows.Add( new DataItemInvalid() { Row = alphabetize, Line = cord .Join( "," , parts) });

}

}

}

}

catch (Exception ex)

{

mHasException = true ;

mLastException = ex;

}

render (IsSuccessFul, validRows, invalidRows,emptyLineCount);

}


Parsing data using OleDb

This method "reads" lines from a CSV file with the disadvantage of all fields are not typed and carry more baggage than needed for processing lines from the CSV file which will brand a difference in time to process with larger CSV files.

public DataTable LoadCsvFileOleDb()

{

var connString = $@ "Provider=Microsoft.Jet.OleDb.iv.0;....." ;

var dt = new DataTable();

try

{

using (var cn = new OleDbConnection(connString))

{

cn.Open();

var selectStatement = "SELECT * FROM [" + Path.GetFileName(_inputFileName) + "]" ;

using (var adapter = new OleDbDataAdapter(selectStatement, cn))

{

var ds = new DataSet( "Demo" );

adapter.Fill(ds);

ds.Tables[0].TableName = Path.GetFileNameWithoutExtension(_inputFileName);

dt = ds.Tables[0];

}

}

}

catch (Exception ex)

{

mHasException = truthful ;

mLastException = ex;

}

return dt;

}


Reviewing

The following window has several buttons at the bottom. The Process button executes reading the CSV file using in this instance StreamReader. The dropdown will contain any line number which needs to be inspected, pressing the inspect button moves to that line in the grid, this would be for a small amount of lines with issues or to get a visual on a possible larger trouble. The button labeled Review will popup a child window to permit edits that will update the main window below.

Child window shown when pressing the "Review" push.

The only true validation washed on this window is to provide a list of valid values for the beat field using a Dropdown from a static list. Every bit this series continues a database reference table will supercede the static list.

Code for validating through a Drop-downwardly in the DataGridView.

using Organization;

using Organization.Collections.Generic;

using System.ComponentModel;

using Arrangement.Data;

using System.Drawing;

using System.Linq;

using Arrangement.Text;

using System.Threading.Tasks;

using System.Windows.Forms;

using WindowsFormsApp1.Classes;

namespace WindowsFormsApp1

{

public partial class ReviewForm : Course

{

individual BindingSource _bs = new BindingSource();

private Listing<DataItem> _data;

/// <summary>

/// Provide access by the calling form to the information presented

/// </summary>

public List<DataItem> Data

{

become { render _data; }

}

/// <summary>

/// Acceptable values for vanquish field. In part 2 these will be read from a database reference tabular array.

/// </summary>

private Listing< cord > _beatList = new List< string >()

{

"1A" , "1B" , "1C" , "2A" , "2B" , "2C" , "3A" , "3B" , "3C" , "3M" , "4A" ,

"4B" , "4C" , "5A" , "5B" , "5C" , "6A" , "6B" , "6C"

} ;

public ReviewForm()

{

InitializeComponent();

}

public ReviewForm(List<DataItem> pData)

{

InitializeComponent();

_data = pData;

Shown += ReviewForm_Shown;

}

private void ReviewForm_Shown( object sender, EventArgs e)

{

dataGridView1.AutoGenerateColumns = false ;

// ReSharper disable once PossibleNullReferenceException

((DataGridViewComboBoxColumn) dataGridView1.Columns[ "beatColumn" ]).DataSource = _beatList;

_bs.DataSource = _data;

dataGridView1.DataSource = _bs;

dataGridView1.ExpandColumns();

dataGridView1.EditingControlShowing += DataGridView1_EditingControlShowing;

}

/// <summary>

/// Setup to provide admission to changes to the current row, here nosotros are only interested in the shell field.

/// Other fields would use like logic for providing valid selections.

/// </summary>

/// <param proper noun="sender"></param>

/// <param name="e"></param>

individual void DataGridView1_EditingControlShowing( object sender, DataGridViewEditingControlShowingEventArgs e)

{

if (dataGridView1.CurrentCell.IsComboBoxCell())

{

if (dataGridView1.Columns[dataGridView1.CurrentCell.ColumnIndex].Name == "beatColumn" )

{

if (due east.Control is ComboBox cb)

{

cb.SelectionChangeCommitted -= _SelectionChangeCommitted;

cb.SelectionChangeCommitted += _SelectionChangeCommitted;

}

}

}

}

/// <summary>

/// Update electric current row beat field

/// </summary>

/// <param name="sender"></param>

/// <param name="eastward"></param>

private void _SelectionChangeCommitted( object sender, EventArgs e)

{

if (_bs.Current != null )

{

if (! string .IsNullOrWhiteSpace(((DataGridViewComboBoxEditingControl)sender).Text))

{

var currentRow = (DataItem) _bs.Current;

currentRow.Shell = ((DataGridViewComboBoxEditingControl) sender).Text;

currentRow.Inspect = false ;

}

}

}

}

}

Extension methods used in the to a higher place code blocks.

namespace WindowsFormsApp1.Classes

{

public static class DataGridViewExtensions

{

/// <summary>

/// Expand all columns excluding in this case Orders column

/// </summary>

/// <param proper name="sender"></param>

public static void ExpandColumns( this DataGridView sender)

{

sender.Columns.Cast<DataGridViewColumn>().ToList()

.ForEach(col => col.AutoSizeMode = DataGridViewAutoSizeColumnMode.AllCells);

}

/// <summary>

/// Used to determine if the current cell blazon is a ComboBoxCell

/// </summary>

/// <param name="sender"></param>

/// <returns></returns>

public static bool IsComboBoxCell( this DataGridViewCell sender)

{

var effect = false ;

if (sender.EditType != nix )

{

if (sender.EditType == typeof (DataGridViewComboBoxEditingControl))

{

result = truthful ;

}

}

return result;

}

}

}

Data classes to contain data read from the CSV file.

Skilful/questionable data class

namespace WindowsFormsApp1.Classes

{

public class DataItem

{

public int Id { get ; set ; }

public DateTime Date { go ; set ; }

public string Address { go ; set ; }

public int District { go ; set ; }

public string Beat { get ; prepare ; }

public int Filigree { become ; set ; }

public string Description { become ; set ; }

public int NcicCode { get ; set ; }

public bladder Breadth { get ; set ; }

public float Longitude { get ; set ; }

public bool Inspect { get ; set ; }

public string Line => $ "{Id},{Date},{Address},{District},{Shell}," +

$ "{Filigree},{Description},{NcicCode},{Latitude},{Longitude}" ;

public override cord ToString()

{

render Id.ToString();

}

}

}


Invalid data class.

namespace WindowsFormsApp1.Classes

{

public course DataItemInvalid

{

public int Row { get ; prepare ; }

public string Line { become ; set ; }

public override cord ToString()

{

render $ "[{Row}] '{Line}'" ;

}

}

}


In this commodity thoughts/ideas along with suggestions accept been presented to dealing with CSV files which is to be considered a building block which continues in office 2 of this series.

Read First Line of Csv File C

Source: https://social.technet.microsoft.com/wiki/contents/articles/52030.c-processing-csv-files-part-1.aspx