Read First Line of Csv File C
This series will explore various aspects for importing a CSV file with comma-separated-values (.csv) into a SQL-Server database. CSV files are a mutual fashion to share data in plain text format from sources such as a database table(s) to another database east.yard. from SQL-Server to an Oracle database.
The accompanying source code and code blocks have been kept very simply so that following forth and learning the basics is not overwhelming equally this generally can happen the deeper into the import procedure you the develop go.
When information exported from a database to a customer database that has a matching database table(s) with matching columns the procedure is not always unproblematic, for example business rules may indicate new incoming data tin can't overwrite existing data or incoming data needs to be merged with existing data.
In the wild rarely is a uncomplicated import possible as database information types all accept the same basic types just are handled differently from database to database. Couple this with a flat CSV file may demand to be split up up into multiple database tables.
Office 1-A part of the series
The following should e'er exist considered when importing CSV files.
- All columns are suspect to be missing birthday or missing in one or more than rows.
- Mixed data types, consider a column with dates where some rows may take malformed dates, dates setup for a dissimilar civilisation, columns that should be numeric were some rows accept no value or unexpected format etc.
- Columns which have values that are not valid to your business organisation e.g. a list of products that need to map to a product tabular array where there are products that yous don't handle.
- Column values out of range e.1000. a numeric column has a range of 1 through x but incoming data has values 1 through 100.
- The file is in use past another process and is locked.
- The file is extremely large and processing time may take hours, have a plan such equally to run a nightly job.
- Handling rows/columns that don't fit into the database, have a plan to handle them as several examples will exist shown in this series.
- Offer clients, a method(s) to review suspect information, modify or refuse the data.
- Consider an intermediate database table then that processing suspect data tin can be done over fourth dimension especially when there is a large data fix that may accept hours or days to process.
Consider working with CSV files as a puzzle no matter what the construction should be and that parsing dissimilar files usually has their ain quirks.
Role ane goals
To read a simple CSV file just over 7,500 records, nine columns with types ranging from integer, float, date time and strings with malformed data.
To parse information a TextFieldParser volition exist used to read and parse information. Alternates to a TextFieldParser are reading data using a Stream (StreamReader) or OleDb when sticking with pure Microsoft classes. Outside of this there are several libraries that can handle reading CSV files withal as stated this series is solely for working with Microsoft classes.
During parsing exclamation is performed to validate data is the proper types, not empty and if in validate ranges. Data read in is placed into a list of a class designed to handle the data read in from the CSV file.
The TextFieldParser class does a great chore at processing incoming information which is why this class was selected. As with any grade there tin can be unknowns which become known once you take worked with them and learn them. With the TextFieldParser when looping though lines in a file, empty lines are skipped. In the code sample nothing is done but the line count will be off past the corporeality of empty lines encountered to what might be learned from opening the file in Notepad++ or like text editor. Using OleDb or a Steam lines are not ignored but nothing is truly gained if the record count is correct e.grand. in that location are 150 lines were l lines are empty and you look 100 lines of valid data. This means yous take received the correct amount of data, but that there are empty lines to filter out.
Requires
Visual interface
The interface is done using Windows forms project every bit these types of projects are easy to setup then setting a web project upwardly coupled with a Windows class project demand not be installed on a user'southward machine simply instead may exist executed from a shared location.
File pick
In the lawmaking samples beneath a hard-coded file is used, in the wild a file may be selected by a file selection dialog, past reading one or more files from a directory listing. If the procedure were to exist from a directory listing then the results would go directly to a intermediate table for review while in the code samples provided here they are sent straight to a DataGridView.
Parsing data using StreamReader
First check to ensure the file to parse exists. In the following code block mHasException and mLastException are from a base exception class which the grade for parsing inherits. The return blazon is a ValueTuple (installed using NuGet Package Director).
if
(!File.Exists(_inputFileName))
{
mHasException =
truthful
;
mLastException =
new
FileNotFoundException($
"Missing {_inputFileName}"
);
return
(mHasException,
new
List<DataItem>(),
new
List<DataItemInvalid>() );
}
If the file exists the adjacent step is to setup several variables which volition be used for validation purposes and return types which volition contain valid and if presented invalid data when read in information from the CSV file.
var validRows =
new
List<DataItem>();
var invalidRows =
new
List<DataItemInvalid>();
var validateBad = 0;
int
index = 0;
int
district = 0;
int
filigree = 0;
int
nCode = 0;
bladder
latitude = 0;
bladder
longitude = 0;
The following code block follows the lawmaking block above.
A while statement is used to loop through each line in the CSV file. For each line, split the line by comma in this case which is the most mutual delimiter. Next validate there are nine elements in the string array. If there are not nine elements in the assortment then place them into a possible reject container.
Note that the outset line contains column names which is skip by checking the index/line number stored in the variable alphabetize.
Following the check for nine elements int a line seven elements in the string array are checked to ensure they can be converted to the expected data type ranging from date to numerics and also empty cord values.
Passing the type check above the section under the annotate Questionable fields will do several more checks eastward.yard. does the NICIC field incorporate data that is not in an expected range. Note all data should be checked hither such as the data in part[iii] as this can be subjective to the data in other elements in the array then this is left to the review process which will provides a grid with a dropdown of validate selections to select from. If in that location are issues to review a tape a belongings is gear up to flag the data for a manual review process and loaded into a listing.
try
{
using
(var readFile =
new
StreamReader(_inputFileName))
{
cord
line;
string
[] parts;
while
((line = readFile.ReadLine()) !=
null
)
{
parts = line.Dissever(
','
);
alphabetize += one;
if
(parts ==
null
)
{
intermission
;
}
index += ane;
validateBad = 0;
if
(parts.Length != 9)
{
invalidRows.Add(
new
DataItemInvalid() { Row = index, Line =
string
.Join(
","
, parts) });
continue
;
}
// Skip offset row which in this example is a header with column names
if
(index <= ane)
proceed
;
/*
* These columns are checked for proper types
*/
var validRow = DateTime.TryParse(parts[0],
out
var d) &&
float
.TryParse(parts[7].Trim(),
out
breadth) &&
bladder
.TryParse(parts[8].Trim(),
out
longitude) &&
int
.TryParse(parts[two],
out
district) &&
int
.TryParse(parts[4],
out
grid) &&
!
string
.IsNullOrWhiteSpace(parts[5]) &&
int
.TryParse(parts[6],
out
nCode);
/*
* Questionable fields
*/
if
(
string
.IsNullOrWhiteSpace(parts[ane]))
{
validateBad += ane;
}
if
(
cord
.IsNullOrWhiteSpace(parts[3]))
{
validateBad += i;
}
// NICI code must exist 909 or greater
if
(nCode < 909)
{
validateBad += ane;
}
if
(validRow)
{
validRows.Add(
new
DataItem()
{
Id = alphabetize,
Date = d,
Address = parts[1],
District = commune,
Crush = parts[3],
Grid = filigree,
Clarification = parts[5],
NcicCode = nCode,
Breadth = breadth,
Longitude = longitude,
Inspect = validateBad > 0
});
}
else
{
// fields to review in specific rows
invalidRows.Add(
new
DataItemInvalid() { Row = index, Line =
string
.Bring together(
","
, parts) });
}
}
}
}
grab
(Exception ex)
{
mHasException =
true
;
mLastException = ex;
}
Once the above code has completed the following line of code returns data to the calling course/window which is a ValueTupler.
render
(IsSuccessFul, validRows, invalidRows);
Parsing information using TextFieldParser
This instance uses a TextFieldParser to process data. Rather then splitting lines manually as done above TextFieldParser.ReadFields method handles the splitting by the delimiter assigned in parser.Delimiters. The remainder for validating data is no different then done with StreamReader. One major deviation is empty lines are ignored unlike with SteamReader.
public
(
bool
Success, Listing<DataItem>, Listing<DataItemInvalid>,
int
EmptyLineCount) LoadCsvFileTextFieldParser()
{
mHasException =
false
;
var validRows =
new
Listing<DataItem>();
var invalidRows =
new
List<DataItemInvalid>();
var validateBad = 0;
int
index = 0;
int
district = 0;
int
grid = 0;
int
nCode = 0;
float
latitude = 0;
float
longitude = 0;
var emptyLineCount = 0;
var line =
""
;
try
{
/*
* If interested in blank line count
*/
using
(var reader = File.OpenText(_inputFileName))
{
while
((line = reader.ReadLine()) !=
nix
)
// EOF
{
if
(
string
.IsNullOrWhiteSpace(line))
{
emptyLineCount++;
}
}
}
using
(var parser =
new
TextFieldParser(_inputFileName))
{
parser.Delimiters =
new
[] {
","
};
while
(
true
)
{
string
[] parts = parser.ReadFields();
if
(parts ==
nix
)
{
intermission
;
}
alphabetize += 1;
validateBad = 0;
if
(parts.Length != 9)
{
invalidRows.Add(
new
DataItemInvalid() { Row = index, Line =
string
.Bring together(
","
, parts) });
continue
;
}
// Skip first row which in this case is a header with cavalcade names
if
(alphabetize <= one)
continue
;
/*
* These columns are checked for proper types
*/
var validRow = DateTime.TryParse(parts[0],
out
var d) &&
float
.TryParse(parts[7].Trim(),
out
latitude) &&
bladder
.TryParse(parts[8].Trim(),
out
longitude) &&
int
.TryParse(parts[2],
out
commune) &&
int
.TryParse(parts[four],
out
filigree) &&
!
cord
.IsNullOrWhiteSpace(parts[5]) &&
int
.TryParse(parts[6],
out
nCode);
/*
* Questionable fields
*/
if
(
string
.IsNullOrWhiteSpace(parts[1]))
{
validateBad += one;
}
if
(
string
.IsNullOrWhiteSpace(parts[3]))
{
validateBad += 1;
}
// NICI code must exist 909 or greater
if
(nCode < 909)
{
validateBad += 1;
}
if
(validRow)
{
validRows.Add together(
new
DataItem()
{
Id = index,
Date = d,
Accost = parts[1],
Commune = district,
Beat = parts[3],
Grid = grid,
Clarification = parts[v],
NcicCode = nCode,
Latitude = latitude,
Longitude = longitude,
Inspect = validateBad > 0
});
}
else
{
// fields to review in specific rows
invalidRows.Add(
new
DataItemInvalid() { Row = alphabetize, Line =
cord
.Join(
","
, parts) });
}
}
}
}
catch
(Exception ex)
{
mHasException =
true
;
mLastException = ex;
}
render
(IsSuccessFul, validRows, invalidRows,emptyLineCount);
}
Parsing data using OleDb
This method "reads" lines from a CSV file with the disadvantage of all fields are not typed and carry more baggage than needed for processing lines from the CSV file which will brand a difference in time to process with larger CSV files.
public
DataTable LoadCsvFileOleDb()
{
var connString = $@
"Provider=Microsoft.Jet.OleDb.iv.0;....."
;
var dt =
new
DataTable();
try
{
using
(var cn =
new
OleDbConnection(connString))
{
cn.Open();
var selectStatement =
"SELECT * FROM ["
+ Path.GetFileName(_inputFileName) +
"]"
;
using
(var adapter =
new
OleDbDataAdapter(selectStatement, cn))
{
var ds =
new
DataSet(
"Demo"
);
adapter.Fill(ds);
ds.Tables[0].TableName = Path.GetFileNameWithoutExtension(_inputFileName);
dt = ds.Tables[0];
}
}
}
catch
(Exception ex)
{
mHasException =
truthful
;
mLastException = ex;
}
return
dt;
}
Reviewing
The following window has several buttons at the bottom. The Process button executes reading the CSV file using in this instance StreamReader. The dropdown will contain any line number which needs to be inspected, pressing the inspect button moves to that line in the grid, this would be for a small amount of lines with issues or to get a visual on a possible larger trouble. The button labeled Review will popup a child window to permit edits that will update the main window below.
Child window shown when pressing the "Review" push.
The only true validation washed on this window is to provide a list of valid values for the beat field using a Dropdown from a static list. Every bit this series continues a database reference table will supercede the static list.
Code for validating through a Drop-downwardly in the DataGridView.
using
Organization;
using
Organization.Collections.Generic;
using
System.ComponentModel;
using
Arrangement.Data;
using
System.Drawing;
using
System.Linq;
using
Arrangement.Text;
using
System.Threading.Tasks;
using
System.Windows.Forms;
using
WindowsFormsApp1.Classes;
namespace
WindowsFormsApp1
{
public
partial
class
ReviewForm : Course
{
individual
BindingSource _bs =
new
BindingSource();
private
Listing<DataItem> _data;
/// <summary>
/// Provide access by the calling form to the information presented
/// </summary>
public
List<DataItem> Data
{
become
{
render
_data; }
}
/// <summary>
/// Acceptable values for vanquish field. In part 2 these will be read from a database reference tabular array.
/// </summary>
private
Listing<
cord
> _beatList =
new
List<
string
>()
{
"1A"
,
"1B"
,
"1C"
,
"2A"
,
"2B"
,
"2C"
,
"3A"
,
"3B"
,
"3C"
,
"3M"
,
"4A"
,
"4B"
,
"4C"
,
"5A"
,
"5B"
,
"5C"
,
"6A"
,
"6B"
,
"6C"
} ;
public
ReviewForm()
{
InitializeComponent();
}
public
ReviewForm(List<DataItem> pData)
{
InitializeComponent();
_data = pData;
Shown += ReviewForm_Shown;
}
private
void
ReviewForm_Shown(
object
sender, EventArgs e)
{
dataGridView1.AutoGenerateColumns =
false
;
// ReSharper disable once PossibleNullReferenceException
((DataGridViewComboBoxColumn) dataGridView1.Columns[
"beatColumn"
]).DataSource = _beatList;
_bs.DataSource = _data;
dataGridView1.DataSource = _bs;
dataGridView1.ExpandColumns();
dataGridView1.EditingControlShowing += DataGridView1_EditingControlShowing;
}
/// <summary>
/// Setup to provide admission to changes to the current row, here nosotros are only interested in the shell field.
/// Other fields would use like logic for providing valid selections.
/// </summary>
/// <param proper noun="sender"></param>
/// <param name="e"></param>
individual
void
DataGridView1_EditingControlShowing(
object
sender, DataGridViewEditingControlShowingEventArgs e)
{
if
(dataGridView1.CurrentCell.IsComboBoxCell())
{
if
(dataGridView1.Columns[dataGridView1.CurrentCell.ColumnIndex].Name ==
"beatColumn"
)
{
if
(due east.Control
is
ComboBox cb)
{
cb.SelectionChangeCommitted -= _SelectionChangeCommitted;
cb.SelectionChangeCommitted += _SelectionChangeCommitted;
}
}
}
}
/// <summary>
/// Update electric current row beat field
/// </summary>
/// <param name="sender"></param>
/// <param name="eastward"></param>
private
void
_SelectionChangeCommitted(
object
sender, EventArgs e)
{
if
(_bs.Current !=
null
)
{
if
(!
string
.IsNullOrWhiteSpace(((DataGridViewComboBoxEditingControl)sender).Text))
{
var currentRow = (DataItem) _bs.Current;
currentRow.Shell = ((DataGridViewComboBoxEditingControl) sender).Text;
currentRow.Inspect =
false
;
}
}
}
}
}
Extension methods used in the to a higher place code blocks.
namespace
WindowsFormsApp1.Classes
{
public
static
class
DataGridViewExtensions
{
/// <summary>
/// Expand all columns excluding in this case Orders column
/// </summary>
/// <param proper name="sender"></param>
public
static
void
ExpandColumns(
this
DataGridView sender)
{
sender.Columns.Cast<DataGridViewColumn>().ToList()
.ForEach(col => col.AutoSizeMode = DataGridViewAutoSizeColumnMode.AllCells);
}
/// <summary>
/// Used to determine if the current cell blazon is a ComboBoxCell
/// </summary>
/// <param name="sender"></param>
/// <returns></returns>
public
static
bool
IsComboBoxCell(
this
DataGridViewCell sender)
{
var effect =
false
;
if
(sender.EditType !=
nix
)
{
if
(sender.EditType ==
typeof
(DataGridViewComboBoxEditingControl))
{
result =
truthful
;
}
}
return
result;
}
}
}
Data classes to contain data read from the CSV file.
Skilful/questionable data class
namespace
WindowsFormsApp1.Classes
{
public
class
DataItem
{
public
int
Id {
get
;
set
; }
public
DateTime Date {
go
;
set
; }
public
string
Address {
go
;
set
; }
public
int
District {
go
;
set
; }
public
string
Beat {
get
;
prepare
; }
public
int
Filigree {
become
;
set
; }
public
string
Description {
become
;
set
; }
public
int
NcicCode {
get
;
set
; }
public
bladder
Breadth {
get
;
set
; }
public
float
Longitude {
get
;
set
; }
public
bool
Inspect {
get
;
set
; }
public
string
Line => $
"{Id},{Date},{Address},{District},{Shell},"
+
$
"{Filigree},{Description},{NcicCode},{Latitude},{Longitude}"
;
public
override
cord
ToString()
{
render
Id.ToString();
}
}
}
Invalid data class.
namespace
WindowsFormsApp1.Classes
{
public
course
DataItemInvalid
{
public
int
Row {
get
;
prepare
; }
public
string
Line {
become
;
set
; }
public
override
cord
ToString()
{
render
$
"[{Row}] '{Line}'"
;
}
}
}
In this commodity thoughts/ideas along with suggestions accept been presented to dealing with CSV files which is to be considered a building block which continues in office 2 of this series.
Source: https://social.technet.microsoft.com/wiki/contents/articles/52030.c-processing-csv-files-part-1.aspx