Reading Text files into Access with ASP.NET

The Jet 4.0 OLEDB driver is a remarkable beast. Not only will it allow connections to MS Access .mdb files and MS Excel spreadhseets, but it will also allow you to connect to and query a variety of text file formats. Here are some examples that illustrate this capability when applied to importing text based data into Access.

All examples make use of Connection and Command objects created from classes in the System.Data.OleDb namespace, so the examples assume that this is made available through a using or Imports statement. Text files come in a variety of formats, with a variety of data separators. These include commas (comma separated values or csv), spaces (space delimited), tabs (tab delimited) or even custom delimiters such as the pipe ( | ). These delimiters have a bearing on how to approach the task, as does the presence or otherwise of a header row. For these examples, I am using a simple Access database called Contacts.mdb with one table, Persons. This contains two columns: FirstName and SecondName. The database will be placed in the App_Data folder, so that |DataDirectory| can be used in the connection string.

Comma Separated Values (CSV)

First, a look at a standard comma-separated file with a header row. The contents of such a file would appear something like this:

FirstName, SecondName	
Joe,Bloggs
Fred,Bassett 
Archie,Falls
Doris,Knight
Gladys,Day

A connection to the database is required, along with a string holding the physical path to the text file (which is also in App_Data):

[C#]
string connect = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=|DataDirectory|Contacts.mdb";
OleDbConnection conn = new OleDbConnection(connect);
string path = Server.MapPath("App_Data");
[VB]
Dim connect As String 
connect = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=|DataDirectory|Contacts.mdb"
Dim conn As New OleDbConnection(connect)
Dim path As String = Server.MapPath("App_Data")

Now we need to create a query that will insert the FirstName and SecondName values into the database table. The first part of the query is exactly as you would expect:

Insert Into Persons (FirstName, SecondName) 

Here's the interesting part. You can perform a fast insert from one table to another in Access with something like this: "Insert Into table1 (field1, field2) Select field1, field2 From table2". The syntax for working with text files is slightly different, in that instead of the second table name, you need to provide a string that contains the type (Text), path and name of the file:

INSERT INTO Persons (FirstName, SecondName) SELECT FirstName, SecondName FROM 
	[Text;DATABASE=" + path + ";].[test.txt]

Note: If you are reading in all the fields, you can also use Select *.

Now, add this to the rest of the code, and a call to ExecuteNonQuery, and we get the following:

[C#]
string connect = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=|DataDirectory|Contacts.mdb";
OleDbConnection conn = new OleDbConnection(connect);
string path = Server.MapPath("App_Data");
string query = "INSERT INTO Persons (FirstName, SecondName) SELECT FirstName, SecondName FROM 
		[Text;DATABASE=" + path + ";].[test.txt]";
OleDbCommand cmd = new OleDbCommand(query, conn);
conn.Open();
cmd.ExecuteNonQuery();
conn.Close();
[VB]
Dim connect As String 
connect = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=|DataDirectory|Contacts.mdb"
Dim conn As New OleDbConnection(connect)
Dim path As String = Server.MapPath("App_Data")
Dim query As String = "INSERT INTO Persons (FirstName, SecondName) " & _
	SELECT FirstName, SecondName FROM "[Text;DATABASE=" & path & ";].[test.txt]"
Dim cmd As OleDbCommand = New OleDbCommand(query, conn)
conn.Open()
cmd.ExecuteNonQuery()
conn.Close()

Since the core code remains the same, I won't reproduce it for further samples. The only thing that changes is the text of the query.

By default, Jet assumes that text files will be comma-delimited and will have a header row, so this has been pretty straightforward so far. If you have no header, you need to add an Extended Property value to the query string: HDR=NO;. You will also have to provide the system default F1, F2, F3 etc for field names:

SELECT F1, F2 FROM [Text;HDR=NO;DATABASE=" + path + ";].[test.txt]

You can also provide aliases as follows:

SELECT F1 As FirstName, F2 As SecondName From [Text;HDR=NO;DATABASE=" + path + ";].[test.txt]

Note: Select * will NOT work for files that do not have a header row. The system default field names F1, F2 etc will be applied as a result of the directive HDR=NO;, and since they are not mentioned in the Sql, an exception will be thrown: "The INSERT INTO statement contains the following unknown field name: 'F1'". There is a way round this, which will be covered in the next section.

Alternative Delimiters

Alternative delimiters fall essentially into 2 categories, which I guess I will call Standard and Custom. Standard alternatives are Tab delimited and Fixed Length. Custom delimiters are any other character, such as a colon, space, pipe etc, but not a double quote. With anything other than a comma separated file, you will need to create a special text file called Schema.ini which provides the driver with information on the file format. This file needs to be placed in the same directory as the file being read.

Schema.ini

A Schema.ini file contains up to 5 sections of information:

  1. The text file name
  2. The file format
  3. The field names, widths and types
  4. The character set
  5. Any special datatype conversions

A full discussion of these settings can be found on MSDN, but for the purposes of this article, I'll only use the first 3.

The first entry, the text file name appears at the top of the Scheme.ini file, and is surrounded by square brackets:

[test.txt]

Following that, the file format is declared. If the format is one of the two "standard" alternatives, the entry will be one of the following:

Format=TabDelimited
Format=FixedLength

For all Custom alternatives the entry is Format=Delimited(), and the separator is places in the parentheses. So for a space delimiter, the entry is as follows:

Format=Delimited( )

And for a pipe delimiter:

Format=Delimited(|)  

The third section contains the field names. If there are no field names in the file itself, you must specify ColNameHeader=false and provide some names. You must also ensure that these match up with the SQL:

ColNameHeader=False
Col1=FirstName
Col2=SecondName
[SQL]
SELECT FirstName, SecondName FROM [Text;DATABASE=" + path + ";].[test.txt]

You may also notice that HDR=No; is omitted from the above SQL. With files that are other than comma delimited, any instructions like this that are put in the connection part of the SQL are ignored, so all details must be covered in the Schema.ini file. For example, if you add HDR=No, and leave ColNameHeader=False out of the Schema.ini, the fille will be imported with a blank row to begin with, because the system default assumes a header.

With Fixed Length files, the field names in the Schema.ini must also be accompanied by the datatype and length. So the entry for a for such a file would look like this:

Format=FixedLength 

ColNameHeader=False
Col1=FirstName TEXT width 50
Col2=SecondName TEXT width 50