Loading CSV Data Into MySQL With Date/Time Transformation

Introduction

This is a quick note to document my ‘experiences’ on loading data from a CSV file into a table in MySQL from the command prompt. It’s pretty straightforward for the MySQL master; its aimed more at those of you who are Oracle RDBMS experts and are wondering how to get started.

Setting The Scene

The source data for my table is a CSV file (comma delimited to be extra precise) containing fields some of which map directly to my database table but others which do not and require some transformation. My date and time are in separate fields in my source file but I want to put them into one single DATETIME field in MySQL.

Initiation

As in Oracle you can create a script to execute the loading function.  Here’s the first part of my script:

load data
infile ‘/tmp/dft_roadsafety_accidents.csv’
into table dft_roadsafety_accidents
fields terminated by ‘,’
optionally enclosed by ‘”‘
lines terminated by ‘\n’
ignore 1 lines

This section tells MySQL the name of the file, target table, how fields in the file are delimited, end-of-line character and the fact that the first line should be ignored. It’s pretty safe to use this general structure in your loading function. Just remember to enclose your fields with a double quote (“) when you create the source data file (good practice).

Source File Fields

The next section of my script creates ‘variables’ for each of the field in the source file. Here it is:

( @Accident_Index, @Location_Easting_OSGR, @Location_Northing_OSGR, @Longitude, @Latitude,  @Date, @Time)

The ‘@’ indicates that it is a variable. If the ‘@’ was missing then it would be interpreted as the field in the target table. You could specify the target table fields at this stage if there is no transformation required. However I like to use this approach since it will always work (though it may be more costly in terms of performance; I’ve only used this approach for tables with up to 2M records).

Transformation

Here is the section of the script describing how I did my transformation (the final part of the picture):

set
accident_index = nulllif(@accident_index,”),
location_easting_osgr = nullif(@location_easting_osgr,”),
location_northing_osgr = nullif(@location_northing_osgr,”),
longitude = nullif(@longitude,”),
latitude = nullif(@latitude,”),
tstamp = str_to_date(concat(@date,’ ‘,@time),’%d/%m/%Y %H:%i’);

This is where the magic occurs! The set command assigns a variable from the input file to one in the target table. For the most part above there is a 1-1 mapping. I have added a ‘nullif’ function to accommodate those records in the source file that have a blank value for field in question. MySQL complains if this occurs. The approach above should be ok as long as your data model and constraints allow the field in question to contain null values.

The bit of this I’m particularly chuffed about is the tstamp part (last line). This transformation command takes the @date and @time field as inputs.

  1. Using the ‘concat’ commands the @date and @time are combined into one string.
  2. The ‘str_to_date’ command takes the output of ‘concat’ and with the date time mask ‘%d/%m/%Y %H:%i’ a value for the database field tstmp is created.

For more details on the date and time mask values to use for your own particular str_to_date transformation take a look at the document here.

Executing The Script

Executing the script you’ve created couldn’t be easier. From the command line I executed it as follows:

mysql -u train01_adm -D train01  -p < load_train01.dft_roadsafety_accidents.cfg

In the script above the administrator for the schema ‘train01′ is ‘train01_adm’. You will be asked for the password and once provided the loading will start (if password correct). You will not see any messages unless there is an error during the load.

Next Steps

  • My script does not have any of the commit functions that SQL*Loader has in Oracle. If I do use this in anger in production I’ll research and update the blog.
  • The script is interactive; ok for what I’m doing but for batch operations user credentials (username, password) will need to be managed carefully. Again something I’ll look into if this approach will be used in our operational environments.
About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s