I am currently in the process of migrating some data -about 8 million rows- from a legacy environment based on Microsoft SQL Server as RDBMS, to a new production environment using MySQL instead.
8 million rows aren’t exactly what I mean by “large dataset”, sure (I have experienced the pain with much larger datasets), nevertheless the process could take some time, especially if both systems are in use at the time the migration occurs and the two environments do not share the same local network, which also means you will have to transfer the data via an Internet connection or some other means.
There are various ways of transferring data between these two RDBMS systems; Microsoft SQL Server makes it pretty trivial to export data in a visual, user friendly way with the “Export Data” task, for instance. However, I prefer a command-line approach on both systems as I find it to be more flexible -especially if you need to repeat the process a number of times with some trial and error because of data formats, encoding, and whatnot, that require changes to the SQL statement more than once. Most people use the SQL Server Management Studio for most tasks with SQL Server, but there’s also the bundled utility SQLCMD, that functions as a command line interpreter for Transact SQL statements. It is particular handy for our purpose as it can process a text file containing any SQL statement, so if we need to change that statement all we have to do is update the text file and reprocess it with SQLCMD.
So, this is in a few simple steps what I am doing right now -while I write- to migrate data from SQL Server to MySQL. As an intermediate format, with are going to use the plain old CSV as it is the easiest and can be imported pretty quickly in MySQL, as we’ll see.
Once you know which data you need to export, create a text file -let’s call it “query.sql”, and paste in it the SQL statement you’ll need to execute to select that data. Note: to ensure MySQL can correctly import all the data, regardless of the content of string columns, it is recommended to enclose each field value with a special character to use as delimiter; we’ll use the double quotes as the character delimiting the beginning and the end of each field value, by using the QUOTENAME function.
For example, the query I had to run to select the data that I need to migrate is as follows:
SELECT SrId, LeadId, QUOTENAME(Email) As Email, SrDate, ClientID, CampaignId, AffiliateId FROM T_ClientLeads WITH NOLOCK WHERE CampaignId IN ( ... )
Note: I suggest you also tell SQL server to ignore locks with NOLOCK if you have a big source dataset and the server is in use. This will speed up the process quite a bit. I’d also recommend to run a test first with a limited dataset (for example, selecting the top N rows), just in case you’ll need to tweak formats, encoding or the data itself.
Next, we’ll process this query file with SQLCMD, asking it to generate a CSV file for us:
SQLCMD -S localhost -h-1 -E -W -d c_data2fr -i leads.sql -o "D:\leads.csv" -s"|"
The meaning of the options I am using here is as follows (“SQLCMD /?” shows all the available ones):
-s => host to connect to (in my case, the same host running the SQL Server instance -h-1 => switches CSV column headers on -E => makes use of a trusted connection (see I am not specifying credentials here) -W => remove trailing spaces / blanks from each string field value (CSV looks cleaner and will be a lot smaller in size too) -d => the database we want to export data from -i => use a text file containing a SQL statement (as SQLCMD can also be used to process SQL statements inline with the "-q" argument) -o => the destination filename -s => column separator
If all goes well, your destination file will be filled with your test data (I am assuming you are saving yourself some time and running a test first, right?). However, you will notice that at the end of the file SQLCMD adds an annoying
X rows affected.
(where X stands for the number of rows returned by your query). This will later make the import process either fail or give you warnings or import some extra rows with wrong data. To fix that, change slightly your query to remove that information with NOCOUNT:
SET NOCOUNT ON SELECT SrId, LeadId, QUOTENAME(Email) As Email, SrDate, ClientID, CampaignId, AffiliateId FROM T_ClientLeads NOLOCK WHERE campaignid IN ( ... )
By simply prepending “SET NO COUNT” it will have effect right away without having to issue a “GO” statement/command first.
Your data should now be ready to be imported into a MySQL database, provided you have already transferred it to the destination host, and have already created a MySQL table with a schema compatible with the columns stored in your CSV file (the order of the columns is also important). A pretty fast way of importing CSV data into MySQL is with the “LOAD DATA” command. It’s very simple, and very, very fast. Just open a session with your MySQL database using your favourite client, and enter this statement (obviously, change the file name as well as the table name):
LOAD DATA LOCAL INFILE 'leads.csv' INTO TABLE T_ClientLeads FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
Here we are simply telling MySQL to import the data contained in our CSV file, using the same delimiters we have used when exporting from SQL Server with SQLCMD. If all goes well -depending on the nature of your data, you may have to apply some transformations directly in the export query- you should see an output similar to the following once the import is complete:
Query OK, 7991285 rows affected, 14 warnings (12 min 22.15 sec) Records: 7991285 Deleted: 0 Skipped: 0 Warnings: 14
You may see some warnings, but often these can be safely ignored depending on the case (truncated values, for example, if you know why you are limiting the length of a field in the destination table). Check the imported data though, as these warnings (if any) may have a different meaning depending on the data. The time taken obviously depends on the amount of data to migrate, both the origin and destination hosts, and the load on the systems while you are exporting and importing. However the process is reasonably speedy, especially if your destination table can use the MyISAM storage engine (like in my case), but this will also depend on the applications that use that database (transactions, relations, and all that goodness). As you can see, importing 8 million rows on a 4GB Linode VPS took a good 12 minutes (although the VPS has a very, very light load at the moment).
Know a better/easier/faster way of migrating data between these two RDBMS systems? I am all ears!