Home Posts Gitlab Mastodon

Feeding a SQL server very fast

I was wondering how fast can i insert data in a SQL server from a single SQL script. I though that this will be easy to find out, but i ended up trying a bunch of ways and some of them gave unexpected results. So i decided to write a short article comparing all i tried.

The scripts used can be found here.

Table of contents

Disclaimer

Environment information

Target example

For this experiment, i will be using the AdventureWorks 2019 sample database provided by Microsoft, it can be found here.

We could test just doing a lot of inserts into one table, but in order to make this a more realistic scenario, I’m gonna decide some constrains to the problem.

We have the following 2 tables:

Production.ProductModel

Production.ProductModel table design

Production.Product

Production.Product table design

We need to insert records with the following structure:

As you can see, Production.Product holds a foreign key to Production.ProductModel, so we have to insert the ProductModel’s data first, retrieve the ID and then insert the Product’s data with the corresponding foreign key.

The data that we are trying to insert could be already on the database, so we have to update the rows if Production.ProductModel.Name or Production.Product.ProductNumber already exist.

Note: Product.Porduct.Name also has an unique key constrain, but we don’t gonna handle this case, I’m gonna assume that Name and ProductNumber always go together and the pair never changes.

Summarizing, the steps for each record are:

  1. Check the existence of one ProductModel with Production.ProductModel.Name = ProductModelName.
  2. If exists, update the row data, if not exists insert a new row with the data.
  3. Get the ProductModelID from the recently updated/inserted row.
  4. Check the existence of one Product with Production.Product.ProductNumber = ProductName
  5. If exists update the row data, if not exists insert a new row with the data.

With all this, lets try some things!

Just run a stored procedure per row

This is the simplest way to get the job done, build a stored procedure that receives the data of one record as parameters.

We append this at the start of our SQL script:

USE AdventureWorks2019;

DROP PROCEDURE IF EXISTS #InsertProductsAndModels

GO

CREATE PROCEDURE #InsertProductsAndModels   
    @ProductModelName nvarchar(50),   
    @CatalogDescription xml,
	@Instructions xml,
	@ProductName nvarchar(50),
	@ProductNumber nvarchar(25),
	@SafetyStockLevel smallint,
	@ReorderPoint smallint,
	@StandardCost money,
	@ListPrice money,
	@DaysToManufacture int
AS

SET NOCOUNT ON;

IF (SELECT 1 FROM Production.ProductModel WHERE Name = @ProductModelName) = 1
	UPDATE Production.ProductModel SET CatalogDescription=@CatalogDescription,Instructions=@Instructions WHERE Name=@ProductModelName;
ELSE
	INSERT INTO Production.ProductModel (Name, CatalogDescription, Instructions) VALUES(@ProductModelName, @CatalogDescription, @Instructions);

DECLARE @MID int;
SELECT @MID = ProductModelID FROM Production.ProductModel WHERE Name = @ProductModelName;

IF (SELECT 1 FROM Production.Product WHERE ProductNumber = @ProductNumber) = 1
	UPDATE Production.Product SET
		Name=@ProductName,
		SafetyStockLevel=@SafetyStockLevel,
		ReorderPoint = @ReorderPoint,
		StandardCost = @StandardCost,
		ListPrice = @ListPrice,
		DaysToManufacture = @DaysToManufacture,
		SellStartDate = GETDATE(),
		ProductModelID = @MID
		WHERE ProductNumber = @ProductNumber;
ELSE
	INSERT INTO Production.Product (Name, ProductNumber, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice, DaysToManufacture, SellStartDate, ProductModelID)  
		VALUES(@ProductName, @ProductNumber, @SafetyStockLevel, @ReorderPoint, @StandardCost, @ListPrice, @DaysToManufacture, GETDATE(), @MID);



GO

And the for each row that we want to insert:

EXEC #InsertProductsAndModels N'PMODEL0', null, null, N'P0', N'PCODE0', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL1', null, null, N'P1', N'PCODE1', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL2', null, null, N'P2', N'PCODE2', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL3', null, null, N'P3', N'PCODE3', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL4', null, null, N'P4', N'PCODE4', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL5', null, null, N'P5', N'PCODE5', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL6', null, null, N'P6', N'PCODE6', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL7', null, null, N'P7', N'PCODE7', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL8', null, null, N'P8', N'PCODE8', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL9', null, null, N'P9', N'PCODE9', 1000, 100, 0, 0, 1;
EXEC #InsertProductsAndModels N'PMODEL10', null, null, N'P10', N'PCODE10', 1000, 100, 0, 0, 1;
-- More...

For 50K rows took:

This method can be good enough if you are inserting few rows from time to time, but when you are inserting a lot of rows at once is very slow. Databases like to work with tables and they are optimized for tables, using a stored procedure that works with one row at each time is a bit counterproductive.

So, lets find a better way.

Merge

We have a lot of rows that we have to insert/update in a very constrained way. But what if we create a very unconstrained temporary table first, insert all the data and then deal with our constrained insert/update later?

First, we declare a table variable and insert all the data with just very simple inserts:

USE AdventureWorks2019;

SET NOCOUNT ON;

DECLARE @tmp AS TABLE (
    ProductModelName nvarchar(50),
    CatalogDescription xml,
	Instructions xml,
	ProductName nvarchar(50),
	ProductNumber nvarchar(25),
	SafetyStockLevel smallint,
	ReorderPoint smallint,
	StandardCost money,
	ListPrice money,
	DaysToManufacture int)

INSERT INTO @tmp VALUES (N'PMODEL0', null, null, N'P0', N'PCODE0', 1000, 100, 0, 0, 1)
INSERT INTO @tmp VALUES (N'PMODEL1', null, null, N'P1', N'PCODE1', 1000, 100, 0, 0, 1)
INSERT INTO @tmp VALUES (N'PMODEL2', null, null, N'P2', N'PCODE2', 1000, 100, 0, 0, 1)
INSERT INTO @tmp VALUES (N'PMODEL3', null, null, N'P3', N'PCODE3', 1000, 100, 0, 0, 1)
INSERT INTO @tmp VALUES (N'PMODEL4', null, null, N'P4', N'PCODE4', 1000, 100, 0, 0, 1)
INSERT INTO @tmp VALUES (N'PMODEL5', null, null, N'P5', N'PCODE5', 1000, 100, 0, 0, 1)
-- More...

Now, we use MERGE, this command allows to modify a table in different ways depending on each record from a source. With this we can run an insert or update over the table checking if the key already exist. Merge is the optimal way to do this kind of insert/update when you have a lot of rows.

We run a Merge for the Production.ProductModel table and another for the Production.Product table. (The Production.Product table still needs the correct ProductModel’s IDs).

-- Insert or update all product models at once
MERGE Production.ProductModel AS destination
USING @tmp AS source  
ON (destination.Name = source.ProductModelName)  
WHEN MATCHED THEN
	UPDATE SET CatalogDescription=source.CatalogDescription, Instructions=source.Instructions 
WHEN NOT MATCHED THEN  
    INSERT (Name, CatalogDescription, Instructions) VALUES(source.ProductModelName, source.CatalogDescription, source.Instructions);

-- Insert or update all products at once
MERGE Production.Product AS destination
-- The using table has to contain a join with ProductModel in order to get the ProductModelId
USING (SELECT tmp.*, ppm.ProductModelId FROM @tmp AS tmp JOIN Production.ProductModel AS ppm ON tmp.ProductModelName = ppm.Name) AS source
ON (destination.ProductNumber = source.ProductNumber)
WHEN MATCHED THEN
	UPDATE SET
		Name=source.ProductName,
		SafetyStockLevel=source.SafetyStockLevel,
		ReorderPoint = source.ReorderPoint,
		StandardCost = source.StandardCost,
		ListPrice = source.ListPrice,
		DaysToManufacture = source.DaysToManufacture,
		SellStartDate = GETDATE(),
		ProductModelID = source.ProductModelId
WHEN NOT MATCHED THEN
	INSERT (Name, ProductNumber, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice, DaysToManufacture, SellStartDate, ProductModelID) VALUES(
		source.ProductName,
		source.ProductNumber,
		source.SafetyStockLevel,
		source.ReorderPoint,
		source.StandardCost,
		source.ListPrice,
		source.DaysToManufacture,
		GETDATE(),
		source.ProductModelId);

SET NOCOUNT OFF;

For 50K rows took:

Well, this is like 6 time faster than the previous method, but still a lot of time for just 50K rows. We are doing a lot of individual inserts, lets do better.

Merge with insert values

SQL Server allows to put up to 1000 value rows in a single insert, so lets do that:

USE AdventureWorks2019;

SET NOCOUNT ON;

DECLARE @tmp AS TABLE (
    ProductModelName nvarchar(50),
    CatalogDescription xml,
	Instructions xml,
	ProductName nvarchar(50),
	ProductNumber nvarchar(25),
	SafetyStockLevel smallint,
	ReorderPoint smallint,
	StandardCost money,
	ListPrice money,
	DaysToManufacture int)


INSERT INTO @tmp VALUES 
(N'PMODEL0', null, null, N'P0', N'PCODE0', 1000, 100, 0, 0, 1),
(N'PMODEL1', null, null, N'P1', N'PCODE1', 1000, 100, 0, 0, 1),
(N'PMODEL2', null, null, N'P2', N'PCODE2', 1000, 100, 0, 0, 1),
(N'PMODEL3', null, null, N'P3', N'PCODE3', 1000, 100, 0, 0, 1),
-- Up to 1000

INSERT INTO @tmp VALUES 
(N'PMODEL1000', null, null, N'P1000', N'PCODE1000', 1000, 100, 0, 0, 1),
(N'PMODEL1001', null, null, N'P1001', N'PCODE1001', 1000, 100, 0, 0, 1),
(N'PMODEL1002', null, null, N'P1002', N'PCODE1002', 1000, 100, 0, 0, 1),
(N'PMODEL1003', null, null, N'P1003', N'PCODE1003', 1000, 100, 0, 0, 1),
-- Up to 1000

-- More...

The MERGE part doesn’t change.

For 50K rows took:

That’s an improvement!

You may think: Well, this INSERT VALUES think seems very direct, pretty sure is the fastest way to insert something into a table. RIGHT? RIGHT??? Let me show you something…

If we execute the same script again, it takes 8.212 seconds. WHY? When you execute anything in SQL server, it gets compiled and cached, then if you try to execute exactly the same thing it won’t need the compiling step. This is very useful for complex selects or procedures. But the cache is useless for this kind of inserts as you usually insert different data each time.

The worrying part is: Why did it take almost 17 seconds to compile? This is a pretty simple statement.

I recommend you to take a look at the point 16 of this article, where explains the problem. Turns out that the compilation grows in a non-linear way for this kind of statement. Which doesn’t make any sense for me, Microsoft should fix this.

Knowing that the compilation of INSERT VALUES is very bad. We can find a way to avoid it.

Merge with insert exec

The previously mentioned article shows a hacky way to insert a lot of values without the compilation time hit.

This way is using an INSERT EXEC and the EXEC part receives an string with SELECTS for each individual row that we want to insert.

USE AdventureWorks2019;

SET NOCOUNT ON;

DECLARE @tmp AS TABLE (
    ProductModelName nvarchar(50),
    CatalogDescription xml,
	Instructions xml,
	ProductName nvarchar(50),
	ProductNumber nvarchar(25),
	SafetyStockLevel smallint,
	ReorderPoint smallint,
	StandardCost money,
	ListPrice money,
	DaysToManufacture int)



INSERT @tmp 
EXEC('
SELECT N''PMODEL0'', null, null, N''P0'', N''PCODE0'', 1000, 100, 0, 0, 1
SELECT N''PMODEL1'', null, null, N''P1'', N''PCODE1'', 1000, 100, 0, 0, 1
SELECT N''PMODEL2'', null, null, N''P2'', N''PCODE2'', 1000, 100, 0, 0, 1
SELECT N''PMODEL3'', null, null, N''P3'', N''PCODE3'', 1000, 100, 0, 0, 1
SELECT N''PMODEL4'', null, null, N''P4'', N''PCODE4'', 1000, 100, 0, 0, 1
SELECT N''PMODEL5'', null, null, N''P5'', N''PCODE5'', 1000, 100, 0, 0, 1
-- More...
SELECT N''PMODEL49999'', null, null, N''P49999'', N''PCODE49999'', 1000, 100, 0, 0, 1');

The MERGE part doesn’t change.

For 50K rows took:

This method was 2.2 times faster than the previous one and 25 times faster than the first one. This method is the fastest way that i found (with the restriction of having to generate a self contained SQL script). If you don’t have that restriction there are better ways that are usually proposed, I’m gonna show 2 of them.

NOTE: I also tried to generate an xml string with all the data, extract all the data with a select and then use the merge. But I didn’t bother to put more about it here, it was too cumbersome and it was just slightly better than the insert values method. insert exec still better.

I’m kind of disappointed, that this is the best I’ve got. In the next point you will see why.

BULK INSERT

BULK INSERT is one of the recommended ways to insert a lot of data into a database. This method requires a file (for example a .csv) with all the data, and that file must be reachable from the server, this usually means that you have to copy manually the file to the server before executing anything.

The SQL code is mostly the same as before but uses BULK INSERT to put all the data into a temporary table:

USE AdventureWorks2019;

SET NOCOUNT ON;

DROP TABLE IF EXISTS #tmp

CREATE TABLE #tmp ( 
    ProductModelName nvarchar(50),   
    CatalogDescription xml,
	Instructions xml,
	ProductName nvarchar(50),
	ProductNumber nvarchar(25),
	SafetyStockLevel smallint,
	ReorderPoint smallint,
	StandardCost money,
	ListPrice money,
	DaysToManufacture int)

-- Bulk insert all the data into a temporary table
BULK INSERT #tmp FROM '/bulk_insert.csv' WITH (FORMAT = 'CSV');

-- Insert or update all product models at once
MERGE Production.ProductModel AS destination
USING #tmp AS source  
ON (destination.Name = source.ProductModelName)  
WHEN MATCHED THEN
	UPDATE SET CatalogDescription=source.CatalogDescription, Instructions=source.Instructions 
WHEN NOT MATCHED THEN  
    INSERT (Name, CatalogDescription, Instructions) VALUES(source.ProductModelName, source.CatalogDescription, source.Instructions);

-- Insert or update all products at once
MERGE Production.Product AS destination
-- The using table has to contain a join with ProductModel in order to get the ProductModelId
USING (SELECT tmp.*, ppm.ProductModelId FROM #tmp AS tmp JOIN Production.ProductModel AS ppm ON tmp.ProductModelName = ppm.Name) AS source
ON (destination.ProductNumber = source.ProductNumber)
WHEN MATCHED THEN
	UPDATE SET
		Name=source.ProductName,
		SafetyStockLevel=source.SafetyStockLevel,
		ReorderPoint = source.ReorderPoint, 
		StandardCost = source.StandardCost,
		ListPrice = source.ListPrice,
		DaysToManufacture = source.DaysToManufacture,
		SellStartDate = GETDATE(),
		ProductModelID = source.ProductModelId
WHEN NOT MATCHED THEN
	INSERT (Name, ProductNumber, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice, DaysToManufacture, SellStartDate, ProductModelID) VALUES(
		source.ProductName,
		source.ProductNumber,
		source.SafetyStockLevel,
		source.ReorderPoint,
		source.StandardCost,
		source.ListPrice,
		source.DaysToManufacture,
		GETDATE(),
		source.ProductModelId);

SET NOCOUNT OFF;

And /bulk_insert.csv looks like:

"PMODEL0",,,"P0","PCODE0",1000,100,0,0,1
"PMODEL1",,,"P1","PCODE1",1000,100,0,0,1
"PMODEL2",,,"P2","PCODE2",1000,100,0,0,1
"PMODEL3",,,"P3","PCODE3",1000,100,0,0,1
"PMODEL4",,,"P4","PCODE4",1000,100,0,0,1
More...

For 50K rows took:

This method is more than 3 times faster than INSERT EXEC and if we take a look at the actual execution plan:

Bulk insert execution plan image

Most of the time is in the actual MERGEs, the BULK INSERT is just free. This means that all the other methods were struggling in doing something that SQL server can do instantly.

Using Table Valued Parameters

The previous strategy allows you to execute the SQL script very fast with the caveat that you have to previously copy one file to the server.

The method that I’m going to show now, doesn’t need to previously copy a file, with the caveat that doesn’t allows you to execute a SQL file at all. This requires write a program in C# or any other language with a library that supports Table Valued Parameters.

You can learn about table valued parameters here and here.

This is the small example that I wrote:

using System;
using System.Data;
using System.Data.SqlTypes;
using System.Data.SqlClient;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;


class MainTVP {


private static void Main() {

   string drop_table_type_cmd = @"
DROP TYPE IF EXISTS InsertProductsAndModelsType
";

   string setup_table_type_cmd = @"
CREATE TYPE InsertProductsAndModelsType 
   AS TABLE ( 
    ProductModelName nvarchar(50),   
    CatalogDescription xml,
	Instructions xml,
	ProductName nvarchar(50),
	ProductNumber nvarchar(25),
	SafetyStockLevel smallint,
	ReorderPoint smallint,
	StandardCost money,
	ListPrice money,
	DaysToManufacture int)
";

   string merge_data_cmd = @"
SET NOCOUNT ON;

-- Insert or update all product models at once
MERGE Production.ProductModel AS destination
USING @tvp AS source  
ON (destination.Name = source.ProductModelName)  
WHEN MATCHED THEN
	UPDATE SET CatalogDescription=source.CatalogDescription, Instructions=source.Instructions 
WHEN NOT MATCHED THEN  
    INSERT (Name, CatalogDescription, Instructions) VALUES(source.ProductModelName, source.CatalogDescription, source.Instructions);

-- Insert or update all products at once
MERGE Production.Product AS destination
-- The using table has to contain a join with ProductModel in order to get the ProductModelId
USING (SELECT tvp.*, ppm.ProductModelId FROM @tvp AS tvp JOIN Production.ProductModel AS ppm ON tvp.ProductModelName = ppm.Name) AS source
ON (destination.ProductNumber = source.ProductNumber)
WHEN MATCHED THEN
	UPDATE SET
		Name=source.ProductName,
		SafetyStockLevel=source.SafetyStockLevel,
		ReorderPoint = source.ReorderPoint,
		StandardCost = source.StandardCost,
		ListPrice = source.ListPrice,
		DaysToManufacture = source.DaysToManufacture,
		SellStartDate = GETDATE(),
		ProductModelID = source.ProductModelId
WHEN NOT MATCHED THEN
	INSERT (Name, ProductNumber, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice, DaysToManufacture, SellStartDate, ProductModelID) VALUES(
		source.ProductName,
		source.ProductNumber,
		source.SafetyStockLevel,
		source.ReorderPoint,
		source.StandardCost,
		source.ListPrice,
		source.DaysToManufacture,
		GETDATE(),
		source.ProductModelId);
";

    
    // Table structure in C#
    SqlMetaData[] InsertProductsAndModelsType =
    {
        new SqlMetaData("ProductModelName", SqlDbType.NVarChar, 50),
        new SqlMetaData("CatalogDescription", SqlDbType.Xml),
        new SqlMetaData("Instructions", SqlDbType.Xml),
        new SqlMetaData("ProductName", SqlDbType.NVarChar, 50),
        new SqlMetaData("ProductNumber", SqlDbType.NVarChar, 25),
        new SqlMetaData("SafetyStockLevel", SqlDbType.SmallInt),
        new SqlMetaData("ReorderPoint", SqlDbType.SmallInt),
        new SqlMetaData("StandardCost", SqlDbType.Money),
        new SqlMetaData("ListPrice", SqlDbType.Money),
        new SqlMetaData("DaysToManufacture", SqlDbType.Int)
    };
    // Populate all the data
    var productsAndModelsData = new List<SqlDataRecord>();
    for (int i = 0; i < 50000; i+=1) {
        var currentRow = new SqlDataRecord(InsertProductsAndModelsType);
        var productModelName = System.String.Format("PMODEL{0}", i);
        var productName = System.String.Format("P{0}", i);
        var productNumber = System.String.Format("PCODE{0}", i);
        currentRow.SetValues(productModelName, new SqlXml(), new SqlXml(), productName, productNumber, (Int16)(1000), (Int16)(100), new SqlMoney(0), new SqlMoney(0), (Int32)(1));
        productsAndModelsData.Add(currentRow);
    }

    // Start the connection
    SqlConnection cn = new SqlConnection("server=tcp:127.0.0.1,1433;Initial Catalog=AdventureWorks2019;Integrated Security=false; User=sa; Password=1234ABC?");
    cn.Open();

    var sw = new System.Diagnostics.Stopwatch();
    sw.Start();
    
    // Setup the type in the database
    SqlCommand setup_cmd = new SqlCommand(drop_table_type_cmd, cn);
    setup_cmd.ExecuteNonQuery();
    setup_cmd = new SqlCommand(setup_table_type_cmd, cn);
    setup_cmd.ExecuteNonQuery();

    // Insert all the data
    SqlCommand insertCommand = new SqlCommand(merge_data_cmd, cn);  
    SqlParameter tvpParam = insertCommand.Parameters.AddWithValue("@tvp", productsAndModelsData);  
    tvpParam.SqlDbType = SqlDbType.Structured;  
    tvpParam.TypeName = "InsertProductsAndModelsType";  
    insertCommand.ExecuteNonQuery();

    cn.Close();

    sw.Stop();

    System.Console.WriteLine("Elapsed={0}",sw.Elapsed); 
}


}

I’m not going to give more details about how it works, you can take a look at this great article.

For 50K rows took:

This takes mostly the same as the BULK INSERT approach but with a different set of trade-offs.

Summary

Method 50000 Inserts(s) 50000 Updates(s)
Stored procedure per row 283.575 288.593
Merge 47.820 46.159
Merge with insert values 25.658 25.894
Merge with insert exec 11.397 10.341
BULK INSERT 03.013 03.52
Table Valued Parameters 03.303 03.733

And that’s it! If I got something wrong or you know better ways to do this, send me an email to contact@tano.xyz, I will happily update the article.


Publish date:

Modified date: