I'm using Dapper to insert data from realtime feed into Sql Server, so I care about performance. Recently, I noticed something strange.
Out of the box, if you give Dapper a collection and an insert query, it fires insert statements for each element. My tests show I can insert about roughly 1800 objects with 12 fields in 1 second this way (counting only connection.Execute(...)
running time.
Now, I didn't find batch insert functionality in Dapper and implemented my own (constructing parameter list and sql query). After that, I found out I can only insert one batch in about 3 seconds (which is limited to 1000 rows) (again, only counting connection.Execute(...)
calls.
So, that makes my batching almost 6 times slower than sending each row in a separate query. Can someone explain that to me? I thought people use batch operations to speed up the process.
I would like insert time to be 1 second at the most.
I use Sql Server 2012 Standard which is on a local network. The table I'm inserting to only has clustered index on primary key (which is bigint field), no non-clustered indexes and triggers.
I can post the code, but there's really nothing special
I'm not sure why you are using Dapper Execute extension method if you want the best performance available.
The best free way to insert with the best performance is using the SqlBulkCopy
class directly.
Disclaimer: I'm the owner of the project Dapper Plus
This project provides easy support for the following operations:
Example:
// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
.Table("Orders")
.Identity(x => x.ID);
// CHAIN & SAVE entity
connection.BulkInsert(orders)
.AlsoInsert(order => order.Items);
.Include(x => x.ThenMerge(order => order.Invoice)
.AlsoMerge(invoice => invoice.Items))
.AlsoMerge(x => x.ShippingAddress);
The ideal batch size can vary from Server to Server, the key factor is Logging, this will affect how resource intensive the Insert is and how large you can make your batches before you see a performance drop-off.
The key to fast Inserts/Updates is ensuring you meet the requirements for Minimal Logging, please look at this Microsoft White Paper. Don't panic - you don't need to read all of it - look at the table describing the conditions to meet 'minimal logging' (you will need to talk to your DBA).
Once you've got as little Logging as possible, then you need to look at your execution plans for SPOOLS, if a batch starts to spool to Tempdb then you will hit a dramatic slow down. The key is to keep the batch small enough to remain in the buffer (RAM). But the amount of buffer space available will depend on other processes.
Note: TABLOCK is not the same as TABLOCKX.