I have a table in SQL Server that is storing files in binary format. Each row is on average ~3MB and there are tens of thousands of rows. What I'd like to do (since I must keep these tables around), is query each row, then run some compression on the binary data, and then re-insert the data (by updating each row).
My current naive implementation simply does something similar to this (using Dapper):
var files = con.QueryAsync<MyClass>("SELECT ID, Content from Files");
foreach (var file in files)
{
... compress file.Content here
con.ExecuteAsync("UPDATE Files SET Content = @NewContent WHERE ID = @ID", { ... });
}
Obviously this is very inefficient because it first loads all files into memory, etc... I was hoping can somehow do a query/update in "batches", and IDEALLY I'd like to be able to run each batch asynchronously (if that's even possible).
Any suggestions would be appreciated (using SQL Server BTW).
Entire operation could be done on db instance, without moving data over network to application and back, using built-in function COMPRESS:
This function compresses the input expression, using the GZIP algorithm. The function returns a byte array of type varbinary(max).
UPDATE Files
SET Content = COMPRESS(Content)
WHERE ID IN (range); -- for example 1k rows per batch
If you are using SQL Server version lower than 2016 or you need "custom" compression algorithm you could use user-defined CLR function.