One of the things that I to do over and over on Dynamics Crm projects is to load existing data in. Using the api’s is usually ok for this but one project had a large number of records in the region of 30m which needed importing and the api’s just weren’t cutting the mustard. I did come up with a supported approach to load lots of records in quickly which I’ll detail here.
The headline number for speed was I increased performance from about 50 records / s to about 300 records per second on an all in one environment running in a virtual machine on my laptop (quad core i7, 8GB Ram, SSD disk). I had left the project before they ran it in anger, so I never found out how fast it really went!
I apologies to all the Crm Dev’s out there that like to just copy and paste code or download a tool to solve their problems as I don’t have that for you. All I have is my experience of the problem and memories of how I solved it. I will set this out in this post and will leave it up-to you to implement your own solution.
*Update on 7th April – I wrote this post in response to a tweet by Mike Read (http:/twitter.com/xRMMike ) asking for ideas on bulk loading records. Mike just let me know his implementation worked at about 450 records/s on a small dev rig. Great work Mike, I would love to know what it does in a production environment!
<shamelessPlug>Of course I am a freelance developer so you could hire me for a few days to do it for you!!! http://www.seethelink.co.uk </shamelessPlug>
The Main Stumbling Block – IIS
There were a number of problems I encountered when dealing with this sized data set but the main one was:
Dynamics Crm api’s are hosted in IIS / WCF. This causes technical problems as WCF and IIS are setup to stop DOS attacks so when you start hitting it really hard it just turns its self off before starting again.
When using the api’s IIS was also doing all the processing and was not very easy to scale out, I had some limited success creating a web garden but it just wasn’t fast enough.
I tried all sorts of things, multi treaded loader apps, async web requests, a number of clients machines firing in requests. I just couldn’t get past the bottleneck of IIS.
In a (fairly big) nutshell the solution is this:
- Create a tool to prep the data. I chose to do a winforms app, but it could be a console app I suppose.
- Chunk the records you want to insert into batches. From memory I did 1000 records at a time.
- Create a representation of the data in some kind of text based format. I used an XML schema for this. If I was to do it again I would probably use JSON or JSV format as it is less verbose and would allow more data to be represented and thus increase performance..
- Serialise your batches of 1000 records into your text based format.
- Zip up the text based format with some kind of compression tool (lots of zip libraries out there)
- Turn the zip into a Base64 text representation
- Create a ‘EntityBulkLoadReciever’ entity (or something like that)
- Create the biggest text field you can on the ‘EntityBulkLoadReciever’ entity. I think I called this field ‘Payload’.
- Create other text fields on it to records various metrics you are interested in.
- Create an OnCreate plugin for the ‘EntityBulkLoadReciever’ entity.
- In this plugin, take the text out of the PayLoad field. De-code it from Base64. Un-compress it. Loop through each of the records in the batch, turn them into the appropriate Dynamics Crm Entity and insert/update/delete them using the IOrganization Service in the PluginContext.
- Clear out the ‘Payload’ field on the TargetEntity otherwise you will bloat your database
- Put any metrics you are interested in recording for that batch into the extra fields you created on the ‘EntityBulkLoadReciver’ entity. (I think i stored excpetions, successful creation count, failure count on mine)
- Register the plugin as ‘Async’ (this is the really really important bit)
- Loop through every batch of 1000 records you have, Put your compressed, Base64 text representation of them into the ‘Payload’ field or a new ‘EntityBulkLoadReciever’ entity.
- Fire the new ‘EntityBulkLoadReciever’ entity with payload of 1000 records at the standard Dynamics Crm api
- Watch the Async service consume all processing power and RAM on your server
- Watch 100’s or records / second get created in the database.
Why this works
- By batching and compressing your data you can deliver 1000 records per request, rather than doing 1000 web requests. This gets past IIS/WCF trying to be helpful and stopping this as it thinks it is a DOS attack.
- By using the IOrganization service inside of the on create Plugin Context means you are already inside the Plugin Pipeline, I’m not 100% technically how it works, but it appears that the organisation service turns the new inserts into sql calls rather than calling the webservice again. This makes it uber fast, probably just sql statements, to insert each payload.
- By putting the plugin on the Async service takes the load of the IIS thread. The Async service is already configured to use all processors on a machine, hell you can even scale it out and add a few extra Async servers if you have a big 1 off load.
I hope this is useful to those of you who face the same challenges as I did. The recently released Dynamics SDK does provide a new message to batch records together to improve bulk load performance which might be worth looking at. Unfortunately I haven’t had the time to do that yet.
If you have any questions please contact me through twitter: http://twitter.com/davehawes