Which database is best to work with graphs and tree structured data?

dapper database graph hierarchy tree


I am planning to work with Dapper.NET for a family site.

A lot of tree like data will be present in the structure. Which database provides the best queries to work with cyclic/acyclic tree relations?

I want to know the easiness & performance comparison of hierarchical queries. ie. like CTE in SQL Server, Connect By/Start with in Oracle etc..

Is dapper be the best choice as a Micro ORM for these kind of tree structured data?

I need opinion in choosing the right database and right Micro ORM for this.

Sorry for my bad English.

Popular Answer

My question still stands: How much data do you expect?

But apart from that it's not just the type of database you're choosing for your data it's also table structure. Hierarchy trees can be stored in various different ways depending on your needs.

Table structure

Particular structures may be very fast on traversal reads but slow on inserts/updates (i.e. nested sets), others (adjacency lists) the other way around. For a 99:1 read:write ratio (vast majority of today's applications read much more than write) I would likely choose a modified nested set structure that has left, right, depth and parent. This gives you best possibility for read scenarios.

Database type

Unless you're aiming at huge amounts of data I suggest you go with any of the SQL databases that you know best (MSSQL, MySQL, Oracle). But if your database will contain enormous number of hierarchy nodes then flirting with a specialised graph-oriented database may be a better option.

80 million nodes

If you'd be opting for a modified nested set solution (also using negative values, so number of updates on insert/update halves) you'd have hierarchy table having left. right, ID and ParentID columns that would result in approx 1.2 GB table. But that's your top estimation after at least two years of usage.

My suggestion

Go quick & go light - Don't overengineer by using best possible database to store your hierarchy if it turns out it's not needed after all. Therefore I'd suggest you use relational DB initially so you can get on the market quickly even though solution will start to struggle after some millions of records. But before your database starts to struggle (we're talking years here) you'll gain two things:

  1. You'll see whether your product will take off in the first place (there's many genealogy services already) so you won't invest in learning new technology; Because you'd be using proven and supported technology would get you on the market quickly
  2. If your product does succeed (and I genuinely hope it does) it will still give you enough time to learn a different storage solution and implement it; with proper code layers it shouldn't be to hard to switch storage later on when required

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why