Relational database design - surrogate keys vs natural keys in context of query speed, ORM and app development

dapper database-design orm relational-database sql

Question

Suppose I have a set of 'model' entities, and a set of difficulty levels. Each model has a certain percentage success rate for a given day on a given difficulty level.

A model entity has a name, which is both unique and immutable under any circumstances, so it makes a natural primary key. A difficulty level is described by its name only (easy, normal, etc). Difficulty levels are very, very unlikely to change, though it's possible a new one could be added. A success rate record is uniquely identified by the model it pertains to, the difficulty level, and the date.

Here's the most trivial db design for this scenario: db design with natural keys

In this design, 'name' is the primary key for the table 'models' and is represented by a VARCHAR(20) field. Likewise, a VARCHAR(20) field 'name' is the primary key for the table 'difficulty_levels' (a lookup table). In the 'success_rates' table, 'model_name' is a foreign key referencing the 'name' field in the 'model' table, and 'difficulty_level' is a foreign key referencing the 'name' field in the 'difficulty_levels' table. The fields 'model_name', 'difficulty_level' and 'date' make up a composite primary key for the 'success_rates' table.

The most used queries would be:

  • getting all success rates for a certain model, difficulty level, and date period

  • getting the most/least successful models for a certain period and difficulty level.

Now, my question is - do I need to add surrogate primary keys to the 'models' and 'difficulty_levels' tables? I guess storing int values as opposed to varchar values in the foreign key fields of 'success_rates' takes up less space, and maybe the queries will be faster (just my wild guess, not sure about that)?

The problem I see with surrogate keys is that they have zero relevance to the business logic itself. I'm planning on using a mini-ORM (most likely Dapper), and without the surrogate keys I can operate on classes that very cleanly represent the entities I'm working with. Now, if I add surrogate keys, I'll have to add 'Id' properties to my classes, and I'm really against adding a database storage implementation like that to a class that can be used anywhere in the app, not even in connection with a database storage. I could add proxy storage classes with an Id property, but that adds another level of complexity. Plus the fact that the 'Id' property won't be readonly (so the ORM can set the ids after saving the entity to the database) means that it would be possible to accidentally set it to a random/invalid value.

I'm not very familiar with ORM's and I have zero knowledge of Dapper, so correct me if I was wrong in any of these points.

What would be the best approach here?

Popular Answer

The problem I see with surrogate keys is that they have zero relevance to the business logic itself.

That is actually the benefit, not the problem. The key's value is the immutable identifier of the entity, regardless of whatever else changes. I really doubt that there is any widely used ORM that cannot easily make use of it, as the alternative (cascading changes to a key value to every child record) is so much more difficult to implement.

I'd also suggest that you add a value to your difficulty levels that allows the hierarchy of increasing difficulty to be represented in the data model, otherwise "harder than" or "easier than" cannot be robustly represented.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why