There are many questions about this, but not enough definitive answers, especially using SQL Server which I am in this case.
I have 3 tables to represent blog posts and tags associated with it. I want to run a query that will get all the posts and each posts tags with it - I need both the tag id and name. The result needs to be easily serializable into my C# classes. I am also trying to use Dapper.net but that is not the most important part.
Obviously many websites do this, I want to know the best way and how it should be done in the real world? I could get all the posts, then run multiple queries for each one afterwards that would return the tags for each post. But surely there has to be a better way?
If I was just attempting to get one post then I would just write multiple selects. One for the post and one for the tags. But I want all of the posts. Is there a way of doing this without replicating the post information for each tag row that is returned?
If the tags for each post are serialized into a column that has them comma separated for example, how would one be able to get both the id and the name? Encode the string?
SELECT * FROM dbo.Posts
SELECT * FROM dbo.Tags
SELECT * FROM dbo.PostTags
Posts
Id Title Content
===============================
1 First Post First Content
3 Second Second Content
Tags
Id Name
============
1 C#
2 SQL
3 IIS
4 Steam
5 OpenID
PostTags
PostId TagId
=============
1 1
1 2
3 3
3 4
Simply joining the tables with the following query:
SELECT p.*, t.Name
FROM dbo.Posts p
LEFT JOIN dbo.PostTags pt ON p.id = pt.PostId
LEFT JOIN dbo.Tags t ON t.id = pt.TagId
Gives much redundant data by repeating the post content for each tag that is associated with it:
Id Title Content Name
======================================
1 First Post First Post C#
1 First Post First Post SQL
3 Second Post Second Content IIS
3 Second Post Second Content Steam
This being purely an exercise, let me preface this by saying that most likely the amount of data being duplicated isn't a big deal. Although if the posts are very large in size and there are lots of them, it does start to make more sense to avoid duplication.
Further, using C# Linq-to-Sql or Entity Framework, the object relationships will be worked out for you and your Post
entity will have a List<Tag>
property that you can access.
However if you want to roll your own type of thing, one option that involves just one DB round trip and no duplication of data is to write a stored proc that gets you back 2 recordsets (2 separate select statements) - one with Post content, and one with Tag content.
It would then be pretty simple to create a C# class that represents a Post
and just has a List<Tag>
and pull it from the stored proc results.
Create Procedure GetPostTags
As
-- We will use the GotTags column here to loop through and get tabs later
Declare @Posts Table (
PostID varchar(50),
PostTitle varchar(50),
PostContent varchar(50),
GotTags bit default 0
)
/* Assuming you care about the ID's, this will get you all of
the tags without duplicating any post content */
Declare @PostTags Table (
PostID int,
TagID int,
TagName varchar(50)
)
-- Populate posts from the main table
Insert Into @Posts (PostID, PostTitle, PostContent)
Select * From Posts
-- Now loop through and get the tags for each post.
Declare @CurrentPostID int
Set @CurrentPostID = (Select Top 1 PostID From @Posts Where GotTags = 0)
While @CurrentPostID Is Not Null
Begin
Insert Into @PostTags (PostId, TagID, TagName)
Select pt.postid, pt.tagid, t.name
From Tags t
Join PostTags pt
On t.id = pt.tagid
Where pt.postid = @CurrentPostID
-- Set next loop
Update @Posts Set GotTags = 1 Where PostID = @CurrentPostID
Set @CurrentPostID = (Select Top 1 PostID From @Posts Where GotTags = 0)
End
-- Return 2 recordsets, which are related by the PostID column found in both sets
Select * from @Posts
Select * From @PostTags
I prefer this type of solution over concatenating strings into one string and then splitting them later; it makes it easier to work with the data this way, allows to be more object oriented in C#, and lets you keep track of Tag ID's easier in case tags need to be removed or added to/from a post, you don't need to find a tag or match by name since you already have the ID.