Page 1 of 1

Database structure problem with tags, categories

Posted: Fri Nov 11, 2011 11:49 am UTC
by BotoBoto
Ok, I'll keep it short. I have a website with a problem regarding my database. It's mostly about the structure in my database.
I have pictures and videos on my website I want them to be easily categorized and searchable. The pictures and videos are structured as such:
  • Media (all information that isnt specific to the kind of media)
    • pictures (anything correspodning to pictures, width, type of image, height etcetera.)
    • videos ( same as pictures=. )
At the moment the 'media' table also has a 'tags' field, this is where my problem lies. I've read some material on this subject and a lot of them say that for performance and keeping the database more 'sensible' I should use seperate tags tables. So there would be a 'individual_tags' table, and a table connecting them to the media themselves. For easier searching.

I am going to add categories like funny and bikini and fail. What best can I do?

Re: Database structure problem with tags, categories

Posted: Fri Nov 11, 2011 3:03 pm UTC
by Robert'); DROP TABLE *;
If I was doing this, this is what I'd do, though I'm not sure it's the "best" way.

Set up three tables, like so:
  • Your media table, with some sort of unique ID column
  • A 'tags' table, for information about the tags themselves, like their name. This also has some sort of ID.
  • A third table mapping the IDs of the others. i.e. an entry that says media #98 goes with tag #23

To search for everything matching a specific tag, you look for the records in the third table that match a specific tagID, and so forth. This will AFAIK be reasonably fast. If your database allows insertion triggers, you could use them to keep a "running count" of how many posts there are under a given tag very easily, which would let you build the classic "tag cloud."

Someone will probably come along with a better method, but AFAIK that'll work, even if it's not the most elegant.

Re: Database structure problem with tags, categories

Posted: Fri Nov 11, 2011 7:45 pm UTC
by Pepve
Or go with a document database, and documents like this:

Code: Select all

{
  type: "picture",
  title: "Crazy beach photo",
  size: {
    width: 1000,
    height: 800
  },
  tags: ["funny", "bikini", "fail"]
}

Re: Database structure problem with tags, categories

Posted: Sat Nov 12, 2011 12:04 am UTC
by Token
Yeah, having a "tags" column in the "media" table is going to make querying a bitch, not to mention that it's poor relational design. Your best option is to have a separate table with a "media_id" column FKed to the "media" table, and a "type" (or "text" or similar) column for storing the actual text of the tag.

Having the tags in a separate table and relating them to the media items in a junction table is probably unnecessary, as the tags are so simple - you'll be forced to make extra joins that you really don't need. I think it's also conceptually a bit misleading to view it as a many-to-many relationship - it's not that the "fail" tag is a single entity that has many pictures, but rather that each of those pictures has its own "fail" tag. That is, I think it's more likely that your code would look something like mediaRepository.getMediaWithTag("fail") than tagRepository.getTagByText("fail").getMedia().

Re: Database structure problem with tags, categories

Posted: Sat Nov 12, 2011 11:08 am UTC
by BotoBoto
Token wrote:Yeah, having a "tags" column in the "media" table is going to make querying a bitch, not to mention that it's poor relational design. Your best option is to have a separate table with a "media_id" column FKed to the "media" table, and a "type" (or "text" or similar) column for storing the actual text of the tag.

Having the tags in a separate table and relating them to the media items in a junction table is probably unnecessary, as the tags are so simple - you'll be forced to make extra joins that you really don't need. I think it's also conceptually a bit misleading to view it as a many-to-many relationship - it's not that the "fail" tag is a single entity that has many pictures, but rather that each of those pictures has its own "fail" tag. That is, I think it's more likely that your code would look something like mediaRepository.getMediaWithTag("fail") than tagRepository.getTagByText("fail").getMedia().


So for tags I would make a table seperate linking all tags fields to the media.

table tags
media_id "number"
tags "fail, pleasure"

or for every tag a new row?

Re: Database structure problem with tags, categories

Posted: Sun Nov 13, 2011 7:39 am UTC
by Zabaron
Each tag should be a separate row. That way if you want to get all of the pictures tagged with 'funny', you can just use the query "SELECT media_id FROM tags WHERE tag = 'funny';". If you were to try to put all the tags for a given item in the same row, that query would be way uglier.

--Zabaron

Re: Database structure problem with tags, categories

Posted: Sun Nov 13, 2011 10:27 am UTC
by BotoBoto
i search through them with a fulltext search...

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 5:09 am UTC
by Steax
When I was deciding on a tagging system for an old web app, I stumbled across this really useful article which gave an overview over the various schemas. It's old, but still relevant.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 6:22 am UTC
by BotoBoto
Steax wrote:When I was deciding on a tagging system for an old web app, I stumbled across this really useful article which gave an overview over the various schemas. It's old, but still relevant.


I've actually read that one Steax, but I was confused as to what I was going to use, the three table approach, which gives me a complete oversight and can be used with statistical analysis, or the two table approach which gives me less joins but more rows...

And how should I handle categories? Should they be handled the same way? It's hard to decide as it is hard to switch after deciding.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 6:55 am UTC
by Steax
My rule of thumb is take whatever you, the developer, finds logical. You can always defeat speed/processing power by throwing more oomph at it, but you must avoid making decisions that will slow things down for you in the future.

I've always used the three-table schema. It makes most sense to me, since the whole point of tags is to give users a way to view more items by association. It also provides an easier way to apply metadata to a particular tag, and apply manipulation to them.

In the end, it depends on what you find comfortable and means the least amount of maintenance. I'm a fan of absolute control, hence my schema - it'd be very annoying trying to rename a tag with a TEXT-field schema, and a bit more difficult to extend a two-table schema. I'm also a fan of normalization. Having a logical list of all tags I have gives me a really useful overview over my site structure, and it's easy to deal with them (rather than a long list of separate tags).

I handle categories the same way.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 1:56 pm UTC
by RoadieRich
Steax wrote:My rule of thumb is take whatever you, the developer, finds logical. You can always defeat speed/processing power by throwing more oomph at it, but you must avoid making decisions that will slow things down for you in the future.

I've always used the three-table schema. It makes most sense to me, since the whole point of tags is to give users a way to view more items by association. It also provides an easier way to apply metadata to a particular tag, and apply manipulation to them.

In the end, it depends on what you find comfortable and means the least amount of maintenance. I'm a fan of absolute control, hence my schema - it'd be very annoying trying to rename a tag with a TEXT-field schema, and a bit more difficult to extend a two-table schema. I'm also a fan of normalization. Having a logical list of all tags I have gives me a really useful overview over my site structure, and it's easy to deal with them (rather than a long list of separate tags).

I handle categories the same way.

It's not overly difficult to migrate from a two table schema to a three table, assuming you name the tables to account that you might: Just keep the tag name as your tag ID, and then the migration is simply something like:

create table Tags (tagname char(32) primary key, <whatever metadata you want>);
insert into tags (tagname) select tagname from imagetags;
<alter table statements to add referential contraints if needed, make types match etc>

Primary keys should be data that already exists in the table unless there's a very good reason: hiding your file structure is an ok excuse for using a numeric media id, there's no excuse for having a numeric tag id.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 2:41 pm UTC
by Steax
Yes, switching from 2 to 3 is relatively easy (it's 1 to 2/3 that's horrid). I tend to keep my options open and not constrain myself (even if removing the constraint is trivial), so I usually go for 3. It offers a level of abstraction which, while probably won't be necessary, gives me peace of mind. I simply see no elegance in repeating a string hundreds of times for every tag.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 6:11 pm UTC
by Xanthir
RoadieRich wrote:Primary keys should be data that already exists in the table unless there's a very good reason: hiding your file structure is an ok excuse for using a numeric media id, there's no excuse for having a numeric tag id.

I disagree. Every table that references your table will use and store your primary key. Numeric keys are small and easy, while strings are much larger, and compound primary keys can be bigger still. I'm not certain, but it may also be more efficient to search/join tables based on numeric keys. Finally, using a numeric synthetic key consistently makes it so that you don't have to worry about what set of columns make up a "unique" row, which is particularly helpful when, in the future, you change the table structure to allow rows that are "duplicates" based on the old columns but differentiated by the new columns. Rebuilding all your indexes that depend on the changed primary key is not fun.

Re: Database structure problem with tags, categories

Posted: Mon Nov 14, 2011 10:04 pm UTC
by RoadieRich
Compare:
SELECT image_id FROM Images JOIN ImageTags ON Images.image_id = ImageTags.image_id WHERE ImageTag.tag_name = %1;
SELECT image_id FROM Images JOIN ImageTags ON Images.image_id = ImageTags.image_id WHERE ImageTag.tag_id = (SELECT tag_id FROM Tags where tag_name = %1;
(You could do it with an extra join, but I doubt you'd gain anything)

Using a synthetic key adds in a whole extra query. Yes, there are cases when your existing data is too complex to form a key, but I'd say that counts as a "very good reason", and should be the minority of cases. In this situation, it definitely is not: you've got a tag, which is easily restricted to a few characters. It also means that if something does break your database, you don't need to cross-reference values as you try to figure out what went wrong. String comparisons are slower than using integers, as you're comparing more bytes, but deciding based on that is premature optimisation that will likely cost you in the long run.

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 12:16 am UTC
by BotoBoto
Alright, as a progress report: I've finished the categories via the three table method. For a many-to-many link :) And the tags are still one table. Maybe later I'll try and make that many-to-many.

As for what the result is: http://www.funzors.com/categories/

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 12:29 am UTC
by Robert'); DROP TABLE *;
Your counts seem to be off, but I'm not entirely sure how. For instance, "WTF?" returns 8 links, but tells me, "1 video, 4 pictures"

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 12:31 am UTC
by BotoBoto
Robert'); DROP TABLE *; wrote:Your counts seem to be off, but I'm not entirely sure how. For instance, "WTF?" returns 8 links, but tells me, "1 video, 4 pictures"


Yes that's because it isnt updated automaticlly, it's done via a CRON every day. I'd figure that's more efficient than doing counts every time something gets uploaded/deleted? Or not?

Edit I ran the cron script again :) Counts should be good now. Though the ones that don't have atleast 1 video and 1 picture always show "1 video and 1 picture"... I'll work out that bug tomorrow ;)

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 12:50 am UTC
by Robert'); DROP TABLE *;
Well, is there a lot of processing going on on upload/delete? That would seem to be the ideal place to do it.

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 8:50 am UTC
by Steax
Or you could just have a table with rows for each category, and increment each on upload. There could be a few corner cases I suppose - if this becomes a problem, you can just run the cron script once every, say, month.

Re: Database structure problem with tags, categories

Posted: Sun Dec 11, 2011 7:18 pm UTC
by BotoBoto
Robert'); DROP TABLE *; wrote:Well, is there a lot of processing going on on upload/delete? That would seem to be the ideal place to do it.


I'll probably be doing that, with a weekly CRON update. :) Oh my, my spaghetti code is turning against me :(