Database structure problem with tags, categories

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Database structure problem with tags, categories

Postby BotoBoto » Fri Nov 11, 2011 11:49 am UTC

Ok, I'll keep it short. I have a website with a problem regarding my database. It's mostly about the structure in my database.
I have pictures and videos on my website I want them to be easily categorized and searchable. The pictures and videos are structured as such:
  • Media (all information that isnt specific to the kind of media)
    • pictures (anything correspodning to pictures, width, type of image, height etcetera.)
    • videos ( same as pictures=. )
At the moment the 'media' table also has a 'tags' field, this is where my problem lies. I've read some material on this subject and a lot of them say that for performance and keeping the database more 'sensible' I should use seperate tags tables. So there would be a 'individual_tags' table, and a table connecting them to the media themselves. For easier searching.

I am going to add categories like funny and bikini and fail. What best can I do?

User avatar
Robert'); DROP TABLE *;
Posts: 730
Joined: Mon Sep 08, 2008 6:46 pm UTC
Location: in ur fieldz

Re: Database structure problem with tags, categories

Postby Robert'); DROP TABLE *; » Fri Nov 11, 2011 3:03 pm UTC

If I was doing this, this is what I'd do, though I'm not sure it's the "best" way.

Set up three tables, like so:
  • Your media table, with some sort of unique ID column
  • A 'tags' table, for information about the tags themselves, like their name. This also has some sort of ID.
  • A third table mapping the IDs of the others. i.e. an entry that says media #98 goes with tag #23

To search for everything matching a specific tag, you look for the records in the third table that match a specific tagID, and so forth. This will AFAIK be reasonably fast. If your database allows insertion triggers, you could use them to keep a "running count" of how many posts there are under a given tag very easily, which would let you build the classic "tag cloud."

Someone will probably come along with a better method, but AFAIK that'll work, even if it's not the most elegant.
...And that is how we know the Earth to be banana-shaped.

Pepve
Posts: 57
Joined: Wed Jul 28, 2010 9:47 am UTC

Re: Database structure problem with tags, categories

Postby Pepve » Fri Nov 11, 2011 7:45 pm UTC

Or go with a document database, and documents like this:

Code: Select all

{
  type: "picture",
  title: "Crazy beach photo",
  size: {
    width: 1000,
    height: 800
  },
  tags: ["funny", "bikini", "fail"]
}

Token
Posts: 1481
Joined: Fri Dec 01, 2006 5:07 pm UTC
Location: London

Re: Database structure problem with tags, categories

Postby Token » Sat Nov 12, 2011 12:04 am UTC

Yeah, having a "tags" column in the "media" table is going to make querying a bitch, not to mention that it's poor relational design. Your best option is to have a separate table with a "media_id" column FKed to the "media" table, and a "type" (or "text" or similar) column for storing the actual text of the tag.

Having the tags in a separate table and relating them to the media items in a junction table is probably unnecessary, as the tags are so simple - you'll be forced to make extra joins that you really don't need. I think it's also conceptually a bit misleading to view it as a many-to-many relationship - it's not that the "fail" tag is a single entity that has many pictures, but rather that each of those pictures has its own "fail" tag. That is, I think it's more likely that your code would look something like mediaRepository.getMediaWithTag("fail") than tagRepository.getTagByText("fail").getMedia().
All posts are works in progress. If I posted something within the last hour, chances are I'm still editing it.

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Sat Nov 12, 2011 11:08 am UTC

Token wrote:Yeah, having a "tags" column in the "media" table is going to make querying a bitch, not to mention that it's poor relational design. Your best option is to have a separate table with a "media_id" column FKed to the "media" table, and a "type" (or "text" or similar) column for storing the actual text of the tag.

Having the tags in a separate table and relating them to the media items in a junction table is probably unnecessary, as the tags are so simple - you'll be forced to make extra joins that you really don't need. I think it's also conceptually a bit misleading to view it as a many-to-many relationship - it's not that the "fail" tag is a single entity that has many pictures, but rather that each of those pictures has its own "fail" tag. That is, I think it's more likely that your code would look something like mediaRepository.getMediaWithTag("fail") than tagRepository.getTagByText("fail").getMedia().


So for tags I would make a table seperate linking all tags fields to the media.

table tags
media_id "number"
tags "fail, pleasure"

or for every tag a new row?

User avatar
Zabaron
Posts: 113
Joined: Wed Jun 04, 2008 3:33 am UTC
Location: Georgia

Re: Database structure problem with tags, categories

Postby Zabaron » Sun Nov 13, 2011 7:39 am UTC

Each tag should be a separate row. That way if you want to get all of the pictures tagged with 'funny', you can just use the query "SELECT media_id FROM tags WHERE tag = 'funny';". If you were to try to put all the tags for a given item in the same row, that query would be way uglier.

--Zabaron
I love deadlines. I love that wooshing sound they make as they fly by. -Douglas Adams

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Sun Nov 13, 2011 10:27 am UTC

i search through them with a fulltext search...

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Database structure problem with tags, categories

Postby Steax » Mon Nov 14, 2011 5:09 am UTC

When I was deciding on a tagging system for an old web app, I stumbled across this really useful article which gave an overview over the various schemas. It's old, but still relevant.
In Minecraft, I use the username Rirez.

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Mon Nov 14, 2011 6:22 am UTC

Steax wrote:When I was deciding on a tagging system for an old web app, I stumbled across this really useful article which gave an overview over the various schemas. It's old, but still relevant.


I've actually read that one Steax, but I was confused as to what I was going to use, the three table approach, which gives me a complete oversight and can be used with statistical analysis, or the two table approach which gives me less joins but more rows...

And how should I handle categories? Should they be handled the same way? It's hard to decide as it is hard to switch after deciding.

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Database structure problem with tags, categories

Postby Steax » Mon Nov 14, 2011 6:55 am UTC

My rule of thumb is take whatever you, the developer, finds logical. You can always defeat speed/processing power by throwing more oomph at it, but you must avoid making decisions that will slow things down for you in the future.

I've always used the three-table schema. It makes most sense to me, since the whole point of tags is to give users a way to view more items by association. It also provides an easier way to apply metadata to a particular tag, and apply manipulation to them.

In the end, it depends on what you find comfortable and means the least amount of maintenance. I'm a fan of absolute control, hence my schema - it'd be very annoying trying to rename a tag with a TEXT-field schema, and a bit more difficult to extend a two-table schema. I'm also a fan of normalization. Having a logical list of all tags I have gives me a really useful overview over my site structure, and it's easy to deal with them (rather than a long list of separate tags).

I handle categories the same way.
In Minecraft, I use the username Rirez.

User avatar
RoadieRich
The Black Hand
Posts: 1037
Joined: Tue Feb 12, 2008 11:40 am UTC
Location: Behind you

Re: Database structure problem with tags, categories

Postby RoadieRich » Mon Nov 14, 2011 1:56 pm UTC

Steax wrote:My rule of thumb is take whatever you, the developer, finds logical. You can always defeat speed/processing power by throwing more oomph at it, but you must avoid making decisions that will slow things down for you in the future.

I've always used the three-table schema. It makes most sense to me, since the whole point of tags is to give users a way to view more items by association. It also provides an easier way to apply metadata to a particular tag, and apply manipulation to them.

In the end, it depends on what you find comfortable and means the least amount of maintenance. I'm a fan of absolute control, hence my schema - it'd be very annoying trying to rename a tag with a TEXT-field schema, and a bit more difficult to extend a two-table schema. I'm also a fan of normalization. Having a logical list of all tags I have gives me a really useful overview over my site structure, and it's easy to deal with them (rather than a long list of separate tags).

I handle categories the same way.

It's not overly difficult to migrate from a two table schema to a three table, assuming you name the tables to account that you might: Just keep the tag name as your tag ID, and then the migration is simply something like:

create table Tags (tagname char(32) primary key, <whatever metadata you want>);
insert into tags (tagname) select tagname from imagetags;
<alter table statements to add referential contraints if needed, make types match etc>

Primary keys should be data that already exists in the table unless there's a very good reason: hiding your file structure is an ok excuse for using a numeric media id, there's no excuse for having a numeric tag id.
73, de KE8BSL loc EN26.

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Database structure problem with tags, categories

Postby Steax » Mon Nov 14, 2011 2:41 pm UTC

Yes, switching from 2 to 3 is relatively easy (it's 1 to 2/3 that's horrid). I tend to keep my options open and not constrain myself (even if removing the constraint is trivial), so I usually go for 3. It offers a level of abstraction which, while probably won't be necessary, gives me peace of mind. I simply see no elegance in repeating a string hundreds of times for every tag.
In Minecraft, I use the username Rirez.

User avatar
Xanthir
My HERO!!!
Posts: 5423
Joined: Tue Feb 20, 2007 12:49 am UTC
Location: The Googleplex
Contact:

Re: Database structure problem with tags, categories

Postby Xanthir » Mon Nov 14, 2011 6:11 pm UTC

RoadieRich wrote:Primary keys should be data that already exists in the table unless there's a very good reason: hiding your file structure is an ok excuse for using a numeric media id, there's no excuse for having a numeric tag id.

I disagree. Every table that references your table will use and store your primary key. Numeric keys are small and easy, while strings are much larger, and compound primary keys can be bigger still. I'm not certain, but it may also be more efficient to search/join tables based on numeric keys. Finally, using a numeric synthetic key consistently makes it so that you don't have to worry about what set of columns make up a "unique" row, which is particularly helpful when, in the future, you change the table structure to allow rows that are "duplicates" based on the old columns but differentiated by the new columns. Rebuilding all your indexes that depend on the changed primary key is not fun.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

User avatar
RoadieRich
The Black Hand
Posts: 1037
Joined: Tue Feb 12, 2008 11:40 am UTC
Location: Behind you

Re: Database structure problem with tags, categories

Postby RoadieRich » Mon Nov 14, 2011 10:04 pm UTC

Compare:
SELECT image_id FROM Images JOIN ImageTags ON Images.image_id = ImageTags.image_id WHERE ImageTag.tag_name = %1;
SELECT image_id FROM Images JOIN ImageTags ON Images.image_id = ImageTags.image_id WHERE ImageTag.tag_id = (SELECT tag_id FROM Tags where tag_name = %1;
(You could do it with an extra join, but I doubt you'd gain anything)

Using a synthetic key adds in a whole extra query. Yes, there are cases when your existing data is too complex to form a key, but I'd say that counts as a "very good reason", and should be the minority of cases. In this situation, it definitely is not: you've got a tag, which is easily restricted to a few characters. It also means that if something does break your database, you don't need to cross-reference values as you try to figure out what went wrong. String comparisons are slower than using integers, as you're comparing more bytes, but deciding based on that is premature optimisation that will likely cost you in the long run.
73, de KE8BSL loc EN26.

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Sun Dec 11, 2011 12:16 am UTC

Alright, as a progress report: I've finished the categories via the three table method. For a many-to-many link :) And the tags are still one table. Maybe later I'll try and make that many-to-many.

As for what the result is: http://www.funzors.com/categories/

User avatar
Robert'); DROP TABLE *;
Posts: 730
Joined: Mon Sep 08, 2008 6:46 pm UTC
Location: in ur fieldz

Re: Database structure problem with tags, categories

Postby Robert'); DROP TABLE *; » Sun Dec 11, 2011 12:29 am UTC

Your counts seem to be off, but I'm not entirely sure how. For instance, "WTF?" returns 8 links, but tells me, "1 video, 4 pictures"
...And that is how we know the Earth to be banana-shaped.

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Sun Dec 11, 2011 12:31 am UTC

Robert'); DROP TABLE *; wrote:Your counts seem to be off, but I'm not entirely sure how. For instance, "WTF?" returns 8 links, but tells me, "1 video, 4 pictures"


Yes that's because it isnt updated automaticlly, it's done via a CRON every day. I'd figure that's more efficient than doing counts every time something gets uploaded/deleted? Or not?

Edit I ran the cron script again :) Counts should be good now. Though the ones that don't have atleast 1 video and 1 picture always show "1 video and 1 picture"... I'll work out that bug tomorrow ;)

User avatar
Robert'); DROP TABLE *;
Posts: 730
Joined: Mon Sep 08, 2008 6:46 pm UTC
Location: in ur fieldz

Re: Database structure problem with tags, categories

Postby Robert'); DROP TABLE *; » Sun Dec 11, 2011 12:50 am UTC

Well, is there a lot of processing going on on upload/delete? That would seem to be the ideal place to do it.
...And that is how we know the Earth to be banana-shaped.

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Database structure problem with tags, categories

Postby Steax » Sun Dec 11, 2011 8:50 am UTC

Or you could just have a table with rows for each category, and increment each on upload. There could be a few corner cases I suppose - if this becomes a problem, you can just run the cron script once every, say, month.
In Minecraft, I use the username Rirez.

User avatar
BotoBoto
Posts: 191
Joined: Mon Mar 09, 2009 9:31 pm UTC
Contact:

Re: Database structure problem with tags, categories

Postby BotoBoto » Sun Dec 11, 2011 7:18 pm UTC

Robert'); DROP TABLE *; wrote:Well, is there a lot of processing going on on upload/delete? That would seem to be the ideal place to do it.


I'll probably be doing that, with a weekly CRON update. :) Oh my, my spaghetti code is turning against me :(


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 7 guests