Sponsored by Best Buy
Best Buy casts another employee in holiday campaign. view!
youtube.com/bestbuy0 - Jarice Brodie has done some cool things in his life. Next: Best Buyâs holiday campaign.
10 Comments
- bitwiseplatypus, on 10/12/2007, -3/+8This isn't blogspam. This is an original, interesting, well-written article that happens to be on the blog of the submitter. There's nothing wrong with that.
- senzafine, on 10/12/2007, -0/+1Good article.
FotoFlix implements this by adding a layer on top of tags that lets you create a heirarchy of tag combinations. Theyāre called QuickSets but what they are really just dynamic tag groupings in a heirarchial layout. An example would be:
Family (tags:family)
+Kim (tags:family,kim)
+Barbi (tags:family,barbi)
Vactions (tags:vacations)
+Hawaii (tags:vacations,hawaii)
example usage: http://www.fotoflix.com/users/jmathai/fotos/ - crsdigger, on 10/12/2007, -2/+3Definitely not blogspam. How else would you be able to publish something such as a well-written college paper online. I have had problems with having so many links on del.icio.us. This article addresses an issue that is only going to become more prevalent as more websites implement tagging. All you have to do is use a firefox extention like foxylicious to find out how unorganized tags can make things feel without structured tags. I DIGG IT!!!
- zooie, on 10/12/2007, -0/+0I received an insightful comment on this blog post - I've included it here (and my response) just to keep everything in one place.
Klyith Says:
March 28th, 2006 at 5:19 pm e
Iāve seen other proposals for āstandardizedā, āstructuredā or āhierarchicalā tagging methods before⦠Whether for other web sites, music, general metadata, whatever. Honestly, youāre just reinventing the wheel, and in a really inferior way too. Librarians have been dong this stuff for decades. Itās called LC Subject Classification (http://www.loc.gov/catdir/cpso/lcco/lcco.html) and it covers pretty much everything that anyone has ever written about.
And itās being destroyed by keyword and natural language searching. Nobody wants to have to memorize a giant and difficult heirarchy of subjects, when they can just type in whatever they use to describe what they want to find. You wonāt get results from people who use different words and descriptions than you do, and you might not get the best possible results, but youāll get results anyways.
The lesson is people are lazy.
2. zooie Says:
March 28th, 2006 at 9:31 pm e
Thanks for the comment Klyith.
I agree - Iām not saying my design is novel, nor do I think normal users will do hierarchical multi-labeling on their content (as I mentioned in my post). But, I do think content providers or power users can use this scheme to better organize their data.
Thanks for the link to the LC subject classification hierarchy. My scheme is pretty much the same except I enable content providers to devise their own hierarchy specific to their data. For example, consider if Google News provided me these tags for a science/tech article:
News.Computer.Hardware.Ulta-Portable.Orgami, Ideology.Geeky.Anti-Microsoft, People.BillG, Source.Online.Blog
This gives me tremendous value - gives me context AND the ability to search for articles in any of the labels (and any of the hierarchies within a label).
Now Iām pretty sure LC hierarchy doesnāt have tags this specific - thatās why I think content providers should be able to customize their own labels. Additionally, data/info/news change all the time, so labels should evolve with the data (which content providers can do here since they control the structure of the tags).
But none of this is new, itās simply a tree of tags. Tagging many labels to a document is also not new. Iām just describing some of their benefits compared to the tags Iāve seen that look like simple keywords.
However, one point I donāt quite understand is why this is a āreally inferior wayā. It sounds like the same thing as the LC classification system. The crux of this post is to show how this mere structure could lend to a clean machine learning algorithm for doing all of this tagging stuff for us automatically. - zooie, on 10/12/2007, -0/+0And more comments from the blog ...
# Matt Says:
March 29th, 2006 at 8:32 am e
The whole point of tagging is the insight that the world is not hierarchical.
If the world was structured hierarchically ontologies such as LC would work and machine learning would be a lot simpler.
But it isnāt.
So we use tags. And we use machine learning to find some (not necessarily hierarchical) structure among the tags. If user A uses tag X and user B uses tag Y to describe the same concept itās not too difficult to find that, statistically, X and Y are equivalent. And if X is a subconcept of Y we can deduce this, given enough data.
So machine learning plays a role (or should play in the near future). But not in the sense that we structure our descriptions such that they are easy to process for a machine learning algorithm. We structure them so that they make sense to us.
# zooie Says:
March 29th, 2006 at 11:28 am e
Hi Matt - Thanks for the comment. I agree with you, simple tags with no concern of hierarchy is definitely easier, and there are ways to still do machine learning given enough data. However, Iām not sure if I agree that ātagging is the insight that the world isnāt hierarchicalā, considering the number of Delicious sites Iāve seen which attempt to organize their tags in groupings using the slashing technique, or the fact that many people organize their emails and files in hierarchies (i.e. file/directory).
As Iāve mentioned in my post, I donāt think many people will use the technique Iāve outlined above - especially when the alternative of just laundry listing common keywords is so much easier. In these cases weāll need to come up with better algorithms to machine learn the tags.
However, I looked at this problem with a āWhat if I wanted to machine learn the tags corresponding to my data right now, what would be the best way to organize my tags to maximize prediction accuracy?ā mindset. This is in a more ideal but controlled world where Iām willing to put in more effort than I really should to give my learner an easier time. To do this, I organize the tags in a dependency graph (which incidentally gives me hierarchies for free).
Tag trees tell us a ton about how labels are related - the leaner I describe uses it to understand the overlap among tags for error correction. Now currently in machine learning literature we have pretty well understood algorithms for doing binary classification (YES/NO) and ok stuff for doing multiclass (Select one of the following: Conservative, Liberal, Moderate). However, there isnāt much literature/standard techniques when it comes to multi-labeling (probably because people got their hands full with the previous problem).
Say this is one of our training examples:
foo.html -> {News.Political, Idelogy.Liberal.Marxist, etc.}
Itās difficult for our leaners to do this because it needs to find features in foo.html that correspond to each tag and implicity find relations among tag groupings (since they arenāt disjoint anymore like in multiclass). Basically what Iām trying to say is multi-labeling is a very difficult problem, and at the point where multiclass is already hard enough, expecting our learners to statistically learn how to do unstructured multi-tagging might be asking for a lot. My quest in this post is to examine how structuring tags could help learners better predict, and I try to make a case for that and the possibility that it might even perform better than multiclass learners. - s1men, on 10/12/2007, -0/+0If you have a relatively unknwon site, chances are _nobody_ will ever find out about it, and therefore it will not be dugg. If you've got something interesting on your blog, you should be allowed to post it. Most people who submit their articles to digg seems to have quite the opposite, ugly, boring blogs with no quality whatsoever. That's not the case for _everyone_ though.
- zooie, on 10/12/2007, -0/+0I've updated the article today (3/29) with a motivation section.
- dmoffitt, on 10/12/2007, -2/+1good, thx
- inactive, on 10/12/2007, -8/+4Yay. Blogspam.
- inactive, on 10/12/2007, -6/+2If you have something worth being dugg, chances are SOMEONE WILL SUBMIT/DIGG IT.
Otherwise, you are just spamming digg. You're just throwing everything you have at digg and using it as an advertising service for your site.


What is Digg?