Returning to my theme of more formal folksonomies. It’s over a year since I wrote that and folksonomies are still very informal.
This makes it hard to find things using automated tools. For example, my depictr toy looks for flickr photos to match keywords in a song. Even when the keyword analysis is good, the pictures returned are of mixed relevance.
RDF would give us the formality we are after, clearly. But the people aren’t going to enter RDF or anything like it in the few seconds they’re prepared to spend tagging.
We want those elements though: subject, predicate and value, from which RDF could be generated. In other words:
- what is being described (a flickr photo, a link in delicious) - which we know
- what dimension of the thing is being described (eg: its creator, date of creation, colour) - which isn’t specified in the folksonomies I’ve used
- the value in that dimension (eg: Mike Harper, 9/6/2006, blue) - which is usually the tag itself
What we get is a word or phrase that has to stand for both dimension and value. For example:
football - meaning topic= football
mike harper - meaning author= mike harper
How to make this more formal? A pragmatic place to start, I think, would be to assume no-one is going to provide more information than they do now when they tag things. What can be done with that?
I’ve been experimenting briefly with Wordnet, a database of words and their semantic relationships. Since in this scenario words are all we are getting, this seems like a useful resource. It can tell us, if it recognises a word, how many senses it has ie: distinct meanings. If it is only known to have one, then we know that the ambiguity of that tag is quite low. If on the other hand it is known to have many meanings, it’s usefulness is limited without further information.
Wordnet also provides hypernyms - words that mean something broader than the word you specify. So for example colour has a hypernym of “visual property". “Visual property” has a hypernym of “property". This might allow some kind of reasoning about the relationship between words, such as whether they represent the same concept.
Perhaps some automated work can be done using resources like Wordnet to try to reassemble some of the meaning intended by the person doing the tagging. I’m going to play with this.