some guidelines on how to choose metadata for your ECM

So, you want to implement an ECM solution. Great idea! So you want to store all your documents and structured information in a digital way so you can search on it, create mash-ups, archive it, manage it and so on..
The best way to add extra value to these items is to add metadata. How do you start with such a task? How do you pick the right metadata tags for your information?
Some guidelines to think about:

Categories of metadata
Descriptive: describes the content for search: (ISBN number, tags, title,…)
Structural: where does this item belong to? (project number, number of pages,..)
Administrative: to manage the item (who has access, is it included in a search scope, who created the item and when,…)
technical: technical information on the item (extension, ..)
use: information on the use of the item (#views, who changed it last and when,…)

Type of metadata
Every piece of metadata has to have a specific format. Do you want users to include free tags? Or a date? of a number? Or a specific code of 9 numbers followed by 3 letters? For every metadata item, think about what type would best suit it.
Some examples:
– single line (with or without a maximum number of digits)
– multiple lines
– numeric (integer value like 1;2;3 or comma value like 2,4 ; 3,6)
– date or time
– set list (like a list of countries, a fixed list or can users add their own values?)
– yes/no
– rating (which could be numerical)
– person
– logical value (calculation or result of an “and/or/not” function)

Categories: Taxonomy vs folksonomy
Categories itself is a huge topic. I will only write the basics here, because this is a strategy of tagging information:
– folksonomy: let users decide what info they want to give with tags (like you have on blogs)
– taxonomy: declare metadata for the organisation based on a certain insight

Most companies go for the taxonomy system, because of the amount of control they have on the information.

Where to define metadata?
So, where do you want to define your metadata?
Ofcourse as high up the chain as possible, to cover as much of items as possible with 1 solution. But do you declare a certain type of item (content type, like a contract, telephone number, task,…) with their metadata ?

Or do you want to create a repository containing specific metadata? Think about this before you implement it. Most of the time you will get a combination of the 2.

All that extra work for my end users!
If you have to add extra information every time you add something, users will become upset and the quality of the metadata will go down. This will lead to a mess, and users won’t find their items in the search anymore, resulting in a downwards spiral of not adding metadata anymore and not finding their documents.
Doctorow stated even in his 2001 essay ‘Metacrap: Putting the torch to seven straw men of the meta-utopia’ that you cant use metadata because :
1) People lie
2) People are lazy
3) People are stupid
4) Mission Impossible: know thyself
5) Schemas aren’t neutral
6) Metrics influence results
7) There’s more than one way to describe something

Some tips:
limit the amount of metadata to a maximum of 7.
– try to have as many as possible automatic values. This could mean you have to insert some logic behind it..(example: if I am adding a document for customer “XYZ” in this specific repository, I don’t want to add clientname = “XYZ” again! But when I search for “XYZ”, the document should come up in the results.)
– stress on the importance of the quality of the metadata. Show them how easy it is to find your items again.
– create a search / results page that uses the metadata. A great example I like is the facetted search for SharePoint. This will use the metadata to filter your search results.
– add a description for each metadata item with the validation rules for the item. This will help users fill it in the correct way.


About: Marijn

Marijn Somers (MVP) has over 14 years experience in the SharePoint world, starting out with SP2007. Over the years the focus has grown to Office 365, with a focus on collaboration and document management. He is a business consultant at Balestra and Principal Content Provider for "Mijn 365 Coach" that offers dutch employee video training. His main work tracks are around user adoption, training and coaching and governance. He is also not afraid to dig deeper in the technicalities with PowerShell, adaptive cards or custom formatting in lists and libraries. You can listen to him on the biweekly "Office 365 Distilled" podcast.