software and services > Common Services > Vocabulary Repository

Terminology resources, such as thesauruses, are primarily a means of communication, meant to describe collections and collection objects in a unified way. Several CATCHPlus subprojects use terminology resources. To connect collections to each other, these resources need to be as open and accessible as possible, without financial, technical or legal boundaries.

To help smooth these boundaries, a Vocabulary and Alignment Repository is built within CATCHPlus. This repository aspires to meet the following goals:

Standardisation of format to SKOS

SKOS is a W3C standard for the representation of terminology resources. It is based on ISO standards (ISO 2788 and ISO 5964:1985), but, as opposed to those standards, SKOS is based on concepts instead of terms referring to each other. Several alternative terms can be added to the concepts as text labels. The concepts have a unique identifier, a URI, that can be referred to.
Thesaurus relations (broader, narrower, related terms) are relations between concepts in SKOS, not between terms. Benefits are for instance expandability and better maintainability.

Technically speaking, conversion of existing terminology resources to SKOS usually isn’t a problem, though sometimes concessions have to be made because of the limitations of the ‘building blocks’ available in SKOS.

Publication as a dataset through REST web service
When a terminology is available in SKOS, it can easily be imported to the CATCHPlus vocabulary repository. The repository then allows the resource to be published on the web.
A so called REST web service API enables collection and search of the data, using standard web methods. The API is mainly used by programmers.
This publication method enables users to search directly for concepts and relations meeting certain search criteria.

Publication as Linked Data
Linked Open Data is a fast growing web of data collections referring to one another on the World Wide Web. These references have the form of URIs as well. In this case, the URIs point to a webpage of a limited data set.
The vocabulary repository offers all concepts as such a limited dataset on the web. This creates the possibility to point to concepts in the repository from the web in a standard way. The other way around works as well: concept descriptions in the repository can be enriched with references to external Linked Data.

Promotion of semantic interoperability
Use of different terminology resources makes simultaneous search of collections and comparison of search results difficult. To lighten this, several projects and organisations are linking concepts in different thesauruses. The vocabulary and alignment repository offers the possibility to save and search these links (“alignments”).

Disconnection of terminology offer and use

Currently, organisations offering thesauruses are often obliged to use specific collection management software. And to use certain thesauruses, support of collection management software is often necessary.

The organisations offering the thesauruses and the companies offering the collection management software both have an interest in disconnecting the two: the organisations offering the thesaurus become more independent of specific management software, the tool builders do not need to invest extra work in supporting each thesaurus separately.

Solving problems concerning licenses
Many terminology resources are available only after payment and under license. CATCHPlus aspires to publish as many resources as possible under open licenses, such as the Open Database License. When this isn’t possible, the vocabulary repository offers the possibility to regulate licenses not per organisation but per user community.

Currently, the repository has been realised based on a so-called RDF Store. A first version of the REST web service was developed in CATCHPlus and is available online. Source code is available as well, under open source license. CATCHPlus also developed a web based browse and search tool to make the repository and its content accessible for end users as well.

In the light of the collaboration between the Institute for Sound and Vision and the National Archive, the GTAA thesaurus is currently made available within the collection management system of the National Archive, using the REST service,.
The first beta version of the vocabulary repository is available online for those involved. The link is available on request through the project office.