Referred by: Ecosystem -- ASKOSI jar

Data Sources for Concepts definitions#

ASKOSI takes the Concept definitions where they are maintained and publish them in SKOS (demo), as a Java API accessible by surrounding Tomcat applications and as Web application (demo).

ConceptSchemes configuration files#

Each ConceptScheme is defined either:

  1. using a ConceptScheme.cfg file (e.g. a file with extension "cfg" and the same name than the ConceptScheme you want to define)
  2. using a ConceptScheme.xml file where the XML follows the ASKOSI SKOS Schema (http://www.askosi.org/ConceptScheme.xsd)
  3. using a ConceptScheme.skos file where the RDF follows the SKOS/RDF rules and can be transformed into XML using the XSLT rules within skos2skosXml.xsl

If you use a .cfg file, we recommend to start it with an empty line (this ensures Unicode BOM and Java properties parsing do not interfere) and to edit it using a text editor supporting UTF-8.

Those concept definitions can come from:

  1. flows of hierarchized containers like XML,
  2. networks of statements represented in RDF,
  3. rectangular / tabular views like Excel files (or TXT files), SQL databases and SPARQL servers.

Configuration Parameters common to every type of sources:#

  • prefix=prefix to remove : Very often, external sources are placing a common prefix in front of all the identifiers.For instance "http://www.windmusic.org/dspace/handle/68502/" is placed in front of all the handles returned by Windmusic: this prefix must be removed to keep identifiers short and compatible with the ASKOSI identifiers rules.
  • namespace=namespace URL : Schemes are often published originally by other servers. This declaration allows to control the URL prefix for publishing in SKOS/RDF.
  • uri=ConceptScheme URI : Schemes must often be published with a specific URI (that cannot be perfectly deducted from namespace alone). This declaration allows to control the ConceptScheme URI for publishing in SKOS/RDF.
  • title-xx=title in the language identified by "xx". It is important not to forget that ASKOSI stores only the labels for the languages with a ConceptScheme title specified. You must give a title to your scheme for each language to be included in the ConceptScheme.
  • description-xx=general description of the ConceptScheme (text written in the language identified by code  "xx" (optional) ).
  • rights-xx=Copyright and other intellectual property declarations (text written in the language identified by code  "xx"(optional) ).
  • creator99=lastname, firstname of the original authors. If they are ordered, a rank may be specified just after the word "creator".
  • contributor99=lastname, firstname of the contributors. If they are ordered, a rank may be specified just after the word "contributor".
  • internalNote=unilingual note to the co-workers (not for end users)
  • icon=Complete URL of an icon representing the ConceptScheme (16x16 is the ideal size)
  • display=Complete URL to display the ConceptScheme in the original application from which it comes. about represents the code of the current Concept.
  • create=Complete URL to create a new Concept in the ConceptScheme
  • help=Complete URL of an help text about the ConceptScheme
  • translations=en,es,nl,fr : indicates that additional translation files are available in Excel (Tabular TXT file documented further below) for the indicated languages.

XML, RDF/XML and SKOS/RDF/XML:#

Files can be local or downloaded automatically from remote URLs.

If they are not in the ASKOSI SKOS Schema (http://www.askosi.org/ConceptScheme.xsd), they are converted using an XSLT program. For SKOS/RDF, it can be skos2skosXml.xsl.

For all other Schemas, you can specify the best one for your needs using the configuration parameter "xslt" (see below).

The configuration file ConceptScheme.cfg can have the following parameters:

  • type=XML, RDF or SKOS : the file extension (.xml, .rdf, .skos) or the configuration parameter "type" specify if this is an XML file, an RDF/XML file or a SKOS/RDF file.
  • file=local file name if the data comes from a local disk. N.B. .cfg files are Java Properties files: backslashes must be doubled (or replaced by forward slashes) in Windows file names.
  • url=source url if the data comes from remote.
  • xslt=filename.xsl specifies the XSLT program to transform the incoming data within ASKOSI SKOS Schema.
    By default:
    • No XSLT for XML files
    • rdf2skosXml.xsl for generic RDF files
    • skos2skosXml.xsl for SKOS/RDF files
  • chunk=namespace:tag specifies an XML (or RDF) tag in the input file within which parsing is done. The input data flow is therefore treated in "chunks" which begins and ends with the tag specified. Enormous files can be managed using this option.
  • cache=filename.xml : by default ConceptScheme-cache.xml is used to keep the result of the XSLT transformation.
  • refresh=delay (hours) : If the local file is more recent than the cache or if the refresment delay of a remote source is exhausted, the cache is recalculated.

For SKOS/RDF files, we use XSLT to transform them into XML (ASKOSI SKOS Schema). This solution is not perfect because we do not use the full power of OWL tools to automatically reason over RDF properties. But, using "chunks", the file can be bigger than what fits in memory. Sometimes, the XSLT program skos2skosXml.xsl have to be adjusted to adapt to different representations of SKOS data in RDF.

By the way, ASKOSI Web Application can create RDF SKOS files (demo)

Rectangular / tabular views:#

A SQL database#

TXT files:#

For a ConceptScheme named "classif", the classif.cfg file looks like this:

type=TAB
title-en=Classification System...
description-en=The Classification Schema ...
title-es=Sistema de clasificación ...
title-fr=Système de classement ...
title-el=Το ταξινομικό σύστημα ...
description-es=El esquema de clasificación ...
description-fr=Le système de classement ...
description-el=Το ταξινομικό σύστημα ...
creator1=Tintin
creator2=Milou
contributor01=Blo, Joe
contributor02=Smith, Sam...


You also create files using Excel (or any other program able to create CSV files with Tabulations as a delimiter and UTF-8 WITHOUT BOM for character encoding).

The languages accessed are those for which a title has been defined (here: en, es, fr, el).

SparQL Server:#

A configuration file is used to define the server itself. For instance dbpedia-pool.cfg could contain:

validation=SELECT ?s WHERE {?s ?p ?o} LIMIT 1
url=http://dbpedia.org/sparql

This server can return JSon or XML results. Then for a given scheme (for instance country.cfg), you can parameterize the SparQL queries for making the semantic adaptation of the RDF data source as an SKOS vocabulary:

type=SPARQL
pool=dbpedia
title-fr=Pays DBPedia
title-en=Country DBPedia
title-es=País DBPedia
title-de=Land DBPedia
title-nl=Land DBPedia
prefix=http://dbpedia.org/resource/
display=http://dbpedia.org/resource/[about]
## about, label-fr, label-en...,
labels=PREFIX yago:  PREFIX rdf:  PREFIX dbprop:  PREFIX dbpedia-owl:  PREFIX rdfs:  \
SELECT DISTINCT ?about ?label WHERE { {?about rdf:type dbpedia-owl:Country; rdfs:label ?label} UNION {?about rdf:type yago:GeoclassCapitalOfAPoliticalEntity; rdfs:label ?label} OPTIONAL {?about dbprop:yearEnd ?yearEnd} FILTER (!bound(?yearEnd)) }
broaders=PREFIX yago:  PREFIX rdf:  PREFIX dbprop:  PREFIX dbpedia-owl:  PREFIX rdfs:  \
SELECT DISTINCT ?about ?broader WHERE { ?about rdf:type yago:GeoclassCapitalOfAPoliticalEntity; dbpedia-owl:country ?broader}

The languages accessed are those for which a title has been defined (here: fr, en, es, de, nl)

Files from a Directory and its subdirectories:#

For instance, to create a scheme with all files in directory D:\dspace\config\EXPORT and with the extension .CSV, you can create a scheme configuration file containing:

type=FILES
title-fr=Formats d'exportation
title-en=Export formats
directory=D:\\dspace\\config\\EXPORT
extension=csv



Java Multilingual Bundles:#

Create a scheme with all messages in a ressource bundle (the key being the concept code and the messages being the prefLabels). The name of the Bundle is specified using the configuration parameter "resource".

type=BUNDLE
title-fr=Messages
title-en=Messages
title-es=Messajios
title-de=Messagens
title-nl=Messagen
resource=Messages

The resource bundle name is the one needed by java.util.ResourceBundle.getBundle method. The languages accessed are those for which a title has been defined (here: fr, en, es, de, nl)

Java ENUM Classes:#

Create a scheme with all possible values of an enumerated type Java class.

type=ENUM
title-fr=Statuts
title-en=Status
class=com.domain.package.myJavaClass

Configuration Parameters for other situations:#

Notations:#

A concept has a main identifier ( schemeAbout_conceptAbout ) and it may have multiple notations (alternate identification code systems). For instance, a chemical substance can have a CAS, an EINECS, etc.

In ASKOSI, the alternate notations are named like schemes. For instance, one may recognize "water" in CAS_7789_20_0 : CAS is a notation scheme which provides an alternate identification access to a scheme made of substances.

CAS.cfg will contain only one line:

notationOf substances

This is a kind of "redirection" toward the scheme configuration file that must provide detailed configuration of the scheme but also of all its notations.

Collections:#

A collection is a set of concepts. An example with the collection of the languages supported by an application (userLanguage.cfg):

type=SELECTION
title-en=Users' languages

title-fr=Langues des utilisateurs

title-de=Sprache der Benutzer

title-es=Lengua de los usuarios

default=language

selection=fr,en,de,es}
default=scheme indicates what is the scheme of the concepts in the collection.

{selection=
}conceptAbout, ... ,conceptAbout : Codes of the concepts in the collection (french, english, german and spanish in the above example).

 

 

Ajouter un attachement

Seuls les utilisateurs autorisés peuvent publier de nouveaux attachements.
« Cette page (révision-30) a été modifiée pour la dernière fois le 03-déc.-2011 17:17 par Christophe Dupriez  
Referred by: Ecosystem - ASKOSI jar