Close

September 3, 2019

The Sheet2RDF VocBench tool

Sheet2RDF is also available as embedded VocBench 3 tool.

Installation

On the VocBench site, we provided detailed instructions for the installation of VocBench.

Sheet2RDF on VocBench

Once the installation is complete, you can start playing with VocBench. The user interface of Sheet2RDF consists of a tab, which can be opened by clicking on Sheetrdf under Toolsnavigation bar menu item.

Open

The UI is composed of three panels:

  • the left panel (1) provides a preview of the datasheet content and allows to customize the transformation process;
  • the right panel (2) allows to managing the PEARL document;
  • the bottom panel (3) shows the rdf triples generated by the system.
User Interface

Now, let’s see how Sheet2RDF works.

The spreadsheet management

The input file (the spreadsheet to be processed) can be chosen and loaded through the Browse button. Once the input file is loaded, a preview of the datasheet is shown into the left panel. In the screenshot below, it is shown how is adopted a color code for the column headers status.

Datasheet preview

The spreadsheet columns need to be configured in order to guide the triplification process. This configuration will guide the automatic generation of the PEARL code which can be triggered by clicking the “play” button. Note that beside this button, two further buttons are available. These allows to export the current configuration so that it could be imported, edited and reused at a later time.

In order to edit the header configuration, a dedicated dialog can be opened by clicking on the small icon near the column name.

The header editor

In order to better understand the usage of the header editor, it’s important to clarify that the PEARL code (which is automatically produced according the configuration of the headers) is the real “brain” of the triplification process. The header editor is just a tool that simplifies the writing of the PEARL code, or even completely relieves the user from the task. In order to completely understand the PEARL language it is recommended to read its documentation, but for the use made of it in Sheet2RDF it is sufficient to know that the PEARL that it is going to be generated is composed essentially by two blocks: nodes and graph. Summarizing briefely:

  • The nodes block defines a set of RDF nodes using CODA converters which, taking as input UIMA features (in our case they represent the content of the spreadsheet cells), precess and transform their content producing RDF values.
  • The graph block uses the nodes defined in the previous and “projects” them, with a SPARQL-like syntax, in an RDF graph according the subject-predicate-object paradigma

So, practically, the header editor allows to define a set of nodes and their application in the graph section (named graph applications).
It is possible to completely exclude an header from the process of generation of the PEARL, simply by checking the ignore header checkbox.

As can be seen in the figure above, the header editor has to immutable fields:

  • Header: tells the name of the editing header
  • PEARL feature: tells the id with which is referred the UIMA features (related to the cells under the current header) in the PEARL code

Despite the PEARL feature identifiers cannot be edited, it is possible to customize the naming strategy. The settings dialog (accessed by clicking the button with the cog icon, next to the spreadsheet file picker) allows to choose among three choices:

  • Column alphabetic index: it names the feature concatenating the prefix col_ with a letter according the alphabetic order: col_Acol_Bcol_C,…;
  • Column numeric index: this is the default choice, it names the feature with the same logic of the alphabetic strategy, but using the column indexes (0-based) instead (e.g. col_0col_1col_2,…);
  • Normalized header name: normalizes the header name by replacing any non-alphanumeric character with a single _. To prevent that multiple header with the same name collide on the same feature name, even this strategy adds a prefix following the Column numeric index one. So, for example, the header preferred label@en, in third column, would generate the feature col_2_preferred_label_en.

Let’s go back to the description of the header editor. Besides the two fields just described, there are two panels: one for the nodes and one for the graph section.
The nodes panel allows only to delete the nodes or to change minor details (like the language of literal converters, or the configuration of uri converters), it is not possible to create nodes directly from here, so let’s focus to the Graph applications panel.

There are two kind of graph application:

  • Simple: allows to mapping directly the header with a property, to generate a node compliant with the range of the chosen property and then to define a mapping with a simple subject-predicate-object statement from the column content.
  • Advanced: as suggested by the name, it is a more complex mechanism than the simple one. It gives to the user more freedom in order to define even complex graph patterns that go beyond a simple S-P-O triple.

Simple graph application

In most cases, the triplification of the content of a column doesn’t require to define a complex graph pattern, the desired output could stick to the simple subject-predicate-object pattern. In this cases the creation of a simple graph application is enough.

The above form guides the user in the creation of a simple graph application. The primary information is the property to be bound to the column, which have to be looked up on the underlying ontology. Depending on the chosen property, the range field is automatically filled (this is read only, it is just useful to inform the user) as well as the range typecombobox which the available value are resourceplain literal and typed literal. The selected value here affects the nature of a further field, indeed in case of resource, a typefield optionally allows to assert a type for the column content (it will determine an additional triple like cell_content rdf:type chosen_type), in case of plain literal, a langfield allows to specify the language and finally in case of typed literal a datatype field allows to specify the datatype of the content.
The last field to fill is node. From the dropdown combobox it can be selected one of the nodes already defined in the current header, or click on the + button for creating a new one. Note that in the example shown in the figure, there is already a node defined. This is defined automatically by the system since, according the heuristics, the @en suffix in the preferred label@en header implicity associates a language tagged value (a literal node) to the comlun content.

If the user chooses to create a new node, a form like the following is prompted.

Here the user needs to provide:

  • an ID of the node
  • a CODA converter with the associated configuration for the generation of the same node. The list of available converters are filtered according the choices made in the previous form. See the tutorial page, for hints on how to choose a suitable converter. As can be seen, in the case taken in example, since the chosen range type was plain literal, there are available only the converters capable to generate a language tagged literal node. Note also that some parameter might be forced as well, like the language forced to english, as chosen previously

The simple graph application determines, in the PEARL nodes section, the serialization of triples like the following in the nodes and graph sections:


nodes = {
...
nodeId node_conversion UIMA_feature/value
...
}
graph = {
...
$subject chosen_predicate $nodeId
...
}
What if the user wants to define a more complex pattern in the graph section? For example, what if from the content under the preferred label@en column it is wanted to generate a reified label (like in SKOSXL)? The simple graph application doesn’t support this intent, so it is necessary to create an advanced graph application.

Advanced graph application

The dialog for creating an advanced graph application consists in two panels: Nodes and Graph. The first basically allows to manage the nodes that are used in the graph pattern. The Graph panel provides a textare for writing the graph pattern. This pattern will be written as it is in the resultant PEARL code.

The following figure shows an advanced graph application that fits the configuration of an header for a SKOSXL lexicalization, indeed it defines two nodes, one for the skosxl:LabelIRI and one for the literal form, and then refers to them in the graph pattern.

In the graph panel there is also a sub-panel that lists the prefix-namespace mappings defined in the VB project and eventually allows to define further mappings useful in the graph pattern.

The creation of this kind of graph applications could be quite complex, so Sheet2RDF provides the possibility to store and load advanced graph applications in an internal storage in order to reuse them. By default, the storage contains already a factory-provided configuration for the mapping of skosxl:Labels. This storage can be accessed (for loading or storing) by clicking on the “floppy-disk” buttons at the top-right corner of the dialog.

Mapping the subject

Beside the configuration of the columns, in the process of PEARL generation, Sheet2RDF needs to know how to mapping the subject. Clicking on the Subject mapping button, the following dialog pops up.

These are the fields foreseen in the form:

  • Header: allows to choose, through a combobox, the column that seeds the generation of the subject node;
  • PEARL feature: this is a read-only field, it shows the id with which is referred the UIMA features in the PEARL code. It depends from the header selected above;
  • Node ID: this is a read-only field too, the subject node ID must be subject that is a reserved ID. In fact, in the editor of the spreadsheet headers, the creation of a node with this name is always denied;
  • Type: optionally it is possible to assert a class membership for the subject;

At the bottom of this form there is a panel for the configuration of the CODA converter that will generate the subject node. Note that in this case the only available converters are those capable to generate an IRI value.

The PEARL editor

Sheet2RDF triplifies the data by following the transformations indicated in the PEARL document. A basic PEARL document can be generated automatically by the system through the generate pearl button (‘play’ icon in the Spreadsheet preview panel). In most of the cases, if all the headers have been configured, it should not be necessary to edit the code, otherwise it may be necessary to eventually replace placeholders (%pls_provide_...%, that is, part of the PEARL which necessarily need to be filled by the user) with a ground value, or just for customizing the transformation differently from the default choices suggested by the system.
A PEARL document can be loaded from the filesystem or exported in order to reuse it later (2).

A side note: in the graph section, every patterns, whether deriving from simple or advanced graph application, excluding those reguarding the subject, are written in an OPTIONALblock. This is necessary in order to avoid the failure of the entire graph projection in case a feature is empty (a feature contains the value of the cell under the related column, so a feature is empty if the referenced cell is empty)

When the PEARL code is valid and completed, it can be executed simply by clicking on the “play” button (1). In case of errors in the code, a warning icon (with details shown in a tooltip) appears to the left of this button.

Triples generation

The bottom panel shows the RDF triples that the system generates. In this panel, the triples will be shown with an alternation of background color. The triples with the same color are generated from the same row of the datasheet. Clicking on a specific row of the triples panel, the source datasheet row will be highlighted as in the figure below.
Finally, after clicking on the add button (1), all the triples are added to the project.
The generated triples can be also exported in different RDF formats (2).

Triples preview