INTER-LAYER ā€“ Data and Semantics to Data and Semantics (DS2DS): Representing alignment in IPSM-AF

(Getting Ready) Scenario description

Lets assume that we have two semantically-annotated data models that are used by source and target communication parties. The analysis of correspondences between them has been done according to Recipe 4 - Mapping data models via alignments. This recipe describes how to express the alignment in IPSM-AF format that is consumed by IPSM tool and can be directly used to perform semantic translation.

Note, here we will concentrate on the alignment format and patterns for translation. In Recipe 6 - Using IPSM-AFE for alignment edition and validation we present a CASE tool that can be used to work with alignment in IPSM-AF format.

Recipe ingredients

  • Text editor or IPSM-AFE.

Prerequisites

(How to Do it)

IPSM-AF alignment format documentation is available here. Sample alignments are available in the repository.

A common ingredient in preparation of IPSM-AF alignment are patterns - an organized snippet templates that come from experience, appear often, and help in thinking about, and designing alignments. Here we present a list of patterns used in INTER-IoT. The patterns are presented on the following template:

[ENTITY1] [ENTITY2]
(optional)

[TRANSFORMATIONS; TYPINGS; FILTERS]

The input and output RDF patterns are presented next to each other and labeled ENTITY1, ENTITY2 (to follow naming convention used in IPSM-AF syntax). If any transformations are used, they are placed below RDF patterns. The best way to read this notation is to follow the flow of data, i.e. starting from ENTITY1, through optional cell elements (if any) FILTERS, TRANSFORMATIONS, and TYPINGS, finishing with ENTITY2.

Basic patterns

Basic patterns are the simple building blocks of any IPSM-AF alignment and are often combined to form more advanced patterns. Anyone new to IPSM-AF should first familiarise themselves with basic patterns, before moving on.

Simple triple rewriting

VAR:X a ex:type1 .
VAR:X a ex:type2 .

This pattern is a very simple rewrite, where any RDF triple that matches the pattern is substituted with a (possibly) different one. Here we can see the introduction of the most basic element that is very common in IPSM_AF, namely a variable. Any named node prefixed with VAR (see the prefixes appendix) is treated by IPSM like a SPARQL variable, and is used to connect entity1 with entity2. Variables may be used in any place, in a triple (subject, predicate, or object), and may even change this "position" when moving between input and output patterns.

Using a variable is not a requirement, and any valid turtle code is a valid pattern. The cell in the example above represents a rewrite of rdf:type, namely any node of type ex:type1 is now annotated with ex:type2 instead.

Simple triple addition

VAR:X a ex:type1 .
VAR:X a ex:type1, ex:type2 .

It is important to remember that IPSM by default does two things: it ignores what it does not recognize (i.e. any cells that do not match with the data), and removes whatever is matched, by replacing it with the output pattern. Therefore, if there is a need to match some pattern, but also preserve it in the output, it must be present in the input pattern, and repeated in the output pattern.

This simple example adds ex:type2 to any node that is of type ex:type1. In contrast with the Simple triple rewriting pattern the information about ex:type1 type is not removed, but copied to the output.

Simple triple removal

VAR:X a ex:type1 .

A simple consequence of IPSMs "match-and-remove" behavior is that any triple from input pattern not repeated in the output pattern is removed. Note that only the explicitly declared triples are removed - in the example, those are any triples asserting the ex:type1 type.

In order to remove any triples associated with a variable, given that it satisfies some condition (like being of a given type), the following removal pattern may be used:

    VAR:X a ex:type1 ;
    VAR:Y VAR:Z .
    VAR:A VAR:X VAR:C .
    VAR:W VAR:V VAR:X .

Running this example will result in complete removal of any mention (i.e. any triple that uses it in subject, predicate, or object) of variable X, under the condition that X is of type ex:type1.

Note that none of the variables defined in entity1 are used in entity2. The formal requirement states that any variable in entity2 must be previously declared (i.e. either in entity1, or under transformations), but none of the variables need to be repeated in entity2.

Simple triple preservation

VAR:X a ex:type1 .
VAR:X a ex:type1 .

This trivial pattern on its own has no effect on the alignment. It is a simple repetition of a part, or the whole input pattern, in the output pattern. However, following the 'simple triple removal' pattern, it is apparent that if a triple needs to be matched as part of input pattern, but not removed from the output, it needs to be repeated.

VAR:X a ex:type1, ex:type2 .
VAR:X a ex:type1 .

The example above is a combination of the removal and preservation patterns, where ex:type2 annotation is removed only from those nodes, that are also of type ex:type1. At the same time, ex:type1 is preserved.

Simple functional rewrite

Ex:node ex:property VAR:X .
Ex:node ex:property VAR:Y .
TRANSFORMATIONS
<sripas:function sripas:about="[FUNCTION NAME]">
    <sripas:param sripas:order="1" sripas:about="&VAR;X"/>
    <sripas:param sripas:order="2" sripas:val="[some literal value]"/>
    <sripas:return sripas:about="&VAR;Y"/>
</sripas:function>
    

When transforming data matched with the input pattern, into one that conforms with the output pattern, functional transformations may be applied. The example above presents a transformation of variable X into variable Y via the application of a function. This hypothetical function takes two parameters (which need to be numbered with sripas:order), one of which is bound to variable X, and the other to some literal value, defined directly in this transformation. The result of the function is bound to variable Y, and used in the output pattern.

Because a lot of functions operate on, or output literals, they are usually applied to variables in the object position in a triple. However, any function that accepts or outputs URIs may be used on (or put out) variables bound to subjects or predicates.

Function composition

Ex:node ex:property VAR:X .
Ex:node ex:property VAR:Z .
TRANSFORMATIONS
<sripas:function sripas:about="F">
    <sripas:param sripas:order="1" sripas:about="&VAR;X"/>
    <sripas:return sripas:about="&VAR;Y"/>
</sripas:function>
<sripas:function sripas:about="G">
    <sripas:param sripas:order="1" sripas:about="&VAR;Y"/>
    <sripas:return sripas:about="&VAR;Z"/>
</sripas:function>
    

Functional transformations in IPSM-AF may be composed, i.e. output of one function may be passed to another, before being put out. This is particularly useful if a function required some preprocessing of its arguments e.g. it only accepts arguments of a given type.

The example presents a simple case, where the value of variable Z is equal to G(F(X)), or (Gāˆ˜F)(X). Variable Y is used as an intermediary output of F(X), and passed as input to G. As with simple functional rewrite, any number of additional parameters may be passed to either function, provided they accept them.

Datatype rewriting

Ex:node ex:dataProperty VAR:X .
Ex:node ex:dataProperty VAR:X .
FILTERS
      <sripas:filter sripas:about="&VAR;X" sripas:datatype="&xsd;int"/>
    
TYPINGS
      <sripas:typing sripas:about="&VAR;X" sripas:datatype="&xsd;float"/>
    

Datatypes for values of datatype properties can be put in the RDF patterns, just like qany other valid RDF code, e.g. "1.0"^^xsd:float. For datatypes of variables, however, there are two dedicated tags outside of RDF patterns - filters and typings. Filters are applied to the input pattern, and restrict the matched input to include the explicit datatype information. In the example above, only triples that have a literal value of type "xsd:int" as object of the given triple will be matched. For the output, any variable X will be explicitly tagged with the datatype "xsd:float", as specified in the typings. Since the example uses both filters and typings on the same variable, the effect will be a rewrite of datatype of variable X from "xsd:int" to "xsd:float" (assuming that the input will match the cell).

Multi-cell patterns

Simple multi-step rewriting

VAR:X ex:property ex:node .
    VAR:Y ex:property2 ex:node2 . 
VAR:X ex:property ex:other_node .
    VAR:Y ex:property ex:other_node .

Any IPSM-AF alignment is applied by IPSM according to the definition of steps. Which declare a sequence of (named) alignment cells that are to be used in order, exactly as many times, as declared. A very practical pattern that takes advantage of this, is the simple multi-step rewriting pattern.

In order to simplify the understanding of the alignment, any cell with triples that are independent of one another (i.e. use no common values or URIs) may be split into multiple cells. The cell presented above is one such example, that may be trivially split into two cells, that look like this:

VAR:X ex:property ex:node .
VAR:X ex:property ex:other_node .
VAR:Y ex:property2 ex:node2 .
VAR:Y ex:property ex:other_node .

Multi-step rewriting

VAR:X ex:property ex:node .
VAR:X ex:property ex:other_node .
VAR:Y ex:property2 ex:node2 .
VAR:Y ex:property ex:other_node .
VAR:Z ex:property ex:other_node .
VAR:Z ex:other_property ex:other_node .

Another consequence of the sequential application of cells, is that the order of cell application may matter. This is always the case, if output pattern of a node has commonalities (same URIs, or literal values) with an input pattern of another node.

In the example above, the two first cells are independent, and may be applied in any order. The result will always be a "reduction" of respective input patterns into 'ex:property ex:other_node' assertion for relevant variables.

The third node, when applied after that, will further change that assertion into a triple with 'ex:other_property' assertion.

Here, two things are important to understand. First, if cell 3 were to be applied before 1 and 2, it would not affect the results of these cells. Second is that, because of splitting this alignment pattern into multiple cells, cell number 3 may also match with any data present in the original input, even if it was not a product of application of cells 1 or 2.

Consider the following example:

VAR:X ex:property ex:node .
        VAR:Y ex:property2 ex:node2 .
        VAR:Z ex:property ex:other_node .
VAR:X ex:other_property ex:other_node .
            VAR:Y ex:other_property ex:other_node .
            VAR:Z ex:other_property ex:other_node .

The important difference between this single cell, and the multi-cell pattern, is that the single cell requires variables X and Y to exist, and the input pattern to match whole. Splitting it into multiple cells ensures that the alignment will work, even if any of the patterns with variables X, Y or Z are not in the input data.

Tagging

VAR:X ex:property ex:node .
VAR:X ex:property ex:node ;
            a tag:TAG .
VAR:Y ex:property2 ex:node2 .
VAR:Y ex:property2 ex:node2 ;
            a tag:TAG .
VAR:Z ex:property3 ex:node3 .
VAR:Z ex:property3 ex:node3 ;
            a tag:TAG .
VAR:TAG a tag:TAG .
VAR:TAG  rdfs:comment "tagged node" ;
            a ex:SpecialNode .

The tagging pattern is a simple instantiation of the multi-step rewriting, in which across many cells nodes are "tagged" with a type or property assertion. The actual tag (tag:TAG in the example above) can be any valid RDF construct, although simple tags, such as a simple type assertion are recommended. Additionally, tag should be something that does not appear in the input data, but instead is used only in the alignment, although there is no technical requirement for that. Again, any valid RDF code may be used as a tag.

In this pattern, the tag is replaced by a cell placed after all tagging cells with some data that is supposed to appear in the output. This data needs only to appear in the "concluding" tag node, as opposed to any tagging node, which would be the case, if this pattern was not applied.

Alternative

    VAR:X ex:property VAR:Y ;
          ex:property2 VAR:ALT .
    VAR:Y ex:property2 "false" .
    
    VAR:X tag:altProperty VAR:ALT .
    VAR:X ex:property VAR:Y .
    VAR:Y ex:property2 "false" .
    
    VAR:X tag:altProperty "Left" .
    
    VAR:X ex:propertyLeft "true" .
    
    VAR:X tag:altProperty "Right" .
    
    VAR:X ex:propertyRight "true" .
    

This pattern is a simple two-step pattern that allows for abstraction of a part of data, in order to avoid repetition of code in input pattern. The first step can is composed of at least one cell, in which a pattern is rewritten into an intermediate form, in a similar fashion to the tagging pattern. The second part of the pattern involves cells that match only against the tagged "alternative" pattern, and put the final value in the output - one "alternative" per cell. In fact, there may be more than two "alternatives" i.e. options. URIs used for tagging alternatives should follow the guidelines described in the tagging pattern, so that they don't clash with any that may appear in the input data.

The example above is a simple decision "fork" between "ex:propertyLeft" and "ex:propertyRight" decided, based on the value of variable ALT. The same effect could be achieved in two cells, however the input pattern from the first cell in this example would have to be repeated twice, like so:

    VAR:X ex:property VAR:Y ;
          ex:property2 "Left" .
    VAR:Y ex:property2 "false" .
    
    VAR:X ex:property VAR:Y ;
          ex:propertyLeft "true" .
    VAR:Y ex:property2 "false" .
    
    VAR:X ex:property VAR:Y ;
          ex:property2 "Right" .
    VAR:Y ex:property2 "false" .
    
    VAR:X ex:property VAR:Y ;
          ex:propertyRight "true" .
    VAR:Y ex:property2 "false" .
    

Notice that both input and output patterns are very similar, in other words, they have a lot of repetition. This can be solved by applying the alternatives pattern. The more complicated the input and output patterns, the more clarity this pattern brings, and the more repetition it prevents.

(How it Works)

Lets consider data models introduced in Recipe 4 - Mapping data models via alignment.

Sample cells in IPSM-AF may look as follows:

<align:Cell rdf:about="http://www.inter-iot.eu/sripas#1_basic_data">
  <align:entity1 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
    var:entity a fw:Entity ;
    fw:hasId var:id ;
    fw:hasName var:name ;
    fw:hasType "DeviceModel" .
  </align:entity1>
  <align:entity2 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
    var:entity a fw:Entity ;
    iiot:hasDescription var:name ;
    iiotex:hasLocalId var:id .
  </align:entity2>
  <relation>=</relation>
</align:Cell>

The cell 1_basic_data matches source RDF graph to ENTITY1 and transforms this part of the graph according to ENTITY2. Here, we match instance of fw:Entity that has three properties: fw:hasId, fw:hasName, fw:hasType. Value of the latter property is fixed, but values of two former properties are stored in variables to be reused in ENTITY2. This cell does not change the type of the instance, but translates properties between two data models.

Another cell example is as follows:

<align:Cell rdf:about="http://www.inter-iot.eu/sripas#5_type_sen">
  <align:entity1 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
    var:entity a fw:Entity ;
    fw:hasAttribute [
      a fw:Attribute ;
      fw:hasName "category" ;
      fw:hasValue  [
        a fw:Array ;
        fw:hasElement  [
          a fw:ArrayElement;
          fw:hasNumber  var:number ;
          fw:hasValue   "sensor"
        ]
      ]
    ] .
  </align:entity1>
  <align:entity2 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
    var:entity a sosa:Sensor .
  </align:entity2>
  <align:relation>=</align:relation>
</align:Cell>

In this case, in ENTITY1 we check value of category attribute that is nested according to a specified pattern. Based on this value we assign a type to var:entity. If the value is "sensor" then type is sosa:Sensor. All the nested structure will be removed from the output RDF graph.

The last example is the following cell:

<align:Cell rdf:about="http://www.inter-iot.eu/sripas#6_property_sen">
    <align:entity1 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
        var:CTX a sosa:Sensor ;
            fw:hasAttribute [ a fw:Attribute ;
                               fw:hasName "controlledProperty" ;
                               fw:hasValue [ a fw:Array ;
                                              fw:hasElement  [ a fw:ArrayElement ;
                                                               fw:hasNumber  var:nr ;
                                                               fw:hasValue   var:value
                                                             ]
                                            ]
                             ] .
    </align:entity1>
    <align:entity2 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
        var:CTX a sosa:Sensor ;
            sosa:observes  var:CTP .
    </align:entity2>
    <relation>=</relation>
    <sripas:transformation rdf:parseType="Literal">
        <function about="CONCAT">
            <param order="1" val="http://inter-iot.eu/GOIoTPex#"/>
            <param order="2" about="http://www.inter-iot.eu/sripas#node_value"/>
            <return about="http://www.inter-iot.eu/sripas#node_uri"/>
        </function>
        <function about="IRI">
            <param order="1" about="http://www.inter-iot.eu/sripas#node_uri"/>
            <return about="http://www.inter-iot.eu/sripas#node_CTP"/>
        </function>
    </sripas:transformation>
</align:Cell>

In cell 6_property_sen, we look for value of fw:Attribute instance with "controlledProperty" as a value of fw:hasName property. We match graph pattern to extract value of this attribute and store it in var:value variable. The attribute value is a text e.g. temperature, fillingLevel. We want to use this text to generate entity with URI http://inter-iot.eu/GOIoTPex#temperature, http://inter-iot.eu/GOIoTPex#fillingLevel respectively. We assume that such instances are defined in GOIoTPex ontology. To generate new entity to be added in the output RDF graph, SPARQL IRI function can be used. First, prefix should be concatenated with value of the attributed, and from the resulting text, new entity is generated with IRI as the text.

Wrapping things up

This recipe should help in preparing alignment between two data models in IPSM-AF format. The alignment should reflect correspondences identified between the two data models. Correspondences usually follow patterns listed in this recipe, that can be applied as templates when writing own alignment cells.