Data veracity assurance BB – Design Document – Prometheus-X Components & Services

Data veracity assurance BB – Design Document

[!TIP] When in doubt regarding the intended meaning of a certain term, refer to the Glossary.

The Data Veracity Assurance building block (DVA from now on) allows data exchange participants to agree on and later prove/verify quality requirements or properties of the exchanged data.

For example, if a data producer (abbreviated P from now on) provides simple sensor data to a data consumer (C from now on), DVA can facilitate P to prove (or at least claim) and C to verify that the provided data is credible (e.g., temperature values are within a certain range, say in the interval (-100 °C, +50 °C)).

DVA requires a veracity level agreement (VLA) between the exchange participants. This agreement is part of the contract and targets a specific data exchange unit (instance). The VLA defines a number of veracity objectives that each describe a data quality aspect (e.g., completeness or accuracy) and an evaluation scheme (e.g., value is within a numerical range). The VLA also defines how the evaluation is to be performed (e.g., with a certain algorithm or software library). When the data exchange occurs, in the simplest model, P attaches an attestation (or even a proof) regarding the exchanged data’s quality that C can verify and trust.

The high-level concepts of the DVA BB have been summarized in the knowledge graph below. The second graph visualizes a concrete example of using DVA in a use case where xAPI training data is exchanged.

---
title: High-Level Data Veracity Concepts (Knowledge Graph / Metamodel)
---

graph TD
  xchg(["Data\n Exchange"]):::External

  va(["Veracity\n Assurance"]):::Assurance
  aov(["Attestation\n of Veracity"]):::Assurance
  pov(["Proof\n of Veracity"]):::Assurance
  voe(["Veracity Objective Evaluation"]):::Assurance
  eval(["Evaluation"]):::Assurance

  vla(["Veracity\n Level\n Agreement"]):::Agreement
  vo(["Veracity\n Objective"]):::Agreement
  qa(["Quality\n Aspect"]):::Agreement
  es(["Evaluation\n Scheme"]):::Agreement
  crit(["Criterion\n Type"]):::Agreement
  method(["Evaluation\n Method"]):::Agreement

  syntax(["Syntax\n (ISO 8000)"]):::Aspect
  timeliness(["Timeliness\n (ISO 25000)"]):::Aspect
  accuracy(["Accuracy\n (ISO 25000)"]):::Aspect
  completeness(["Completeness\n (ISO 25000)"]):::Aspect
  consistency(["Consistency\n (ISO 25000)"]):::Aspect

  validinvalid(["Valid/\n Invalid"]):::Agreement
  inrange(["In\n Range"]):::Agreement
  greaterless(["Greater Than\n Less Than"]):::Agreement

  vla-- targets exchange -->xchg
  vla-- has objective -->vo
  vo-- targets aspect -->qa
  vo-- can be evaluated using -->es
  es-- has type -->crit
  es-- has method -->method

  syntax & timeliness & accuracy & completeness & consistency-- is a -->qa
  validinvalid & inrange & greaterless-- is a -->crit

  va-- for agreement -->vla
  aov & pov-- is a -->va

  va-- has evaluation -->voe
  voe-- targets objective -->vo
  voe-- has evaluation -->eval

  classDef Agreement fill:#fcdc00,stroke:#000,color:#000
  classDef Aspect fill:#fb4b00,stroke:#000,color:#000
  classDef External fill:#73d8ff,color:#000
  classDef Assurance fill:#a4dd00,stroke:#000,color:#000
  linkStyle default stroke-width:4px
---
title: Data Veracity Concepts Example (xAPI Learning Traces)
---

graph LR
  xchg(["xAPI Learning\n Traces Exchange"]):::External

  aov(["Attestation\n of Veracity"]):::Assurance
  voe_syn(["Syntax\n Evaluation"]):::Assurance
  voe_rec(["Recency\n Evaluation"]):::Assurance
  eval_syn(["Valid"]):::Assurance
  eval_rec(["3 Days\n Old"]):::Assurance

  vla(["xAPI Learning Trace\n Veracity Level Agreement"]):::Agreement
  vo_syn(["Valid\n Syntax"]):::Agreement
  vo_rec(["Recency"]):::Agreement
  qa_syn(["Syntax"]):::Aspect
  qa_rec(["Timeliness"]):::Aspect
  es_syn(["Syntax\n Checking"]):::Agreement
  es_rec(["Timeliness\n Checking"]):::Agreement
  crit_syn(["Valid/\n Invalid"]):::Agreement
  crit_rec(["Greater Than\nLess Than"]):::Agreement
  method_syn(["Syntax\n Checker"]):::Agreement
  method_rec(["Value\n Comparison"]):::Agreement

  vla-- targets exchange -->xchg

  vla-- has objective -->vo_syn & vo_rec
  
  vo_syn-- targets aspect -->qa_syn
  vo_rec-- targets aspect -->qa_rec
  vo_syn-- can be evaluated using -->es_syn
  vo_rec-- can be evaluated using -->es_rec

  es_syn-- has type -->crit_syn
  es_rec-- has type -->crit_rec
  es_syn-- has method -->method_syn
  es_rec-- has method -->method_rec

  aov-- for agreement -->vla

  aov-- has evaluation -->voe_syn & voe_rec
  voe_syn-- has evaluation -->eval_syn
  voe_rec-- has evaluation -->eval_rec
  voe_syn-- targets objective --->vo_syn
  voe_rec-- targets objective --->vo_rec

  classDef Agreement fill:#fcdc00,stroke:#000,color:#000
  classDef Aspect fill:#fb4b00,stroke:#000,color:#000
  classDef External fill:#73d8ff,color:#000
  classDef Assurance fill:#a4dd00,stroke:#000,color:#000
  linkStyle default stroke-width:4px

Technical Usage Scenarios & Features

Features/Main Functionalities

Key functionalities:

  1. Manage data veracity level agreements (VLAs)

  2. Provide means to…

    the veracity of exchanged data

  3. Log veracity verification results

Optional functionalities:

Technical Usage Scenarios

The technical usage scenarios have been summarized in the following UML use case diagram.

Use Case Diagram

Templates for Veracity Level Agreements (VLAs)

VLAs describe exactly what data quality P ‘promises’ and/or C expects. The format and exact contents of VLAs is further detailed later in this document.

While VLAs are struck and primarily managed by the Contract Manager, DVA supports the process by managing VLA templates. The dataspace orchestrator is authorized to select the set of templates whose usage is allowed in the dataspace.

DVA of course also provides the means for P to prove (or attest) and C to verify that the exchanged data fulfils the requirements set by the VLA; see below.

Proving, Attestation, and Verification of Veracity

We approach veracity compliance assurance as a challenge at the intersection of technology and trust.

There are chiefly two ways P can offer veracity assurance regarding the exchanged data:

  1. By presenting an Attestation of Veracity (AoV)
  2. By presenting a Proof of Veracity (PoV)

In some cases, VLAs do not need to be supported by an explicit AoV/PoV at all: the VLA serves as a kind of a ‘data contract’ where C takes on the responsibility of checking compliance on receiving data.

The primary deficiency of this ‘trust, but verify’ model is that C may not be willing, or even capable to (fully) check compliance with a VLA. Attestations of veracity provide a trust-based solution to establish compliance without consumer-side checking.

We distinguish two major categories of attestations:

  1. Third-party attestations of veracity follow the normal trust-based claim attestation pattern; the usual concerns of the third party being trustworthy by C certainly apply here. Verification of these attestations will typically not go further than establishing the validity of the claim as a valid and non-revoked verifiable crendetial (VC).
  2. We also allow for ‘self-attestation’ by P. Trust-wise, the additional assurance carried by self-attestations (note that a VLA is already a commitment by P) is that P is able to communicate partial or full results of the veracity evaluation performed by them in such attestations. In general, this can be valuable for ‘hard to compute, easy to verify’ evaluations (e.g., NP-complete decision problems on the data); but in practice, we expect this mechanism to increase confidence in C through showing compliance for a sample of the data.

Proofs of veracity (PoVs), on the other hand, establish compliance through cryptographic, and not trust-based approaches – when this is required and feasible. Such proofs are sound, meaning that a cheating P cannot forge a PoV for a piece of data that does not adhere to the VLA’s requirements. (Mathematically and succinctly) verifiable zero-knowledge as well as non-zero knowledge proofs on data have been an emerging field of mathematics in the last two decades, with increasingly rapid development in the last few years. However, as algorithms, standards, software frameworks, and use cases are still evolving, the DVA building block will provide a highly extensible framework for PoVs, driven by the use cases of the project.

DVA defines what proofs and attestations are (see later in this document and provides means to generate PoVs, AoVs, and to verify veracity.

Logging of Results

DVA also keeps track of veracity verification results for traceability and auditing purposes.

Requirements

---
title: DVA Requirements
---

requirementDiagram
  requirement BB_08__01 {
    id: BB_08__01
    text: "DVA MUST define schemata for VLAs"
    risk: medium
    verifymethod: demonstration
  }
  requirement BB_08__02 {
    id: BB_08__02
    text: "DVA MUST provide VLA templates"
    risk: medium
    verifymethod: demonstration
  }
  requirement BB_08__03 {
    id: BB_08__03
    text: "DVA SHOULD support editing available VLA templates"
    risk: low
    verifymethod: demonstration
  }
  functionalRequirement BB_08__04 {
    id: BB_08__04
    text: "DVA MUST support striking VLAs"
    risk: medium
    verifymethod: test
  }
  functionalRequirement BB_08__05 {
    id: BB_08__05
    text: "DVA MUST provide multiple veracity assurance methods"
    risk: low
    verifymethod: demonstration
  }
  functionalRequirement BB_08__06 {
    id: BB_08__06
    text: "DVA MUST support veracity attestation"
    risk: low
    verifymethod: demonstration
  }
  functionalRequirement BB_08__07 {
    id: BB_08__07
    text: "DVA SHOULD support veracity self-attestation"
    risk: low
    verifymethod: demonstration
  }
  functionalRequirement BB_08__08 {
    id: BB_08__08
    text: "DVA SHOULD support third-party veracity attestation"
    risk: low
    verifymethod: demonstration
  }
  functionalRequirement BB_08__09 {
    id: BB_08__09
    text: "DVA SHOULD support provider-proven veracity"
    risk: medium
    verifymethod: demonstration
  }
  functionalRequirement BB_08__10 {
    id: BB_08__10
    text: "DVA SHOULD support consumer-verified veracity"
    risk: medium
    verifymethod: demonstration
  }
  interfaceRequirement BB_08__11 {
    id: BB_08__11
    text: "DVA MUST interface with the Contract Manager service"
    risk: medium
    verifymethod: test
  }
  interfaceRequirement BB_08__12 {
    id: BB_08__12
    text: "DVA MUST interface with the Dataspace Connector"
    risk: medium
    verifymethod: test
  }
  functionalRequirement BB_08__13 {
    id: BB_08__13
    text: "DVA MUST log verification result"
    risk: medium
    verifymethod: test
  }

  BB_08__02 - refines -> BB_08__01
  BB_08__03 - refines -> BB_08__01
  BB_08__06 - refines -> BB_08__05
  BB_08__07 - refines -> BB_08__06
  BB_08__08 - refines -> BB_08__06
  BB_08__09 - refines -> BB_08__05
  BB_08__10 - refines -> BB_08__05

Integrations

Direct Integrations with Other BBs

No direct integrations.

Integrations via the Dataspace Connector

Relevant Standards

Data Format Standards

Other Standards

There are ISO standards that define data-quality-related concepts:

Other possibly relevant standards and specifications:

PoVs and AoVs are planned to be manifested as W3C verifiable credentials (VCs):

Mapping to Data Space Reference Architecture Models

DSSC: see the Value-Added Services building block.

IDS RAM: see 4.3.6 Data Quality in the Governance Perspective.

Input / Output Data

Data Veracity Level Agreements (VLAs)

[!NOTE] The precise language of VLAs is still being worked out. This should not be a concern to other components such as the Contract Manager at this point, as VLAs are expected to be embedded into the contracts. Take, for example, the Bilateral Contract example: the mockup VLA could be added to this contract under an additional vla key (with some minor modifications and after converting the YAML to JSON of course).

Initial mockup VLAs based on data contracts:

---
id: urn:vla:example:vrtraces
meta:
  title: VR Learning Traces VLA Example
  version: 0.1.0
  description: |
    A simple Veracity Level Agreement (VLA) example based on the
    VR Learning Traces building block.
  exchange: cdef77c9-4016-45bb-868d-6f014e17ed2d


models:
  trace:
    description: A VR learning trace
    type: xapi
    xapi_extensions:
      - http://example.com/exercises/b9e16535-4fc9-4c66-ac87-3ad7ce515f5c/sensors/score


objectives:

  - name: xapi_syntax
    description: Data is a valid xAPI JSON file
    aspect: syntax
    evaluation:
      method:
        id: syntax_check
        args:
          checker: xapi
      type: valid_invalid

  - name: 1w_freshness
    description: Learning trace is not too old
    aspect: timeliness
    evaluation:
      method:
        id: timestamp_comparison
        args:
          timestamp: xapi_timestamp
          within: 1w
      type: in_range

  - name: new_user
    description: No data has been supplied about this actor in the past
    aspect: uniqueness
    evaluation:
      method:
        id: uniqueness_check
        args:
          target: actor.id
      type: valid_invalid
---
id: urn:vla:example:moodle
meta:
  title: Moodle Learning Traces VLA Example
  version: 0.1.0
  description: |
    A simple Veracity Level Agreement (VLA) example for Moodle-like xAPI
    data
  exchange: bb54352d-3da4-4b6d-a4db-3639003f5f99


models:
  trace:
    description: xAPI trace
    type: xapi


objectives:

  - name: is_dases
    description: Trace is within the subset defined by Gaia-X DaSES
    aspect: schema
    evaluation:
      method:
        id: xapi_schema_dases
      type: valid_invalid

Attestations of Veracity (AoVs)

AoVs (and PoVs) will manifest as verifiable credentials. The information graph that summarizes the contents of these credentials can be seen below.

---
title: Information Graph of an AoV Verifiable Credential
---

graph TD
  vc(["(AoV) Credential Instance"]):::Main
  id[Credential ID #123456789]:::Optional
  type([Attestation of Veracity]):::Required
  validFrom[2025-01-12T12:31:33Z]:::Optional
  subj([Data Exchange Instance]):::Required
  issuer([Example Org]):::Required

  subjId[Data Exchange ID #ABCD1234]:::Optional
  subjContract[Contract ID #98765]:::Custom
  subjEval1[Evaluation of Objective ID #AAA]:::Custom
  subjEval2[Evaluation of Objective ID #AAB]:::Custom

  vc-- id -->id
  vc-- type -->type
  vc-- validFrom -->validFrom
  vc-- credentialSubject -->subj
  vc-- issuer -->issuer

  subj-- id -->subjId
  subj-- contractId -->subjContract
  subj-- evaluations -->subjEval1 & subjEval2

  classDef Main fill:#fff,stroke:#000,color:#000
  classDef Required fill:#0fa,stroke:#000,color:#000
  classDef Optional fill:#7d7,stroke:#000,color:#000
  classDef Custom fill:#ffa,stroke:#000,color:#000
  linkStyle default stroke-width:4px
---
title: Information Graph of a PoV Verifiable Credential
---

graph TD
  vc(["(PoV) Credential Instance"]):::Main
  id[Credential ID #123456789]:::Optional
  type([Proof of Veracity]):::Required
  validFrom[2025-01-12T12:31:33Z]:::Optional
  subj([Data Exchange Instance]):::Required
  issuer([Example Org]):::Required

  subjId[Data Exchange ID #ABCD1234]:::Optional
  subjContract[Contract ID #98765]:::Custom
  proof[Proof]:::Custom

  vc-- id -->id
  vc-- type -->type
  vc-- validFrom -->validFrom
  vc-- credentialSubject -->subj
  vc-- issuer -->issuer

  subj-- id -->subjId
  subj-- contractId -->subjContract
  subj-- proof -->proof

  classDef Main fill:#fff,stroke:#000,color:#000
  classDef Required fill:#0fa,stroke:#000,color:#000
  classDef Optional fill:#7d7,stroke:#000,color:#000
  classDef Custom fill:#ffa,stroke:#000,color:#000
  linkStyle default stroke-width:4px

For AoVs, specifying concrete evaluation results is optional. The important elements of an attestation are its issuer and the identifiers of the relevant data exchange (and contract).

For PoVs, the proof is a crucial element of the credential.

Architecture

High-Level Architecture

High Level Architecture

Internal Software Architecture

---
title: Data Veracity Assurance High-Level Architecture
---

graph LR
  apip>"fa:fa-plug\n Data Provider API"]:::API
  apic>"fa:fa-plug\n Data Consumer API"]:::API
  apio>"fa:fa-plug\n Orchestrator API"]:::API
  apim>"fa:fa-plug\n Contract API"]:::API
  att["fa:fa-stamp\n Attestation Component"]:::Component
  attloc["Local Attestation"]:::Misc
  attext["External Attestation"]:::Misc
  vla["fa:fa-file\n VLA Component"]:::Component
  prov["fa:fa-file-circle-check\n Proving Component"]:::Component
  verif["fa:fa-check-double\n Verification Component"]:::Component
  gen["Built-in Proof Generator"]:::Misc
  gen_ext["External Proof Generator"]:::Misc
  ver["Proof Verifier"]:::Misc

  apio -- manage templates -->vla
  apim -- get templates -->vla
  apip -- create AoV --> att
  apip -- create PoV --> prov
  apic -- verify AoV --> att
  apic -- verify PoV --> prov
  apic -- check data compliance --> verif
  att --> attloc & attext
  prov --> gen & gen_ext & ver

  classDef default color:#000
  classDef API fill:lightgreen
  classDef Controller fill:cyan
  classDef Component fill:orange
  classDef Misc fill:greeen

Dynamic Behaviour

The sequence diagrams below describe possible DVA additions to the basic Connector flows.

---
title: Data Exchange with Attestation or Proof of Veracity (AoV/PoV)
---

sequenceDiagram
    participant c as Consumer PDC

    box rgba(50, 100, 20, .5) Data Provider
      participant p as Provider PDC
      participant dva as Provider DVA
    end
    
    participant ctr as Contract Manager

    box rgba(150, 50, 50, .5) 3rd Party DVA Organization
      participant pdc3 as PDC X 
      participant dva3 as 3rd Party DVA
    end
    
    box rgba(100, 100, 130, .5) Organizagion/Individual A
      participant pdca as PDC A
      participant dvaa as DVA A
      participant svca as Service A
    end
    
    box rgba(100, 100, 130, .5) Organizagion/Individual B
      participant pdcb as PDC B
      participant dvab as DVA B
      participant svcb as Service B
    end

    c ->> ctr : Request data processing chain
    ctr --) c: Return processing sequence

    c -) p: Initiate data transaction

    alt self-attestation or self-generated proof
        c ->> dva: Create self-AoV or Generate PoV
        dva --) c: Return AoV/PoV
    else third-party attestation or proving
        c ->> pdc3: Request AoV/PoV
        pdc3 ->> dva3: Create AoV/PoV
        dva3 --) pdc3: Return AoV/PoV
        pdc3 --) c: Return AoV/PoV
    end

    p -) pdca: Send raw data (+ AoV/PoV) for processing
    pdca -) dvaa: Verify AoV/PoV
    pdca ->> svca: Process data
    svca --) pdca: Return processed data
    pdca --) c: Notify progress

    pdca -) pdcb: Send data for next processing
    pdcb -) dvab: Verify AoV/PoV
    pdcb ->> svcb: Process data
    svcb --) pdcb: Return processed data
    pdcb -) c: Notify progress

    pdcb --) c: Send final processed data

Configuration & Deployment Settings

The data space orchstrator may configure some basic aspects of DVA, such as…

Error Scenarios

The main potential error scenarios of DVA are caused by not being able to access the data for which AoVs or PoVs should be generated or verified and by possible limitations in resources required to generate AoVs and PoVs.

Unavailable Data

To be able to generate an AoV or a PoV, DVA needs access to the data under assessment. Incomplete or corrupted data may also not be possible to properly analyze. AoVs and PoVs must ‘prove’ that they have been created based on the right data to be valid and reliable (this can be most simply accomplished by ‘committing’ them to a checksum).

Access to the data may be necessary not only for generation by P but also for verification by C. This is only relevant in the case of PoVs, which are verifiable proofs that a given piece of data fulfils the VLA – the proof can only be checked if the original data is available.

Resource Limitations

Some DVA operations may require surprisingly high computational power. This is especially true for PoVs, which are inherently more complex than AoVs.

Furthermore, DVA will not be prepared to handle extreme workloads and will likely start thrashing above a certain limit of request frequency.

Third-Party Components & Licenses

For verifiable-credentials-related operations, DVA will rely on:

For performing veracity checks, DVA will use:

Other potential, less important libraries planned to be used by the implementation:

Implementation Details

The core functionality of DVA will be implemented over the JVM in Java/Kotlin. Some verifiable-credential-related functionality will be implemented in TypeScript.

OpenAPI Specification

The current specification can be found in spec/openapi.yaml.

Test Specification

Test Plan

The primary objective of testing will be to validate the correct handling of exchanged data compliant and non-compliant with the quality aspects established in the VLA. Several data examples (including correct and incorrect samples) will be used for these tests. Various data quality aspects will be targeted and case studies will be conducted using different data types used in the main project use cases, like VR traces (xAPI), Moodle learning traces (xAPI), and skills (ontology/terminology).

The integration with the Dataspace Connector component will be tested thoroughy to verify that the necessary interactions are indeed possible and that error cases are handled properly (e.g., when no data is received during a data exchange or data is received but without a PoV/AoV even though it would be required).

While DVA will not directly integrate with the Contract Manager component, it should be tested that DVA can recognize VLA fragments defined in the contracts and that it is possible to extend existing contracts with VLA fragments. In the end, this functionality will be provided by (or at least via) the Catalogue, not this BB.

Furthermore, interactions with other components, such as the Data Value Chain Tracker (DVCT) will be validated through testing, as these potentially involve new interactions, protocols, and interfaces.

The DVA BB test acceptance critieria are, informally, and without striving for completeness:

Partners & Roles

BME (the BB leader) shall design and implement DVA.

Usage in the Data Space

DVA may be involved in various service chains and use cases. So far, the following usages have been identified.

Learning Records BB: Sharing LMS/Moodle Data for Visualization

Sharing LMS/Moodle Data for Visualization

Skill Scenarios

Skills Scenario: Single source data flow goes through PDC communication from Org A to Org B

Abbreviations Used

Abbreviation Expansion
DVA data veracity assurance building block
VLA veracity level agreement
P data producer
C data consumer
PoV proof of veracity
AoV attestation of veracity

Glossary

attestation
the issue of a statement, based on a decision, that fulfillment of specified requirements has been demonstrated (ISO/IEC 17000:2020)
In DVA, attestations of veracity (AoVs) are statements made by either P or a third party rearding the fulfilment of the VLA. While PoVs are meant to be verified, AoVs are based on trust.
attestation of veracity
refer to Proving, Attestation, and Verification of Veracity
data consumer
a transaction participant to whom data is, or is to be technically supplied by a data provider in the context of a specific data transaction (DSSC Glossary v2.0 2023-09 2. Core Concepts: Data Recipient)
data producer
a transaction participant that, in the context of a specific data transaction, technically provides data to the data recipients that have a right or duty to access and/or receive that data (DSSC Glossary v2.0 2023-09 2. Core Concepts: Data Provider)
data veracity
completeness and/or accuracy of data (ISO/IEC 20546:2019 3.1.16)
orchestrator
A data space participant that represents and is accountable for a specific use case in the context of the governance framework. The orchestrator establishes and enforces business rules and other conditions to be followed by the use case participants. (DSSC Glossary v2.0 2023-09 3. Data space use cases and business model: Use case orchestrator)
proof
a fact or piece of information that shows that something exists or is true (Cambridge Dictionary)
In DVA, proofs of veracity (PoVs) are special data that demonstrate the fulfilment of the VLA and can be verified by C. While AoVs require trust, PoVs can be directly verified.
proof of veracity
refer to Proving, Attestation, and Verification of Veracity
template
templates are possible elements to use in VLAs
verifiable credential
a verifiable credential is a tamper-evident credential that has authorship that can be cryptographically verified (W3C Verifiable Credentials Data Model 2.0)
credential: A set of one or more claims made by an issuer. The claims in a credential can be about different subjects. The definition of credential used in this specification differs from, NIST's definitions of credential.
claim: An assertion made about a subject.
veracity level agreement
an agreement regarding the data veracity requirements of a specific data exchange
veracity objective
a single requirement in a veracity level agreement