Specifications and augmentations needed to enable a dataspace-driven data exchange ecosystem where data flows through multiple dataspace enabling services. The term used here to represent that currently is Data Processing Chain.
graph LR
A[Data Provider] --> B[Service 1]
B --> C[Service n]
C -.-> D[Data Consumer]
B --> D
A data processing service, as it acts as a data consumer and data provider, can be defined in a specific Service Offering that would be tagged as an “Infrastructure Service” and provide a Service Resource (for Input) & a Data Resource (for output). This is then represented as an Infrastructure Service-Tagged Service Offering.
An instance of a service needs to reference a Participant for the system to be aware of what participant is providing such service and which dataspace connector is going to be used by this data processing node whenever called by the data processing chain.
An infrastructure service:
The data processing chain represents the sequence of services the data should go through during the data transaction. It is the amount of nodes, represented as Infrastructure Service-Tagged Service Offerings.
Interaction with other building blocks is essential to ensuring the functionalities of the Data Processing Chain. It is crucial to integrate the data processing chain into the contract, as this is where the chain will be stored. Additionally, integration with the catalogue is necessary to enable the creation and selection of the chain by participants.
The execution of the data processing chain primarily occurs through the connector. This component plays a central role in the process, orchestrating the data flow between the different nodes in the chain. The connector initiates each processing step, transmits data from one node to another, and ensures communication with the Consumer Connector to notify the progress of the process.
N.B: To notify the consumer of the progress of the data transaction, each node should ping the Consumer PDC on where it is in the chain. This enables the consumer to know if there was an issue in the chain.
The main entry of a data processing chain is the raw data coming from the data provider. The chain will then transform the data and transmit it to the consumer. Additionally, it will accept an arbitrary JSON payload, whose structure and attributes must be defined by the infrastructure service provider, to pass specific configuration to an infrastructure service when necessary.
Here's a potential example where the node expects date format specifications for input and output:
{
"dateFormat": {
"inputFormat": "yyyy-MM-dd'T'HH:mm:ss'Z'",
"outputFormat": "dd/MM/yyyy",
"fields": [
"createdAt",
"updatedAt"
]
}
}
---
title: Data Processing Chain Architecture
---
graph TD
O[Orchestrator]
subgraph Contract
Chain[Processing Chain]
CO1[Connector Object 1]
COn[Connector Object n]
end
P([Provider Connector])
ConC([Consumer Connector])
O -->|Defines| Contract
Chain -->|Consists of| CO1
Chain -->|Consists of| COn
CO1 -->|Contains| a1[Connector Self-Description 1]
CO1 -->|Contains| a2[Infrastructure Service URI 1]
COn -->|Contains| a3[Connector Self-Description n]
COn -->|Contains| a4[Infrastructure Service URI n]
P -->|Sends data| Con1([Connector 1])
Con1 -->|Processes data| Con1
Con1 -.->|Reads| Contract
Con1 -->|Sends processed data| Conn([Connector n])
Conn -->|Processes data| Conn
Conn -.->|Reads| Contract
Conn -->|Sends final data| ConC
subgraph Connector 1
Con1
BB1Config[Building Block 1]
BBnConfig[Building Block n]
Con1 -->|Uses| BB1Config
Con1 -->|Uses| BBnConfig
BB1Config -->|Next| BBnConfig
BBnConfig -.->|Next could be| Con1
end
subgraph Connector n
Conn
BBnConfig2[Infrastructure Service n]
end
Con1 -.->|Notifies progress| ConC
Conn -.->|Notifies progress| ConC
classDef orchestratorClass fill:#009030,stroke:#004010,stroke-width:2px,color:#002000;
classDef contractClass fill:#A0A0C0,stroke:#606080,stroke-width:2px,color:#404060;
classDef bbClass fill:#e0a0a0,stroke:#a06060,stroke-width:2px,color:#804040;
classDef connectorClass fill:#8093f0,stroke:#4053B0,stroke-width:2px,color:#203390;
classDef attributeClass fill:#F0F0F0,stroke:#A0A0A0,stroke-width:1px,color:#606060;
classDef providerConsumerClass fill:#90EE90,stroke:#2E8B57,stroke-width:2px,color:#006400;
classDef subgraphClass fill:#FFE4B5,stroke:#DEB887,stroke-width:2px,color:#8B4513;
class O orchestratorClass;
class Contract,Chain,CO1,COn contractClass;
class BB1Config,BBnConfig,BBnConfig2 bbClass;
class Con1,Conn connectorClass;
class a1,a2,a3,a4 attributeClass;
class P,ConC providerConsumerClass;
class Connector1,Connectorn subgraphClass;
To implement the Data Processing Chain functionality, several components of the dataspace will need to be modified or extended. Here's an overview of the required developments:
Contract Building Block Improvement:
Service Offering Model Update:
Connector Modifications:
Catalogue Improvements:
Catalogue UI Updates:
These developments will enable the creation, management, and execution of data processing chains within the dataspace, improving its capabilities for data transformations and data flow control.
---
title: Processing Chain within the Connector
---
sequenceDiagram
participant CC as Consumer Connector
participant PC as Provider Connector
participant C as Contract
box rgba(100, 105, 130, 0.5) BB1 Connector
participant BB1C as Connector
participant BB1 as Infrastructure Service 1
end
box rgba(100, 105, 130, 0.5) BBn Connector
participant BBnC as Connector
participant BBn as Infrastructure Service n
end
CC->>C: Request data processing chain
C-->>CC: Return processing sequence
CC->>PC: Initiate data transaction
PC->>BB1C: Send raw data for processing
BB1C->>BB1: Process data
BB1-->>BB1C: Return processed data
BB1C->>CC: Notify progress
BB1C->>BBnC: Send data for next processing
BBnC->>BBn: Process data
BBn-->>BBnC: Return processed data
BBnC->>CC: Notify progress
BBnC-->>CC: Send final processed data
The Data Processing Chain reports logs on the successful execution of data processing through the chain. If a component in the chain takes too long or fails to respond, a warning is logged.
Additionally, as mentioned earlier the nodes accept JSON input to provide supplementary information on data transformations. This JSON can include specific parameters to configure the processing, such as target date formats, repositories to use, or other relevant metadata for the transformation.
The testing strategy will focus on ensuring the correctness, reliability, and performance of its functionalities. We will employ a combination of unit tests, integration tests, and possibly UI tests where relevant. The testing environment will replicate production-like conditions to validate the agent's behavior accurately. Acceptance criteria will be defined based on user stories, functional requirements, and performance benchmarks.
For unit testing, we will utilize the Mocha testing framework along with Chai for assertions. The test cases will cover individual components as contract profile management, catalog integration, communication interface, the data exchange proposals and the connector. Mocking frameworks like Sinon may be used to isolate dependencies and simulate external interactions.
We will use tools such as Postman for API testing to verify communication interfaces and data exchange protocols. Additionally, integration tests will ensure seamless integration within the connector. Continuous Integration (CI) pipelines will be set up to automate the execution of integration tests.
Testing frameworks like Cypress will be used to automate UI interactions and validate the usability and functionality of the user interface. UI tests will cover scenarios such as creating and updating a chain.