The DAAV building block purpose is to setup data manipulation pipeline to create new dataset by :

The project is divided in two modules :
Blocks are divided in three groups :
The first main objective of this building block is to reduce entry barriers for data providers and ai services clients by simplifying the handling of heterogeneous data and conversion to standard or service-specific formats.
The second main objective is to simplify and optimize data management and analysis in complex environments.
Create a connector from any input data model to any output model. Examples :
Create a aggregator from different data sources :
Receive pushed data from PDC.
Tools to describe and explore datasets.
Expert system and AI automatic data alignment.
BB must communicate with catalog API to retrieve contract.
BB must communicate with pdc to trigger data exchange with pdc.
BB MUST communicate to PDC to get data from contract and consent BB.
BB CAN receive data pushed by PDC.
BB CAN connect to others BB.
BB MUST expose endpoints to communicate with others BB.
BB SHOULD be able to process any type of data as inputs
Expected requests time :
| Type | Requests | 
|---|---|
| Simple request | < 100ms | 
| Medium request | < 3000ms | 
| Large requests | < 10000ms | 
No other building block interacting with this building block requires specific integration.
JSON - CSV - NoSQL (mongo, elasticsearch) - SQL - xAPI - Parquet - Archive (tar, zip, 7z, rar).
DSSC :
*IDS RAM* 4.3.3 Data as an Economic Good
 

Project interface handle all requires information for a DAAV project. The front-end can import and export json content who follows this structure.
The back-end can execute the workflow describe in this structure. Inside we have Data Connectors required to connect to a datasource.
A workflow is represented by nodes (Node) with input and output (NodePort) who can be connected. All nodes are divides in three group :

This may be for a complete run, or simply to test a single node in the chain to ensure hierarchical dependency between connected nodes.

This diagram describes the basic architecture of the front-end, whose purpose is to model a set of tools for the user, where he can build a processing chain.
This is based on Rete.js, a framework for creating processing-oriented node-based editors.
A workflow is a group of nodes connected by ports (input/output) each ports have a socket type who define the data format which can go through and so define connection rules between node.
Example of a node base editor with nodes and inside inputs and/or output and a colored indicator to visualize its status.

The back-end class, which reconstructs a representation of the defined workflow and executes the processing chain, taking dependencies into account.
For each node, we know its type, and therefore its associated processing, as well as its inputs and outputs, and the internal attributes defined by the user via the interface.
The sequence diagram shows how the component communicates with other components.
---
title: Sequence Diagram Example (Connector Data Exchange)
---
sequenceDiagram
    participant i1 as Input Data Block (Data Source)
    participant ddvcon as PDC
    participant con as Contract Service
    participant cons as Consent Service
    participant dpcon as Data Provider Connector
    participant dp as Participant (Data Provider)
    participant i2 as Transformer Block
    participant i3 as Merge Block
    participant i4 as Output Data Block
    participant enduser as End User
    i1 -) ddvcon:: Trigger consent-driven data exchange<br>BY USING CONSENT 
    ddvcon -) cons: Verify consent validity
    cons -) con: Verify contract signature & status
    con --) cons: Contract verified
    cons -) ddvcon: Consent verified
    ddvcon -) con: Verify contract & policies
    con --) ddvcon: Verified contract
    ddvcon -) dpcon: Data request + contract + consent
    dpcon -) dp: GET data
    dp --) dpcon: Data
    dpcon --) ddvcon: Data
    ddvcon --) i1: Data
    i1 -) i2: Provide data connection or data
    Note over i2 : setup of transformation
    i2 -) i3: Provide data
    Note over i3 : setup merge with another data source
    i3 -) i4: Provide data
    Note over i4 : new data is available
    enduser -) i4: Read file directly from local filesystem
    enduser -) i4: Read file through SFTP protocol
    enduser -) i4: Read data through REST API 
    enduser -) i4: Read data through database connector 
---
title: Node status - on create or update
---
stateDiagram-v2
   classDef Incomplete fill:yellow
   classDef Complete fill:orange
   classDef Valid fill:green
   classDef Error fill:red
    
    [*] --> Incomplete
    Incomplete --> Complete: parameters are valid
    state fork_state <<choice>>
    Complete --> fork_state : Backend execution
    fork_state --> Valid  
    fork_state --> Error
    #state fork_state2 <<choice>> 
    #Error --> fork_state2 : User modify connection/parameter
    #fork_state2 --> Complete 
    #fork_state2 --> Incomplete
    
    class Incomplete Incomplete
    class Complete Complete
    class Valid Valid
    class Error Error
Backend Node Execute : Node mother class function "execute" Each child class have its function "Process" with specific treatment.
Inside a workflow a recursive pattern propagate the execution following parents nodes.
---
title: Backend - Node class function "Execute"
---
stateDiagram-v2
   classDef Incomplete fill:yellow
   classDef Complete fill:orange
   classDef Valid fill:green
   classDef Error fill:red
    state EachInput {
        [*] --> ParentNodeStatus
        ParentNodeStatus --> ParentNodeValid
        ParentNodeStatus --> ParentNodeComplete
        ParentNodeStatus --> ParentNodeIncomplete
        ParentNodeIncomplete -->  [*]
        ParentNodeValid -->  [*]
        ParentNodeComplete --> ParentNodeStatus : Parent Node function "Execute"
        ParentNodeStatus --> ParentNodeError
        ParentNodeError --> [*]
    }
    [*] --> NodeStatus
    NodeStatus --> Complete 
   NodeStatus --> Incomplete
    Incomplete --> [*] : Abort
    Complete --> EachInput
    state if_state3 <<choice>>
    EachInput --> if_state3 : Aggregate Result   
    if_state3 --> SomeInputIncomplete
    if_state3 --> AllInputValid
    if_state3 --> SomeInputError    
    SomeInputIncomplete --> [*] : Abort
    AllInputValid --> ProcessNode: function "Process"   
    SomeInputError --> [*] : Abort
    ProcessNode --> Error
    ProcessNode --> Valid
        
    Valid --> [*] :Success
    Error --> [*] :Error
    
    class Incomplete Incomplete
    class SomeInputIncomplete Incomplete
    class ParentNodeIncomplete Incomplete
    class ParentNodeComplete Complete
    class Complete Complete
    class Valid Valid
    class AllInputValid Valid
    class ParentNodeValid Valid
    class Error Error
    class SomeInputError Error
    class ParentNodeError Error
Various configuration example :
**.env file :** 
MONGO_URI = ''
SQL_HOST = ''
LOG_FILE = "logs/log.txt"
...
**secrets folder (openssl password generated) :** 
- secrets/db_root_password 
- secrets/elasticsearch_admin
**angular environments :** 
- production: true / false
- version : X.X
**python fast api backend config.ini :**
[dev]
DEBUG=True/False
[edge]
master:True/False
cluster=014
What are the limits in terms of usage (e.g. number of requests, size of dataset, etc.)?
# work in progress
All nodes have a front-end implementation where the user can setup its configuration and a back-end implementation to execute the process.
All nodes inherit from the abstract class Nodeblock.
We identify shared functionalities like :
Inputs Node :
All input nodes inherit from InputDataBlock :
We identify shared functionalities like :
We need one class of input Node for each data-source format (Local file, MySQL, Mongo , API, ...)
For the front-end a factory deduces the correct input instance according to the given source connector sent in parameter.
Node instance exposes one output and its socket type define its format :
Connections rules between nodes
Each block input defines a socket, ie. what kind of data it accepts (float, string, array of string, etc…).
On the back-end counterpart class :
“Process” function implement the business logic. In this case, it retrieves data-source information and populate the output with data schemes and data content.
For huge data content we can use a parquet format to store intermediate result physically, likely as a selector shared attribute of InputDataBlock
Each input Node may have rules executed at the request level and a widget in the front-end to setup them.
Outputs Node :
All output nodes inherit from OutputDataBlock.
We need one class of output Node for each format handle by the BB.
Each output block specifies the inputs it can process.
The widgets’ block permits to setup the output and visualizes its status.
On the back-end counterpart class :
One method to execute the node and export a file or launch transactions to fill the destination.
Transform Node :
All nodes called upon to manipulate data inherit from the transformation node.
Transform node can dynamically add input or output based on data manipulated and use widgets to expose parameters for the user.
For complex cases, we can imagine a additional modal window to visualize a sample of the data and provide a convenient interface for describing the desired relationships between data.
Considerations on revision identifiers
The workflow representation can be likened to a tree representation, where each node can have a unique revision ID representing its contents and children.
With this we can have a convenient pattern to compare and identify modifications between two calls on back-end API.
If we use intermediate results with physical records such as parquet files, for example, we can avoid certain processing when a node retains the same revision ID between two calls, and reconstruct only those parts where the user has made changes.
By keeping track of the last revision of the root of a front-end and back-end call, it is also possible to detect desynchronization when several instances of a project are open and so, make the user aware of the risks of overwriting.
The back-end will expose a swagger OpenAPI description.
For all entity describes we will follow a REST API (Create / Update / Delete).
API output data blocks (GET)
And around the project all business action :
Front-end:
Unit test for all core class function :
Back-end :
Unit test for all endpoints and core class function.
Core class :
API endpoints :
Charge Test with K6 to evaluate API performance.
Front end : Selenium to test custom node usage deploy and interaction with custom parameter.
Manual tests to prove individual functionalities.
Objective: Ensure the DAAV backend service is running and responding correctly.
Precondition: DAAV backend service is deployed and accessible.
Steps:
Expected Result:
{
  "status": "healthy",
  "app_name": "DAAV Backend API",
  "environment": "production",
  "version": "2.0.0"
}
Result: Validated.
Objective: Ensure datasets can be created with different input types.
Precondition: DAAV interface is accessible and functional.
Steps:
Expected Result: Dataset is created and appears in the list.
Result: Validated.
Objective: Ensure a workflow can be created and saved.
Precondition: At least one dataset exists.
Steps:
Expected Result: Workflow is saved with a unique ID.
Result: Validated.
Objective: Ensure a workflow can be executed.
Precondition: A valid workflow exists.
Steps:
Expected Result: Workflow executes without error and produces expected result.
Result: Validated.
Enumerate all partner organizations and specify their roles in the development and planned operation of this component. Do this at a level which a)can be made public, b)supports the understanding of the concrete technical contributions (instead of "participates in development" specify the functionality, added value, etc.)
Profenpoche (BB leader):
Inokufu :
BME, cabrilog and Ikigaï are also partners available for beta-testing.
Specify the Dataspace Enalbing Service Chain in which the BB will be used. This assumes that during development the block (lead) follows the service chain, contributes to this detailed design and implements the block to meet the integration requirements of the chain.
The Daav building block can be used as a data transformer to build new datasets from local data or from prometheus dataspace.
The output can also be share on prometheus dataspace.
Example 1
 Example 2
Example 2
 Example 3
Example 3
 Example 4
Example 4
 Example 5
Example 5
 Example 6
Example 6
