The DAAV building block purpose is to setup data manipulation pipeline to create new dataset by :

The project is divided in two modules :
Blocks are divided in three groups :
The first main objective of this building block is to reduce entry barriers for data providers and ai services clients by simplifying the handling of heterogeneous data and conversion to standard or service-specific formats.
The second main objective is to simplify and optimize data management and analysis in complex environments.
Create a connector from any input data model to any output model. Examples :
Create a aggregator from different data sources :
Receive pushed data from PDC.
Tools to describe and explore datasets.
Expert system and AI automatic data alignment.
BB must communicate with catalog API to retrieve contract.
BB must communicate with pdc to trigger data exchange with pdc.
BB MUST communicate to PDC to get data from contract and consent BB.
BB CAN receive data pushed by PDC.
BB CAN connect to others BB.
BB MUST expose endpoints to communicate with others BB.
BB SHOULD be able to process any type of data as inputs
Expected requests time :
| Type | Requests |
|---|---|
| Simple request | < 100ms |
| Medium request | < 3000ms |
| Large requests | < 10000ms |
No other building block interacting with this building block requires specific integration.
JSON - CSV - NoSQL (mongo, elasticsearch) - SQL - xAPI - Parquet - Archive (tar, zip, 7z, rar).
DSSC :
*IDS RAM* 4.3.3 Data as an Economic Good

Project interface handle all requires information for a DAAV project. The front-end can import and export json content who follows this structure.
The back-end can execute the workflow describe in this structure. Inside we have Data Connectors required to connect to a datasource.
A workflow is represented by nodes (Node) with input and output (NodePort) who can be connected. All nodes are divides in three group :

This may be for a complete run, or simply to test a single node in the chain to ensure hierarchical dependency between connected nodes.

This diagram describes the basic architecture of the front-end, whose purpose is to model a set of tools for the user, where he can build a processing chain.
This is based on Rete.js, a framework for creating processing-oriented node-based editors.
A workflow is a group of nodes connected by ports (input/output) each ports have a socket type who define the data format which can go through and so define connection rules between node.
Example of a node base editor with nodes and inside inputs and/or output and a colored indicator to visualize its status.

The back-end class, which reconstructs a representation of the defined workflow and executes the processing chain, taking dependencies into account.
For each node, we know its type, and therefore its associated processing, as well as its inputs and outputs, and the internal attributes defined by the user via the interface.
The sequence diagram shows how the component communicates with other components.
---
title: Sequence Diagram Example (Connector Data Exchange)
---
sequenceDiagram
participant i1 as Input Data Block (Data Source)
participant ddvcon as PDC
participant con as Contract Service
participant cons as Consent Service
participant dpcon as Data Provider Connector
participant dp as Participant (Data Provider)
participant i2 as Transformer Block
participant i3 as Merge Block
participant i4 as Output Data Block
participant enduser as End User
i1 -) ddvcon:: Trigger consent-driven data exchange<br>BY USING CONSENT
ddvcon -) cons: Verify consent validity
cons -) con: Verify contract signature & status
con --) cons: Contract verified
cons -) ddvcon: Consent verified
ddvcon -) con: Verify contract & policies
con --) ddvcon: Verified contract
ddvcon -) dpcon: Data request + contract + consent
dpcon -) dp: GET data
dp --) dpcon: Data
dpcon --) ddvcon: Data
ddvcon --) i1: Data
i1 -) i2: Provide data connection or data
Note over i2 : setup of transformation
i2 -) i3: Provide data
Note over i3 : setup merge with another data source
i3 -) i4: Provide data
Note over i4 : new data is available
enduser -) i4: Read file directly from local filesystem
enduser -) i4: Read file through SFTP protocol
enduser -) i4: Read data through REST API
enduser -) i4: Read data through database connector
---
title: Node status - on create or update
---
stateDiagram-v2
classDef Incomplete fill:yellow
classDef Complete fill:orange
classDef Valid fill:green
classDef Error fill:red
[*] --> Incomplete
Incomplete --> Complete: parameters are valid
state fork_state <<choice>>
Complete --> fork_state : Backend execution
fork_state --> Valid
fork_state --> Error
#state fork_state2 <<choice>>
#Error --> fork_state2 : User modify connection/parameter
#fork_state2 --> Complete
#fork_state2 --> Incomplete
class Incomplete Incomplete
class Complete Complete
class Valid Valid
class Error Error
Backend Node Execute : Node mother class function "execute" Each child class have its function "Process" with specific treatment.
Inside a workflow a recursive pattern propagate the execution following parents nodes.
---
title: Backend - Node class function "Execute"
---
stateDiagram-v2
classDef Incomplete fill:yellow
classDef Complete fill:orange
classDef Valid fill:green
classDef Error fill:red
state EachInput {
[*] --> ParentNodeStatus
ParentNodeStatus --> ParentNodeValid
ParentNodeStatus --> ParentNodeComplete
ParentNodeStatus --> ParentNodeIncomplete
ParentNodeIncomplete --> [*]
ParentNodeValid --> [*]
ParentNodeComplete --> ParentNodeStatus : Parent Node function "Execute"
ParentNodeStatus --> ParentNodeError
ParentNodeError --> [*]
}
[*] --> NodeStatus
NodeStatus --> Complete
NodeStatus --> Incomplete
Incomplete --> [*] : Abort
Complete --> EachInput
state if_state3 <<choice>>
EachInput --> if_state3 : Aggregate Result
if_state3 --> SomeInputIncomplete
if_state3 --> AllInputValid
if_state3 --> SomeInputError
SomeInputIncomplete --> [*] : Abort
AllInputValid --> ProcessNode: function "Process"
SomeInputError --> [*] : Abort
ProcessNode --> Error
ProcessNode --> Valid
Valid --> [*] :Success
Error --> [*] :Error
class Incomplete Incomplete
class SomeInputIncomplete Incomplete
class ParentNodeIncomplete Incomplete
class ParentNodeComplete Complete
class Complete Complete
class Valid Valid
class AllInputValid Valid
class ParentNodeValid Valid
class Error Error
class SomeInputError Error
class ParentNodeError Error
Various configuration example :
**.env file :**
MONGO_URI = ''
SQL_HOST = ''
LOG_FILE = "logs/log.txt"
...
**secrets folder (openssl password generated) :**
- secrets/db_root_password
- secrets/elasticsearch_admin
**angular environments :**
- production: true / false
- version : X.X
**python fast api backend config.ini :**
[dev]
DEBUG=True/False
[edge]
master:True/False
cluster=014
What are the limits in terms of usage (e.g. number of requests, size of dataset, etc.)?
# work in progress
All nodes have a front-end implementation where the user can setup its configuration and a back-end implementation to execute the process.
All nodes inherit from the abstract class Nodeblock.
We identify shared functionalities like :
Inputs Node :
All input nodes inherit from InputDataBlock :
We identify shared functionalities like :
We need one class of input Node for each data-source format (Local file, MySQL, Mongo , API, ...)
For the front-end a factory deduces the correct input instance according to the given source connector sent in parameter.
Node instance exposes one output and its socket type define its format :
Connections rules between nodes
Each block input defines a socket, ie. what kind of data it accepts (float, string, array of string, etc…).
On the back-end counterpart class :
“Process” function implement the business logic. In this case, it retrieves data-source information and populate the output with data schemes and data content.
For huge data content we can use a parquet format to store intermediate result physically, likely as a selector shared attribute of InputDataBlock
Each input Node may have rules executed at the request level and a widget in the front-end to setup them.
Outputs Node :
All output nodes inherit from OutputDataBlock.
We need one class of output Node for each format handle by the BB.
Each output block specifies the inputs it can process.
The widgets’ block permits to setup the output and visualizes its status.
On the back-end counterpart class :
One method to execute the node and export a file or launch transactions to fill the destination.
Transform Node :
All nodes called upon to manipulate data inherit from the transformation node.
Transform node can dynamically add input or output based on data manipulated and use widgets to expose parameters for the user.
For complex cases, we can imagine a additional modal window to visualize a sample of the data and provide a convenient interface for describing the desired relationships between data.
Considerations on revision identifiers
The workflow representation can be likened to a tree representation, where each node can have a unique revision ID representing its contents and children.
With this we can have a convenient pattern to compare and identify modifications between two calls on back-end API.
If we use intermediate results with physical records such as parquet files, for example, we can avoid certain processing when a node retains the same revision ID between two calls, and reconstruct only those parts where the user has made changes.
By keeping track of the last revision of the root of a front-end and back-end call, it is also possible to detect desynchronization when several instances of a project are open and so, make the user aware of the risks of overwriting.
The back-end will expose a swagger OpenAPI description.
For all entity describes we will follow a REST API (Create / Update / Delete).
API output data blocks (GET)
And around the project all business action :
The DAAV testing strategy ensures that:
To reproduce (manual) tests, you need to deploy DAAV using Docker or local installation.
Before testing DAAV, ensure you have:
git clone https://github.com/Prometheus-X-association/daav.git
cd daav
docker-compose up -d
Wait for services to be ready (30-60 seconds)
Access the application:
adminAdmin123!The default configuration should work out-of-the-box. For customization:
backendApi/.env.examplefrontendApp/src/environments/For detailed deployment instructions, see README.md and DOCKER_DEPLOYMENT.md.
The DAAV testing strategy will focus on ensuring the accuracy, reliability, and performance of its functionality. We will use a combination of unit testing, integration testing, component-level testing, and user interface testing. The test environment will reproduce conditions similar to those in production in order to accurately validate BB behavior. Acceptance criteria will be defined based on user stories, functional requirements, and performance criteria.
Summary of tests:
Tests to validate functional requirements and mitigate potential risks identified in the design phase.
| Test ID | Description | Prerequisites | Test | Status | Test Case |
|---|---|---|---|---|---|
| System Health & Infrastructure | |||||
| SYS-001 | Backend API health check | DAAV backend deployed and accessible | GET /health - Verify HTTP 200, JSON response with status, app_name, version |
✅ Manual test | Test Case 0 |
| SYS-002 | API documentation accessibility | Backend running | Navigate to /docs - Verify Swagger UI loads with all endpoints |
✅ Manual test | Test Case 1 |
| Authentication & User Management | |||||
| AUTH-001 | User registration with valid credentials | Application is running. No existing user with same username/email | Username: testuser, Email: testuser@example.com, Password: TestPass123!, Full Name: Test User - Verify account created, can login |
✅ Unit tests | test_auth_service.py, test_user_service.py |
| AUTH-002 | User login with valid credentials | User account exists (username admin, password Admin123!) |
Login with valid credentials - Verify access_token and refresh_token returned | ✅ Both (manual + unit) | Test Case 2, test_auth_service.py |
| AUTH-003 | User login with invalid credentials | Application is running | Login with username admin, password wrongpassword - Verify HTTP 401, error message |
✅ Unit tests | test_auth_service.py |
| AUTH-004 | Password reset flow | User exists with email, email service configured | Request password reset for testuser@example.com - Verify reset token generated |
⚠️ Partial (unit only) | test_auth_service.py, test_email_service.py |
| AUTH-005 | Change password for authenticated user | User is logged in with valid access token | Change from Admin123! to NewAdmin123! - Verify new password works, old doesn't |
✅ Unit tests | test_auth_service.py |
| AUTH-006 | Token refresh using valid refresh token | User has valid refresh token | Use refresh token - Verify new access and refresh tokens generated | ✅ Unit tests | test_auth_service.py |
| AUTH-007 | Access protected resource without authentication | Application is running | Request to /auth/me without Authorization header - Verify HTTP 401 |
✅ Unit tests | test_auth_service.py |
| AUTH-008 | Admin creates new user | Admin user is logged in | Create user: username newuser, email newuser@example.com, role user - Verify user created |
✅ Unit tests | test_user_service.py |
| AUTH-009 | Admin deactivates user account | Admin logged in, target user exists and is active | Deactivate user by ID - Verify user cannot login, data preserved | ✅ Unit tests | test_user_service.py |
| AUTH-010 | Admin activates deactivated user | Admin logged in, target user exists and is deactivated | Activate user by ID - Verify user can login again | ✅ Unit tests | test_user_service.py |
| AUTH-011 | Non-admin attempts admin operation | Regular user is logged in (not admin) | Attempt to create user as non-admin - Verify HTTP 403, operation denied | ✅ Unit tests | test_auth_service.py |
| Dataset Management | |||||
| DATA-001 | Create file dataset (CSV/JSON) | User authenticated, file exists | Name: test-dataset, Type: file, File: sample.csv or data.json - Verify dataset created, appears in list, structure preserved |
✅ Both (manual + unit) | Test Case 3, test_dataset_service.py |
| DATA-003 | Create file dataset (Parquet) | User authenticated, Parquet file exists | Name: parquet-dataset, Type: file, File: data.parquet - Verify schema readable |
✅ Unit tests | test_dataset_service.py |
| DATA-004 | Create MySQL dataset connection | User authenticated, MySQL database accessible | Name: mysql-dataset, Host: db.example.com, Port: 3306, Database: testdb, Table: users - Verify connection created, credentials encrypted |
✅ Unit tests | test_dataset_service.py |
| DATA-005 | Create MongoDB dataset connection | User authenticated, MongoDB accessible | Name: mongo-dataset, URI: mongodb://localhost:27017, Database: testdb, Collection: users - Verify connection successful |
✅ Unit tests | test_dataset_service.py |
| DATA-006 | Create API dataset connection | User authenticated, External API accessible | Name: api-dataset, URL: https://api.example.com/data, Method: GET, Headers with auth - Verify API callable |
✅ Unit tests | test_dataset_service.py |
| DATA-007 | Create Elasticsearch dataset | User authenticated, Elasticsearch accessible | Name: elastic-dataset, URL: http://localhost:9200, Index: test-index - Verify index queryable |
✅ Unit tests | test_dataset_service.py |
| DATA-008 | Create PTX dataset connection | User authenticated, PDC accessible | Name: ptx-dataset, Type: ptx, PDC URL, Bearer token - Verify catalog retrievable |
✅ Unit tests | test_dataset_service.py, test_pdc_service.py |
| DATA-009 | Upload file through API | User authenticated, file size within limits | Upload upload.csv (< 100MB), Folder: datasets - Verify file uploaded, path returned, user isolated |
✅ Unit tests | test_dataset_service.py |
| DATA-010 | Upload file exceeding size limit | User authenticated | Upload file > 100MB - Verify HTTP 413, appropriate error message | ✅ Unit tests | test_dataset_service.py |
| DATA-011 | Retrieve dataset content | User authenticated, dataset exists, user has access | Dataset ID, Pagination: page 1, limit 100 - Verify content returned, pagination works | ✅ Both (manual + unit) | Test Case 4, test_dataset_service.py |
| DATA-012 | Edit dataset configuration | User authenticated, user owns dataset | Dataset ID, New name: updated-dataset - Verify changes persisted |
✅ Unit tests | test_dataset_service.py |
| DATA-013 | Delete dataset | User authenticated, user owns dataset | Dataset ID - Verify deleted, no longer in list, files cleaned up | ✅ Unit tests | test_dataset_service.py |
| DATA-014 | Access dataset without permission | User authenticated, dataset belongs to another user | Attempt to access other user's dataset - Verify HTTP 403, access denied | ✅ Unit tests | test_dataset_service.py |
| DATA-015 | Share dataset with another user | User owns dataset, target user exists | Dataset ID, Target User ID, Permission: read - Verify share successful, target can access |
✅ Unit tests | test_dataset_service.py |
| DATA-016 | Unshare dataset | Dataset is shared, user is owner | Dataset ID, Target User ID - Verify share removed, target loses access | ✅ Unit tests | test_dataset_service.py |
| Workflow Creation & Management | |||||
| WF-001 | Create empty workflow | User authenticated | Name: test-workflow, Revision: 1.0, Schema: empty - Verify workflow ID returned, appears in list |
✅ Both (manual + unit) | Test Case 6, test_workflow_service.py |
| WF-002 | Create workflow with single input node | User authenticated, dataset exists | Name: simple-workflow, Nodes: DataFileBlock, Dataset ID - Verify node configured correctly |
✅ Both (manual + unit) | Test Case 5, 6, test_workflow_service.py |
| WF-003 | Create workflow with input-transform-output chain | User authenticated, dataset exists | Nodes: DataFileBlock → FilterTransform → FileOutput - Verify all connected, executable | ✅ Unit tests | test_workflow_service.py |
| WF-004 | Save workflow with complex transformations | User authenticated, multiple datasets exist | Workflow with 10+ nodes (Filter, Flatten, Merge, Join) - Verify all configs preserved | ✅ Unit tests | test_workflow_service.py |
| WF-005 | Update existing workflow | User authenticated, workflow exists, user owns it | Workflow ID, updated schema with new nodes - Verify updated, revision tracked | ✅ Unit tests | test_workflow_service.py |
| WF-006 | Delete workflow | User authenticated, user owns workflow | Workflow ID - Verify deleted, no longer in list, outputs cleaned | ✅ Unit tests | test_workflow_service.py |
| WF-007 | Retrieve workflow by ID | User authenticated, user has access | Workflow ID - Verify data returned, schema complete | ✅ Unit tests | test_workflow_service.py |
| WF-008 | List all user workflows | User authenticated, user has workflows | User access token - Verify only accessible workflows shown | ✅ Unit tests | test_workflow_service.py |
| WF-009 | Export workflow as JSON | Workflow exists, user has access | Workflow ID - Verify complete JSON returned, all nodes and connections | ✅ Unit tests | test_workflow_service.py |
| WF-010 | Import workflow from JSON | User authenticated, valid JSON provided | Valid workflow JSON - Verify imported, new ID assigned, nodes restored | ✅ Unit tests | test_workflow_service.py |
| Workflow Execution | |||||
| EXEC-001 | Execute simple workflow (CSV → File) | Valid workflow with CSV input and file output | Workflow ID - Verify executes, output file created, transformation correct | ✅ Both (manual + unit) | Test Case 7, test_workflow.py, test_node.py |
| EXEC-002 | Execute workflow with filter transformation | Workflow with filter node, input data available | Workflow ID, Filter: age > 18 - Verify only filtered records in output |
✅ Unit tests | test_filter_transform.py |
| EXEC-003 | Execute workflow with flatten transformation | Workflow with nested JSON input, flatten node | Workflow ID, nested JSON data - Verify nested structures flattened | ✅ Unit tests | test_flatten_transform.py |
| EXEC-004 | Execute workflow with merge transformation | Workflow merges two datasets, both available | Workflow ID, Dataset A + B, Merge key: id - Verify merged correctly |
✅ Unit tests | test_merge_transform.py |
| EXEC-005 | Execute workflow with join transformation | Workflow joins two datasets, join configured | Workflow ID, Left join on user_id - Verify join correct, null handling ok |
✅ Unit tests | test_workflow.py |
| EXEC-006 | Execute workflow with multiple transformations | Complex workflow with 5+ transform nodes | Workflow ID - Verify all execute in order, data flows correctly | ✅ Unit tests | test_workflow.py, test_execution_context.py |
| EXEC-007 | Execute workflow with MySQL output | Workflow outputs to MySQL table, DB accessible | Workflow ID, Target table: results - Verify data written, table created if needed |
✅ Unit tests | test_node.py |
| EXEC-008 | Execute workflow with MongoDB output | Workflow outputs to MongoDB, DB accessible | Workflow ID, Collection: results - Verify data written, documents structured |
✅ Unit tests | test_node.py |
| EXEC-009 | Execute workflow with API output | Workflow configured with custom API endpoint | Workflow ID, API URL: /api/custom-path - Verify endpoint exposed, returns results |
✅ Unit tests | test_workflow.py |
| EXEC-010 | Execute incomplete workflow | Workflow has disconnected nodes, incomplete config | Incomplete workflow ID - Verify fails gracefully, error indicates missing connections | ✅ Unit tests | test_workflow.py |
| EXEC-011 | Execute workflow with invalid data | Workflow expects specific schema, input doesn't match | Workflow ID, invalid input - Verify validation error, schema mismatch indicated | ✅ Unit tests | test_node.py |
| EXEC-012 | Execute workflow recursively (dependencies) | Workflow with dependent nodes, parents incomplete | Execute specific node - Verify parents execute first, dependencies resolved | ✅ Unit tests | test_workflow.py, test_execution_context.py |
| API Endpoints & Data Exposure | |||||
| API-001 | Access custom API endpoint | Workflow with ApiOutput, custom path configured | GET /api/custom-path - Verify data returned, workflow executed if needed |
✅ Unit tests | test_workflow.py |
| API-002 | Access API endpoint with authentication token | ApiOutput configured with token auth | GET /api/secure-path, Header: X-API-Key: <token> - Verify auth works, invalid rejected |
✅ Unit tests | test_workflow.py |
| API-003 | Access API endpoint with query filters | ApiOutput endpoint exposed, data available | GET /api/data?filter[age][operator]=>&filter[age][value]=18 - Verify filtered data returned |
✅ Unit tests | test_workflow.py |
| API-004 | Access workflow output by workflow ID | Workflow executed, output generated | GET /api/workflow/<id>, with auth - Verify latest execution results returned |
✅ Unit tests | test_workflow.py |
| API-005 | Access non-existent API endpoint | No workflow configured for path | GET /api/non-existent - Verify HTTP 404, appropriate error |
✅ Unit tests | test_workflow.py |
| API-006 | Execute and retrieve workflow via API | Valid workflow exists | POST /workflows/execute/<id> - Verify executes, results available |
✅ Unit tests | test_workflow_service.py |
| Prometheus-X Integration (PTX) | |||||
| PTX-001 | Create PTX dataset connection | User authenticated, PDC connector accessible | Name: ptx-connector, PDC URL, Token - Verify connection created, token secured |
✅ Unit tests | test_pdc_service.py |
| PTX-002 | Retrieve catalog from PTX connection | PTX dataset configured, PDC accessible | Connection ID - Verify catalog retrieved, offerings/resources listed | ✅ Unit tests | test_pdc_service.py |
| PTX-003 | Get data resources from PTX | PTX connection exists, catalog has resources | Connection ID - Verify resources listed, metadata complete | ✅ Unit tests | test_pdc_service.py |
| PTX-004 | Update data resource URL in PTX | PTX connection exists, resource exists in catalog | Connection ID, Resource ID, New URL - Verify URL updated in PDC | ✅ Unit tests | test_pdc_service.py |
| PTX-005 | Execute service chain workflow | Service chain configured, workflow assigned | POST /ptx/executeChainService, Headers with chain ID, Payload - Verify executed in chain context |
✅ Unit tests | test_pdc_service.py |
| PTX-006 | Trigger data exchange via PTX | PTX connection configured, exchange endpoint exists | Connection ID, Data exchange config - Verify exchange triggered, response handled | ✅ Unit tests | test_pdc_service.py |
| PTX-007 | Store and retrieve service chain data | Service chain ID exists, data stored temporarily | Service Chain ID, Store payload - Verify data stored, retrievable, cleaned after | ✅ Unit tests | test_pdc_service.py |
| PTX-008 | Access PDC output endpoint | Workflow with PdcOutput, custom URL configured | GET /output/<custom_path>, Headers: M2M credentials - Verify workflow executes, output returned |
✅ Unit tests | test_pdc_service.py |
| PTX-009 | Receive pushed data from PDC | PDC configured to push data, endpoint listening | POST /ptx/pdcInput, Data payload from PDC - Verify data received, processed, acknowledged |
✅ Unit tests | test_pdc_service.py |
| Security & Access Control | |||||
| SEC-001 | Path traversal prevention in file upload | User authenticated | Upload file with name: ../../etc/passwd - Verify rejected/sanitized, no traversal |
✅ Unit tests | test_path_security.py |
| SEC-002 | File extension validation | User authenticated | Upload file: malicious.exe - Verify rejected, only allowed extensions accepted |
✅ Unit tests | test_path_security.py |
| SEC-003 | File size limit enforcement | User authenticated | Upload 150MB file (limit 100MB) - Verify HTTP 413, limit enforced before full upload | ✅ Unit tests | test_security_middleware.py |
| SEC-004 | SQL injection prevention | MySQL dataset configured | Query with injection: '; DROP TABLE users; -- - Verify prevented, parameterized, no damage |
✅ Unit tests | test_dataset_service.py |
| SEC-005 | XSS prevention in user inputs | User creates dataset with name | Name: <script>alert('XSS')</script> - Verify sanitized, script escaped/removed |
✅ Unit tests | test_security_middleware.py |
| SEC-006 | Rate limiting on API endpoints | Public API endpoint configured | 100 requests in 10 seconds to same endpoint - Verify HTTP 429, limit triggered | ⚠️ Partial (unit only) | test_security_middleware.py |
| SEC-007 | CORS policy enforcement | Frontend on different origin | API request from http://evil.com - Verify blocked if not allowed, CORS enforced |
✅ Unit tests | test_security_middleware.py |
| SEC-008 | Sensitive data encryption | User creates dataset with password | Password: MySecretPass123 - Verify encrypted in DB, not in API responses |
✅ Unit tests | test_dataset_service.py |
| SEC-009 | User data isolation | Two users with same filename | User A uploads data.csv, User B uploads data.csv - Verify stored separately, isolation maintained |
✅ Unit tests | test_path_security.py |
| SEC-010 | Token expiration enforcement | Access token expired (> 60 min old) | Request with expired token - Verify HTTP 401, must refresh/re-login | ✅ Unit tests | test_auth_service.py |
| Data Processing & Transformations | |||||
| PROC-001 | Process CSV file with headers | CSV file with headers uploaded | File with columns: name,age,city - Verify parsed, headers detected, names preserved |
✅ Unit tests | test_dataset_service.py |
| PROC-002 | Process CSV file without headers | CSV file without headers | Headerless CSV data - Verify parsed, auto-generated column names | ✅ Unit tests | test_dataset_service.py |
| PROC-003 | Process JSON array | JSON file with array of objects | [{id: 1, name: "A"}, {id: 2, name: "B"}] - Verify array parsed, objects accessible |
✅ Unit tests | test_dataset_service.py |
| PROC-004 | Process nested JSON | JSON with nested structures | {user: {profile: {age: 25}}} - Verify nested preserved, dot notation accessible |
✅ Unit tests | test_dataset_service.py, test_flatten_transform.py |
| PROC-005 | Process Parquet file | Parquet file uploaded | Parquet with schema and data - Verify schema read, data queryable, types preserved | ✅ Unit tests | test_dataset_service.py |
| PROC-006 | Process Excel file (.xlsx) | Excel file uploaded with multiple sheets | Excel with 3 sheets - Verify all detected, sheet selection available | ✅ Unit tests | test_dataset_service.py |
| PROC-007 | Handle missing values in data | Dataset with null/empty values | Data with nulls and empty strings - Verify handled gracefully, distinguished | ✅ Unit tests | test_node.py, test_filter_transform.py |
| PROC-008 | Handle large dataset (1M+ rows) | Large CSV file (> 1 million rows) | 1M row CSV - Verify processed efficiently, pagination works, no timeout | ✅ Unit tests | test_dataset_service.py |
| PROC-009 | Convert between data formats | Input CSV, output Parquet | CSV → Parquet conversion - Verify successful, data integrity maintained | ✅ Unit tests | test_node.py |
| PROC-010 | Aggregate data (groupBy) | Dataset with groupable data, transform with aggregate | Group by category, aggregate sum(amount) - Verify grouping correct, aggregation accurate |
✅ Unit tests | test_workflow.py |
| PROC-011 | Sort data | Dataset with unsorted data, sort transformation | Sort by age DESC, name ASC - Verify multi-column sort works, order correct |
✅ Unit tests | test_workflow.py |
| PROC-012 | Remove duplicates | Dataset with duplicate rows | Deduplicate on columns: id - Verify duplicates removed, unique rows remain |
✅ Unit tests | test_workflow.py |
| Error Handling & Edge Cases | |||||
| ERR-001 | Handle malformed JSON input | User uploads JSON file | Invalid JSON: {broken: syntax - Verify error caught, helpful message, line number if possible |
✅ Unit tests | test_dataset_service.py |
| ERR-002 | Handle corrupt file upload | User uploads corrupted file | Corrupted CSV file - Verify error detected, upload fails with reason | ✅ Unit tests | test_dataset_service.py |
| ERR-003 | Handle database connection failure | MySQL dataset configured, database is down | Execute workflow with MySQL input - Verify connection error caught, clear error | ✅ Unit tests | test_dataset_service.py |
| ERR-004 | Handle external API timeout | API dataset configured, API slow/unresponsive | Workflow with API call (30s+ response) - Verify timeout handled, configurable, clear error | ✅ Unit tests | test_dataset_service.py |
| ERR-005 | Handle missing required fields | Create dataset without required field | Dataset without name field - Verify HTTP 422, required fields listed |
✅ Unit tests | test_dataset_service.py, test_workflow_service.py |
| ERR-006 | Handle concurrent workflow edits | Two users edit same workflow simultaneously | User A and B save workflow at same time - Verify conflict detected, no corruption | ⚠️ Partial (unit only) | test_workflow_service.py |
| ERR-007 | Handle workflow circular dependencies | User creates workflow with circular connections | Node A → Node B → Node A - Verify detected, validation prevents save, error explains | ✅ Unit tests | test_workflow.py |
| ERR-008 | Handle insufficient disk space | Server disk space full, file upload attempted | Upload large file when disk full - Verify HTTP 507, graceful degradation | ⚠️ Not implemented | test_dataset_service.py |
| ERR-009 | Handle memory exhaustion on large data | Very large dataset processing | Process 10GB+ file - Verify memory limit respected, streaming/chunking, OOM error if needed | ✅ Unit tests | test_dataset_service.py, test_workflow.py |
| ERR-010 | Handle invalid credentials in dataset | MySQL dataset with wrong password | Credentials: user admin, wrong password - Verify auth error caught, clear message, no leak |
✅ Unit tests | test_dataset_service.py |
Small manual tests to prove individual features work correctly.
Objective: Ensure the DAAV backend service is running and responding correctly.
Precondition: DAAV backend service is deployed and accessible.
Steps:
http://localhost:8081/health200 OKExpected Result:
{
"status": "healthy",
"app_name": "DAAV Backend API",
"environment": "production",
"version": "2.0.0"
}
Result: Validated
Objective: Ensure the Swagger API documentation is accessible.
Precondition: Backend is running.
Steps:
http://localhost:8081/docsExpected Result: Swagger documentation page displays with all available endpoints.
Result: Validated
Objective: Ensure users can log in with valid credentials.
Precondition: Application is running.
Steps:
adminAdmin123!Expected Result: User is logged in and redirected to the main dashboard.
Result: Validated
Objective: Ensure datasets can be created with file upload (JSON format).
Precondition:
Steps:
http://localhost:8080Expected Result: Dataset is created and appears in the list.
Result: Validated
Objective: Ensure dataset content can be retrieved and displayed.
Precondition:
Steps:
Expected Result: Dataset content is displayed correctly with proper formatting.
Result: Validated
Objective: Ensure nodes can be connected in the visual workflow editor.
Precondition: Workflow editor is open with at least two compatible nodes.
Steps:
Expected Result: Visual connection line appears between nodes and connection is saved.
Result: Validated
Objective: Ensure a workflow can be created and saved.
Precondition: At least one dataset exists.
Steps:
Expected Result: Workflow is saved with a unique ID.
Result: Validated
Objective: Ensure a workflow can be executed successfully.
Precondition: A valid workflow exists.
Steps:
Expected Result: Workflow executes without error and produces expected result.
Result: Validated
In conclusion, all the core functional tests are validated.
Using real-world personas and use cases, we established several detailed test scenarios that demonstrate end-to-end functionality.
Profile:
Test Scenario 1: Paul creates a data aggregation workflow
Context: Paul needs to combine CSV files from different departments to create a unified dashboard.
Prerequisites:
paulDataAnalyst123!userSteps:
Login:
http://localhost:8080paulDataAnalyst123!Upload CSV files:
students_scores.csvstudent-scoresstudents_attendance.csv, name it: student-attendanceCreate workflow:
student-performance-reportBuild data pipeline:
student-scores datasetstudent-attendance datasetstudent_id columnstudent_report.parquetExecute workflow:
Verify results:
student_report.parquetExpected Result:
Validation: Test Scenario 1 Validated
Profile:
Test Scenario 2: Christophe prepares ML training data
Context: Christophe needs to combine data from a public REST API and a local JSON file, apply transformations, and export to Parquet format for ML training.
Prerequisites:
christopheMLEngineer123!userSteps:
Login:
http://localhost:8080christopheMLEngineer123!Create API dataset:
api-usershttps://jsonplaceholder.typicode.com/usersUpload JSON file:
ml_features.jsonml-featuresBuild ML data pipeline:
ml-training-pipelineapi-users datasetml-features datasetid columntraining_data.parquet formatConfigure and execute:
Verify results:
training_data.parquetExpected Result:
id columnValidation: ✅ Test Scenario 2 Validated
Profile:
Test Scenario 3: Fatim configures PDC data exchange
Context: Fatim needs to create a workflow that receives data from a PDC (Prometheus Data Connector), processes it, and exposes it via Prometheus-X service chain.
Prerequisites:
fatimSysAdmin123!adminSteps:
Login:
http://localhost:8080fatimSysAdmin123!Setup PDC connection:
pdc-learning-recordshttps://pdc.example.com<valid_token>Verify catalog access:
Create workflow with PDC:
pdc-data-processing/api/processed-learning-dataConfigure service chain:
processed-learning-dataTest data exchange:
Verify endpoint access:
http://localhost:8081/output/processed-learning-dataAuthorization: Bearer <m2m_token>Expected Result:
Validation: Test Scenario 3 Validated
Objective: Verify graceful error handling when dataset connection fails.
Context: User configures a MySQL dataset with incorrect credentials.
Steps:
Expected Result:
Validation: Test Scenario 4 Validated
DAAV includes comprehensive automated test suites to ensure code quality and functionality.
Backend (Python/Pytest):
backendApi/tests/Commands to run backend tests:
cd backendApi
python -m pytest # Run all tests
python -m pytest tests/services/ # Run service tests
python -m pytest tests/nodes/ # Run node tests
python -m pytest --cov=app --cov-report=html # With coverage
Frontend (Angular/Karma/Jasmine):
frontendApp/src/**/*.spec.tsCommands to run frontend tests:
cd frontendApp
npm test # Run tests
npm run test:headless # Headless mode
ng test --code-coverage # With coverage
Profenpoche (BB leader):
Inokufu :
BME, cabrilog and Ikigaï are also partners available for beta-testing.
The Daav building block can be used as a data transformer to build new datasets from local data or from prometheus dataspace.
The output can also be share on prometheus dataspace.
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
