Prometheus-X Components & Services

Distributed data visualization BB – Design Document

Distributed Data Visualisation is an on-device building block to ease up building end-user UIs for showing data in an informative way. It allows AI providers to display the results of their analytics, predictions and recommendations.

The building block is a reusable, portable and multi-platform supported HTML5 / JavaSript & D3.js based component that can be displayed by UI provider from any source.

This concept is not only about presenting data in an accessible format, but also about revolutionizing the way data interacts across different platforms, ensuring data privacy and improving user control. With the Prometheus-X project at the forefront, a ground-breaking approach to handling education and career development data will redefine industry standards.

Distributed data visualization is a sophisticated framework designed to seamlessly manage and display data across multiple platforms. It enables AI providers to process datasets from different data providers and gain valuable insights such as recommendations, analysis and predictions. What makes this approach special is the ability to easily integrate these results into different applications and user interfaces (UI), making it a versatile tool for educational and professional development.

Technical usage scenarios & Features

Distributed data visualisation builds the story /user journey for the given goal and given data sets, it applies other building blocks to modify/anonymize/analyse the data. It enables to run analysis by applying other building blocks and Data Space Services from Catalog (under contract).

Features/main functionalities

User journey case 1:

Use case description, value and goal: The use case will take as basis Pino a product from Solideos used by Korean citizens to store their diploma and learning credentials. Use information from Korean users’ Diploma and Transcript to match job opportunities in the EU (France). Skills data extracted from the documents will be used for service providers in the EU to recommend potential employment opportunities.

Use case functionalities:

User journey case 2:

Use case description, value and goal: LAB aims to prepare individuals for the job market by offering personalized training recommendations worldwide, leveraging skills data and partnerships with service providers like Headai, Inokufu, Rejustify, and Edunao. The goal is to equip learners with relevant skills for career success, enhancing employability and lifelong learning.

Use case functionalities:

Generic user journey:

  1. Matilda has data about herself in several LMS/LXP
  2. She’s using a new learning provider and wants to receive personalised learning recommendations based on any/all her data
  3. She can share her data from service provider to another and the results are displayed inside one of those
  4. Service provider only has to integrate the data viz BB and PTX Connector into LMS/LXP

Technical usage scenarios

Distributed Data Visualisation is a on-device building block meant to ease up building end-user UIs for showing data in informative way with a relevant story/journey. Visualiser requires a host system that runs the UI and that host system must configure it’s components (consent, contract) that are needed to operate this building block.

Because of the nature of this building block being slightly different than others, the reader of this document can not assume all server-side-building-blocks behaviors to exist in this BB.

Technical usage scenario 1:

User would gives her/his consent, it triggers “API consumption” by sending the data as a payload to this API call to the AI, and the data will be returned by the protocol. In this flow, the connector of the edtech side would send the result of the AI service back to the edtech so that it could be shown to the user.

Technical usage scenario 2:

User gives her/his consent to datasets, data provider(s) gives contracts to datasets that are not personal data. All consents/contracts are sent to Visualiser with additional metadata (data endpoint definitions). Visualizer fetches all the data based on this data. The Case 2 would ease up the Visualiser configuration.

Tech usage scenario in short

With Distributed data visualisation, Data providers and Data consumers can easily bring their services to end users without developing all the middleware components on their own.

For exapmle, a LMS company using Moodle, can integrate the Distributed data visualisation directly to Moodle and so focus on their LMS provider role, while this building block do all the Data Space technical operations.

Requirements

Functional requirements:

Performance requirements:

Security requirements:

Dependability requirements:

In case of incomplete rules.json or data the DDV closes the process and returns error message.

Operational requirements:

Integrations

Direct Integrations with Other BBs

There is no direct integrations with other BBs except Connector. All the following integrations are optional.

Integrations via Connector

Relevant Standards

The standard data formats are applied in a way that the Building Block specific data model is convertible to any other model with standard tools. This could mean e.g. that 1) graph-data in JSON-LD format can be converted to a CSV table, where nodes and their values are only used. So this node-csv can be used directly in excel for drawing barcharts. 2) JSON-LD can be mapped in PowerBi and applied in any PowerBi report as such. The fundamental idea is to bring a universal skills data visualization building block and at the same time support any data visualization tool.

Data Format Standards

Mapping to DSSC Data Space Reference Architecture Model

The Data Interoperability pillar:

The Data Sovereignty and Trust pillar:

The Data Value Creation pillar:

See full DSSC

GDPR Mapping

AI Act Mapping

Input / Output Data

Input data format:

Host system sends the building block the following information in rules.json:

{
	"contract": "CONTRACT_ID",
	"consent": "CONSENT_ID",
	"services": [
		"service 1 - consent 1 definition",
		"service 2 - consent 1 definition",
		"service 1 - contract 1 definition"
	],
	"visualisation_types": [
		"consent 1:hexagon (see OpenAPI section)",
		"contract 1: barchart - consent 1 definition",
		"service 1: list"
	],
	"visualizer_params": [
		"colors=C5192D,CCCCCC,F6E8C3",
		"iframe=true",
		"color_scale=log",
		"show_number=value"
	]
}

Output data format:

By including the contract (& consent when personal data is involved), the BB will have the ability to fetch the detailed contract & consent information in order to provide its connector with the details of where to fetch the data from.

Depending on the rules.json, the BB appplies the appropriate endpoints to retrieve training or job recommendations. In addition to displaying recommendations, it can also display analytics or metadata of different datasets.

Use case 1 (Job recommendations)

BB displays the following json in an interactive and user friendly frame, through the JavaScript web component:

{
	"url": "Link to the job post.",
	"author": "Name of the job portal the job is posted to.",
	"language": "Language of the job post.",
	"title": "Title of the job post.",
	"description": "Full description of the job post.",
	"city": "City of job post.",
	"time": "Date and time the job is posted.",
	"score": "Scoring index of the job recommendation.",
	"reasoning": ["Matching skill 1", "Matching skill 2", "Matching skill 3"],
	"missing_skills": ["Missing skill 1", "Missing skill 2", "Missing skill 3"]
}

Above example is of one recommendation. Component displays several recommendations of such, depending on the configurations set by the host system or end user.

Use case 2 (Training recommendations)

BB displays the following json in an interactive and user friendly frame, through the JavaScript web component:

{
	"code": "Course code.",
	"url": "Link to the course.",
	"title": "Course title.",
	"short_description": "Short description of the course.",
	"explanation": "Skills you have: x,y,z. Skills you will get: w, z, t.",
	"new_skills": ["New skill 1", "New skill 2", "New skill 3"],
	"existing_skills": [
		"Existing skill 1",
		"Exsiting skill 2",
		"Existing skill 3"
	],
	"interests": ["Interest 1", "Interest 2"],
	"quality_index": "Quality index of the training recommendation.",
	"scoring_index": "Scoring index of the tarining recommendation."
}

Architecture

---
title: Distributed Data Visualisation
---

classDiagram
    class Api1["API (optional services for building and working with data)"]
    class Api2["API (optional services for running analysis)"]
    Data --> Api1
    Api1 --|> Visualisation
    Api1 --|> Api2
    Api2 --|> Visualisation
    Data : +Consent
    Data : +Contract
    Data : +Snowlflake, Data Excahnge()
    Data : +National data sources like Sisu, Peppi, etc.()
    Data : +Free text()
    Data : +Consent services like Visions, Vastuu, Koski()
    Api1 : +Data normalisers like Build Knowlegde Graph
    Api1 : +Data structure builders like Text To Knowledge Graph
    Api1 : +Personal Data Storages like Inokufu, Cozy Cloud, Digital Twin Storage, etc()
    Visualisation: +Bar chart
    Visualisation: +Line chart
    Visualisation: +Hexagon
    Visualisation: +Square
    Visualisation: +Table
    Visualisation: +List
    Api2 : +Recommendation services like Headai Compass or MindMatcher recommendation
    Api2 : +Gap analysys services like Headai Score

Dynamic Behaviour

The sequence diagram shows how the component communicates with other components.

Non personal data

For scenarios where personal data is not involved, the consent is not required and only contracts between the participants & the DDV.

---
title: Distributed Visualisation Sequence Diagram (Non Personal Data)
---

sequenceDiagram
    participant host as Host System (UI Provider)
    participant ddv as DDV
    participant ddvcon as DDV Connector
    participant con as Contract Service
    participant dpcon as Data Provider Connector
    participant dp as Participant (Data Provider)
    participant dc as Participant (Data Consumer)

    dc -) host: visualization request (incl. contract) <br> https://example.com?contract=CONTRACT_ID
    host -) ddv: HTTP request (incl. contract)
    Note over ddv: rules.json
    ddv -) con: Get Contract Information (provide contract ID)
    con --) ddv: Contract
    ddv -) ddvcon: Trigger data exchange<br>BY USING CONTRACT
    ddvcon -) con: Verify contract & policies
    con --) ddvcon: Verified contract

    loop For n amount of Providers that are in the contract
    ddvcon -) dpcon: Data request + contract
    dpcon -) dp: GET data
    dp --) dpcon: Data
    dpcon --) ddvcon: Data
    ddvcon --) ddv: Data
    end

    ddv -)  host: JavaScript Component
    host -)  dc: Visualization

Personal Data

For scenarios where personal data is involved, the consent is required and must be verified on top of the existing contracts between the participants.

---
title: Distributed Visualisation Sequence Diagram (With Personal Data)
---

sequenceDiagram
    participant host as Host System (UI Provider)
    participant ddv as DDV
    participant ddvcon as DDV Connector
    participant con as Contract Service
    participant cons as Consent Service
    participant dpcon as Data Provider Connector
    participant dp as Participant (Data Provider)
    participant dc as Participant (Data Consumer)

    dc -) host: visualization request (incl. Contract & Consent) <br>https://example.com?contract=CONTRACT_ID&consent=CONSENT_ID
    host -) ddv: HTTP request
    Note over ddv: rules.json (incl. Contract & Consent)
    ddv -) ddvcon: Trigger consent-driven data exchange<br>BY USING CONSENT
    ddvcon -) cons: Verify consent validity
    cons -) con: Verify contract signature & status
    con --) cons: Contract verified
    cons -) ddvcon: Consent verified
    ddvcon -) con: Verify contract & policies
    con --) ddvcon: Verified contract

    loop For n amount of Providers that are in the consent
    ddvcon -) dpcon: Data request + contract + consent
    dpcon -) dp: GET data
    dp --) dpcon: Data
    dpcon --) ddvcon: Data
    ddvcon --) ddv: Data
    end

    ddv -)  host: JavaScript Component
    host -)  dc: Visualization

Configuration and deployment settings

Configuration of the BB07 can be done in two places

The given parameters follows REST-API type of GET parameters, so only values that could be publicly seen or strongly encrypted values are allowed.

Third Party Components & Licenses

Background Component: D3.js Available at D3.js Git https://github.com/d3/d3

D3.js is licensed under the ISC License. https://github.com/d3/d3/blob/main/LICENSE

In order to maximise cyber security we have isolated d3.js online-dependencies in current development version, which may cause small differences on how code behaviors when developed further. This decison may change during the development.

OpenAPI Specification

Full URL Example, development version

Data behind encrypted contract ‘xwiaHk1n3p1672366166478’

URL Parameters

Name Description Accepted Values or Example Value (Default)
Fundamental Parameters    
json_url

(mandatory)
URL with the JSON Data. If mode=customize or mode=clean, this parameter can be empty. Example: https://test.headai.com/a.json
plot_type Type of Visualization to show from the given URL.

If Empty, the Visualizer will try to detect automatically the structure of the JSON.
top20 , linechart,

horizontalbarchart, verticalbarchart,

hexagon , square, barchart
mode Customize: Shows a Form to fill all the editable parameters (plus export mode)

Clean: Enables interaction to remove clicked concepts. Shows buttons to store modified data, and to get generated URL in the clipboard. Export: Shows Buttons to store the visualization as SVG, PNG, or JSON. NOTE: This parameter is ignored If iframe mode is enabled
customize

clean

export

<empty>
Visual Customization    
iframe Enable or Disable Full-Screen Mode for IFrame Embedding. If mode=customize, the value of this parameter is ignored. true, false
width Width of the Component. If IFrame mode is enabled, this value will be used to preserve the aspect ratio in bigger or smaller windows. Any positive integer (1400)
height Height of the Component. If IFrame mode is enabled, this value will be used to preserve the aspect ratio in bigger or smaller windows. Any positive integer (800)
font_family FontFamily of all the text labels in the visualization. serif , sans-serif , cursive , monospace
font_size Font Size of all the text labels in the visualization. Any positive integer (14)
hide_legend Boolean that enables hiding the legend of certain visualization. true, false
background_color Hexadecimal color (#F9F9F9)
Parameters under Development    
legends List of comma-separated strings with the customized legends “Demand, Offer, Intersection”

URL Parameters when plot_type is hexagon or square

NameDescriptionAccepted Values or Example Value (Default)
Visual Customization
fig_sizeSize of the Figures in the MindMap. This is equivalent to HexagonRadius or SquareSize in wordplot library.Any positive integer (70)
center_nodeLabel of the concept that will be in the center. This means that all the concepts will be reorganized around the specified term. If empty, the central concept will be the one with the largest value.Example: artificial intelligenceartificial_intelligence (space or underscore work)
center_camera_aroundCenters the initial position of the visualization focusing on a specific concept (this only affects the position of the camera, not the content of the map). If empty, the camera will focus on the center_node. This parameter won’t work if you don’t specify a valid value for initial_zoom.Example: climate_change
initial_zoomDefines the initial zoom of the camera. If empty, the camera will try to show all the concepts of the visualization on the screen, automatically calculating the optimal position. If this parameter is empty or incorrect, center_camera_around will be ignored.Example: 0.8
click_actionremove: Deletes clicked conceptsshowValues: Shows historical data as line chart (only for signals maps)showDetails: Shows the attributes of the clickedhighlight: shows the neighborhood of the clicked conceptsource: Shows the list of sources if info is availablerecenter: Reorganizes the map around the clicked conceptValues (Case Insensitive):remove (default when mode=clean)showValues (for signals)showDetailshighlightsourcerecenter
colorsHexadecimal code of the colors, separated by a comma and without the numeral character '#'.Example: "A0A000,F000F0"
color_scalesqrt , log , linear , pow , flat
stroke_color
show_numberDefines which numerical value of the nodes will be displayed inside the hexagon along with the label of the concepts.Example: value, normalized_value, id, weight, group
show_action_buttonstrue, false
Special Modes
sdg_mapJson must have a specific format, containing scores and indicatorstrue, false
relevancy_modeBoolean that defines if the visualization will display the 5 weights (how meaningful is each concept).This mode shows the 5 different weights with different colors and enables the Legend for them.true, false
only_nearest_neighboursThis mode places nodes in a way that a hexagon is next to another ONLY if it’s related. This generates a map without any strokes.true, false
Data Manipulation
max_nodesDesired number of nodes for the map reduction. The internal algorithm will pick the set of values for MinWeight and MinValue that reduces the map as close to that number as possible. This feature only works if nodes have weights, otherwise it will be ignored.
filter_min_weight1,2,3,4,5
filter_min_valueAny positive integer
word_typeonly_compounds, <empty>
hide_nodesList of labels separated by comma. These labels represent the nodes you want to hide from the MindMap in the visualization.For compound words you can use underscores or spacesEj:
  • data_science, data_analytics
  • data science , data analytics

Click to view latest version -> Visualiser Document

Test specification

This document outlines the test plan for the DistriButed data Visualization, subject to the specific attributes as follows:

  1. No any part of the Headai’s existing testing system shall be released or transferred as a part of this building block.
  2. All implementation work of the Headai’s existing testing system is the intellectual property of Headai and is proprietary.
  3. No any source of the Headai’s existing testing system is to be released under any circumstances.

Test plan

The objective of testing the “Distributed Data Visualization” function is two-fold:

  1. To verify that it accurately builds a knowledge graph based on the given parameters, ensuring that the output is correct, reliable, and efficient over varying conditions.

  2. To confirm that the Distributed Data Visualization accurately represents the data provided by the JSON URLs. This involves verifying that all nodes, connections, and groups in the visualization correctly correspond to the data structure and content specified in the JSON file.

Scope of Functional Tests includes:

Resulting users can effectively interact with and derive insights from visualized data, reflecting accurate and meaningful information as intended by the data source.

Technical Description of Test Plan

This test plan outlines a comprehensive approach combining black box testing methodologies with automated testing routines to ensure functional accuracy, performance under various conditions, optimal response times, and resilience against anomalies in the system. The strategy leverages industry-standard tools and methodologies to achieve a high level of software quality and reliability.

Objectives for the current approach which combines best methodologies from Black Box testing implemented using homegrown Headai Quality Assurance Framework for AI. Using this approach it is possible to achieve the following:

Methodologies

Black Box Testing: This approach focuses on testing software functionality without knowledge of the internal workings. Test cases will be derived from functional specifications to verify the correctness of output based on varied inputs. This method effectively simulates user interactions and scenarios.

Automated Testing Routines: Automating the execution of repetitive but essential test cases ensures comprehensive coverage, consistency in test execution, and efficient use of resources. Automated tests will be scheduled to run at regular intervals, ensuring continuous validation of the application’s functionality and performance.

Introduction of the tools used

Headai Quality Assurance Framework for AI: 100% proprietary testing infrastructure for Natural Language Processing development. This framework facilitates the creation of repeatable automated tests in the Java environment. In particular, the attention is on backend testing, service-level testing, and integration testing, offering features for assertions, test grouping, and test lifecycle management. This framework has dashboards and reporting tools integrated with the testing tool to monitor test executions, outcomes, and performance trends over time.

Selenium: For web-based applications, Selenium automates browsers, enabling the testing of web applications across various browsers and platforms. It’s instrumental in performing end-to-end functional testing and verifying the correctness of web elements and response times.

Postman: For RESTful APIs, Postman allows the execution of API requests to validate responses, status codes, and response times. It supports automated testing through scripting and collection runners, making it ideal for testing API endpoints.

By combining black box testing with automated routines, this test plan will fully meet the requirements of the Distributed Data Visualization (“DDV”) block including but not restricted to its:

A comprehensive evaluation of the Distributed Data Visualization’s functionality, performance, resilience, and operational readiness will enhance its robustness in managing anomalies. The use of these specific tools and methodologies enhances the effectiveness of testing efforts, leading to a robust, reliable, and high-performing application ready for production deployment.

Unit tests

Test Cases

Test Case ID TC001
Description Validate successful visualization rendering from a valid JSON URL.
Inputs json_url=<valid_url>, iframe=false
Expected Result Visualization is correctly rendered based on the JSON data. Pass if the visualization matches the JSON data structure; fail otherwise.
Actual Outcome  
Status  
Comments  
Test Case ID TC002
Description Test Full-Screen Mode functionality for IFrame embedding.
Inputs json_url=<valid_url>, iframe=true
Expected Result Visualization is rendered in full-screen mode within an IFrame. Pass if the visualization occupies the full screen of the IFrame; fail if it does not.
Actual Outcome  
Status  
Comments  
Test Case ID TC003
Description Verify the functionality of initial zoom and camera focus.
Inputs json_url=<valid_url>, initial_zoom=1.0, center_camera_around=<node_id>
Expected Result Camera is zoomed to “human readable” size focusing on the specified node. Pass if the initial view focuses and zooms as expected; fail otherwise.
Actual Outcome  
Status  
Comments  
Test Case ID TC004
Description Check color coding functionality with custom colors for groups.
Inputs json_url=<valid_url>, colors=A0A000,F000F0
Expected Result Visualization uses the specified colors to differentiate between two groups. Pass if groups are correctly colored; fail if default or incorrect colors are used.
Actual Outcome  
Status  
Comments  
Test Case ID TC005
Description Test filtering nodes by minimum weight.
Inputs json_url=<valid_url>, filter_min_weight=3
Expected Result Only nodes with weight >= 3 are displayed. Pass if visualization correctly filters nodes; fail if nodes with weight < 3 are displayed.
Actual Outcome  
Status  
Comments  
Test Case ID TC006
Description Ensure click actions show the neighborhood of a clicked node.
Inputs json_url=<valid_url>, click_action=highlight
Expected Result Clicking a node highlights its neighborhood. Pass if the neighborhood is highlighted upon clicking; fail if no action occurs or the behavior is incorrect.
Actual Outcome  
Status  
Comments  
Test Case ID TC007
Description Verify error handling for unsupported word_type input.
Inputs word_type=abc.
Expected Result An appropriate error message indicating the unsupported word_type. Fail if not failing is happening.
Actual Outcome  
Status  
Comments  
Test Case ID TC008
Description Test performance under high load by concurrently executing multiple requests.
Inputs Multiple requests using valid parameters.
Expected Result The function maintains performance and accuracy across all requests. Pass if rendering is done correctly and within a reasonable time frame; fail otherwise.
Actual Outcome  
Status  
Comments  
Test Case ID TC009
Description Check for the functionality with all parameters filled, including optional ones.
Inputs All parameters specified, including optional ones with valid data.
Expected Result A detailed knowledge graph is rendered that matches all specified criteria; fail otherwise.
Actual Outcome  
Status  
Comments  

Acceptance Criteria

Component-level testing

Al the Unit Tests are done in order to make sure Distributed Data visualisation is integrateable via HTTPS requests and via REST-API requests.

Such test should be done also when intergrating DDV into a host system. All the stest could be done with same tools introduced in Unit Test section (e.g. Postman).

UI test

Al the Unit Tests are done in order to make sure Distributed Data visualisation UI would work unders decribed conditions and environments (e.g. latest Mozilla Firefox)

Such test should be done also when intergrating DDV into a host system. All the stest could be done with same tools introduced in Unit Test section (e.g. Selenium).

Partnerships & Roles

Headai

Visions

EDUNAO

IMT

Usage in the dataspace

Screenshot 2024-05-22 at 15.53.01

Data Route

1 : Data from the Learning Management System (LMS) is tracked in the Learning Record Store (LRS)

2 : The LRS transmits data to the Learning Record Converter (LRC) in a format other than xAPI

3 : The LRC converts the data into xAPI format and sends it to the Prometheus-X Dataspace Connector (PDC)

4 : The PDC requests validation for transferring data to individual X, which includes their identity, catalogue, contract, and consent

5 : The data intermediary sends the terms of the contract, identity, catalogue, and consent of individual X

6 : The PDC of organization A sends a data set in xAPI format to the PDC of individual X

7 : The PDC of individual X transfers data in xAPI format to its Personal Learning Record Store (PLRS)

4 : The PDC requests validation to transfer data to organization B. This involves confirming the organization’s identity, catalogue, contract, and consent

5 : The data intermediary sends the terms of the contract, identity, catalogue, and consent of organization B

8 : PDC of organization A sends a data set in xAPI format to the PDC of organization B

9 : The PDC of individual X requests validation to send data to organization B, which involves identity, catalogue, contract, and consent

10 : The data intermediary sends the terms of the contract, identity, catalogue, and consent of organization B

11 : The PDC of individual X sends a data set in xAPI format to the PDC of organization B

12 : The PDC sends data to the Data Value Chain Tracker (DVCT) in xAPI format and applies the commercial terms of the data-sharing contract

13 : The PDC sends data to the Data Veracity Assurance (DVA) in xAPI format, ensuring the accuracy of specific data exchanges in the database

14 : The PDC sends data to the Distributed Data Visualization (DDV) in xAPI format

15 : The DDV visualizes the received traces from both the organization and the individual