Prometheus-X Components & Services

Edge translators BB – Design Document

The baseline of this building block is: “On Institutional Edges for AI Assisted Onto-Terminology Translators.”.

The AI Translator is a tool to help us achieve **Frictionless Interoperability. **

In the data world, like in the spoken language world, translation require 2 abilities :

To explain the spirit of this Building Block we will divide it into 3 sections :

As a central part of the “common language”, the Ontology Editor will provide a community sourced and updated pivotal onto-terminology for all partners of the project.

Technical usage scenarios & Features

The Edge Translator is the core component for the process of translating input data format and value to a standard output. This component will work in conjunction with others in order to provide best translations. These side-car components are the following 4 :

Within the PTX environment the Edge Translator can be schematised as:

alt_text

The more straightforward use-case for this Building Bloc is :

The rest of this document will focus on the Edge Translator component.

Features/main functionalities

Technical usage scenarios

Requirements

Integrations

See 01_BB Connections spreadsheet

Direct Integrations with Other BBs

BB 9b LOMCT & BB 7 Distributed data visualization may have direct connection to the PDC.

Integrations into the PDC Flow

The translation service has a specific position inside the actual flow of the PDC. It acts as an intermediary for an exchange between a data-provider and a data-consumer. In order to take into account this specific situation, we will closely follow the evolution of PTX”s “Protocol Component as it may be a solution to orchestrate the Translator’s required flow.

Relevant Standards

Data Format Standards

Mapping to Data Space Reference Architecture Models

Input / Output Data

There are 3 categories of inputs for this building block:

As output, 3 main categories of data will be produced:

Architecture

At a high descriptive level of the internals of Edge Translator we can identify this components:

classDiagram
  class InputConnector {
    listen()
    parseParameters()
    startTransform()
  }
  class OntologyTransform {
    ontologyMatch: OntologyMatching
    terminologyTransf: TerminologyTransform
    transform(inputData, ontologyMappingRules)
  }
  class OntologyMatching {
    getCandidatesForProperty(sourceProperty, context) candidates[]
  }
  class TerminologyTransform {
    terminologyMatch: TerminologyMatching
    terminologyLangTransform: TerminologyLangTransform
    transform(property, sourceValue)
  }

  class TerminologyMatching {
    getCandidatesForValue(value, lang)
  }

  class TerminologyLangTransform {
    getInLang(value, lang)
  }

  class OutputConnector{
    sendToConnector(outputData)
  }


    InputConnector -- OntologyTransform
    OntologyTransform -- OntologyMatching
    OntologyTransform -- TerminologyTransform
    TerminologyTransform -- TerminologyMatching
    TerminologyTransform -- TerminologyLangTransform
    OutputConnector -- OntologyTransform

These components then use the API and data provided by Headai & Rejustify to provide their features.

The detailed séquence diagram for theses components is : alt_text

Dynamic Behaviour

The following diagrams show 2 concrete examples for the Terminology (or Framework) transformation process.

The first case details the process when a national framework is available and can be mapped before the connection of the data-source.

alt_text

The second case details the process when an internal, not broadly available framework is used to describe the source data. The live mappings are then conducted and are available for later inspection.

alt_text

Configuration and deployment settings

Deployment: The deployment of the translator will be done through docker containers. Even if the translator will work on CPU, it would be better for deployment to have GPU available. As a dependency, ElasticSearch will have to be deployed independently (on premise or cloud deployment available). Translator’s companion apps can be deployed to any infrastructure through docker containers.

Logging and Operations: The Translator will log operations, errors, and warnings to standard output and / or a cloud logging system. Logging includes details such as incoming & output requests, calls to main components, calls to external api, ontology and terminology transformation traces. Error scenarios, such as failed input request, error during structure or value transformation, failed queries to dependent components, …. are logged with appropriate error codes and descriptions to aid in troubleshooting and debugging.

Third Party Components & Licenses

The main Third Parties that will be used are:

See detailed documentation here.

OpenAPI Specification

Click to open the full spec ```json { "openapi": "3.1.0", "info": { "title": "FastAPI", "version": "0.1.0" }, "paths": { "/ontologies/get_mapping_rules": { "get": { "summary": "Ontologies.Get Mapping Rules", "operationId": "ontologies_get_mapping_rules_ontologies_get_mapping_rules_get", "parameters": [ { "description": "Name of the data provider", "required": true, "schema": { "type": "string", "title": "Provider Name", "description": "Name of the data provider" }, "name": "provider_name", "in": "query" }, { "description": "the document type ", "required": false, "schema": { "type": "string", "title": "Document Type", "description": "the document type " }, "name": "document_type", "in": "query" }, { "description": "Version of the rules", "required": false, "schema": { "type": "string", "title": "Version", "description": "Version of the rules" }, "name": "version", "in": "query" } ], "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "items": { "type": "object" }, "type": "array", "title": "Response Ontologies Get Mapping Rules Ontologies Get Mapping Rules Get" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/ontologies/get_jsonld_from_mapping_rules": { "post": { "summary": "Ontologies.Get Jsonld From Mapping Rules", "operationId": "ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post", "parameters": [ { "required": false, "schema": { "type": "string", "title": "Version" }, "name": "version", "in": "query" } ], "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Ontologies Get Jsonld From Mapping Rules Ontologies Get Jsonld From Mapping Rules Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/ontologies/get_jsonld_from_provider": { "post": { "summary": "Ontologies.Get Jsonld From Provider", "operationId": "ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post", "parameters": [ { "description": "Name of the data provider", "required": true, "schema": { "type": "string", "title": "Provider Name", "description": "Name of the data provider" }, "name": "provider_name", "in": "query" }, { "description": "Version of the rules", "required": false, "schema": { "type": "string", "title": "Version", "description": "Version of the rules" }, "name": "version", "in": "query" } ], "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Ontologies Get Jsonld From Provider Ontologies Get Jsonld From Provider Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/ontologies/helloworld": { "get": { "summary": "Ontologies.Get Hello World", "operationId": "ontologies_get_hello_world_ontologies_helloworld_get", "parameters": [ { "required": true, "schema": { "type": "string", "title": "Name" }, "name": "name", "in": "query" } ], "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "string", "title": "Response Ontologies Get Hello World Ontologies Helloworld Get" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/machine_learning/get_embedding_vectors_from_sentences_from_flask": { "post": { "summary": "Embeddings.Get Embedding Vector From Sentences From Flask", "operationId": "embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post", "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Embeddings Get Embedding Vector From Sentences From Flask Machine Learning Get Embedding Vectors From Sentences From Flask Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/machine_learning/get_embedding_vectors_from_sentences": { "post": { "summary": "Embeddings.Get Embedding Vector From Sentences", "operationId": "embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post", "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Embeddings Get Embedding Vector From Sentences Machine Learning Get Embedding Vectors From Sentences Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/machine_learning/get_knn_from_elasticsearch_for_embedding": { "post": { "summary": "Embeddings.Get Knn From Elasticsearch For Embedding", "operationId": "embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post", "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Embeddings Get Knn From Elasticsearch For Embedding Machine Learning Get Knn From Elasticsearch For Embedding Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } }, "/machine_learning/get_knn_from_elasticsearch_for_vector": { "post": { "summary": "Embeddings.Get Knn From Elasticsearch For Vector", "operationId": "embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post", "requestBody": { "content": { "application/json": { "schema": { "$ref": "#/components/schemas/Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post" } } }, "required": true }, "responses": { "200": { "description": "Successful Response", "content": { "application/json": { "schema": { "type": "object", "title": "Response Embeddings Get Knn From Elasticsearch For Vector Machine Learning Get Knn From Elasticsearch For Vector Post" } } } }, "422": { "description": "Validation Error", "content": { "application/json": { "schema": { "$ref": "#/components/schemas/HTTPValidationError" } } } } } } } }, "components": { "schemas": { "Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post": { "properties": { "embedding": { "allOf": [ { "$ref": "#/components/schemas/EmbeddingPayload" } ], "title": "Embedding", "description": "the text to transform" } }, "type": "object", "required": [ "embedding" ], "title": "Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post" }, "Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post": { "properties": { "embedding": { "allOf": [ { "$ref": "#/components/schemas/EmbeddingPayload" } ], "title": "Embedding", "description": "the text to transform" } }, "type": "object", "required": [ "embedding" ], "title": "Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post" }, "Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post": { "properties": { "embedding": { "allOf": [ { "$ref": "#/components/schemas/EmbeddingPayload" } ], "title": "Embedding", "description": "the text to match" } }, "type": "object", "required": [ "embedding" ], "title": "Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post" }, "Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post": { "properties": { "embedding_vector": { "type": "object", "title": "Embedding Vector", "description": "the vector to match" } }, "type": "object", "required": [ "embedding_vector" ], "title": "Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post" }, "Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post": { "properties": { "mapping_rules": { "type": "object", "title": "Mapping Rules", "description": "the mapping rules" }, "document": { "anyOf": [ { "items": { "type": "object" }, "type": "array" }, { "type": "object" } ], "title": "Document", "description": "the document" } }, "type": "object", "required": [ "mapping_rules", "document" ], "title": "Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post" }, "Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post": { "properties": { "document": { "anyOf": [ { "items": { "type": "object" }, "type": "array" }, { "type": "object" } ], "title": "Document", "description": "the document" } }, "type": "object", "required": [ "document" ], "title": "Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post" }, "EmbeddingPayload": { "properties": { "sentences": { "items": { "type": "string" }, "type": "array", "title": "Sentences", "description": "List of texts to embed", "example": [ "Bonjour, comment ça va?" ] } }, "type": "object", "required": [ "sentences" ], "title": "EmbeddingPayload", "description": "Intended for use as a base class for externally-facing models.\n\nAny models that inherit from this class will:\n* accept fields using snake_case or camelCase keys\n* use camelCase keys in the generated OpenAPI spec\n* have orm_mode on by default\n * Because of this, FastAPI will automatically attempt to parse returned orm instances into the model" }, "HTTPValidationError": { "properties": { "detail": { "items": { "$ref": "#/components/schemas/ValidationError" }, "type": "array", "title": "Detail" } }, "type": "object", "title": "HTTPValidationError" }, "ValidationError": { "properties": { "loc": { "items": { "anyOf": [ { "type": "string" }, { "type": "integer" } ] }, "type": "array", "title": "Location" }, "msg": { "type": "string", "title": "Message" }, "type": { "type": "string", "title": "Error Type" } }, "type": "object", "required": [ "loc", "msg", "type" ], "title": "ValidationError" } } } } ```

Test specification

Test plan

See specific document on test plan The main tests scenarios will cover:

Unit tests

Unit tests will be set-up on a range of sample data to ensure non-regression during the development. These unit tests will take as input an example of a data provider and the output of the transformation will be checked against a static output file.

These tests will be executed with Pytest. Extensive examples of data input and output are available here

Example of input data file (in json)

[
  {
	"Experience Name": "Problem-Solving Puzzle",
	"User ID": "xx.yy@gmail.com",
	"Date": "2023-06-28",
	"Associated Soft Skill Block": "Problem-Solving",
	"Results": "Validated"
  }
]

Example of output data file (in json-ld)

{
  "@context": {
	"id": "@id",
	"graph": {
  	"@id": "@graph",
  	"@container": "@set"
	},
	"type": {
  	"@id": "@type",
  	"@container": "@set"
	},
	"tr": "https://competencies.be/mindmatcher/translator",
	"mms": "https://competencies.be/mindmatcher/schema",
	"date": {
  	"@id": "mms:date",
  	"@type": "xsd:dateTime"
	},
	"keywords": {
  	"@id": "mms:keywords",
  	"@type": "xsd:string"
	},
	"picture": {
  	"@id": "mms:picture",
  	"@type": "xsd:string"
	},
	"title": {
  	"@id": "mms:title",
  	"@type": "xsd:string"
	},
	"url": {
  	"@id": "mms:url",
  	"@type": "xsd:string"
	}

  },
  "graph": [
	{
  	"id": "tr:__generated-id-1__",
  	"type": "soo:Experience",
  	"prefLabel": {"@value": "Problem-Solving Puzzle", "@language": "en"},
  	"profile": "tr:__profile-id-1__",
  	"dateFrom": "2023-06-28",
  	"skill": "tr:__skill-id-1__",
  	"result": "Validated"
	},
	{
  	"id":"tr:__profile-id-1__",
  	"type": "soo:Profile",
  	"email":"xx.yy@gmail.com",
  	"experience": "tr:__generated-id-1__"
	},
	{
  	"id": "tr:__skill-id-1",
  	"type": "soo:Skill",
  	"experience": "tr:__generated-id-1__",
  	"sourceValue": "tr:__source-value-1",
  	"mapping": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869",
  	"suggestions": [ "tr:sugg/1", "tr:sugg/2"],
  	"skillLevelValue": "Validated"
	},
	{
  	"id": "tr:__source-value-1",
  	"label": "Problem-Solving"
	},
	{
  	"id": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869",
  	"type": "esco:Skill",
  	"prefLabel": {"@value": "solve problems", "@language": "en"}
	},
	{
  	"id":"tr:sugg/1",
  	"type": "mms:Suggestion",
  	"source": "mm-search",
  	"score": 35,
  	"mapping": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869"
	},
	{
  	"id":"tr:sugg/2",
  	"type": "mms:Suggestion",
  	"source": "mm-search",
  	"score": 10,
  	"mapping": "esco:1234"
	}
  ]
}

Integration tests

Integration tests conducted with the setup of “fake application” and the automated call of an exchange. This possibility will depend on the ability of PDC to provide a way to automate such exchanges (not possible as of today). These integration tests will be run by a python or JavaScript script. They may also be runned with Postman.

Partners & roles

MindMatcher :

Headai :

Rejustify :

Usage in the dataspace

Translator can be used in case of aggregation of different data-sources, in order to get a uniform data-format and facilitate the processing of data from different sources.

alt_text

In the given scenario Use Case Orchestrator has also the role of the Data Provider and Data Consumer.

  1. As Use Case Orchestrator the Org A selects the required services from the Catalog
  2. As Use Case Orchestrator the Org A signs the Contract
  3. The Data Processing Chain Protocol is defined together with the Org A
  4. After signing the contract the Org A will have a PDC and by acting as Data Provider will send the raw data to the first service provider’s PDC (Org C PDC)
  5. Org C’s PDC communicated the raw data with the service provider
  6. Org. C’s PDC receives the anonymized data back
  7. As defined by Data Processing Chain Protocol first service provider (Org.C), after processing the data, checks the next recipient and pushes the processed data to the next service provider’s PDC (Org D Data Vacity Assurance)
  8. Org D’s PDC requests for the Attestation of Veracity (AoV)
  9. Proof of Veracity (PoV) is provided
  10. As defined by Data Processing Chain Protocol the data flow is double checked between the Service Provider and DPCP and pushed to the next service provider (Org E Edge Translator)
  11. Org E’s PDC forwards the raw, anonymized data to Edge Translator BB
  12. Output data is generated (Json-ld) and forwarded to the PDC of the EdgeTranslator BB
  13. Translated data (Json-ld in Pivotal Ontology) is forwarded back to DVA’s PDC for double verification
  14. Data is pushed to DVA requesting AoV
  15. PoV is provided
  16. DVA’s PDC Transfers data to the Visualisation BB’s PDS to process it
  17. Visualization BB’s PDC pushes the data into the Visualization BB
  18. Already processed data returns to the PDC ready to be distributed
  19. Service provider’s PDC (Org F) forwards final data to the Data Consumer’s device, ready to be displayed