The baseline of this building block is: “On Institutional Edges for AI Assisted Onto-Terminology Translators.”.
The AI Translator is a tool to help us achieve **Frictionless Interoperability. **
In the data world, like in the spoken language world, translation require 2 abilities :
To explain the spirit of this Building Block we will divide it into 3 sections :
As a central part of the “common language”, the Ontology Editor will provide a community sourced and updated pivotal onto-terminology for all partners of the project.
The Edge Translator is the core component for the process of translating input data format and value to a standard output. This component will work in conjunction with others in order to provide best translations. These side-car components are the following 4 :
Within the PTX environment the Edge Translator can be schematised as:
The more straightforward use-case for this Building Bloc is :
The rest of this document will focus on the Edge Translator component.
See 01_BB Connections spreadsheet
BB 9b LOMCT & BB 7 Distributed data visualization may have direct connection to the PDC.
The translation service has a specific position inside the actual flow of the PDC. It acts as an intermediary for an exchange between a data-provider and a data-consumer. In order to take into account this specific situation, we will closely follow the evolution of PTX”s “Protocol Component as it may be a solution to orchestrate the Translator’s required flow.
There are 3 categories of inputs for this building block:
As output, 3 main categories of data will be produced:
At a high descriptive level of the internals of Edge Translator we can identify this components:
classDiagram
class InputConnector {
listen()
parseParameters()
startTransform()
}
class OntologyTransform {
ontologyMatch: OntologyMatching
terminologyTransf: TerminologyTransform
transform(inputData, ontologyMappingRules)
}
class OntologyMatching {
getCandidatesForProperty(sourceProperty, context) candidates[]
}
class TerminologyTransform {
terminologyMatch: TerminologyMatching
terminologyLangTransform: TerminologyLangTransform
transform(property, sourceValue)
}
class TerminologyMatching {
getCandidatesForValue(value, lang)
}
class TerminologyLangTransform {
getInLang(value, lang)
}
class OutputConnector{
sendToConnector(outputData)
}
InputConnector -- OntologyTransform
OntologyTransform -- OntologyMatching
OntologyTransform -- TerminologyTransform
TerminologyTransform -- TerminologyMatching
TerminologyTransform -- TerminologyLangTransform
OutputConnector -- OntologyTransform
These components then use the API and data provided by Headai & Rejustify to provide their features.
The detailed séquence diagram for theses components is :
The following diagrams show 2 concrete examples for the Terminology (or Framework) transformation process.
The first case details the process when a national framework is available and can be mapped before the connection of the data-source.
The second case details the process when an internal, not broadly available framework is used to describe the source data. The live mappings are then conducted and are available for later inspection.
Deployment: The deployment of the translator will be done through docker containers. Even if the translator will work on CPU, it would be better for deployment to have GPU available. As a dependency, ElasticSearch will have to be deployed independently (on premise or cloud deployment available). Translator’s companion apps can be deployed to any infrastructure through docker containers.
Logging and Operations: The Translator will log operations, errors, and warnings to standard output and / or a cloud logging system. Logging includes details such as incoming & output requests, calls to main components, calls to external api, ontology and terminology transformation traces. Error scenarios, such as failed input request, error during structure or value transformation, failed queries to dependent components, …. are logged with appropriate error codes and descriptions to aid in troubleshooting and debugging.
The main Third Parties that will be used are:
See detailed documentation here.
{
"openapi": "3.1.0",
"info": {
"title": "FastAPI",
"version": "0.1.0"
},
"paths": {
"/ontologies/get_mapping_rules": {
"get": {
"summary": "Ontologies.Get Mapping Rules",
"operationId": "ontologies_get_mapping_rules_ontologies_get_mapping_rules_get",
"parameters": [
{
"description": "Name of the data provider",
"required": true,
"schema": {
"type": "string",
"title": "Provider Name",
"description": "Name of the data provider"
},
"name": "provider_name",
"in": "query"
},
{
"description": "the document type ",
"required": false,
"schema": {
"type": "string",
"title": "Document Type",
"description": "the document type "
},
"name": "document_type",
"in": "query"
},
{
"description": "Version of the rules",
"required": false,
"schema": {
"type": "string",
"title": "Version",
"description": "Version of the rules"
},
"name": "version",
"in": "query"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"items": {
"type": "object"
},
"type": "array",
"title": "Response Ontologies Get Mapping Rules Ontologies Get Mapping Rules Get"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/ontologies/get_jsonld_from_mapping_rules": {
"post": {
"summary": "Ontologies.Get Jsonld From Mapping Rules",
"operationId": "ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post",
"parameters": [
{
"required": false,
"schema": {
"type": "string",
"title": "Version"
},
"name": "version",
"in": "query"
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Ontologies Get Jsonld From Mapping Rules Ontologies Get Jsonld From Mapping Rules Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/ontologies/get_jsonld_from_provider": {
"post": {
"summary": "Ontologies.Get Jsonld From Provider",
"operationId": "ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post",
"parameters": [
{
"description": "Name of the data provider",
"required": true,
"schema": {
"type": "string",
"title": "Provider Name",
"description": "Name of the data provider"
},
"name": "provider_name",
"in": "query"
},
{
"description": "Version of the rules",
"required": false,
"schema": {
"type": "string",
"title": "Version",
"description": "Version of the rules"
},
"name": "version",
"in": "query"
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Ontologies Get Jsonld From Provider Ontologies Get Jsonld From Provider Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/ontologies/helloworld": {
"get": {
"summary": "Ontologies.Get Hello World",
"operationId": "ontologies_get_hello_world_ontologies_helloworld_get",
"parameters": [
{
"required": true,
"schema": {
"type": "string",
"title": "Name"
},
"name": "name",
"in": "query"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "string",
"title": "Response Ontologies Get Hello World Ontologies Helloworld Get"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/machine_learning/get_embedding_vectors_from_sentences_from_flask": {
"post": {
"summary": "Embeddings.Get Embedding Vector From Sentences From Flask",
"operationId": "embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Embeddings Get Embedding Vector From Sentences From Flask Machine Learning Get Embedding Vectors From Sentences From Flask Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/machine_learning/get_embedding_vectors_from_sentences": {
"post": {
"summary": "Embeddings.Get Embedding Vector From Sentences",
"operationId": "embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Embeddings Get Embedding Vector From Sentences Machine Learning Get Embedding Vectors From Sentences Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/machine_learning/get_knn_from_elasticsearch_for_embedding": {
"post": {
"summary": "Embeddings.Get Knn From Elasticsearch For Embedding",
"operationId": "embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Embeddings Get Knn From Elasticsearch For Embedding Machine Learning Get Knn From Elasticsearch For Embedding Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/machine_learning/get_knn_from_elasticsearch_for_vector": {
"post": {
"summary": "Embeddings.Get Knn From Elasticsearch For Vector",
"operationId": "embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"type": "object",
"title": "Response Embeddings Get Knn From Elasticsearch For Vector Machine Learning Get Knn From Elasticsearch For Vector Post"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post": {
"properties": {
"embedding": {
"allOf": [
{
"$ref": "#/components/schemas/EmbeddingPayload"
}
],
"title": "Embedding",
"description": "the text to transform"
}
},
"type": "object",
"required": [
"embedding"
],
"title": "Body_embeddings_get_embedding_vector_from_sentences_from_flask_machine_learning_get_embedding_vectors_from_sentences_from_flask_post"
},
"Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post": {
"properties": {
"embedding": {
"allOf": [
{
"$ref": "#/components/schemas/EmbeddingPayload"
}
],
"title": "Embedding",
"description": "the text to transform"
}
},
"type": "object",
"required": [
"embedding"
],
"title": "Body_embeddings_get_embedding_vector_from_sentences_machine_learning_get_embedding_vectors_from_sentences_post"
},
"Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post": {
"properties": {
"embedding": {
"allOf": [
{
"$ref": "#/components/schemas/EmbeddingPayload"
}
],
"title": "Embedding",
"description": "the text to match"
}
},
"type": "object",
"required": [
"embedding"
],
"title": "Body_embeddings_get_knn_from_elasticsearch_for_embedding_machine_learning_get_knn_from_elasticsearch_for_embedding_post"
},
"Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post": {
"properties": {
"embedding_vector": {
"type": "object",
"title": "Embedding Vector",
"description": "the vector to match"
}
},
"type": "object",
"required": [
"embedding_vector"
],
"title": "Body_embeddings_get_knn_from_elasticsearch_for_vector_machine_learning_get_knn_from_elasticsearch_for_vector_post"
},
"Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post": {
"properties": {
"mapping_rules": {
"type": "object",
"title": "Mapping Rules",
"description": "the mapping rules"
},
"document": {
"anyOf": [
{
"items": {
"type": "object"
},
"type": "array"
},
{
"type": "object"
}
],
"title": "Document",
"description": "the document"
}
},
"type": "object",
"required": [
"mapping_rules",
"document"
],
"title": "Body_ontologies_get_jsonld_from_mapping_rules_ontologies_get_jsonld_from_mapping_rules_post"
},
"Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post": {
"properties": {
"document": {
"anyOf": [
{
"items": {
"type": "object"
},
"type": "array"
},
{
"type": "object"
}
],
"title": "Document",
"description": "the document"
}
},
"type": "object",
"required": [
"document"
],
"title": "Body_ontologies_get_jsonld_from_provider_ontologies_get_jsonld_from_provider_post"
},
"EmbeddingPayload": {
"properties": {
"sentences": {
"items": {
"type": "string"
},
"type": "array",
"title": "Sentences",
"description": "List of texts to embed",
"example": [
"Bonjour, comment ça va?"
]
}
},
"type": "object",
"required": [
"sentences"
],
"title": "EmbeddingPayload",
"description": "Intended for use as a base class for externally-facing models.\n\nAny models that inherit from this class will:\n* accept fields using snake_case or camelCase keys\n* use camelCase keys in the generated OpenAPI spec\n* have orm_mode on by default\n * Because of this, FastAPI will automatically attempt to parse returned orm instances into the model"
},
"HTTPValidationError": {
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
},
"ValidationError": {
"properties": {
"loc": {
"items": {
"anyOf": [
{
"type": "string"
},
{
"type": "integer"
}
]
},
"type": "array",
"title": "Location"
},
"msg": {
"type": "string",
"title": "Message"
},
"type": {
"type": "string",
"title": "Error Type"
}
},
"type": "object",
"required": [
"loc",
"msg",
"type"
],
"title": "ValidationError"
}
}
}
}
See specific document on test plan The main tests scenarios will cover:
Unit tests will be set-up on a range of sample data to ensure non-regression during the development. These unit tests will take as input an example of a data provider and the output of the transformation will be checked against a static output file.
These tests will be executed with Pytest. Extensive examples of data input and output are available here
Example of input data file (in json)
[
{
"Experience Name": "Problem-Solving Puzzle",
"User ID": "xx.yy@gmail.com",
"Date": "2023-06-28",
"Associated Soft Skill Block": "Problem-Solving",
"Results": "Validated"
}
]
Example of output data file (in json-ld)
{
"@context": {
"id": "@id",
"graph": {
"@id": "@graph",
"@container": "@set"
},
"type": {
"@id": "@type",
"@container": "@set"
},
"tr": "https://competencies.be/mindmatcher/translator",
"mms": "https://competencies.be/mindmatcher/schema",
"date": {
"@id": "mms:date",
"@type": "xsd:dateTime"
},
"keywords": {
"@id": "mms:keywords",
"@type": "xsd:string"
},
"picture": {
"@id": "mms:picture",
"@type": "xsd:string"
},
"title": {
"@id": "mms:title",
"@type": "xsd:string"
},
"url": {
"@id": "mms:url",
"@type": "xsd:string"
}
},
"graph": [
{
"id": "tr:__generated-id-1__",
"type": "soo:Experience",
"prefLabel": {"@value": "Problem-Solving Puzzle", "@language": "en"},
"profile": "tr:__profile-id-1__",
"dateFrom": "2023-06-28",
"skill": "tr:__skill-id-1__",
"result": "Validated"
},
{
"id":"tr:__profile-id-1__",
"type": "soo:Profile",
"email":"xx.yy@gmail.com",
"experience": "tr:__generated-id-1__"
},
{
"id": "tr:__skill-id-1",
"type": "soo:Skill",
"experience": "tr:__generated-id-1__",
"sourceValue": "tr:__source-value-1",
"mapping": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869",
"suggestions": [ "tr:sugg/1", "tr:sugg/2"],
"skillLevelValue": "Validated"
},
{
"id": "tr:__source-value-1",
"label": "Problem-Solving"
},
{
"id": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869",
"type": "esco:Skill",
"prefLabel": {"@value": "solve problems", "@language": "en"}
},
{
"id":"tr:sugg/1",
"type": "mms:Suggestion",
"source": "mm-search",
"score": 35,
"mapping": "esco:adc6dc11-3376-467b-96c5-9b0a21edc869"
},
{
"id":"tr:sugg/2",
"type": "mms:Suggestion",
"source": "mm-search",
"score": 10,
"mapping": "esco:1234"
}
]
}
Integration tests conducted with the setup of “fake application” and the automated call of an exchange. This possibility will depend on the ability of PDC to provide a way to automate such exchanges (not possible as of today). These integration tests will be run by a python or JavaScript script. They may also be runned with Postman.
MindMatcher :
Headai :
Rejustify :
Translator can be used in case of aggregation of different data-sources, in order to get a uniform data-format and facilitate the processing of data from different sources.
In the given scenario Use Case Orchestrator has also the role of the Data Provider and Data Consumer.