Skip to content

Welcome to the documentation for the Dataset Recommendation Service.

FastAPI 0.1.0

DataGEMS Recommendation Service


POST /dataset-recsys/recommend

Get recommendations

Description

Retrieve the top-N recommendations for a given dataset.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No
entity_id query string No The dataset identifier.
n query integer 10 No Number of similar items to return

Responses

{
    "entity_id": "07382b91-5bc5-42f9-8391-33adc2460c19",
    "recommendations": [
        {
            "entity_id": "67f22d91-2b1a-4e8c-8f92-52dbc38c130f"
        },
        {
            "entity_id": "climate-data-v2"
        },
        {
            "entity_id": "european-soil-maps-2023"
        }
    ]
}
{
    "entity_id": "non-existent-or-unrelated.pdf",
    "recommendations": []
}
Schema of the response body
{
    "properties": {
        "entity_id": {
            "type": "string",
            "title": "Entity Id",
            "description": "The ID of the entity for which we want recommendations"
        },
        "recommendations": {
            "items": {
                "$ref": "#/components/schemas/Recommendation"
            },
            "type": "array",
            "title": "Recommendations",
            "description": "List of recommendations"
        }
    },
    "type": "object",
    "required": [
        "entity_id",
        "recommendations"
    ],
    "title": "RecsResponse"
}
{
    "code": 102,
    "error": "Validation Error",
    "message": [
        {
            "Key": "query.entity_id",
            "Value": [
                "field required"
            ]
        }
    ]
}
Schema of the response body

{
    "code": 401,
    "error": "Could not validate credentials"
}
Schema of the response body

{
    "code": 403,
    "error": "Forbidden",
    "message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body

{
    "code": 500,
    "error": "Internal server error"
}
Schema of the response body


POST /dataset-recsys/recommend/ap

Get recommendations via Analytical Pattern

Description

Processes an Analytical Pattern (AP) request by extracting the seed dataset and returning its top-N recommendations as part of an enriched graph.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No

Request body

{
    "nodes": [
        {
            "id": "user-session-123",
            "labels": [
                "User"
            ]
        },
        {
            "id": "task-rec-001",
            "labels": [
                "Task"
            ],
            "properties": {
                "name": "Get recommendations",
                "description": "Retrieve the top-N recommendations for a given dataset."
            }
        },
        {
            "id": "pattern-ds-recs",
            "labels": [
                "Analytical_Pattern"
            ],
            "properties": {
                "name": "Dataset-to-Dataset Recommendations AP",
                "description": "Pattern for discovering related datasets.",
                "process": "recommend",
                "publishedDate": "2026-01-21",
                "startTime": "10:00:00"
            }
        },
        {
            "id": "operator-recommender",
            "labels": [
                "DatasetRecommender_Operator"
            ],
            "properties": {
                "name": "GetRecommendations Operator",
                "command": "get_recommendations",
                "n": 10
            }
        },
        {
            "id": "07382b91-5bc5-42f9-8391-33adc2460c19",
            "labels": [
                "sc:Dataset"
            ],
            "properties": {
                "description": "The seed dataset used as the basis for recommendations."
            }
        }
    ],
    "edges": [
        {
            "from": "user-session-123",
            "to": "task-rec-001",
            "labels": [
                "request"
            ]
        },
        {
            "from": "task-rec-001",
            "to": "pattern-ds-recs",
            "labels": [
                "is_accomplished"
            ]
        },
        {
            "from": "pattern-ds-recs",
            "to": "operator-recommender",
            "labels": [
                "consist_of"
            ]
        },
        {
            "from": "operator-recommender",
            "to": "07382b91-5bc5-42f9-8391-33adc2460c19",
            "labels": [
                "input"
            ]
        }
    ]
}
Schema of the request body
{
    "additionalProperties": true,
    "type": "object",
    "title": "Analytical Pattern",
    "description": "The Analytical Pattern graph in JSON format"
}

Responses

{
    "nodes": [
        {
            "id": "user-session-123",
            "labels": [
                "User"
            ]
        },
        {
            "id": "task-rec-001",
            "labels": [
                "Task"
            ],
            "properties": {
                "name": "Get recommendations",
                "description": "Retrieve the top-N recommendations for a given dataset."
            }
        },
        {
            "id": "pattern-ds-recs",
            "labels": [
                "Analytical_Pattern"
            ],
            "properties": {
                "name": "Dataset-to-Dataset Recommendations AP",
                "description": "Retrieve top-N recommendations for a given dataset.",
                "process": "recommend",
                "publishedDate": "2026-01-21",
                "startTime": "10:00:00"
            }
        },
        {
            "id": "operator-recommender",
            "labels": [
                "DatasetRecommender_Operator"
            ],
            "properties": {
                "name": "GetRecommendations Operator",
                "description": "Dataset-to-Dataset recommender operator.",
                "command": "get_recommendations",
                "n": 2
            }
        },
        {
            "id": "seed-dataset-001",
            "labels": [
                "sc:Dataset"
            ],
            "properties": {
                "name": "Seed Dataset"
            }
        },
        {
            "id": "recommended-entity-1",
            "labels": [
                "sc:Dataset"
            ],
            "properties": {
                "name": "Recommended Entity 1"
            }
        },
        {
            "id": "recommended-entity-2",
            "labels": [
                "sc:Dataset"
            ],
            "properties": {
                "name": "Recommended Entity 2"
            }
        }
    ],
    "edges": [
        {
            "from": "user-session-123",
            "to": "task-rec-001",
            "labels": [
                "request"
            ]
        },
        {
            "from": "task-rec-001",
            "to": "pattern-ds-recs",
            "labels": [
                "is_accomplished"
            ]
        },
        {
            "from": "pattern-ds-recs",
            "to": "operator-recommender",
            "labels": [
                "consist_of"
            ]
        },
        {
            "from": "operator-recommender",
            "to": "seed-dataset-001",
            "labels": [
                "input"
            ]
        },
        {
            "from": "operator-recommender",
            "to": "recommended-entity-1",
            "labels": [
                "output"
            ],
            "properties": {
                "rank": 1
            }
        },
        {
            "from": "operator-recommender",
            "to": "recommended-entity-2",
            "labels": [
                "output"
            ],
            "properties": {
                "rank": 2
            }
        }
    ]
}
Schema of the response body

{
    "code": 403,
    "error": "Forbidden",
    "message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body

{
    "code": 422,
    "error": "Malformed AP Graph: No input dataset linked to the Recommender Operator. Expected edge direction: Operator --input--> seed dataset."
}
{
    "code": 422,
    "error": "Malformed AP Graph: Operator 'DatasetRecommender_Operator' not found."
}
Schema of the response body

{
    "code": 500,
    "error": "Internal server error"
}
Schema of the response body

MathE Recommendation Service


POST /dataset-recsys/mathe/recommend

Get material recommendations for a math question

Description

Given a MathE question ID, return a list of recommended PDF materials.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No

Request body

{
    "question": "\\int x^2 dx",
    "question_id": "6",
    "n": 0
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body
{
    "properties": {
        "question": {
            "type": "string",
            "title": "Question",
            "description": "The math question in LaTeX format",
            "examples": [
                "\\int x^2 dx"
            ]
        },
        "question_id": {
            "type": "string",
            "title": "Question Id",
            "description": "The ID of the question to use as a seed",
            "examples": [
                "6"
            ]
        },
        "n": {
            "type": "integer",
            "exclusiveMinimum": 0.0,
            "title": "N",
            "description": "Number of materials to return",
            "default": 10
        }
    },
    "type": "object",
    "required": [
        "question",
        "question_id"
    ],
    "title": "MatheRecsRequest"
}

Responses

{
    "question_id": "string",
    "recommendations": [
        {
            "material_id": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "question_id": {
            "type": "string",
            "title": "Question Id",
            "description": "The MathE question ID used to generate the recommendations"
        },
        "recommendations": {
            "items": {
                "$ref": "#/components/schemas/MatheRecommendation"
            },
            "type": "array",
            "title": "Recommendations",
            "description": "List of recommended MathE materials"
        }
    },
    "type": "object",
    "required": [
        "question_id",
        "recommendations"
    ],
    "title": "MatheRecsResponse"
}

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

POST /dataset-recsys/mathe/sync

Sync Data

Description

Triggers the sync, OCR, and recommendation refresh process. Uses BackgroundTasks so the API returns immediately.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No

Responses

Schema of the response body


GET /dataset-recsys/mathe/status

Get Status

Description

Returns the current MathE sync status and metadata about the materials being processed.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No

Responses

Schema of the response body

DataGEMS Dataset Management


POST /dataset-recsys/dataset/add

Add a dataset

Description

Adds a dataset to the DataGEMS recommender.

The service retrieves the dataset metadata, builds its embedding, stores it in the vector database, and updates the recommendation index. If the dataset already exists, the request is ignored to avoid duplicate entries.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No
entity_id query string No The dataset identifier to add.

Responses

{
    "status": "success",
    "message": "Dataset ds_123 successfully added and recommendations updated."
}
Schema of the response body

{
    "code": 401,
    "error": "Could not validate credentials"
}
Schema of the response body

{
    "code": 403,
    "error": "Forbidden",
    "message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body

{
    "code": 102,
    "error": "Validation Error",
    "message": [
        {
            "Key": "query.entity_id",
            "Value": [
                "field required"
            ]
        }
    ]
}
Schema of the response body

{
    "detail": "Dataset not found in external repository."
}
Schema of the response body

{
    "code": 500,
    "error": "Internal server error"
}
Schema of the response body


POST /dataset-recsys/dataset/remove

Remove a dataset

Description

Removes a dataset from the DataGEMS recommender.

The service deletes the dataset embedding, removes its recommendation list, removes the dataset from the recommendation index, and cleans references to it from other recommendation lists.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No
entity_id query string No The unique identifier of the dataset to be removed.

Responses

{
    "status": "success",
    "message": "Dataset ds_123 removed."
}
Schema of the response body

{
    "detail": "Dataset not found in the recommendation engine."
}
Schema of the response body

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "type": "array",
            "title": "Detail"
        }
    },
    "type": "object",
    "title": "HTTPValidationError"
}

POST /dataset-recsys/dataset/exist

Check dataset existence

Description

Checks whether one or more datasets are currently registered in the DataGEMS recommendation index.

The endpoint receives a list of dataset IDs and returns a mapping from each ID to a boolean value indicating whether it exists in the recommender.

Input parameters

Parameter In Type Default Nullable Description
OAuth2PasswordBearer header string N/A No

Request body

[
    "ds_123",
    "ds_456"
]
Schema of the request body
{
    "items": {
        "type": "string"
    },
    "type": "array",
    "title": "Entity Ids",
    "description": "A list of dataset IDs to verify."
}

Responses

{
    "ds_123": true,
    "ds_456": false
}
Schema of the response body

{
    "detail": "Not authenticated"
}
Schema of the response body

{
    "detail": "Access denied"
}
Schema of the response body

{
    "detail": "List of entity_ids cannot be empty."
}
Schema of the response body

{
    "detail": "Internal system failure"
}
Schema of the response body

Service Health


GET /dataset-recsys/health

Health check

Description

Check if the API, Redis, and vector database are responsive.

Responses

Schema of the response body


GET /dataset-recsys/

Root endpoint

Description

Root endpoint to verify that the service is running.

Responses

Schema of the response body


GET /dataset-recsys/debug/schema

Get database schema

Description

Retrieve the database schema for the embedding storage.

Responses

Schema of the response body


Schemas

HTTPValidationError

Name Type Description
detail Array<ValidationError>

MatheRecommendation

Name Type Description
material_id string The recommended MathE material ID

MatheRecsRequest

Name Type Description
n integer Number of materials to return
question string The math question in LaTeX format
question_id string The ID of the question to use as a seed

MatheRecsResponse

Name Type Description
question_id string The MathE question ID used to generate the recommendations
recommendations Array<MatheRecommendation> List of recommended MathE materials

Recommendation

Name Type Description
entity_id string The recommended entity ID

RecsResponse

Name Type Description
entity_id string The ID of the entity for which we want recommendations
recommendations Array<Recommendation> List of recommendations

ValidationError

Name Type Description
loc Array<>
msg string
type string

Security schemes

Name Type Scheme Description
OAuth2PasswordBearer oauth2