Welcome to the documentation for the Dataset Recommendation Service.
FastAPI 0.1.0¶
DataGEMS Recommendation Service¶
POST /dataset-recsys/recommend¶
Get recommendations
Description
Retrieve the top-N recommendations for a given dataset.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No | |
entity_id |
query | string | No | The dataset identifier. | |
n |
query | integer | 10 | No | Number of similar items to return |
Responses
{
"entity_id": "07382b91-5bc5-42f9-8391-33adc2460c19",
"recommendations": [
{
"entity_id": "67f22d91-2b1a-4e8c-8f92-52dbc38c130f"
},
{
"entity_id": "climate-data-v2"
},
{
"entity_id": "european-soil-maps-2023"
}
]
}
{
"entity_id": "non-existent-or-unrelated.pdf",
"recommendations": []
}
Schema of the response body
{
"properties": {
"entity_id": {
"type": "string",
"title": "Entity Id",
"description": "The ID of the entity for which we want recommendations"
},
"recommendations": {
"items": {
"$ref": "#/components/schemas/Recommendation"
},
"type": "array",
"title": "Recommendations",
"description": "List of recommendations"
}
},
"type": "object",
"required": [
"entity_id",
"recommendations"
],
"title": "RecsResponse"
}
{
"code": 102,
"error": "Validation Error",
"message": [
{
"Key": "query.entity_id",
"Value": [
"field required"
]
}
]
}
Schema of the response body
{
"code": 401,
"error": "Could not validate credentials"
}
Schema of the response body
{
"code": 403,
"error": "Forbidden",
"message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body
{
"code": 500,
"error": "Internal server error"
}
Schema of the response body
POST /dataset-recsys/recommend/ap¶
Get recommendations via Analytical Pattern
Description
Processes an Analytical Pattern (AP) request by extracting the seed dataset and returning its top-N recommendations as part of an enriched graph.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No |
Request body
{
"nodes": [
{
"id": "user-session-123",
"labels": [
"User"
]
},
{
"id": "task-rec-001",
"labels": [
"Task"
],
"properties": {
"name": "Get recommendations",
"description": "Retrieve the top-N recommendations for a given dataset."
}
},
{
"id": "pattern-ds-recs",
"labels": [
"Analytical_Pattern"
],
"properties": {
"name": "Dataset-to-Dataset Recommendations AP",
"description": "Pattern for discovering related datasets.",
"process": "recommend",
"publishedDate": "2026-01-21",
"startTime": "10:00:00"
}
},
{
"id": "operator-recommender",
"labels": [
"DatasetRecommender_Operator"
],
"properties": {
"name": "GetRecommendations Operator",
"command": "get_recommendations",
"n": 10
}
},
{
"id": "07382b91-5bc5-42f9-8391-33adc2460c19",
"labels": [
"sc:Dataset"
],
"properties": {
"description": "The seed dataset used as the basis for recommendations."
}
}
],
"edges": [
{
"from": "user-session-123",
"to": "task-rec-001",
"labels": [
"request"
]
},
{
"from": "task-rec-001",
"to": "pattern-ds-recs",
"labels": [
"is_accomplished"
]
},
{
"from": "pattern-ds-recs",
"to": "operator-recommender",
"labels": [
"consist_of"
]
},
{
"from": "operator-recommender",
"to": "07382b91-5bc5-42f9-8391-33adc2460c19",
"labels": [
"input"
]
}
]
}
Schema of the request body
{
"additionalProperties": true,
"type": "object",
"title": "Analytical Pattern",
"description": "The Analytical Pattern graph in JSON format"
}
Responses
{
"nodes": [
{
"id": "user-session-123",
"labels": [
"User"
]
},
{
"id": "task-rec-001",
"labels": [
"Task"
],
"properties": {
"name": "Get recommendations",
"description": "Retrieve the top-N recommendations for a given dataset."
}
},
{
"id": "pattern-ds-recs",
"labels": [
"Analytical_Pattern"
],
"properties": {
"name": "Dataset-to-Dataset Recommendations AP",
"description": "Retrieve top-N recommendations for a given dataset.",
"process": "recommend",
"publishedDate": "2026-01-21",
"startTime": "10:00:00"
}
},
{
"id": "operator-recommender",
"labels": [
"DatasetRecommender_Operator"
],
"properties": {
"name": "GetRecommendations Operator",
"description": "Dataset-to-Dataset recommender operator.",
"command": "get_recommendations",
"n": 2
}
},
{
"id": "seed-dataset-001",
"labels": [
"sc:Dataset"
],
"properties": {
"name": "Seed Dataset"
}
},
{
"id": "recommended-entity-1",
"labels": [
"sc:Dataset"
],
"properties": {
"name": "Recommended Entity 1"
}
},
{
"id": "recommended-entity-2",
"labels": [
"sc:Dataset"
],
"properties": {
"name": "Recommended Entity 2"
}
}
],
"edges": [
{
"from": "user-session-123",
"to": "task-rec-001",
"labels": [
"request"
]
},
{
"from": "task-rec-001",
"to": "pattern-ds-recs",
"labels": [
"is_accomplished"
]
},
{
"from": "pattern-ds-recs",
"to": "operator-recommender",
"labels": [
"consist_of"
]
},
{
"from": "operator-recommender",
"to": "seed-dataset-001",
"labels": [
"input"
]
},
{
"from": "operator-recommender",
"to": "recommended-entity-1",
"labels": [
"output"
],
"properties": {
"rank": 1
}
},
{
"from": "operator-recommender",
"to": "recommended-entity-2",
"labels": [
"output"
],
"properties": {
"rank": 2
}
}
]
}
Schema of the response body
{
"code": 403,
"error": "Forbidden",
"message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body
{
"code": 422,
"error": "Malformed AP Graph: No input dataset linked to the Recommender Operator. Expected edge direction: Operator --input--> seed dataset."
}
{
"code": 422,
"error": "Malformed AP Graph: Operator 'DatasetRecommender_Operator' not found."
}
Schema of the response body
{
"code": 500,
"error": "Internal server error"
}
Schema of the response body
MathE Recommendation Service¶
POST /dataset-recsys/mathe/recommend¶
Get material recommendations for a math question
Description
Given a MathE question ID, return a list of recommended PDF materials.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No |
Request body
{
"question": "\\int x^2 dx",
"question_id": "6",
"n": 0
}
Schema of the request body
{
"properties": {
"question": {
"type": "string",
"title": "Question",
"description": "The math question in LaTeX format",
"examples": [
"\\int x^2 dx"
]
},
"question_id": {
"type": "string",
"title": "Question Id",
"description": "The ID of the question to use as a seed",
"examples": [
"6"
]
},
"n": {
"type": "integer",
"exclusiveMinimum": 0.0,
"title": "N",
"description": "Number of materials to return",
"default": 10
}
},
"type": "object",
"required": [
"question",
"question_id"
],
"title": "MatheRecsRequest"
}
Responses
{
"question_id": "string",
"recommendations": [
{
"material_id": "string"
}
]
}
Schema of the response body
{
"properties": {
"question_id": {
"type": "string",
"title": "Question Id",
"description": "The MathE question ID used to generate the recommendations"
},
"recommendations": {
"items": {
"$ref": "#/components/schemas/MatheRecommendation"
},
"type": "array",
"title": "Recommendations",
"description": "List of recommended MathE materials"
}
},
"type": "object",
"required": [
"question_id",
"recommendations"
],
"title": "MatheRecsResponse"
}
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
}
POST /dataset-recsys/mathe/sync¶
Sync Data
Description
Triggers the sync, OCR, and recommendation refresh process. Uses BackgroundTasks so the API returns immediately.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No |
Responses
Schema of the response body
GET /dataset-recsys/mathe/status¶
Get Status
Description
Returns the current MathE sync status and metadata about the materials being processed.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No |
Responses
Schema of the response body
DataGEMS Dataset Management¶
POST /dataset-recsys/dataset/add¶
Add a dataset
Description
Adds a dataset to the DataGEMS recommender.
The service retrieves the dataset metadata, builds its embedding, stores it in the vector database, and updates the recommendation index. If the dataset already exists, the request is ignored to avoid duplicate entries.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No | |
entity_id |
query | string | No | The dataset identifier to add. |
Responses
{
"status": "success",
"message": "Dataset ds_123 successfully added and recommendations updated."
}
Schema of the response body
{
"code": 401,
"error": "Could not validate credentials"
}
Schema of the response body
{
"code": 403,
"error": "Forbidden",
"message": "Insufficient permissions for the requested entity_id."
}
Schema of the response body
{
"code": 102,
"error": "Validation Error",
"message": [
{
"Key": "query.entity_id",
"Value": [
"field required"
]
}
]
}
Schema of the response body
{
"detail": "Dataset not found in external repository."
}
Schema of the response body
{
"code": 500,
"error": "Internal server error"
}
Schema of the response body
POST /dataset-recsys/dataset/remove¶
Remove a dataset
Description
Removes a dataset from the DataGEMS recommender.
The service deletes the dataset embedding, removes its recommendation list, removes the dataset from the recommendation index, and cleans references to it from other recommendation lists.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No | |
entity_id |
query | string | No | The unique identifier of the dataset to be removed. |
Responses
{
"status": "success",
"message": "Dataset ds_123 removed."
}
Schema of the response body
{
"detail": "Dataset not found in the recommendation engine."
}
Schema of the response body
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
}
POST /dataset-recsys/dataset/exist¶
Check dataset existence
Description
Checks whether one or more datasets are currently registered in the DataGEMS recommendation index.
The endpoint receives a list of dataset IDs and returns a mapping from each ID to a boolean value indicating whether it exists in the recommender.
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
OAuth2PasswordBearer |
header | string | N/A | No |
Request body
[
"ds_123",
"ds_456"
]
Schema of the request body
{
"items": {
"type": "string"
},
"type": "array",
"title": "Entity Ids",
"description": "A list of dataset IDs to verify."
}
Responses
{
"ds_123": true,
"ds_456": false
}
Schema of the response body
{
"detail": "Not authenticated"
}
Schema of the response body
{
"detail": "Access denied"
}
Schema of the response body
{
"detail": "List of entity_ids cannot be empty."
}
Schema of the response body
{
"detail": "Internal system failure"
}
Schema of the response body
Service Health¶
GET /dataset-recsys/health¶
Health check
Description
Check if the API, Redis, and vector database are responsive.
Responses
Schema of the response body
GET /dataset-recsys/¶
Root endpoint
Description
Root endpoint to verify that the service is running.
Responses
Schema of the response body
GET /dataset-recsys/debug/schema¶
Get database schema
Description
Retrieve the database schema for the embedding storage.
Responses
Schema of the response body
Schemas¶
HTTPValidationError¶
| Name | Type | Description |
|---|---|---|
detail |
Array<ValidationError> |
MatheRecommendation¶
| Name | Type | Description |
|---|---|---|
material_id |
string | The recommended MathE material ID |
MatheRecsRequest¶
| Name | Type | Description |
|---|---|---|
n |
integer | Number of materials to return |
question |
string | The math question in LaTeX format |
question_id |
string | The ID of the question to use as a seed |
MatheRecsResponse¶
| Name | Type | Description |
|---|---|---|
question_id |
string | The MathE question ID used to generate the recommendations |
recommendations |
Array<MatheRecommendation> | List of recommended MathE materials |
Recommendation¶
| Name | Type | Description |
|---|---|---|
entity_id |
string | The recommended entity ID |
RecsResponse¶
| Name | Type | Description |
|---|---|---|
entity_id |
string | The ID of the entity for which we want recommendations |
recommendations |
Array<Recommendation> | List of recommendations |
ValidationError¶
| Name | Type | Description |
|---|---|---|
loc |
Array<> | |
msg |
string | |
type |
string |
Security schemes¶
| Name | Type | Scheme | Description |
|---|---|---|---|
| OAuth2PasswordBearer | oauth2 |