Skip to content

FastAPI 0.0.1

Endpoints


GET /

Read Root

Response 200 OK

Schema of the response body

Health


GET /monitoring/health-check

Check Health

Description

Check the health status of the Dataset Profiler service and its dependencies.

This endpoint verifies the connectivity and operational status of critical service dependencies:

  • Redis: Used for job status tracking and caching
  • Ray: Used for distributed computing tasks

Returns

A status report containing health information for each dependency: * redis: Status and connection details * ray: Status and cluster information

Raises

  • HTTPException: 503 Service Unavailable if any dependency is unhealthy

Example

{
    "redis": {
        "status": "healthy",
        "message": "Connected to Redis at localhost:6379"
    },
    "ray": {
        "status": "healthy",
        "message": "Ray cluster reachable with 3 alive node(s)"
    }
}

Response 200 OK

Schema of the response body
{
    "additionalProperties": true,
    "title": "Response Check Health Monitoring Health Check Get",
    "type": "object"
}

Profiler


POST /profiler/clean_up

Clean Up Job

Description

Clean up resources associated with a completed profiling job.

This endpoint releases resources and temporary storage used during the profiling process. It should be called after retrieving and storing the profile data to free up system resources.

Parameters

  • clean_up_req (CleanUpRequest): Request containing:
  • profile_job_id: The unique identifier of the profiling job to clean up

Returns

  • dict: A response indicating the success of the cleanup operation

Example

{
  "detail": "SUCCESS"
}

Note

This endpoint is currently a placeholder and cleanup functionality is not yet implemented.

Request body

{
    "profile_job_id": "string"
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body
{
    "description": "Request model for cleaning up resources associated with a profiling job.\n\n## Attributes\n* **profile_job_id** (str): The unique identifier of the profiling job to clean up\n    \n## Example\n```json\n{\n  \"profile_job_id\": \"550e8400-e29b-41d4-a716-446655440000\"\n}\n```",
    "properties": {
        "profile_job_id": {
            "title": "Profile Job Id",
            "type": "string"
        }
    },
    "required": [
        "profile_job_id"
    ],
    "title": "CleanUpRequest",
    "type": "object"
}

Response 200 OK

Schema of the response body
{
    "additionalProperties": true,
    "title": "Response Clean Up Job Profiler Clean Up Post",
    "type": "object"
}

Response 422 Unprocessable Content

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/job_status/{profile_job_id}

Get Job Status

Description

Check the detailed status of a profiling job.

This endpoint retrieves the current status of a profiling job from the job store. It provides more detailed information about the job's progress than the runner status.

Parameters

  • profile_job_id (str): The unique identifier of the profiling job

Returns

  • JobStatus: The current status of the profiling job, one of:
  • SUBMITTING: The job is being submitted to the processing queue
  • STARTING: The job has been accepted and is starting
  • LIGHT_PROFILE_READY: The light profile (basic metadata) is ready
  • HEAVY_PROFILES_READY: The heavy profile (including record sets) is ready
  • FAILED: The job has failed

Example

"LIGHT_PROFILE_READY"

Input parameters

Parameter In Type Default Nullable Description
profile_job_id path string No

Response 200 OK

"submitting"
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "description": "Enumeration of possible profiling job statuses.\n\n## Values\n* **SUBMITTING**: The job is being submitted to the processing queue\n* **STARTING**: The job has been accepted and is starting\n* **LIGHT_PROFILE_READY**: The light profile (basic metadata) is ready\n* **HEAVY_PROFILES_READY**: The heavy profile (including record sets) is ready\n* **CLEANED_UP**: Resources associated with the job have been cleaned up",
    "enum": [
        "submitting",
        "starting",
        "light_profile_ready",
        "heavy_profile_ready",
        "cleaned_up"
    ],
    "title": "JobStatus",
    "type": "string"
}

Response 422 Unprocessable Content

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/profile/{profile_job_id}

Get Profile

Description

Retrieve the generated profile for a completed profiling job.

This endpoint returns the profile data generated for a dataset, including both light and heavy profiles if available. The profile contains metadata about the dataset structure, content, and characteristics.

Parameters

  • profile_job_id (str): The unique identifier of the profiling job

Returns

  • ProfilesResponse: The generated profiles, containing:
  • moma_profile_light: Basic metadata about the dataset and its distributions
  • moma_profile_heavy: Detailed information about record sets and fields
  • cdd_profile: Profile used by the Cross-Dataset Discovery service

Raises

  • HTTPException: 404 Not Found if no profile exists for the given job ID or if profiling is still in progress

Example

{
  "moma_profile_light": {
    "@context": {...},
    "@type": "sc:Dataset",
    "name": "Mathematics Learning Assessment",
    "description": "...",
    "distribution": [...]
  },
  "moma_profile_heavy": {
    "@context": {...},
    "@type": "sc:Dataset",
    "recordSet": [...]
  },
  "cdd_profile": {}
}

Input parameters

Parameter In Type Default Nullable Description
profile_job_id path string No

Response 200 OK

{
    "cdd_profile": {},
    "moma_profile_heavy": {},
    "moma_profile_light": {}
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "description": "Response model containing the generated profiles for a dataset.\n\n## Attributes\n* **moma_profile_light** (dict): Basic metadata about the dataset and its distributions\n* **moma_profile_heavy** (dict): Detailed information about record sets and fields\n* **cdd_profile** (dict): Profile used by the Cross-Dataset Discovery service\n    \n## Example\n```json\n{\n  \"moma_profile_light\": {\n    \"@context\": {\n      \"@language\": \"en\",\n      \"@vocab\": \"https://schema.org/\",\n      \"cr\": \"http://mlcommons.org/croissant/\"\n    },\n    \"@type\": \"sc:Dataset\",\n    \"name\": \"Mathematics Learning Assessment\",\n    \"description\": \"This dataset was extracted from the MathE platform...\",\n    \"distribution\": [\n      {\n        \"@type\": \"cr:FileObject\",\n        \"name\": \"mathe_assessment_dataset.csv\",\n        \"contentSize\": \"1057461 B\",\n        \"encodingFormat\": \"text/csv\"\n      }\n    ]\n  },\n  \"moma_profile_heavy\": {\n    \"@context\": {\n      \"@language\": \"en\",\n      \"@vocab\": \"https://schema.org/\",\n      \"cr\": \"http://mlcommons.org/croissant/\"\n    },\n    \"@type\": \"sc:Dataset\",\n    \"recordSet\": [\n      {\n        \"@type\": \"cr:RecordSet\",\n        \"name\": \"mathe_assessment_dataset\",\n        \"field\": [\n          {\n            \"@type\": \"cr:Field\",\n            \"name\": \"Student ID\",\n            \"dataType\": \"sc:Integer\"\n          }\n        ]\n      }\n    ]\n  },\n  \"cdd_profile\": {}\n}\n```",
    "properties": {
        "cdd_profile": {
            "additionalProperties": true,
            "title": "Cdd Profile",
            "type": "object"
        },
        "moma_profile_heavy": {
            "additionalProperties": true,
            "title": "Moma Profile Heavy",
            "type": "object"
        },
        "moma_profile_light": {
            "additionalProperties": true,
            "title": "Moma Profile Light",
            "type": "object"
        }
    },
    "required": [
        "moma_profile_light",
        "moma_profile_heavy",
        "cdd_profile"
    ],
    "title": "ProfilesResponse",
    "type": "object"
}

Response 422 Unprocessable Content

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/runner_status/{profile_job_id}

Get Runner Status

Description

Check the status of the Ray task for a given profiling job.

This endpoint queries the Ray cluster to determine the current execution status of a profiling task. It provides information about whether the task is pending, in progress, completed, failed, or unknown.

Parameters

  • profile_job_id (str): The unique identifier of the profiling job

Returns

  • RunnerStatus: The current status of the Ray task, one of:
  • pending: The job is waiting to be processed
  • in_progress: The job is currently being processed
  • completed: The job has completed successfully
  • failed: The job has failed
  • unknown: The job ID is not recognized

Example

"in_progress"

Input parameters

Parameter In Type Default Nullable Description
profile_job_id path string No

Response 200 OK

"pending"
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "description": "Enumeration of possible Ray task statuses.\n\n## Values\n* **PENDING**: The job is waiting to be processed\n* **IN_PROGRESS**: The job is currently being processed\n* **COMPLETED**: The job has completed successfully\n* **FAILED**: The job has failed\n* **UNKNOWN**: The job ID is not recognized",
    "enum": [
        "pending",
        "in_progress",
        "completed",
        "failed",
        "unknown"
    ],
    "title": "RunnerStatus",
    "type": "string"
}

Response 422 Unprocessable Content

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

POST /profiler/trigger_profile

Trigger Dataset Profiling

Description

Submit a new dataset profiling job.

This endpoint accepts dataset specifications and initiates a profiling job. The profiling process analyzes the dataset structure and content, generating metadata that describes its characteristics.

Parameters

  • profile_req (ProfilingRequest): The profiling request containing:
  • profile_specification: Metadata about the dataset to be profiled
  • only_light_profile: Flag to generate only basic metadata (default: False)

Returns

  • IngestionTriggerResponse: A response containing:
  • job_id: Unique identifier for tracking the profiling job
  • status: Confirmation message that the job was submitted

Example

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "Job submitted"
}

Request body

{
    "only_light_profile": true,
    "profile_specification": {
        "cite_as": null,
        "country": "string",
        "data_connectors": [
            null
        ],
        "database_name": null,
        "date_published": "string",
        "description": "string",
        "doi": null,
        "fields_of_science": [
            "string"
        ],
        "headline": "string",
        "id": "d5077463-3d43-43fd-a4db-d816dbc8777b",
        "keywords": [
            "string"
        ],
        "languages": [
            "string"
        ],
        "license": "string",
        "name": "string",
        "published_url": null,
        "uploaded_by": "string"
    }
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body
{
    "description": "Request model for triggering a dataset profiling job.\n\n## Attributes\n* **profile_specification** (ProfileSpecificationEndpoint): Metadata about the dataset to be profiled\n* **only_light_profile** (bool): Flag to generate only basic metadata (default: False)\n  * If True, only the light profile (basic metadata and distributions) is generated\n  * If False, both light and heavy profiles (including record sets) are generated\n        \n## Example\n```json\n{\n  \"profile_specification\": {\n    \"id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\",\n    \"name\": \"Mathematics Learning Assessment\",\n    \"description\": \"This dataset was extracted from the MathE platform...\",\n    \"headline\": \"Dataset for Assessing Mathematics Learning in Higher Education.\",\n    \"fields_of_science\": [\"MATHEMATICS\"],\n    \"languages\": [\"en\"],\n    \"keywords\": [\"math\", \"student\", \"higher education\"],\n    \"country\": \"PT\",\n    \"published_url\": \"https://dados.ipb.pt//dataset.xhtml?persistentId=doi:10.34620/dadosipb/PW3OWY\",\n    \"date_published\": \"24-05-2025\",\n    \"license\": \"CC0 1.0\",\n    \"uploaded_by\": \"ADMIN\",\n    \"data_connectors\": [\n      {\n        \"type\": \"RawDataPath\",\n        \"dataset_id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\"\n      }\n    ]\n  },\n  \"only_light_profile\": false\n}\n```",
    "properties": {
        "only_light_profile": {
            "default": false,
            "title": "Only Light Profile",
            "type": "boolean"
        },
        "profile_specification": {
            "$ref": "#/components/schemas/ProfileSpecificationEndpoint"
        }
    },
    "required": [
        "profile_specification"
    ],
    "title": "ProfilingRequest",
    "type": "object"
}

Response 200 OK

{
    "job_id": "string",
    "status": "string"
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "description": "Response model for a submitted profiling job.\n\n## Attributes\n* **job_id**: Unique identifier for tracking the profiling job\n* **status**: Confirmation message that the job was submitted\n    \n## Example\n```json\n{\n  \"job_id\": \"550e8400-e29b-41d4-a716-446655440000\",\n  \"status\": \"Job submitted\"\n}\n```",
    "properties": {
        "job_id": {
            "title": "Job Id",
            "type": "string"
        },
        "status": {
            "title": "Status",
            "type": "string"
        }
    },
    "required": [
        "job_id",
        "status"
    ],
    "title": "IngestionTriggerResponse",
    "type": "object"
}

Response 422 Unprocessable Content

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}
⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body
{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

Schemas

CleanUpRequest

Name Type
profile_job_id string

DatabaseConnection

Name Type
database_name string
type string

HTTPValidationError

Name Type
detail Array<ValidationError>

IngestionTriggerResponse

Name Type
job_id string
status string

JobStatus

Type: string

ProfileSpecificationEndpoint

Name Type
cite_as
country string
data_connectors Array<>
database_name
date_published string
description string
doi
fields_of_science Array<string>
headline string
id string(uuid)
keywords Array<string>
languages Array<string>
license string
name string
published_url
uploaded_by string

ProfilesResponse

Name Type
cdd_profile
moma_profile_heavy
moma_profile_light

ProfilingRequest

Name Type
only_light_profile boolean
profile_specification ProfileSpecificationEndpoint

RawDataPath

Name Type
dataset_id string
type string

RunnerStatus

Type: string

ValidationError

Name Type
loc Array<>
msg string
type string