FastAPI 0.0.1¶

Endpoints¶

GET /¶

Read Root

Response 200 OK

application/json

Schema of the response body

Health¶

GET /monitoring/health-check¶

Check Health

Description

Check the health status of the Dataset Profiler service and its dependencies.

This endpoint verifies the connectivity and operational status of critical service dependencies:

Redis: Used for job status tracking and caching
Ray: Used for distributed computing tasks

Returns¶

A status report containing health information for each dependency: * redis: Status and connection details * ray: Status and cluster information

Raises¶

HTTPException: 503 Service Unavailable if any dependency is unhealthy

Example¶

{
    "redis": {
        "status": "healthy",
        "message": "Connected to Redis at localhost:6379"
    },
    "ray": {
        "status": "healthy",
        "message": "Ray cluster reachable with 3 alive node(s)"
    }
}

Response 200 OK

application/json

Schema of the response body

{
    "additionalProperties": true,
    "title": "Response Check Health Monitoring Health Check Get",
    "type": "object"
}

Profiler¶

POST /profiler/clean_up¶

Clean Up Job

Description

Clean up resources associated with a completed profiling job.

This endpoint releases resources and temporary storage used during the profiling process. It should be called after retrieving and storing the profile data to free up system resources.

Parameters¶

clean_up_req (CleanUpRequest): Request containing:
profile_job_id: The unique identifier of the profiling job to clean up

Returns¶

dict: A response indicating the success of the cleanup operation

Example¶

{
  "detail": "SUCCESS"
}

Note¶

This endpoint is currently a placeholder and cleanup functionality is not yet implemented.

Request body

application/json

{
    "profile_job_id": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "description": "Request model for cleaning up resources associated with a profiling job.\n\n## Attributes\n* **profile_job_id** (str): The unique identifier of the profiling job to clean up\n    \n## Example\n```json\n{\n  \"profile_job_id\": \"550e8400-e29b-41d4-a716-446655440000\"\n}\n```",
    "properties": {
        "profile_job_id": {
            "title": "Profile Job Id",
            "type": "string"
        }
    },
    "required": [
        "profile_job_id"
    ],
    "title": "CleanUpRequest",
    "type": "object"
}

Response 200 OK

application/json

Schema of the response body

{
    "additionalProperties": true,
    "title": "Response Clean Up Job Profiler Clean Up Post",
    "type": "object"
}

Response 422 Unprocessable Content

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/job_status/{profile_job_id}¶

Get Job Status

Description

Check the detailed status of a profiling job.

This endpoint retrieves the current status of a profiling job from the job store. It provides more detailed information about the job's progress than the runner status.

Parameters¶

profile_job_id (str): The unique identifier of the profiling job

Returns¶

JobStatus: The current status of the profiling job, one of:
SUBMITTING: The job is being submitted to the processing queue
STARTING: The job has been accepted and is starting
LIGHT_PROFILE_READY: The light profile (basic metadata) is ready
HEAVY_PROFILES_READY: The heavy profile (including record sets) is ready
FAILED: The job has failed

Example¶

"LIGHT_PROFILE_READY"

Input parameters

Parameter	In	Type	Default	Nullable	Description
`profile_job_id`	path	string		No

Response 200 OK

application/json

"submitting"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "Enumeration of possible profiling job statuses.\n\n## Values\n* **SUBMITTING**: The job is being submitted to the processing queue\n* **STARTING**: The job has been accepted and is starting\n* **LIGHT_PROFILE_READY**: The light profile (basic metadata) is ready\n* **HEAVY_PROFILES_READY**: The heavy profile (including record sets) is ready\n* **CLEANED_UP**: Resources associated with the job have been cleaned up",
    "enum": [
        "submitting",
        "starting",
        "light_profile_ready",
        "heavy_profile_ready",
        "cleaned_up"
    ],
    "title": "JobStatus",
    "type": "string"
}

Response 422 Unprocessable Content

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/profile/{profile_job_id}¶

Get Profile

Description

Retrieve the generated profile for a completed profiling job.

This endpoint returns the profile data generated for a dataset, including both light and heavy profiles if available. The profile contains metadata about the dataset structure, content, and characteristics.

Parameters¶

profile_job_id (str): The unique identifier of the profiling job

Returns¶

ProfilesResponse: The generated profiles, containing:
moma_profile_light: Basic metadata about the dataset and its distributions
moma_profile_heavy: Detailed information about record sets and fields
cdd_profile: Profile used by the Cross-Dataset Discovery service

Raises¶

HTTPException: 404 Not Found if no profile exists for the given job ID or if profiling is still in progress

Example¶

{
  "moma_profile_light": {
    "@context": {...},
    "@type": "sc:Dataset",
    "name": "Mathematics Learning Assessment",
    "description": "...",
    "distribution": [...]
  },
  "moma_profile_heavy": {
    "@context": {...},
    "@type": "sc:Dataset",
    "recordSet": [...]
  },
  "cdd_profile": {}
}

Input parameters

Parameter	In	Type	Default	Nullable	Description
`profile_job_id`	path	string		No

Response 200 OK

application/json

{
    "cdd_profile": {},
    "moma_profile_heavy": {},
    "moma_profile_light": {}
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "Response model containing the generated profiles for a dataset.\n\n## Attributes\n* **moma_profile_light** (dict): Basic metadata about the dataset and its distributions\n* **moma_profile_heavy** (dict): Detailed information about record sets and fields\n* **cdd_profile** (dict): Profile used by the Cross-Dataset Discovery service\n    \n## Example\n```json\n{\n  \"moma_profile_light\": {\n    \"@context\": {\n      \"@language\": \"en\",\n      \"@vocab\": \"https://schema.org/\",\n      \"cr\": \"http://mlcommons.org/croissant/\"\n    },\n    \"@type\": \"sc:Dataset\",\n    \"name\": \"Mathematics Learning Assessment\",\n    \"description\": \"This dataset was extracted from the MathE platform...\",\n    \"distribution\": [\n      {\n        \"@type\": \"cr:FileObject\",\n        \"name\": \"mathe_assessment_dataset.csv\",\n        \"contentSize\": \"1057461 B\",\n        \"encodingFormat\": \"text/csv\"\n      }\n    ]\n  },\n  \"moma_profile_heavy\": {\n    \"@context\": {\n      \"@language\": \"en\",\n      \"@vocab\": \"https://schema.org/\",\n      \"cr\": \"http://mlcommons.org/croissant/\"\n    },\n    \"@type\": \"sc:Dataset\",\n    \"recordSet\": [\n      {\n        \"@type\": \"cr:RecordSet\",\n        \"name\": \"mathe_assessment_dataset\",\n        \"field\": [\n          {\n            \"@type\": \"cr:Field\",\n            \"name\": \"Student ID\",\n            \"dataType\": \"sc:Integer\"\n          }\n        ]\n      }\n    ]\n  },\n  \"cdd_profile\": {}\n}\n```",
    "properties": {
        "cdd_profile": {
            "additionalProperties": true,
            "title": "Cdd Profile",
            "type": "object"
        },
        "moma_profile_heavy": {
            "additionalProperties": true,
            "title": "Moma Profile Heavy",
            "type": "object"
        },
        "moma_profile_light": {
            "additionalProperties": true,
            "title": "Moma Profile Light",
            "type": "object"
        }
    },
    "required": [
        "moma_profile_light",
        "moma_profile_heavy",
        "cdd_profile"
    ],
    "title": "ProfilesResponse",
    "type": "object"
}

Response 422 Unprocessable Content

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

GET /profiler/runner_status/{profile_job_id}¶

Get Runner Status

Description

Check the status of the Ray task for a given profiling job.

This endpoint queries the Ray cluster to determine the current execution status of a profiling task. It provides information about whether the task is pending, in progress, completed, failed, or unknown.

Parameters¶

profile_job_id (str): The unique identifier of the profiling job

Returns¶

RunnerStatus: The current status of the Ray task, one of:
pending: The job is waiting to be processed
in_progress: The job is currently being processed
completed: The job has completed successfully
failed: The job has failed
unknown: The job ID is not recognized

Example¶

"in_progress"

Input parameters

Parameter	In	Type	Default	Nullable	Description
`profile_job_id`	path	string		No

Response 200 OK

application/json

"pending"

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "Enumeration of possible Ray task statuses.\n\n## Values\n* **PENDING**: The job is waiting to be processed\n* **IN_PROGRESS**: The job is currently being processed\n* **COMPLETED**: The job has completed successfully\n* **FAILED**: The job has failed\n* **UNKNOWN**: The job ID is not recognized",
    "enum": [
        "pending",
        "in_progress",
        "completed",
        "failed",
        "unknown"
    ],
    "title": "RunnerStatus",
    "type": "string"
}

Response 422 Unprocessable Content

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

POST /profiler/trigger_profile¶

Trigger Dataset Profiling

Description

Submit a new dataset profiling job.

This endpoint accepts dataset specifications and initiates a profiling job. The profiling process analyzes the dataset structure and content, generating metadata that describes its characteristics.

Parameters¶

profile_req (ProfilingRequest): The profiling request containing:
profile_specification: Metadata about the dataset to be profiled
only_light_profile: Flag to generate only basic metadata (default: False)

Returns¶

IngestionTriggerResponse: A response containing:
job_id: Unique identifier for tracking the profiling job
status: Confirmation message that the job was submitted

Example¶

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "Job submitted"
}

Request body

application/json

{
    "only_light_profile": true,
    "profile_specification": {
        "cite_as": null,
        "country": "string",
        "data_connectors": [
            null
        ],
        "database_name": null,
        "date_published": "string",
        "description": "string",
        "doi": null,
        "fields_of_science": [
            "string"
        ],
        "headline": "string",
        "id": "d5077463-3d43-43fd-a4db-d816dbc8777b",
        "keywords": [
            "string"
        ],
        "languages": [
            "string"
        ],
        "license": "string",
        "name": "string",
        "published_url": null,
        "uploaded_by": "string"
    }
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the request body

{
    "description": "Request model for triggering a dataset profiling job.\n\n## Attributes\n* **profile_specification** (ProfileSpecificationEndpoint): Metadata about the dataset to be profiled\n* **only_light_profile** (bool): Flag to generate only basic metadata (default: False)\n  * If True, only the light profile (basic metadata and distributions) is generated\n  * If False, both light and heavy profiles (including record sets) are generated\n        \n## Example\n```json\n{\n  \"profile_specification\": {\n    \"id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\",\n    \"name\": \"Mathematics Learning Assessment\",\n    \"description\": \"This dataset was extracted from the MathE platform...\",\n    \"headline\": \"Dataset for Assessing Mathematics Learning in Higher Education.\",\n    \"fields_of_science\": [\"MATHEMATICS\"],\n    \"languages\": [\"en\"],\n    \"keywords\": [\"math\", \"student\", \"higher education\"],\n    \"country\": \"PT\",\n    \"published_url\": \"https://dados.ipb.pt//dataset.xhtml?persistentId=doi:10.34620/dadosipb/PW3OWY\",\n    \"date_published\": \"24-05-2025\",\n    \"license\": \"CC0 1.0\",\n    \"uploaded_by\": \"ADMIN\",\n    \"data_connectors\": [\n      {\n        \"type\": \"RawDataPath\",\n        \"dataset_id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\"\n      }\n    ]\n  },\n  \"only_light_profile\": false\n}\n```",
    "properties": {
        "only_light_profile": {
            "default": false,
            "title": "Only Light Profile",
            "type": "boolean"
        },
        "profile_specification": {
            "$ref": "#/components/schemas/ProfileSpecificationEndpoint"
        }
    },
    "required": [
        "profile_specification"
    ],
    "title": "ProfilingRequest",
    "type": "object"
}

Response 200 OK

application/json

{
    "job_id": "string",
    "status": "string"
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "description": "Response model for a submitted profiling job.\n\n## Attributes\n* **job_id**: Unique identifier for tracking the profiling job\n* **status**: Confirmation message that the job was submitted\n    \n## Example\n```json\n{\n  \"job_id\": \"550e8400-e29b-41d4-a716-446655440000\",\n  \"status\": \"Job submitted\"\n}\n```",
    "properties": {
        "job_id": {
            "title": "Job Id",
            "type": "string"
        },
        "status": {
            "title": "Status",
            "type": "string"
        }
    },
    "required": [
        "job_id",
        "status"
    ],
    "title": "IngestionTriggerResponse",
    "type": "object"
}

Response 422 Unprocessable Content

application/json

{
    "detail": [
        {
            "loc": [
                null
            ],
            "msg": "string",
            "type": "string"
        }
    ]
}

⚠️ This example has been generated automatically from the schema and it is not accurate. Refer to the schema for more information.

Schema of the response body

{
    "properties": {
        "detail": {
            "items": {
                "$ref": "#/components/schemas/ValidationError"
            },
            "title": "Detail",
            "type": "array"
        }
    },
    "title": "HTTPValidationError",
    "type": "object"
}

Schemas¶

CleanUpRequest¶

Name	Type
`profile_job_id`	string

DatabaseConnection¶

Name	Type
`database_name`	string
`type`	string

HTTPValidationError¶

Name	Type
`detail`	Array<ValidationError>

IngestionTriggerResponse¶

Name	Type
`job_id`	string
`status`	string

JobStatus¶

Type: string

ProfileSpecificationEndpoint¶

Name	Type
`cite_as`
`country`	string
`data_connectors`	Array<>
`database_name`
`date_published`	string
`description`	string
`doi`
`fields_of_science`	Array<string>
`headline`	string
`id`	string(uuid)
`keywords`	Array<string>
`languages`	Array<string>
`license`	string
`name`	string
`published_url`
`uploaded_by`	string

ProfilesResponse¶

Name	Type
`cdd_profile`
`moma_profile_heavy`
`moma_profile_light`

ProfilingRequest¶

Name	Type
`only_light_profile`	boolean
`profile_specification`	ProfileSpecificationEndpoint

RawDataPath¶

Name	Type
`dataset_id`	string
`type`	string

RunnerStatus¶

Type: string

ValidationError¶

Name	Type
`loc`	Array<>
`msg`	string
`type`	string