FastAPI 0.0.1¶
Endpoints¶
GET /¶
Read Root
Response 200 OK
Schema of the response body
Health¶
GET /monitoring/health-check¶
Check Health
Description
Check the health status of the Dataset Profiler service and its dependencies.
This endpoint verifies the connectivity and operational status of critical service dependencies:
- Redis: Used for job status tracking and caching
- Ray: Used for distributed computing tasks
Returns¶
A status report containing health information for each dependency: * redis: Status and connection details * ray: Status and cluster information
Raises¶
- HTTPException: 503 Service Unavailable if any dependency is unhealthy
Example¶
{
"redis": {
"status": "healthy",
"message": "Connected to Redis at localhost:6379"
},
"ray": {
"status": "healthy",
"message": "Ray cluster reachable with 3 alive node(s)"
}
}
Response 200 OK
Schema of the response body
{
"additionalProperties": true,
"title": "Response Check Health Monitoring Health Check Get",
"type": "object"
}
Profiler¶
POST /profiler/clean_up¶
Clean Up Job
Description
Clean up resources associated with a completed profiling job.
This endpoint releases resources and temporary storage used during the profiling process. It should be called after retrieving and storing the profile data to free up system resources.
Parameters¶
- clean_up_req (CleanUpRequest): Request containing:
- profile_job_id: The unique identifier of the profiling job to clean up
Returns¶
- dict: A response indicating the success of the cleanup operation
Example¶
{
"detail": "SUCCESS"
}
Note¶
This endpoint is currently a placeholder and cleanup functionality is not yet implemented.
Request body
{
"profile_job_id": "string"
}
Schema of the request body
{
"description": "Request model for cleaning up resources associated with a profiling job.\n\n## Attributes\n* **profile_job_id** (str): The unique identifier of the profiling job to clean up\n \n## Example\n```json\n{\n \"profile_job_id\": \"550e8400-e29b-41d4-a716-446655440000\"\n}\n```",
"properties": {
"profile_job_id": {
"title": "Profile Job Id",
"type": "string"
}
},
"required": [
"profile_job_id"
],
"title": "CleanUpRequest",
"type": "object"
}
Response 200 OK
Schema of the response body
{
"additionalProperties": true,
"title": "Response Clean Up Job Profiler Clean Up Post",
"type": "object"
}
Response 422 Unprocessable Content
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"title": "Detail",
"type": "array"
}
},
"title": "HTTPValidationError",
"type": "object"
}
GET /profiler/job_status/{profile_job_id}¶
Get Job Status
Description
Check the detailed status of a profiling job.
This endpoint retrieves the current status of a profiling job from the job store. It provides more detailed information about the job's progress than the runner status.
Parameters¶
- profile_job_id (str): The unique identifier of the profiling job
Returns¶
- JobStatus: The current status of the profiling job, one of:
- SUBMITTING: The job is being submitted to the processing queue
- STARTING: The job has been accepted and is starting
- LIGHT_PROFILE_READY: The light profile (basic metadata) is ready
- HEAVY_PROFILES_READY: The heavy profile (including record sets) is ready
- FAILED: The job has failed
Example¶
"LIGHT_PROFILE_READY"
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
profile_job_id |
path | string | No |
Response 200 OK
"submitting"
Schema of the response body
{
"description": "Enumeration of possible profiling job statuses.\n\n## Values\n* **SUBMITTING**: The job is being submitted to the processing queue\n* **STARTING**: The job has been accepted and is starting\n* **LIGHT_PROFILE_READY**: The light profile (basic metadata) is ready\n* **HEAVY_PROFILES_READY**: The heavy profile (including record sets) is ready\n* **CLEANED_UP**: Resources associated with the job have been cleaned up",
"enum": [
"submitting",
"starting",
"light_profile_ready",
"heavy_profile_ready",
"cleaned_up"
],
"title": "JobStatus",
"type": "string"
}
Response 422 Unprocessable Content
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"title": "Detail",
"type": "array"
}
},
"title": "HTTPValidationError",
"type": "object"
}
GET /profiler/profile/{profile_job_id}¶
Get Profile
Description
Retrieve the generated profile for a completed profiling job.
This endpoint returns the profile data generated for a dataset, including both light and heavy profiles if available. The profile contains metadata about the dataset structure, content, and characteristics.
Parameters¶
- profile_job_id (str): The unique identifier of the profiling job
Returns¶
- ProfilesResponse: The generated profiles, containing:
- moma_profile_light: Basic metadata about the dataset and its distributions
- moma_profile_heavy: Detailed information about record sets and fields
- cdd_profile: Profile used by the Cross-Dataset Discovery service
Raises¶
- HTTPException: 404 Not Found if no profile exists for the given job ID or if profiling is still in progress
Example¶
{
"moma_profile_light": {
"@context": {...},
"@type": "sc:Dataset",
"name": "Mathematics Learning Assessment",
"description": "...",
"distribution": [...]
},
"moma_profile_heavy": {
"@context": {...},
"@type": "sc:Dataset",
"recordSet": [...]
},
"cdd_profile": {}
}
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
profile_job_id |
path | string | No |
Response 200 OK
{
"cdd_profile": {},
"moma_profile_heavy": {},
"moma_profile_light": {}
}
Schema of the response body
{
"description": "Response model containing the generated profiles for a dataset.\n\n## Attributes\n* **moma_profile_light** (dict): Basic metadata about the dataset and its distributions\n* **moma_profile_heavy** (dict): Detailed information about record sets and fields\n* **cdd_profile** (dict): Profile used by the Cross-Dataset Discovery service\n \n## Example\n```json\n{\n \"moma_profile_light\": {\n \"@context\": {\n \"@language\": \"en\",\n \"@vocab\": \"https://schema.org/\",\n \"cr\": \"http://mlcommons.org/croissant/\"\n },\n \"@type\": \"sc:Dataset\",\n \"name\": \"Mathematics Learning Assessment\",\n \"description\": \"This dataset was extracted from the MathE platform...\",\n \"distribution\": [\n {\n \"@type\": \"cr:FileObject\",\n \"name\": \"mathe_assessment_dataset.csv\",\n \"contentSize\": \"1057461 B\",\n \"encodingFormat\": \"text/csv\"\n }\n ]\n },\n \"moma_profile_heavy\": {\n \"@context\": {\n \"@language\": \"en\",\n \"@vocab\": \"https://schema.org/\",\n \"cr\": \"http://mlcommons.org/croissant/\"\n },\n \"@type\": \"sc:Dataset\",\n \"recordSet\": [\n {\n \"@type\": \"cr:RecordSet\",\n \"name\": \"mathe_assessment_dataset\",\n \"field\": [\n {\n \"@type\": \"cr:Field\",\n \"name\": \"Student ID\",\n \"dataType\": \"sc:Integer\"\n }\n ]\n }\n ]\n },\n \"cdd_profile\": {}\n}\n```",
"properties": {
"cdd_profile": {
"additionalProperties": true,
"title": "Cdd Profile",
"type": "object"
},
"moma_profile_heavy": {
"additionalProperties": true,
"title": "Moma Profile Heavy",
"type": "object"
},
"moma_profile_light": {
"additionalProperties": true,
"title": "Moma Profile Light",
"type": "object"
}
},
"required": [
"moma_profile_light",
"moma_profile_heavy",
"cdd_profile"
],
"title": "ProfilesResponse",
"type": "object"
}
Response 422 Unprocessable Content
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"title": "Detail",
"type": "array"
}
},
"title": "HTTPValidationError",
"type": "object"
}
GET /profiler/runner_status/{profile_job_id}¶
Get Runner Status
Description
Check the status of the Ray task for a given profiling job.
This endpoint queries the Ray cluster to determine the current execution status of a profiling task. It provides information about whether the task is pending, in progress, completed, failed, or unknown.
Parameters¶
- profile_job_id (str): The unique identifier of the profiling job
Returns¶
- RunnerStatus: The current status of the Ray task, one of:
- pending: The job is waiting to be processed
- in_progress: The job is currently being processed
- completed: The job has completed successfully
- failed: The job has failed
- unknown: The job ID is not recognized
Example¶
"in_progress"
Input parameters
| Parameter | In | Type | Default | Nullable | Description |
|---|---|---|---|---|---|
profile_job_id |
path | string | No |
Response 200 OK
"pending"
Schema of the response body
{
"description": "Enumeration of possible Ray task statuses.\n\n## Values\n* **PENDING**: The job is waiting to be processed\n* **IN_PROGRESS**: The job is currently being processed\n* **COMPLETED**: The job has completed successfully\n* **FAILED**: The job has failed\n* **UNKNOWN**: The job ID is not recognized",
"enum": [
"pending",
"in_progress",
"completed",
"failed",
"unknown"
],
"title": "RunnerStatus",
"type": "string"
}
Response 422 Unprocessable Content
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"title": "Detail",
"type": "array"
}
},
"title": "HTTPValidationError",
"type": "object"
}
POST /profiler/trigger_profile¶
Trigger Dataset Profiling
Description
Submit a new dataset profiling job.
This endpoint accepts dataset specifications and initiates a profiling job. The profiling process analyzes the dataset structure and content, generating metadata that describes its characteristics.
Parameters¶
- profile_req (ProfilingRequest): The profiling request containing:
- profile_specification: Metadata about the dataset to be profiled
- only_light_profile: Flag to generate only basic metadata (default: False)
Returns¶
- IngestionTriggerResponse: A response containing:
- job_id: Unique identifier for tracking the profiling job
- status: Confirmation message that the job was submitted
Example¶
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "Job submitted"
}
Request body
{
"only_light_profile": true,
"profile_specification": {
"cite_as": null,
"country": "string",
"data_connectors": [
null
],
"database_name": null,
"date_published": "string",
"description": "string",
"doi": null,
"fields_of_science": [
"string"
],
"headline": "string",
"id": "d5077463-3d43-43fd-a4db-d816dbc8777b",
"keywords": [
"string"
],
"languages": [
"string"
],
"license": "string",
"name": "string",
"published_url": null,
"uploaded_by": "string"
}
}
Schema of the request body
{
"description": "Request model for triggering a dataset profiling job.\n\n## Attributes\n* **profile_specification** (ProfileSpecificationEndpoint): Metadata about the dataset to be profiled\n* **only_light_profile** (bool): Flag to generate only basic metadata (default: False)\n * If True, only the light profile (basic metadata and distributions) is generated\n * If False, both light and heavy profiles (including record sets) are generated\n \n## Example\n```json\n{\n \"profile_specification\": {\n \"id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\",\n \"name\": \"Mathematics Learning Assessment\",\n \"description\": \"This dataset was extracted from the MathE platform...\",\n \"headline\": \"Dataset for Assessing Mathematics Learning in Higher Education.\",\n \"fields_of_science\": [\"MATHEMATICS\"],\n \"languages\": [\"en\"],\n \"keywords\": [\"math\", \"student\", \"higher education\"],\n \"country\": \"PT\",\n \"published_url\": \"https://dados.ipb.pt//dataset.xhtml?persistentId=doi:10.34620/dadosipb/PW3OWY\",\n \"date_published\": \"24-05-2025\",\n \"license\": \"CC0 1.0\",\n \"uploaded_by\": \"ADMIN\",\n \"data_connectors\": [\n {\n \"type\": \"RawDataPath\",\n \"dataset_id\": \"8930240b-a0e8-46e7-ace8-aab2b42fcc01\"\n }\n ]\n },\n \"only_light_profile\": false\n}\n```",
"properties": {
"only_light_profile": {
"default": false,
"title": "Only Light Profile",
"type": "boolean"
},
"profile_specification": {
"$ref": "#/components/schemas/ProfileSpecificationEndpoint"
}
},
"required": [
"profile_specification"
],
"title": "ProfilingRequest",
"type": "object"
}
Response 200 OK
{
"job_id": "string",
"status": "string"
}
Schema of the response body
{
"description": "Response model for a submitted profiling job.\n\n## Attributes\n* **job_id**: Unique identifier for tracking the profiling job\n* **status**: Confirmation message that the job was submitted\n \n## Example\n```json\n{\n \"job_id\": \"550e8400-e29b-41d4-a716-446655440000\",\n \"status\": \"Job submitted\"\n}\n```",
"properties": {
"job_id": {
"title": "Job Id",
"type": "string"
},
"status": {
"title": "Status",
"type": "string"
}
},
"required": [
"job_id",
"status"
],
"title": "IngestionTriggerResponse",
"type": "object"
}
Response 422 Unprocessable Content
{
"detail": [
{
"loc": [
null
],
"msg": "string",
"type": "string"
}
]
}
Schema of the response body
{
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"title": "Detail",
"type": "array"
}
},
"title": "HTTPValidationError",
"type": "object"
}
Schemas¶
CleanUpRequest¶
| Name | Type |
|---|---|
profile_job_id |
string |
DatabaseConnection¶
| Name | Type |
|---|---|
database_name |
string |
type |
string |
HTTPValidationError¶
| Name | Type |
|---|---|
detail |
Array<ValidationError> |
IngestionTriggerResponse¶
| Name | Type |
|---|---|
job_id |
string |
status |
string |
JobStatus¶
Type: string
ProfileSpecificationEndpoint¶
| Name | Type |
|---|---|
cite_as |
|
country |
string |
data_connectors |
Array<> |
database_name |
|
date_published |
string |
description |
string |
doi |
|
fields_of_science |
Array<string> |
headline |
string |
id |
string(uuid) |
keywords |
Array<string> |
languages |
Array<string> |
license |
string |
name |
string |
published_url |
|
uploaded_by |
string |
ProfilesResponse¶
| Name | Type |
|---|---|
cdd_profile |
|
moma_profile_heavy |
|
moma_profile_light |
ProfilingRequest¶
| Name | Type |
|---|---|
only_light_profile |
boolean |
profile_specification |
ProfileSpecificationEndpoint |
RawDataPath¶
| Name | Type |
|---|---|
dataset_id |
string |
type |
string |
RunnerStatus¶
Type: string
ValidationError¶
| Name | Type |
|---|---|
loc |
Array<> |
msg |
string |
type |
string |