Workflows

Production Workflows¶

In production, the system components interact through three core operational workflows: full batch rebuild, incremental dataset update, and recommendation serving.

1. Full Batch Rebuild Workflow¶

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[Scheduler / Manual Trigger] --> B[Fetch Full Catalog]
    B --> C[Preprocess Metadata]
    C --> X[Optional LLM Enrichment]
    X --> D[Generate Embeddings]
    D --> F[Update Vector DB]
    F --> G[Compute Top-k Recos + Scores]
    G --> H[Write Recos to Redis]

    classDef trigger fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef prep fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef compute fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;

    class A trigger;
    class B,C,X prep;
    class D,G compute;
    class F,H storage;

This workflow is used when the full catalog must be recomputed, for example after an embedding-model change, a major metadata update, or an initial deployment.

Depending on configuration, the workflow may optionally enrich the dataset representation using an LLM before generating embeddings.

2. Incremental Dataset Update Workflow¶

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[New / Updated Dataset] --> B[Fetch Dataset Metadata]
    B --> C[Preprocess Metadata]
    C --> X[Optional LLM Enrichment]
    X --> D[Generate / Update Embedding]
    D --> F[Update Vector DB]
    F --> G[Compute Top-k Recos + Scores for Changed Dataset]
    G --> H[Write Recos to Redis]

    classDef trigger fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef prep fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef compute fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;

    class A trigger;
    class B,C,X prep;
    class D,G compute;
    class F,H storage;

This workflow updates only the changed dataset: it updates its embedding, computes its recommendation list, and writes the new top-k output to Redis.

As in the full batch workflow, an optional LLM-based enrichment step may be applied before embedding generation, depending on system configuration.

Optionally, the system can also perform a selective neighbor refresh:

retrieve the top-M nearest existing datasets to the changed dataset
recompute their recommendation lists
update their corresponding Redis entries

This allows the new or updated dataset to appear in relevant existing recommendation lists without requiring a full catalog rebuild. A broader rebuild can still be done later, if needed.

3. Recommendation Serving with Access Control¶

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[Client / Platform Request] --> B[Authenticate User]
    B --> C[Read Recos from Redis]
    C --> D[Authorization Filter]
    D --> E[Return Visible Ranked Dataset List]

    classDef request fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef access fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;
    classDef result fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;

    class A request;
    class B,D access;
    class C storage;
    class E result;

This is the request-time path. The recommendation computations are performed ahead of time, while the API authenticates the requester and filters out datasets the user is not authorized to access.