Skip to content

Workflows

Production Workflows

In production, the system components interact through three core operational workflows: full batch rebuild, incremental dataset update, and recommendation serving.

1. Full Batch Rebuild Workflow

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[Scheduler / Manual Trigger] --> B[Fetch Full Catalog]
    B --> C[Preprocess Metadata]
    C --> X[Optional LLM Enrichment]
    X --> D[Generate Embeddings]
    D --> F[Update Vector DB]
    F --> G[Compute Top-k Recos + Scores]
    G --> H[Write Recos to Redis]

    classDef trigger fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef prep fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef compute fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;

    class A trigger;
    class B,C,X prep;
    class D,G compute;
    class F,H storage;

This workflow is used when the full catalog must be recomputed, for example after an embedding-model change, a major metadata update, or an initial deployment.

Depending on configuration, the workflow may optionally enrich the dataset representation using an LLM before generating embeddings.

2. Incremental Dataset Update Workflow

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[New / Updated Dataset] --> B[Fetch Dataset Metadata]
    B --> C[Preprocess Metadata]
    C --> X[Optional LLM Enrichment]
    X --> D[Generate / Update Embedding]
    D --> F[Update Vector DB]
    F --> G[Compute Top-k Recos + Scores for Changed Dataset]
    G --> H[Write Recos to Redis]

    classDef trigger fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef prep fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef compute fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;

    class A trigger;
    class B,C,X prep;
    class D,G compute;
    class F,H storage;

This workflow updates only the changed dataset: it updates its embedding, computes its recommendation list, and writes the new top-k output to Redis.

As in the full batch workflow, an optional LLM-based enrichment step may be applied before embedding generation, depending on system configuration.

Optionally, the system can also perform a selective neighbor refresh:

  • retrieve the top-M nearest existing datasets to the changed dataset
  • recompute their recommendation lists
  • update their corresponding Redis entries

This allows the new or updated dataset to appear in relevant existing recommendation lists without requiring a full catalog rebuild. A broader rebuild can still be done later, if needed.

3. Recommendation Serving with Access Control

%%{init: {'flowchart': {'nodeSpacing': 22, 'rankSpacing': 22, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px', 'lineColor': '#9E9E9E', 'edgeLabelBackground':'#ffffff', 'primaryBorderColor':'#BDBDBD', 'clusterBorder':'#E0E0E0', 'lineWidth':'0\.9px'}}}%%
flowchart TB
    A[Client / Platform Request] --> B[Authenticate User]
    B --> C[Read Recos from Redis]
    C --> D[Authorization Filter]
    D --> E[Return Visible Ranked Dataset List]

    classDef request fill:#E8F5E9,stroke:#43A047,stroke-width:2px,color:#1B5E20;
    classDef access fill:#E3F2FD,stroke:#1E88E5,stroke-width:2px,color:#0D47A1;
    classDef storage fill:#F3E5F5,stroke:#8E24AA,stroke-width:2px,color:#4A148C;
    classDef result fill:#FFF3E0,stroke:#FB8C00,stroke-width:2px,color:#E65100;

    class A request;
    class B,D access;
    class C storage;
    class E result;

This is the request-time path. The recommendation computations are performed ahead of time, while the API authenticates the requester and filters out datasets the user is not authorized to access.