Repository Table

Repository Table Overview

The Repository Table object in the Data Workshop allows you to load data from a repository table into your workflow. This object serves as the main entry point for structured data and makes it available for use across other components in the pipeline.

Once registered, the Repository Table acts as a referenceable data source. It can be used to build datasets, train models, enrich features, or support any downstream ML process within the same workflow.


Key Capabilities

  • Direct data access: Load an existing table from your source repository without writing custom queries or scripts.
  • Reusable reference: Once added, the table is available to all other objects that accept tabular input.
  • Centralized updates: Any change to the source (schema or content) is reflected automatically across all dependent objects.
  • Workflow consistency: Ensures that every component in the ML pipeline uses the same source of truth.

Typical Usage Flow

  1. Add Repository Table
    Select a table from the repository and register it via the Repository Table object.

  2. Connect it to downstream objects
    Use the registered table as input for other components, such as:

    • Dataset object: to prepare training and test sets
    • Train Model object: to feed training data
    • Feature Builder object: to apply transformations or enrichments
  3. Leverage it across the workflow
    Any step that requires structured data can now reference the Repository Table without redefining its schema or logic.

Example Workflow

  • Add a Repository Table → feed it into a Dataset → use that to train a model.
  • Or: Repository Table → Feature Builder → Dataset → Train Model.

Best Practices

  • Use Repository Tables as the starting point for any data-driven object.
  • Avoid using raw queries elsewhere in the workflow if the table is already registered as a Repository Table.