Repository Table

Repository Table Overview

The foundational data object in ARPIA's Reasoning Flows — connecting structured data storage, semantic search, and generative AI capabilities.


Purpose

The Repository Table object in Reasoning Flows allows you to load structured data directly from your organization's repository into a workflow. It acts as the primary data entry point within ARPIA, making validated, reusable data instantly available across all components of a flow — from transformation and enrichment to training models and powering AI-driven queries.

Once registered, a Repository Table becomes a referenceable, version-aware data source that ensures every downstream object (datasets, models, analytics) uses a consistent and authoritative view of the data.

Repository Table interface in Reasoning Flows

How Repository Tables Fit in Reasoning Flows

In the Reasoning Flows architecture:

  1. Extract & Load populates the data repository from external sources.
  2. Repository Tables register these datasets for structured access within workflows.
  3. Transform & Prepare refines them for analytics and AI.
  4. AI & ML Objects (including Generative and Semantic tools) consume them for intelligent insights.

Repository Tables bridge the gap between raw data storage and active workflow consumption.


Key Capabilities

Direct Data Access
Load an existing table from your repository without writing SQL queries or scripts. ARPIA handles schema registration and version mapping automatically.

Reusable Reference
Once added, the table is available to all other objects that accept tabular input — such as Dataset, Feature Builder, Train Model, and Transform.

Centralized Updates
Any structural or data changes made to the source table are automatically reflected across all dependent objects. This ensures reproducibility and version control within projects.

Workflow Consistency
Repository Tables guarantee that every process — from ETL to ML — operates on the same "single source of truth."


Typical Usage Flow

Step 1: Add a Repository Table

Select a table from your registered data repository and register it using the Repository Table object.

Step 2: Connect to Downstream Objects

Use the registered table as input for other workflow components:

ObjectPurpose
DatasetPrepare training and test splits
Train ModelFeed model training processes
Feature BuilderApply feature transformations or enrichments
TransformClean, filter, or restructure data

Step 3: Leverage Across the Flow

Any object that consumes tabular data can reference a Repository Table, removing the need to redefine logic or queries repeatedly.


AI Integration Capabilities

Repository Tables in ARPIA are deeply integrated with the Generative AI and Semantic Search layers of Reasoning Flows.

Generative LLM Integration

Repository Tables can serve as knowledge sources for Generative LLMs within Reasoning Flows. When paired with ARPIA's Reasoning Atlas, these tables provide contextual grounding for:

  • Text-to-query generation
  • Conversational data retrieval
  • Natural language explanations of structured data

Example Use Case:

An LLM can reference a sales_gold Repository Table to answer questions like:

"What were the top-performing categories by region in Q3?"

The model generates an SQL query automatically, executes it against the Repository Table, and returns summarized insights.

Semantic Search of Repository Tables

Each Repository Table is indexed using vector embeddings (numerical representations that capture meaning), enabling semantic search and contextual discovery of both table metadata and content.

This allows users to:

  • Search for tables semantically (e.g., "monthly performance data")
  • Query fields or metrics by meaning, not just by name
  • Integrate natural-language reasoning into ETL and ML workflows

Example:

A search for "profitability by category" would retrieve Repository Tables containing relevant business metrics even if the columns are labeled differently (margin_rate, net_profit).


Integration with Data Layers

Repository Tables can be registered from any ARPIA data layer:

LayerUse Case
RAWExploration and validation
CLEANStandardized, ready-to-transform data
GOLDAnalytics-ready datasets, single sources of truth
OPTIMIZEDAI and application-level consumption, indexed for search and semantic reasoning

Important: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Catalog for reasoning and LLM-powered exploration.


Example Workflows

Repository Tables serve as the starting point for various workflow patterns:

Workflow TypeObject Chain
ML TrainingRepository TableFeature BuilderDatasetTrain Model
AutoMLRepository TableTransformDatasetAutoML Engine
Generative AIRepository TableSemantic QueryGenerative AI Workflow

Best Practices

Use Repository Tables as the starting point for any data-driven flow rather than embedding raw queries in downstream objects.

Apply data layer tags to every Repository Table (RAW, CLEAN, GOLD, or OPTIMIZED) according to the ARPIA Data Layer Framework.

Keep metadata and descriptions updated — this improves discoverability in Semantic Search and interpretability for Generative AI tools.

Prefer GOLD or OPTIMIZED tables for production workflows to ensure data quality and governance compliance.


Related Documentation

  • Extract & Load
    Understand how data is ingested into ARPIA's repository for further processing.

  • ARPIA Data Layer Framework
    Learn the standard definitions and governance rules behind RAW, CLEAN, GOLD, and OPTIMIZED layers.

  • Transform & Prepare
    Continue to the next stage to refine, standardize, and enrich Repository Tables for analysis and modeling.

  • Reasoning Atlas Overview
    Explore how Repository Tables integrate with ARPIA's semantic reasoning and LLM frameworks for generative AI.