Repository Table Overview

The foundational data object in ARPIA's Reasoning Flows — connecting structured data storage, semantic search, and generative AI capabilities.

Purpose

The Repository Table object in Reasoning Flows allows you to load structured data directly from your organization's repository into a workflow. It acts as the primary data entry point within ARPIA, making validated, reusable data instantly available across all components of a flow — from transformation and enrichment to training models and powering AI-driven queries.

Once registered, a Repository Table becomes a referenceable, version-aware data source that ensures every downstream object (datasets, models, analytics) uses a consistent and authoritative view of the data.

How Repository Tables Fit in Reasoning Flows

In the Reasoning Flows architecture:

Extract & Load populates the data repository from external sources.
Repository Tables register these datasets for structured access within workflows.
Transform & Prepare refines them for analytics and AI.
AI & ML Objects (including Generative and Semantic tools) consume them for intelligent insights.

Repository Tables bridge the gap between raw data storage and active workflow consumption.

Key Capabilities

Direct Data Access
Load an existing table from your repository without writing SQL queries or scripts. ARPIA handles schema registration and version mapping automatically.

Reusable Reference
Once added, the table is available to all other objects that accept tabular input — such as Dataset, Feature Builder, Train Model, and Transform.

Centralized Updates
Any structural or data changes made to the source table are automatically reflected across all dependent objects. This ensures reproducibility and version control within projects.

Workflow Consistency
Repository Tables guarantee that every process — from ETL to ML — operates on the same "single source of truth."

Typical Usage Flow

Step 1: Add a Repository Table

Select a table from your registered data repository and register it using the Repository Table object.

Step 2: Connect to Downstream Objects

Use the registered table as input for other workflow components:

Object	Purpose
`Dataset`	Prepare training and test splits
`Train Model`	Feed model training processes
`Feature Builder`	Apply feature transformations or enrichments
`Transform`	Clean, filter, or restructure data

Step 3: Leverage Across the Flow

Any object that consumes tabular data can reference a Repository Table, removing the need to redefine logic or queries repeatedly.

AI Integration Capabilities

Repository Tables in ARPIA are deeply integrated with the Generative AI and Semantic Search layers of Reasoning Flows.

Generative LLM Integration

Repository Tables can serve as knowledge sources for Generative LLMs within Reasoning Flows. When paired with ARPIA's Reasoning Atlas, these tables provide contextual grounding for:

Text-to-query generation
Conversational data retrieval
Natural language explanations of structured data

Example Use Case:

An LLM can reference a sales_gold Repository Table to answer questions like:

"What were the top-performing categories by region in Q3?"

The model generates an SQL query automatically, executes it against the Repository Table, and returns summarized insights.

Semantic Search of Repository Tables

Each Repository Table is indexed using vector embeddings (numerical representations that capture meaning), enabling semantic search and contextual discovery of both table metadata and content.

This allows users to:

Search for tables semantically (e.g., "monthly performance data")
Query fields or metrics by meaning, not just by name
Integrate natural-language reasoning into ETL and ML workflows

Example:

A search for "profitability by category" would retrieve Repository Tables containing relevant business metrics even if the columns are labeled differently (margin_rate, net_profit).

Integration with Data Layers

Repository Tables can be registered from any ARPIA data layer:

Layer	Use Case
RAW	Exploration and validation
CLEAN	Standardized, ready-to-transform data
GOLD	Analytics-ready datasets, single sources of truth
OPTIMIZED	AI and application-level consumption, indexed for search and semantic reasoning

Important: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Catalog for reasoning and LLM-powered exploration.

Example Workflows

Repository Tables serve as the starting point for various workflow patterns:

Workflow Type	Object Chain
ML Training	`Repository Table` → `Feature Builder` → `Dataset` → `Train Model`
AutoML	`Repository Table` → `Transform` → `Dataset` → `AutoML Engine`
Generative AI	`Repository Table` → `Semantic Query` → `Generative AI Workflow`

Best Practices

Use Repository Tables as the starting point for any data-driven flow rather than embedding raw queries in downstream objects.

Apply data layer tags to every Repository Table (RAW, CLEAN, GOLD, or OPTIMIZED) according to the ARPIA Data Layer Framework.

Keep metadata and descriptions updated — this improves discoverability in Semantic Search and interpretability for Generative AI tools.

Prefer GOLD or OPTIMIZED tables for production workflows to ensure data quality and governance compliance.