Repository Table
Repository Table Overview
The foundational data object in ARPIA's Reasoning Flows — connecting structured data storage, semantic search, and generative AI capabilities.
Purpose
The Repository Table object in Reasoning Flows allows you to load structured data directly from your organization's repository into a workflow. It acts as the primary data entry point within ARPIA, making validated, reusable data instantly available across all components of a flow — from transformation and enrichment to training models and powering AI-driven queries.
Once registered, a Repository Table becomes a referenceable, version-aware data source that ensures every downstream object (datasets, models, analytics) uses a consistent and authoritative view of the data.
How Repository Tables Fit in Reasoning Flows
In the Reasoning Flows architecture:
- Extract & Load populates the data repository from external sources.
- Repository Tables register these datasets for structured access within workflows.
- Transform & Prepare refines them for analytics and AI.
- AI & ML Objects (including Generative and Semantic tools) consume them for intelligent insights.
Repository Tables bridge the gap between raw data storage and active workflow consumption.
Key Capabilities
Direct Data Access
Load an existing table from your repository without writing SQL queries or scripts. ARPIA handles schema registration and version mapping automatically.
Reusable Reference
Once added, the table is available to all other objects that accept tabular input — such as Dataset, Feature Builder, Train Model, and Transform.
Centralized Updates
Any structural or data changes made to the source table are automatically reflected across all dependent objects. This ensures reproducibility and version control within projects.
Workflow Consistency
Repository Tables guarantee that every process — from ETL to ML — operates on the same "single source of truth."
Typical Usage Flow
Step 1: Add a Repository Table
Select a table from your registered data repository and register it using the Repository Table object.
Step 2: Connect to Downstream Objects
Use the registered table as input for other workflow components:
| Object | Purpose |
|---|---|
Dataset | Prepare training and test splits |
Train Model | Feed model training processes |
Feature Builder | Apply feature transformations or enrichments |
Transform | Clean, filter, or restructure data |
Step 3: Leverage Across the Flow
Any object that consumes tabular data can reference a Repository Table, removing the need to redefine logic or queries repeatedly.
AI Integration Capabilities
Repository Tables in ARPIA are deeply integrated with the Generative AI and Semantic Search layers of Reasoning Flows.
Generative LLM Integration
Repository Tables can serve as knowledge sources for Generative LLMs within Reasoning Flows. When paired with ARPIA's Reasoning Atlas, these tables provide contextual grounding for:
- Text-to-query generation
- Conversational data retrieval
- Natural language explanations of structured data
Example Use Case:
An LLM can reference a sales_gold Repository Table to answer questions like:
"What were the top-performing categories by region in Q3?"
The model generates an SQL query automatically, executes it against the Repository Table, and returns summarized insights.
Semantic Search of Repository Tables
Each Repository Table is indexed using vector embeddings (numerical representations that capture meaning), enabling semantic search and contextual discovery of both table metadata and content.
This allows users to:
- Search for tables semantically (e.g., "monthly performance data")
- Query fields or metrics by meaning, not just by name
- Integrate natural-language reasoning into ETL and ML workflows
Example:
A search for "profitability by category" would retrieve Repository Tables containing relevant business metrics even if the columns are labeled differently (margin_rate, net_profit).
Integration with Data Layers
Repository Tables can be registered from any ARPIA data layer:
| Layer | Use Case |
|---|---|
| RAW | Exploration and validation |
| CLEAN | Standardized, ready-to-transform data |
| GOLD | Analytics-ready datasets, single sources of truth |
| OPTIMIZED | AI and application-level consumption, indexed for search and semantic reasoning |
Important: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Catalog for reasoning and LLM-powered exploration.
Example Workflows
Repository Tables serve as the starting point for various workflow patterns:
| Workflow Type | Object Chain |
|---|---|
| ML Training | Repository Table → Feature Builder → Dataset → Train Model |
| AutoML | Repository Table → Transform → Dataset → AutoML Engine |
| Generative AI | Repository Table → Semantic Query → Generative AI Workflow |
Best Practices
Use Repository Tables as the starting point for any data-driven flow rather than embedding raw queries in downstream objects.
Apply data layer tags to every Repository Table (RAW, CLEAN, GOLD, or OPTIMIZED) according to the ARPIA Data Layer Framework.
Keep metadata and descriptions updated — this improves discoverability in Semantic Search and interpretability for Generative AI tools.
Prefer GOLD or OPTIMIZED tables for production workflows to ensure data quality and governance compliance.
Related Documentation
-
Extract & Load
Understand how data is ingested into ARPIA's repository for further processing. -
ARPIA Data Layer Framework
Learn the standard definitions and governance rules behind RAW, CLEAN, GOLD, and OPTIMIZED layers. -
Transform & Prepare
Continue to the next stage to refine, standardize, and enrich Repository Tables for analysis and modeling. -
Reasoning Atlas Overview
Explore how Repository Tables integrate with ARPIA's semantic reasoning and LLM frameworks for generative AI.
Updated about 19 hours ago
