Repository Table
Repository Table Overview
The foundational data object in Arpia’s Reasoning Flows — connecting structured data storage, semantic search, and generative AI capabilities.
🧭 Purpose
The Repository Table object in Reasoning Flows allows you to load structured data directly from your organization’s repository into a workflow.
It acts as the primary data entry point within Arpia, making validated, reusable data instantly available across all components of a flow — from transformation and enrichment to training models and powering AI-driven queries.
Once registered, a Repository Table becomes a referenceable, version-aware data source that ensures every downstream object (datasets, models, analytics) uses a consistent and authoritative view of the data.
⚙️ Key Capabilities
-
Direct Data Access
Load an existing table from your repository without writing SQL queries or scripts. Arpia handles schema registration and version mapping automatically. -
Reusable Reference
Once added, the table is available to all other objects that accept tabular input — such asDataset,Feature Builder,Train Model, andTransform. -
Centralized Updates
Any structural or data changes made to the source table are automatically reflected across all dependent objects. This ensures reproducibility and version control within projects. -
Workflow Consistency
Repository Tables guarantee that every process — from ETL to ML — operates on the same “single source of truth.”
🧩 Typical Usage Flow
-
Add a Repository Table
Select a table from your registered data repository and register it using theRepository Tableobject. -
Connect to Downstream Objects
Use the registered table as input for other workflow components:Datasetobject → prepare training and test splitsTrain Modelobject → feed model training processesFeature Builderobject → apply feature transformations or enrichments
-
Leverage Across the Flow
Any object that consumes tabular data can reference a Repository Table, removing the need to redefine logic or queries repeatedly.
🧠 AI Integration Capabilities
Arpia’s Repository Tables are deeply integrated with the Generative AI and Semantic Search layers of Reasoning Flows.
🔹 Generative LLM Integration
Repository Tables can serve as knowledge sources for Generative LLMs within Reasoning Flows.
When paired with Arpia’s Knowledge Atlas, these tables provide contextual grounding for:
- Text-to-query generation
- Conversational data retrieval
- Natural language explanations of structured data
Example Use Case:
An LLM can reference a sales_gold Repository Table to answer questions like:
“What were the top-performing categories by region in Q3?”
The model generates an SQL query automatically, executes it against the Repository Table, and returns summarized insights.
🔹 Semantic Search of Repository Tables
Each Repository Table is indexed using vector embeddings, enabling semantic search and contextual discovery of both table metadata and content.
This allows users to:
- Search for tables semantically (e.g., “monthly performance data”)
- Query fields or metrics by meaning, not just by name
- Integrate natural-language reasoning into ETL and ML workflows
Example:
A search for “profitability by category” would retrieve Repository Tables containing relevant business metrics even if the columns are labeled differently (margin_rate, net_profit).
🧭 Best Practices
- Use Repository Tables as the starting point for any data-driven flow.
- Always prefer referencing registered tables over executing raw queries within downstream objects.
- Tag Repository Tables with their data layer classification (RAW, CLEAN, GOLD, or OPTIMIZED) according to the Arpia Data Layer Framework.
- Keep metadata and descriptions updated — this improves discoverability in Semantic Search and interpretability for Generative AI tools.
🔗 Example Workflow
- Repository Table →
Feature Builder→Dataset→Train Model - Repository Table →
Transform→Dataset→AutoML Engine - Repository Table →
Semantic Query→Generative AI Workflow
🔄 Integration with Data Layers
Repository Tables can be registered from any Arpia data layer:
- RAW: For exploration and validation.
- CLEAN: For standardized, ready-to-transform data.
- GOLD: For analytics-ready datasets, serving as single sources of truth.
- OPTIMIZED: For AI and application-level consumption (indexed for search and semantic reasoning).
Tip: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Atlas for reasoning and LLM-powered exploration.
🧩 How Repository Tables Fit in Reasoning Flows
In the Reasoning Flows architecture:
- Extract & Load populates the data repository.
- Repository Tables register these datasets for structured access.
- Transform & Prepare refines them for analytics and AI.
- AI & ML Objects (including Generative and Semantic tools) consume them for intelligent insights.
🧭 Related Documentation
-
Extract & Load in Arpia — Knowledge Base Overview
Understand how data is ingested into Arpia’s repository for further processing. -
Arpia Data Layer Framework
Learn the standard definitions and governance rules behind RAW, CLEAN, GOLD, and OPTIMIZED layers. -
Transform & Prepare Overview
Continue to the next stage to refine, standardize, and enrich Repository Tables for analysis and modeling. -
Knowledge Atlas Overview
Explore how Repository Tables integrate with Arpia’s semantic reasoning and LLM frameworks for generative AI.
Updated 18 days ago
