Repository Table

Repository Table Overview

The foundational data object in Arpia’s Reasoning Flows — connecting structured data storage, semantic search, and generative AI capabilities.


🧭 Purpose

The Repository Table object in Reasoning Flows allows you to load structured data directly from your organization’s repository into a workflow.
It acts as the primary data entry point within Arpia, making validated, reusable data instantly available across all components of a flow — from transformation and enrichment to training models and powering AI-driven queries.

Once registered, a Repository Table becomes a referenceable, version-aware data source that ensures every downstream object (datasets, models, analytics) uses a consistent and authoritative view of the data.


⚙️ Key Capabilities

  • Direct Data Access
    Load an existing table from your repository without writing SQL queries or scripts. Arpia handles schema registration and version mapping automatically.

  • Reusable Reference
    Once added, the table is available to all other objects that accept tabular input — such as Dataset, Feature Builder, Train Model, and Transform.

  • Centralized Updates
    Any structural or data changes made to the source table are automatically reflected across all dependent objects. This ensures reproducibility and version control within projects.

  • Workflow Consistency
    Repository Tables guarantee that every process — from ETL to ML — operates on the same “single source of truth.”


🧩 Typical Usage Flow

  1. Add a Repository Table
    Select a table from your registered data repository and register it using the Repository Table object.

  2. Connect to Downstream Objects
    Use the registered table as input for other workflow components:

    • Dataset object → prepare training and test splits
    • Train Model object → feed model training processes
    • Feature Builder object → apply feature transformations or enrichments
  3. Leverage Across the Flow
    Any object that consumes tabular data can reference a Repository Table, removing the need to redefine logic or queries repeatedly.


🧠 AI Integration Capabilities

Arpia’s Repository Tables are deeply integrated with the Generative AI and Semantic Search layers of Reasoning Flows.

🔹 Generative LLM Integration

Repository Tables can serve as knowledge sources for Generative LLMs within Reasoning Flows.
When paired with Arpia’s Knowledge Atlas, these tables provide contextual grounding for:

  • Text-to-query generation
  • Conversational data retrieval
  • Natural language explanations of structured data

Example Use Case:
An LLM can reference a sales_gold Repository Table to answer questions like:

“What were the top-performing categories by region in Q3?”

The model generates an SQL query automatically, executes it against the Repository Table, and returns summarized insights.


🔹 Semantic Search of Repository Tables

Each Repository Table is indexed using vector embeddings, enabling semantic search and contextual discovery of both table metadata and content.
This allows users to:

  • Search for tables semantically (e.g., “monthly performance data”)
  • Query fields or metrics by meaning, not just by name
  • Integrate natural-language reasoning into ETL and ML workflows

Example:
A search for “profitability by category” would retrieve Repository Tables containing relevant business metrics even if the columns are labeled differently (margin_rate, net_profit).


🧭 Best Practices

  • Use Repository Tables as the starting point for any data-driven flow.
  • Always prefer referencing registered tables over executing raw queries within downstream objects.
  • Tag Repository Tables with their data layer classification (RAW, CLEAN, GOLD, or OPTIMIZED) according to the Arpia Data Layer Framework.
  • Keep metadata and descriptions updated — this improves discoverability in Semantic Search and interpretability for Generative AI tools.

🔗 Example Workflow

  • Repository TableFeature BuilderDatasetTrain Model
  • Repository TableTransformDatasetAutoML Engine
  • Repository TableSemantic QueryGenerative AI Workflow

🔄 Integration with Data Layers

Repository Tables can be registered from any Arpia data layer:

  • RAW: For exploration and validation.
  • CLEAN: For standardized, ready-to-transform data.
  • GOLD: For analytics-ready datasets, serving as single sources of truth.
  • OPTIMIZED: For AI and application-level consumption (indexed for search and semantic reasoning).

Tip: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Atlas for reasoning and LLM-powered exploration.


🧩 How Repository Tables Fit in Reasoning Flows

In the Reasoning Flows architecture:

  1. Extract & Load populates the data repository.
  2. Repository Tables register these datasets for structured access.
  3. Transform & Prepare refines them for analytics and AI.
  4. AI & ML Objects (including Generative and Semantic tools) consume them for intelligent insights.

🧭 Related Documentation