ARPIA Data Layer Framework
Standard definitions, color coding, and governance rules for Raw, Clean, Gold, and Optimized data layers in Arpia.
🎯 Purpose
To establish a common language and a clear structure for classifying, governing, and consuming data in Arpia.
This terminology will be used across the organization, both for technical documentation and business communication, ensuring consistency and shared understanding among teams.
🔹 Data Layers in Arpia
1. Raw (Crude)
- Definition: Data as it arrives from sources (ERP, POS, IoT, external integrations, files, APIs).
- Characteristics:
- No transformation or cleaning applied.
- May contain duplicates, nulls, or format errors.
- Represents the exact “snapshot” of the source.
- Examples:
ventas_rawwith dates as strings and inconsistent fields.maestro_sociedades_rawexported directly from SAP.
- Color/Tag: Grey ⚪ with tag RAW.
2. Clean
- Definition: Data processed by workshops or cleaning/normalization objects, validated and standardized.
- Characteristics:
- Correct type casting.
- Null values handled, duplicates removed.
- Includes clean master data and normalized tables.
- May or may not be included in the Knowledge Grid, but must always be tagged and color-coded.
- Examples:
ventas_silverwith typed dates and normalized costs.dim_articulo_cleanwith unified hierarchies.
- Color/Tag: Blue 🔵 with tag CLEAN.
3. Gold (Refined)
- Definition: Data ready for business consumption and for inclusion in the Knowledge Grid.
- Characteristics:
- Aggregated or transformed into KPIs and business metrics.
- Ambiguity-free: defines the “single source of truth” for sales, budgets, utilities, etc.
- Serves as the foundation for both traditional dashboards and AI assistants.
- Examples:
agg_mes_tienda_categoriawith Net Sales, Budget, Compliance, and Profitability.agg_dia_regional_categoriafor operational analysis.
- Color/Tag: Gold 🟡 with tag GOLD.
4. Optimized
- Definition: Subset of Clean or Gold, tailored and optimized for a specific process, application, or consumption.
- Characteristics:
- Always documented with its associated Knowledge Node.
- May include indexes, special partitions, or wide structures to accelerate a use case.
- Maintains traceability back to its Clean/Gold origin.
- Examples:
agg_mes_forecast_ventasoptimized for AI predictive models.agg_tienda_dia_dashboardoptimized for fast executive dashboards.
- Color/Tag: Emerald Green 🟢 with tag OPTIMIZED.
🔹 General Rules
- All datasets must be tagged with their layer (RAW, CLEAN, GOLD, OPTIMIZED).
- Official colors must be used in documentation, diagrams, and dashboards to visually reinforce the layer.
- Responsibilities:
- Raw → Clean: Data Engineering.
- Clean → Gold: Shared responsibility with Data Analysts (business rules).
- Optimized: Process/application owner, documented in Knowledge Grid.
- Knowledge Grid:
- Only Gold and Optimized enter the Knowledge Grid.
- Each node must define metrics, hierarchies, joins, and usage policies.
- Governance:
- No dashboard/AI may consume Raw directly.
- Clean is for exploratory analysis and validation.
- Gold/Optimized are the only reference datasets in production.
🔹 Visual Flow Example
🔗 Related Documentation
-
Extract & Load in Arpia — Knowledge Base Overview
Learn how data is ingested from sources into the Arpia platform. -
Repository Tables
Explore how loaded data is organized and managed within internal repositories. -
Transform & Prepare Overview
Continue to the next stage of the ETL lifecycle — cleaning, enriching, and structuring data for business use.
Updated 18 days ago
