ARPIA Data Layer Framework

Standard definitions, color coding, and governance rules for Raw, Clean, Gold, and Optimized data layers in Arpia.

Purpose

This framework establishes a common language and clear structure for classifying, governing, and consuming data in ARPIA. This terminology is used across the organization for both technical documentation and business communication, ensuring consistency and shared understanding among teams.


Data Layers Overview

ARPIA uses a four-layer data architecture to organize data by quality, transformation state, and consumption readiness. Each layer has a defined purpose, color code, and governance rules.

LayerTagColorPurpose
RawRAWGrey ⚪Source data as-is, no transformation
CleanCLEANBlue 🔵Validated and standardized data
GoldGOLDGold 🟡Business-ready, single source of truth
OptimizedOPTIMIZEDGreen 🟢Tailored for specific use cases

Layer Definitions

1. Raw

Definition: Data as it arrives from sources (ERP, POS, IoT, external integrations, files, APIs).

Characteristics:

  • No transformation or cleaning applied.
  • May contain duplicates, nulls, or format errors.
  • Represents the exact "snapshot" of the source.

Examples:

  • ventas_raw — sales data with dates as strings and inconsistent fields.
  • maestro_sociedades_raw — master data exported directly from SAP.

Color/Tag: Grey ⚪ with tag RAW


2. Clean

Definition: Data processed by transformation objects (such as AP Prepared Table or AP SQL Code Execution), validated and standardized.

Note: This layer is equivalent to "Silver" in medallion architecture terminology. You may encounter _silver suffixes in legacy tables — these follow the same standards as _clean tables.

Characteristics:

  • Correct data type casting applied.
  • Null values handled, duplicates removed.
  • Includes clean master data and normalized tables.
  • May or may not be included in the Knowledge Grid, but must always be tagged and color-coded.

Examples:

  • ventas_clean — sales data with typed dates and normalized costs.
  • dim_articulo_clean — article dimension with unified hierarchies.

Color/Tag: Blue 🔵 with tag CLEAN


3. Gold

Definition: Data ready for business consumption and inclusion in the Knowledge Grid.

Characteristics:

  • Aggregated or transformed into KPIs and business metrics.
  • Ambiguity-free: defines the "single source of truth" for sales, budgets, margins, etc.
  • Serves as the foundation for both traditional dashboards and AI assistants.

Examples:

  • agg_ventas_mes_tienda_gold — monthly sales by store with Net Sales, Budget, Compliance, and Profitability.
  • agg_ventas_dia_regional_gold — daily regional sales for operational analysis.

Color/Tag: Gold 🟡 with tag GOLD


4. Optimized

Definition: Subset of Clean or Gold data, tailored and optimized for a specific process, application, or consumption pattern.

Characteristics:

  • Always documented with its associated Knowledge Node.
  • May include indexes, special partitions, or wide structures to accelerate a use case.
  • Maintains traceability back to its Clean or Gold origin.

Examples:

  • agg_forecast_ventas_optimized — optimized for AI predictive models.
  • agg_tienda_dia_dashboard_optimized — optimized for fast executive dashboards.

Color/Tag: Emerald Green 🟢 with tag OPTIMIZED


Table Naming Conventions

To ensure tables are easily identifiable by layer, follow these naming patterns:

LayerSuffixPrefix PatternExample
Raw_rawSource nameventas_raw, clientes_raw
Clean_cleandim_ for dimensions, source name for factsdim_articulo_clean, ventas_clean
Gold_goldagg_ for aggregates, fact_ for factsagg_ventas_mes_gold, fact_orders_gold
Optimized_optimizedBased on use caseagg_forecast_ventas_optimized

Prefix Definitions:

  • dim_ — Dimension tables (master data, reference data)
  • fact_ — Fact tables (transactional data)
  • agg_ — Aggregated tables (pre-calculated summaries)

How Layers Connect

ARPIA Data Layer Flow — Raw to Clean to Gold to Optimized
TransitionResponsible TeamARPIA Objects Used
Source → RawData EngineeringAP DataPipe Engine, Python 3.12 DataPipe
Raw → CleanData EngineeringAP Prepared Table, AP SQL Code Execution, AP Transform objects
Clean → GoldData Engineering + Data AnalystsAP SQL Code Execution, AP Prepared Table
Gold → OptimizedProcess/Application OwnerCustom objects based on use case

Applying Tags and Colors

Tags

Tags are applied as metadata in the Repository Table settings:

  1. Open the table in Repository Tables.
  2. Navigate to Settings or Properties.
  3. In the Layer Tag field, select the appropriate value: RAW, CLEAN, GOLD, or OPTIMIZED.

Colors

Colors are applied automatically based on the layer tag in:

  • Repository table listings
  • Data lineage diagrams
  • Knowledge Grid visualizations
  • Documentation exports

When creating diagrams or documentation manually, use these color codes:

LayerHex CodeRGB
Raw (Grey)#9E9E9E158, 158, 158
Clean (Blue)#2196F333, 150, 243
Gold (Gold)#FFC107255, 193, 7
Optimized (Green)#4CAF5076, 175, 80

Knowledge Grid Integration

The Knowledge Grid is ARPIA's semantic layer that defines how data is exposed to dashboards, reports, and AI assistants. It contains metadata about metrics, hierarchies, joins, and usage policies.

Layer Eligibility:

LayerKnowledge Grid EligibleNotes
Raw❌ NoNever exposed to end users
Clean⚠️ LimitedFor exploratory analysis and validation only
Gold✅ YesPrimary source for business consumption
Optimized✅ YesMust have associated Knowledge Node documentation

Knowledge Node: A Knowledge Node is the documentation record for a dataset in the Knowledge Grid. It defines the table's metrics, dimensions, relationships, refresh schedule, and usage policies.


Governance Rules

Tagging Requirements

  1. All datasets must be tagged with their layer (RAW, CLEAN, GOLD, OPTIMIZED).
  2. Official colors must be used in documentation, diagrams, and dashboards to visually reinforce the layer.
  3. Table names must include the appropriate suffix (_raw, _clean, _gold, _optimized).

Consumption Rules

LayerAllowed Consumers
RawData Engineering pipelines only
CleanData teams for exploration and validation
GoldDashboards, reports, AI assistants, business users
OptimizedSpecific application or process as documented

Prohibited:

  • No dashboard or AI assistant may consume Raw data directly.
  • No production reports may reference Clean data without explicit approval.

Responsibilities

TransitionOwner
Raw → CleanData Engineering
Clean → GoldShared — Data Engineering (technical) + Data Analysts (business rules)
Gold → OptimizedProcess or application owner; must document in Knowledge Grid

Visual Flow

ARPIA Data Layer Flow — Raw to Clean to Gold to Optimized

Note: Verify this image displays the correct data layer flow diagram.


Related Documentation

  • Extract & Load
    Learn how data is ingested from sources into the ARPIA platform (Source → Raw).

  • Repository Tables
    Explore how loaded data is organized and managed within internal repositories.

  • Transform & Prepare
    Continue to the next stage of the ETL lifecycle — cleaning, enriching, and structuring data (Raw → Clean → Gold).

  • Knowledge Grid
    Understand how Gold and Optimized data is exposed to dashboards and AI assistants.