Purpose

This framework establishes a common language and clear structure for classifying, governing, and consuming data in ARPIA. This terminology is used across the organization for both technical documentation and business communication, ensuring consistency and shared understanding among teams.

Data Layers Overview

ARPIA uses a four-layer data architecture to organize data by quality, transformation state, and consumption readiness. Each layer has a defined purpose, color code, and governance rules.

Layer	Tag	Color	Purpose
Raw	`RAW`	Grey ⚪	Source data as-is, no transformation
Clean	`CLEAN`	Blue 🔵	Validated and standardized data
Gold	`GOLD`	Gold 🟡	Business-ready, single source of truth
Optimized	`OPTIMIZED`	Green 🟢	Tailored for specific use cases

Layer Definitions

1. Raw

Definition: Data as it arrives from sources (ERP, POS, IoT, external integrations, files, APIs).

Characteristics:

No transformation or cleaning applied.
May contain duplicates, nulls, or format errors.
Represents the exact "snapshot" of the source.

Examples:

ventas_raw — sales data with dates as strings and inconsistent fields.
maestro_sociedades_raw — master data exported directly from SAP.

Color/Tag: Grey ⚪ with tag RAW

2. Clean

Definition: Data processed by transformation objects (such as AP Prepared Table or AP SQL Code Execution), validated and standardized.

Note: This layer is equivalent to "Silver" in medallion architecture terminology. You may encounter _silver suffixes in legacy tables — these follow the same standards as _clean tables.

Characteristics:

Correct data type casting applied.
Null values handled, duplicates removed.
Includes clean master data and normalized tables.
May or may not be included in the Knowledge Grid, but must always be tagged and color-coded.

Examples:

ventas_clean — sales data with typed dates and normalized costs.
dim_articulo_clean — article dimension with unified hierarchies.

Color/Tag: Blue 🔵 with tag CLEAN

3. Gold

Definition: Data ready for business consumption and inclusion in the Knowledge Grid.

Characteristics:

Aggregated or transformed into KPIs and business metrics.
Ambiguity-free: defines the "single source of truth" for sales, budgets, margins, etc.
Serves as the foundation for both traditional dashboards and AI assistants.

Examples:

agg_ventas_mes_tienda_gold — monthly sales by store with Net Sales, Budget, Compliance, and Profitability.
agg_ventas_dia_regional_gold — daily regional sales for operational analysis.

Color/Tag: Gold 🟡 with tag GOLD

4. Optimized

Definition: Subset of Clean or Gold data, tailored and optimized for a specific process, application, or consumption pattern.

Characteristics:

Always documented with its associated Knowledge Node.
May include indexes, special partitions, or wide structures to accelerate a use case.
Maintains traceability back to its Clean or Gold origin.

Examples:

agg_forecast_ventas_optimized — optimized for AI predictive models.
agg_tienda_dia_dashboard_optimized — optimized for fast executive dashboards.

Color/Tag: Emerald Green 🟢 with tag OPTIMIZED

Table Naming Conventions

To ensure tables are easily identifiable by layer, follow these naming patterns:

Layer	Suffix	Prefix Pattern	Example
Raw	`_raw`	Source name	`ventas_raw`, `clientes_raw`
Clean	`_clean`	`dim_` for dimensions, source name for facts	`dim_articulo_clean`, `ventas_clean`
Gold	`_gold`	`agg_` for aggregates, `fact_` for facts	`agg_ventas_mes_gold`, `fact_orders_gold`
Optimized	`_optimized`	Based on use case	`agg_forecast_ventas_optimized`

Prefix Definitions:

dim_ — Dimension tables (master data, reference data)
fact_ — Fact tables (transactional data)
agg_ — Aggregated tables (pre-calculated summaries)

How Layers Connect

Transition	Responsible Team	ARPIA Objects Used
Source → Raw	Data Engineering	AP DataPipe Engine, Python 3.12 DataPipe
Raw → Clean	Data Engineering	AP Prepared Table, AP SQL Code Execution, AP Transform objects
Clean → Gold	Data Engineering + Data Analysts	AP SQL Code Execution, AP Prepared Table
Gold → Optimized	Process/Application Owner	Custom objects based on use case

Applying Tags and Colors

Colors

Colors are applied automatically based on the layer tag in:

Repository table listings
Data lineage diagrams
Knowledge Grid visualizations
Documentation exports

When creating diagrams or documentation manually, use these color codes:

Layer	Hex Code	RGB
Raw (Grey)	`#9E9E9E`	158, 158, 158
Clean (Blue)	`#2196F3`	33, 150, 243
Gold (Gold)	`#FFC107`	255, 193, 7
Optimized (Green)	`#4CAF50`	76, 175, 80

Knowledge Grid Integration

The Knowledge Grid is ARPIA's semantic layer that defines how data is exposed to dashboards, reports, and AI assistants. It contains metadata about metrics, hierarchies, joins, and usage policies.

Layer Eligibility:

Layer	Knowledge Grid Eligible	Notes
Raw	❌ No	Never exposed to end users
Clean	⚠️ Limited	For exploratory analysis and validation only
Gold	✅ Yes	Primary source for business consumption
Optimized	✅ Yes	Must have associated Knowledge Node documentation

Knowledge Node: A Knowledge Node is the documentation record for a dataset in the Knowledge Grid. It defines the table's metrics, dimensions, relationships, refresh schedule, and usage policies.

Governance Rules

Tagging Requirements

All datasets must be tagged with their layer (RAW, CLEAN, GOLD, OPTIMIZED).
Official colors must be used in documentation, diagrams, and dashboards to visually reinforce the layer.
Table names must include the appropriate suffix (_raw, _clean, _gold, _optimized).

Consumption Rules

Layer	Allowed Consumers
Raw	Data Engineering pipelines only
Clean	Data teams for exploration and validation
Gold	Dashboards, reports, AI assistants, business users
Optimized	Specific application or process as documented

Prohibited:

No dashboard or AI assistant may consume Raw data directly.
No production reports may reference Clean data without explicit approval.

Responsibilities

Transition	Owner
Raw → Clean	Data Engineering
Clean → Gold	Shared — Data Engineering (technical) + Data Analysts (business rules)
Gold → Optimized	Process or application owner; must document in Knowledge Grid

Visual Flow

Note: Verify this image displays the correct data layer flow diagram.

ARPIA Data Layer Framework

Purpose

Data Layers Overview

Layer Definitions

1. Raw

2. Clean

3. Gold

4. Optimized

Table Naming Conventions

How Layers Connect

Applying Tags and Colors

Tags

Colors

Knowledge Grid Integration

Governance Rules

Tagging Requirements

Consumption Rules

Responsibilities

Visual Flow

Related Documentation