Transform & Prepare

Transform & Prepare Overview

The Reasoning Flows layer dedicated to data refinement, enrichment, and transformation.


🧭 Purpose

The Transform & Prepare object type enables comprehensive data processing within Reasoning Flows.
These tools allow teams to clean, structure, and transform data, preparing it for advanced analysis, AI modeling, and operational integration.

This includes filtering out irrelevant information, standardizing formats, handling missing or inconsistent values, and merging data from multiple sources.
By refining the data at this stage, teams ensure that it is accurate, consistent, and analysis-ready — forming the foundation for reliable insights, ML performance, and AI reasoning.


🔹 Where It Fits in Reasoning Flows

In the Reasoning Flows architecture:

  1. Extract & Load brings raw or external data into the repository.
  2. Repository Tables register structured datasets for reuse across workflows.
  3. Transform & Prepare refines and cleans this data for analysis or model training.
  4. Knowledge Atlas connects refined data to the Generative AI and Semantic reasoning layers.

Goal: Transform & Prepare bridges the transition from data ingestion to data intelligence — it’s where raw data becomes business-ready information.


🧩 Development Environments

AutoML

This object type provides access to the machine learning and AI tools available within the Arpia platform.
These tools support a wide range of functions, from generating text embeddings to training custom machine learning models tailored to specific needs.

  • Singular-AI Text Embeddings
    Allows you to generate semantic embeddings from raw text data for use in AI pipelines.
    Ideal for NLP-based tasks such as similarity search, classification, or clustering.

Extract

These objects automate the extraction process from MySQL-compatible databases registered as Data Sources within Reasoning Flows.
Extractions can either perform direct table-to-table transfers or use custom SQL queries to precisely define the data retrieved from the source.

  • AP DataPipe Engine - MySQL
    GUI-based tool for moving data directly from a MySQL source into a destination table.

  • Python 3.12 DataPipe Engine
    Script-based version of DataPipe that supports Python logic for dynamic extraction and transformation workflows.


High Performance Computing

These objects provide open development environments for writing and executing custom code — ideal for advanced data processing, ML model training, or complex business logic.

  • PHP 7.4 Application
    Full code environment for procedural logic, integrations, or custom backends.

  • Python 3.8 Advanced ML Application
    Python environment for advanced processing, modeling, and data manipulation.

  • Python 3.8 Advanced ML & Plotly
    Same as above, but preconfigured to support rich data visualizations using Plotly.


Notification Engine

This object type provides access to Arpia's Notification Engine, enabling configuration of automated email notifications and alerts.
Requires a Mailgun API key for operation.

  • AP Notification Engine
    GUI-based setup for defining email templates, triggers, and delivery rules across your workflows.

Prepare & Transform Tools

These objects enable flexible data transformation and preparation tailored to project needs.
Capabilities include adding indexes, converting data types, transforming date formats, executing SQL logic, or segmenting text for downstream AI applications.

  • AP Prepared Table
    GUI-based tool that allows converting an existing table into a modifiable dataset.
    Supports field-by-field data cleaning, retyping, and transformation.

  • AP Transform String to Binary
    Converts text fields into binary encodings — ideal for classification flags.

  • AP Transform String to Numeric
    Converts categorical or string-based values into numeric form for aggregation or modeling.

  • AP Transform Dates to Numeric
    Converts date fields into numeric representations (e.g., timestamps, day-of-week).

  • AP SQL Code Execution
    Code block object for executing custom SQL logic.
    Perfect for logic-heavy preparation, complex joins, or multi-step transformations.

  • SingularAI Text Splitter
    Splits long text entries into smaller segments — useful for tokenization, summarization, or RAG-based workflows.


Web-Hook Sender

This object enables external integration through webhooks, allowing Reasoning Flows to send event-based payloads to external systems in real time.

  • AP Web-Hook Sender
    GUI-based tool for defining webhook payloads and mapping data to third-party systems or APIs.

🧠 Best Practices

  • Use Transform & Prepare objects as the standard layer between ingestion and modeling.
  • Align all transformed tables with the Arpia Data Layer Framework (RAW → CLEAN → GOLD → OPTIMIZED).
  • Use SQL Code Execution or Python DataPipe for complex transformation logic.
  • Document transformation rules and results in the Knowledge Atlas for governance and reusability.

🔗 Related Documentation