Extract & Load

Extract & Load Overview

The Extract & Load tools in the Data Workshop are built to simplify the transfer of data from various sources—such as databases, APIs, or files—into defined destinations within the ARPIA platform. These tools automate both extraction and loading steps, streamlining complex data workflows and making them more efficient. By centralizing data movement, the platform ensures consistency and accuracy across systems, allowing teams to spend less time on preparation and more time on analysis.

Development Environment

The Extract & Load tools are divided into three main categories within the development environment:

Extract

These tools automate data extraction from MySQL-compatible databases that have been registered as Data Sources in the ARPIA workspace. Extractions can be configured to pull data directly from tables or to use custom SQL queries for more precise control.

  • AP DataPipe Engine - MySQL
  • AP DataPipe Engine - File
  • Python 3.12 DataPipe Engine

High Performance Computing

These environments are designed for complex, custom processing. They allow users to write and execute code to solve advanced problems, run computations, or create highly tailored workflows.

  • PHP 7.4 Application
  • Python 3.8 Advanced ML Application
  • PHP 8.2 Application
  • Python 3 FastAPI

Notebooks

Arpia Notebooks offer a flexible, interactive environment for data exploration and rapid prototyping.

  • Arpia Notebook

Object Capabilities and Differences

Each object type serves a different purpose in the data workflow process. Below is an overview of what each object can do and how they differ.

AP DataPipe Engine (MySQL / File)

These are form-based tools designed for connecting to a data source and transferring data into a destination table inside ARPIA.


  • Purpose: Automate the loading of structured data from MySQL-compatible databases or files into destination tables.
  • Interface: Configured through a form in the UI; no code required.
  • Use Cases:
    • Copying entire tables or selected columns from a source database.
    • Performing simple filtering or field mapping during the transfer.

Python 3.12 DataPipe Engine

This tool extends the capabilities of the form-based DataPipe by adding support for Python scripting.


  • Purpose: Enable more complex data manipulation and transformation during the extraction and loading process.
  • Interface: Uses Python code to define the behavior of the data pipeline.
  • Use Cases:
    • Applying conditional logic or data cleaning before insertion.
    • Transforming data formats, aggregations, or merging multiple sources.
    • Interfacing with APIs or services before loading into a table.

High Performance Computing Applications

These objects provide a fully open development environment for building custom applications or logic-heavy processes.

  • Purpose: Support advanced use cases that go beyond structured extraction and loading.
  • Interface: Full development access via scripting languages (PHP or Python).
  • Use Cases:
    • Building custom APIs, webhooks, or external integrations.
    • Running heavy processing workloads or nested workflows.
    • Training models or executing ML pipelines (in Python-based environments).
    • Creating advanced automation scenarios with branching logic.

Arpia Notebook

Notebooks provide a rich, browser-based environment for interactive scripting, visual exploration, and step-by-step data processing.

  • Purpose: Rapid prototyping, data analysis, or exploratory development without deploying full applications.
  • Interface: Interactive code cells with support for Python and visualization libraries.
  • Use Cases:
    • Exploratory data analysis.
    • Testing SQL queries or API calls with immediate feedback.
    • Creating custom visualizations and plots.
    • Documenting data processes or insights with code and markdown.

Summary of Differences

ToolCoding RequiredBest Suited ForConfiguration Style
AP DataPipe - MySQL/FileNoDirect data transfer from source to tableForm-based UI
Python 3.12 DataPipeYes (Python)Data transformation and custom logicPython script with configs
HPC ApplicationsYes (PHP/Python)Full flexibility for custom processes & APIsFull development environment
Arpia NotebookYes (Python)Interactive exploration and prototypingNotebook-style interface