AI & Machine Learning

AI & Machine Learning Overview

The Reasoning Flows layer dedicated to intelligent systems, model training, and generative AI.


Purpose

The AI & Machine Learning objects in Reasoning Flows empower users to build, deploy, and leverage intelligent systems directly within their workflows. These tools support a wide range of use cases — from generating embeddings and training predictive models to real-time inference, time-series forecasting, and generative AI applications.

With both GUI-based and GPU-accelerated options, teams can move seamlessly from data preparation to advanced analytics and generative reasoning.


Where It Fits in Reasoning Flows

In the Reasoning Flows architecture:

  1. Extract & Load brings data into the platform.
  2. Repository Tables register datasets for reuse.
  3. Transform & Prepare cleans and structures data for modeling.
  4. AI & Machine Learning builds, trains, and deploys intelligent models.
  5. Reasoning Atlas integrates model outputs into ARPIA's Generative AI and Semantic reasoning systems.

Goal: This stage turns structured and prepared data into intelligent models, predictions, and AI-driven applications.


Available Tools

AutoML

GUI-based machine learning tools for building models without writing code.

AP AutoML Engine

Low-code tool for training ML models on structured datasets with zero-code setup.

Capabilities:

  • Classification (predict categories)
  • Regression (predict numeric values)
  • Basic time-series forecasting

Best for:

  • Rapid prototyping
  • Business users without ML expertise
  • Standard prediction tasks on tabular data

SingularAI Text Embeddings

Generates semantic vector embeddings from raw text, enabling semantic search, classification, and clustering.

Capabilities:

  • Convert text to numerical vectors that capture meaning
  • Enable similarity-based search
  • Support text classification and clustering

Best for:

  • Natural Language Processing (NLP) tasks
  • Retrieval-Augmented Generation (RAG) workflows
  • Semantic search implementations
  • Text similarity analysis

Example use case:
Convert product descriptions into embeddings, then find similar products based on meaning rather than exact keyword matches.


ARPIA Sequential GenAI Worker

Framework for building sequential generative AI workflows where steps execute in order.

Capabilities:

  • Chain multiple LLM calls in sequence
  • Pass outputs from one step as inputs to the next
  • Build multi-step reasoning pipelines

Best for:

  • Step-by-step document processing
  • Sequential reasoning tasks
  • Pipelines where each step depends on the previous

ARPIA WorkFlow GenAI Worker

Framework for building complex generative AI workflows with branching and parallel execution.

Capabilities:

  • Orchestrate multiple LLM components
  • Support conditional branching
  • Enable parallel processing paths

Best for:

  • Complex AI applications with multiple paths
  • Chat assistants with routing logic
  • Content generation pipelines
  • Summarization and analysis workflows

AutoML GPU

GPU-accelerated machine learning for compute-intensive tasks.

AP AutoML GPU Engine

GPU-accelerated version of the standard AutoML Engine for larger datasets and more complex models.

Capabilities:

  • Same as AP AutoML Engine but with GPU acceleration
  • Faster training on large datasets
  • Support for more complex model architectures

Best for:

  • Datasets exceeding 100,000 rows
  • Deep learning models
  • When standard AutoML training time is too slow

AutoML TimeSeries GPU

Specialized GPU-accelerated tools for time-series forecasting.

AP AutoML TimeSeries GPU Engine

GPU-accelerated engine specifically optimized for time-series prediction tasks.

Capabilities:

  • Forecast future values based on historical patterns
  • Handle seasonality and trends
  • Process multiple time series simultaneously

Best for:

  • Sales forecasting
  • Demand prediction
  • Financial time-series analysis
  • Any prediction task where historical sequence matters

Image Analyzer

Computer vision tools for image classification and analysis.

AP AutoML Vision Predictor

GUI-driven object that classifies images based on visual content. References image URLs stored in Repository Tables.

Capabilities:

  • Image classification
  • Content tagging
  • Visual pattern recognition

Best for:

  • Product image categorization
  • Quality control validation
  • Content moderation
  • Any task requiring automated image analysis

Choosing the Right Tool

If you need to...Use this tool
Train a classification or regression model (no code)AP AutoML Engine
Train on large datasets or complex modelsAP AutoML GPU Engine
Forecast time-series dataAP AutoML TimeSeries GPU Engine
Generate text embeddings for NLPSingularAI Text Embeddings
Build a step-by-step GenAI pipelineARPIA Sequential GenAI Worker
Build complex GenAI workflows with branchingARPIA WorkFlow GenAI Worker
Classify or analyze imagesAP AutoML Vision Predictor

When to Use GPU

GPU-accelerated engines provide significant performance benefits but consume more resources. Use this guide:

ScenarioRecommendation
Dataset < 100,000 rowsStandard AutoML Engine
Dataset > 100,000 rowsGPU Engine
Training takes > 30 minutes on standardSwitch to GPU
Deep learning or neural networksGPU Engine
Time-series with many series or long historyTimeSeries GPU Engine
Quick prototyping or testingStandard Engine first

Typical Workflow Examples

Example 1: Sales Prediction Model

sales_gold (GOLD)
    │
    ▼
AP AutoML Engine
    • Target: net_sales
    • Features: category, region, month
    • Type: Regression
    │
    ▼
sales_model (trained model)
    │
    ▼
sales_predictions_optimized (OPTIMIZED)

Example 2: Semantic Search with RAG

documents_clean (CLEAN)
    │
    ▼
SingularAI Text Embeddings
    • Generate vectors for each document
    │
    ▼
documents_embeddings_optimized (OPTIMIZED)
    │
    ▼
Reasoning Atlas
    • Enable semantic search
    • Power LLM-based Q&A

Example 3: Time-Series Forecasting

historical_sales_gold (GOLD)
    │
    ▼
AP AutoML TimeSeries GPU Engine
    • Time column: date
    • Target: units_sold
    • Forecast horizon: 30 days
    │
    ▼
sales_forecast_optimized (OPTIMIZED)

Example 4: Generative AI Document Processing

documents_raw (RAW)
    │
    ▼
ARPIA Sequential GenAI Worker
    • Step 1: Extract key information
    • Step 2: Summarize content
    • Step 3: Generate recommendations
    │
    ▼
processed_documents_gold (GOLD)

Integration with Data Layers

AI & ML objects consume and produce data at specific layers:

LayerAI/ML Role
RAWNot recommended as direct input — clean first
CLEANAcceptable for exploration and testing
GOLDPrimary input for production models
OPTIMIZEDOutput destination for predictions and embeddings

Important: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Catalog for reasoning and LLM-powered exploration.


Best Practices

Prepare data before training. Use Transform & Prepare objects to clean and structure data. Models perform better on high-quality GOLD layer data.

Start with standard AutoML. Use GPU engines only when standard engines are too slow or dataset size requires it.

Tag outputs appropriately. Model predictions and embeddings should be tagged as OPTIMIZED and documented in the Knowledge Catalog.

Version your models. Keep track of which data and parameters produced each model for reproducibility.

Monitor model performance. Regularly validate that models continue to perform well as new data arrives.

Document in the Knowledge Catalog. Register model outputs as Nodes so they're discoverable and governed.


Related Documentation