AI & Machine Learning
AI & Machine Learning Overview
The Reasoning Flows layer dedicated to intelligent systems, model training, and generative AI.
Purpose
The AI & Machine Learning objects in Reasoning Flows empower users to build, deploy, and leverage intelligent systems directly within their workflows. These tools support a wide range of use cases — from generating embeddings and training predictive models to real-time inference, time-series forecasting, and generative AI applications.
With both GUI-based and GPU-accelerated options, teams can move seamlessly from data preparation to advanced analytics and generative reasoning.
Where It Fits in Reasoning Flows
In the Reasoning Flows architecture:
- Extract & Load brings data into the platform.
- Repository Tables register datasets for reuse.
- Transform & Prepare cleans and structures data for modeling.
- AI & Machine Learning builds, trains, and deploys intelligent models.
- Reasoning Atlas integrates model outputs into ARPIA's Generative AI and Semantic reasoning systems.
Goal: This stage turns structured and prepared data into intelligent models, predictions, and AI-driven applications.
Available Tools
AutoML
GUI-based machine learning tools for building models without writing code.
AP AutoML Engine
Low-code tool for training ML models on structured datasets with zero-code setup.
Capabilities:
- Classification (predict categories)
- Regression (predict numeric values)
- Basic time-series forecasting
Best for:
- Rapid prototyping
- Business users without ML expertise
- Standard prediction tasks on tabular data
SingularAI Text Embeddings
Generates semantic vector embeddings from raw text, enabling semantic search, classification, and clustering.
Capabilities:
- Convert text to numerical vectors that capture meaning
- Enable similarity-based search
- Support text classification and clustering
Best for:
- Natural Language Processing (NLP) tasks
- Retrieval-Augmented Generation (RAG) workflows
- Semantic search implementations
- Text similarity analysis
Example use case:
Convert product descriptions into embeddings, then find similar products based on meaning rather than exact keyword matches.
ARPIA Sequential GenAI Worker
Framework for building sequential generative AI workflows where steps execute in order.
Capabilities:
- Chain multiple LLM calls in sequence
- Pass outputs from one step as inputs to the next
- Build multi-step reasoning pipelines
Best for:
- Step-by-step document processing
- Sequential reasoning tasks
- Pipelines where each step depends on the previous
ARPIA WorkFlow GenAI Worker
Framework for building complex generative AI workflows with branching and parallel execution.
Capabilities:
- Orchestrate multiple LLM components
- Support conditional branching
- Enable parallel processing paths
Best for:
- Complex AI applications with multiple paths
- Chat assistants with routing logic
- Content generation pipelines
- Summarization and analysis workflows
AutoML GPU
GPU-accelerated machine learning for compute-intensive tasks.
AP AutoML GPU Engine
GPU-accelerated version of the standard AutoML Engine for larger datasets and more complex models.
Capabilities:
- Same as AP AutoML Engine but with GPU acceleration
- Faster training on large datasets
- Support for more complex model architectures
Best for:
- Datasets exceeding 100,000 rows
- Deep learning models
- When standard AutoML training time is too slow
AutoML TimeSeries GPU
Specialized GPU-accelerated tools for time-series forecasting.
AP AutoML TimeSeries GPU Engine
GPU-accelerated engine specifically optimized for time-series prediction tasks.
Capabilities:
- Forecast future values based on historical patterns
- Handle seasonality and trends
- Process multiple time series simultaneously
Best for:
- Sales forecasting
- Demand prediction
- Financial time-series analysis
- Any prediction task where historical sequence matters
Image Analyzer
Computer vision tools for image classification and analysis.
AP AutoML Vision Predictor
GUI-driven object that classifies images based on visual content. References image URLs stored in Repository Tables.
Capabilities:
- Image classification
- Content tagging
- Visual pattern recognition
Best for:
- Product image categorization
- Quality control validation
- Content moderation
- Any task requiring automated image analysis
Choosing the Right Tool
| If you need to... | Use this tool |
|---|---|
| Train a classification or regression model (no code) | AP AutoML Engine |
| Train on large datasets or complex models | AP AutoML GPU Engine |
| Forecast time-series data | AP AutoML TimeSeries GPU Engine |
| Generate text embeddings for NLP | SingularAI Text Embeddings |
| Build a step-by-step GenAI pipeline | ARPIA Sequential GenAI Worker |
| Build complex GenAI workflows with branching | ARPIA WorkFlow GenAI Worker |
| Classify or analyze images | AP AutoML Vision Predictor |
When to Use GPU
GPU-accelerated engines provide significant performance benefits but consume more resources. Use this guide:
| Scenario | Recommendation |
|---|---|
| Dataset < 100,000 rows | Standard AutoML Engine |
| Dataset > 100,000 rows | GPU Engine |
| Training takes > 30 minutes on standard | Switch to GPU |
| Deep learning or neural networks | GPU Engine |
| Time-series with many series or long history | TimeSeries GPU Engine |
| Quick prototyping or testing | Standard Engine first |
Typical Workflow Examples
Example 1: Sales Prediction Model
sales_gold (GOLD)
│
▼
AP AutoML Engine
• Target: net_sales
• Features: category, region, month
• Type: Regression
│
▼
sales_model (trained model)
│
▼
sales_predictions_optimized (OPTIMIZED)
Example 2: Semantic Search with RAG
documents_clean (CLEAN)
│
▼
SingularAI Text Embeddings
• Generate vectors for each document
│
▼
documents_embeddings_optimized (OPTIMIZED)
│
▼
Reasoning Atlas
• Enable semantic search
• Power LLM-based Q&A
Example 3: Time-Series Forecasting
historical_sales_gold (GOLD)
│
▼
AP AutoML TimeSeries GPU Engine
• Time column: date
• Target: units_sold
• Forecast horizon: 30 days
│
▼
sales_forecast_optimized (OPTIMIZED)
Example 4: Generative AI Document Processing
documents_raw (RAW)
│
▼
ARPIA Sequential GenAI Worker
• Step 1: Extract key information
• Step 2: Summarize content
• Step 3: Generate recommendations
│
▼
processed_documents_gold (GOLD)
Integration with Data Layers
AI & ML objects consume and produce data at specific layers:
| Layer | AI/ML Role |
|---|---|
| RAW | Not recommended as direct input — clean first |
| CLEAN | Acceptable for exploration and testing |
| GOLD | Primary input for production models |
| OPTIMIZED | Output destination for predictions and embeddings |
Important: Only GOLD and OPTIMIZED tables are indexed in the Knowledge Catalog for reasoning and LLM-powered exploration.
Best Practices
Prepare data before training. Use Transform & Prepare objects to clean and structure data. Models perform better on high-quality GOLD layer data.
Start with standard AutoML. Use GPU engines only when standard engines are too slow or dataset size requires it.
Tag outputs appropriately. Model predictions and embeddings should be tagged as OPTIMIZED and documented in the Knowledge Catalog.
Version your models. Keep track of which data and parameters produced each model for reproducibility.
Monitor model performance. Regularly validate that models continue to perform well as new data arrives.
Document in the Knowledge Catalog. Register model outputs as Nodes so they're discoverable and governed.
Related Documentation
-
Transform & Prepare
Learn how data is refined before model training. -
Repository Tables
Understand how structured data is managed and shared across workflows. -
ARPIA Data Layer Framework
Review the governance standards for RAW, CLEAN, GOLD, and OPTIMIZED datasets. -
Reasoning Atlas Overview
Discover how AI and ML outputs integrate into ARPIA's Semantic Reasoning and Generative AI ecosystem.
Updated about 17 hours ago
