Arpia Tools SDK Documentation

Single-File SDK for Arpia Platform



Overview

Arpia Tools is a unified, single-file Python SDK that provides access to all Arpia Platform capabilities. Currently includes:

  • inference_client: LLM inference via Arpia AI Chat Proxy (OpenAI, Anthropic, Cerebras, OpenRouter)

Key Features:

  • ✅ OpenAI-compatible API interface
  • ✅ Support for multiple LLM providers
  • ✅ Zero external dependencies (only requests)
  • ✅ Comprehensive error handling
  • ✅ Type hints and dataclasses
  • ✅ Single file - easy to deploy

Installation

Option 1: Direct File Inclusion

# arpia_tools.py is automatically available in your Arpia execution environment
from arpia_tools import inference_client

Option 2: Manual Installation

# Download arpia_tools.py to your project
# Then import it
from arpia_tools import inference_client

Requirements:

Python >= 3.7
requests >= 2.25.0

Quick Start

from arpia_tools import inference_client

# Create client with your session token
client = inference_client.create_client(
    token="your-session-token",
    base_url="http://localhost:9191"  # Optional, defaults to localhost
)

# Generate a completion
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Python?"}
    ],
    max_tokens=150,
    temperature=0.7
)

# Access the response
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Inference Client

Initialization

Method 1: Using create_client() (Recommended)

from arpia_tools import inference_client

client = inference_client.create_client(
    token="your-session-token",
    base_url="http://localhost:9191",  # Optional
    timeout=300  # Optional, seconds
)

Method 2: Direct Instantiation

from arpia_tools import inference_client

client = inference_client.ArpiaAI(
    token="your-session-token",
    base_url="http://localhost:9191",
    timeout=300
)

Using Constants (Best Practice)

from arpia_tools import inference_client

# Store configuration as constants
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"

client = inference_client.create_client(
    token=API_TOKEN,
    base_url=API_BASE_URL
)

Basic Usage

Simple Completion

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)

print(response.choices[0].message.content)

With System Prompt

response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "Explain list comprehensions."}
    ],
    max_tokens=200,
    temperature=0.7
)

Multi-Turn Conversation

# Initialize conversation history
messages = [
    {"role": "system", "content": "You are a coding assistant."}
]

# First turn
messages.append({"role": "user", "content": "How do I create a dict in Python?"})
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=messages,
    max_tokens=200
)

# Add assistant response to history
messages.append({
    "role": "assistant",
    "content": response.choices[0].message.content
})

# Second turn with context
messages.append({"role": "user", "content": "Show me an example."})
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=messages,
    max_tokens=200
)

Model Formats

Core Models (No Prefix Required)

Core models from supported providers. DO NOT add "core/" prefix — the API handles this automatically.

# OpenAI Models
"openai/gpt-3.5-turbo"
"openai/gpt-4"
"openai/gpt-4-turbo"
"openai/gpt-4.1"
"openai/gpt-4.1-mini"
"openai/gpt-4.1-nano"
"openai/gpt-4.5-preview"
"openai/gpt-4o"
"openai/gpt-4o-mini"
"openai/gpt-4o-search-preview"
"openai/gpt-5"
"openai/gpt-5-mini"
"openai/gpt-5-nano"
"openai/o1"
"openai/o1-mini"
"openai/o3-mini"

# Anthropic Models
"anthropic/claude-3-5-haiku-latest"
"anthropic/claude-3-5-sonnet-latest"
"anthropic/claude-3-7-sonnet-latest"
"anthropic/claude-4-5-sonnet-latest"
"anthropic/claude-opus-4-0"
"anthropic/claude-opus-4-1"
"anthropic/claude-sonnet-4-0"

# Cerebras Models
"cerebras/gpt-oss-120b"
"cerebras/llama-3.3-70b"
"cerebras/llama-4-maverick-17b-128e-instruct"
"cerebras/llama-4-scout-17b-16e-instruct"
"cerebras/llama3.1-8b"
"cerebras/qwen-3-32b"
"cerebras/qwen-3-coder-480b"

# Google Models
"google/gemini-2.0-flash-001"
"google/gemini-2.5-flash-preview-05-20"
"google/gemini-2.5-pro-preview"

# Lambda Models
"lambda/deepseek-llama3.3-70b"
"lambda/deepseek-r1-0528"
"lambda/deepseek-r1-671b"
"lambda/hermes3-405b"
"lambda/llama-4-maverick-17b-128e-instruct-fp8"
"lambda/llama-4-scout-17b-16e-instruct"
"lambda/llama3.2-11b-vision-instruct"
"lambda/llama3.2-3b-instruct"
"lambda/llama3.3-70b-instruct-fp8"
"lambda/qwen25-coder-32b-instruct"

# Mistral Models
"mistralai/codestral-2501"
"mistralai/mistral-nemo"
"mistralai/pixtral-large-2411"

# Perplexity Models
"perplexity/sonar"
"perplexity/sonar-deep-research"
"perplexity/sonar-pro"

# xAI Models
"xai/grok-2-latest"
"xai/grok-2-vision-latest"

# OpenRouter Models (nested format)
"openrouter/anthropic/claude-3-5-sonnet"
"openrouter/openai/gpt-4o"

Workarea Models (Requires Prefix)

Custom models specific to your workarea. MUST include "workarea/" prefix.

"workarea/my-custom-model"
"workarea/fine-tuned-gpt4"
"workarea/company-llm-v2"

❌ Common Mistakes

# WRONG - Do not use "core/" prefix
"core/openai/gpt-4o"  # ❌

# CORRECT
"openai/gpt-4o"  # ✅

# WRONG - Missing "workarea/" for custom models
"my-custom-model"  # ❌

# CORRECT
"workarea/my-custom-model"  # ✅

Parameters Reference

create() Method Parameters

ParameterTypeDefaultDescription
modelstrRequiredModel identifier (see Model Formats)
messagesList[Dict]RequiredList of message dicts with role and content keys
max_tokensint100Maximum tokens to generate
temperaturefloat1.0Sampling temperature (0.0-2.0). Lower = more deterministic
streamboolFalseEnable streaming responses (not yet implemented)
json_modebool|"plain"FalseResponse format control (see below)
formatstr"text"Response format type
response_formatdict{}Additional format specifications

JSON Mode Values

The json_mode parameter controls the response structure:

ValueBehaviorUse Case
FalseReturns plain text onlySimple text responses, minimal overhead
TrueReturns full JSON with metadataWhen you need token counts and timing
"plain"Same as TrueAlternative syntax for JSON response

Examples:

# Plain text response (default)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    json_mode=False  # Default
)
# response.usage may have zero values

# Full JSON response with metadata
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    json_mode=True
)
print(f"Tokens: {response.usage.total_tokens}")
print(f"Time: {response.time_stats}s")

Response Structure

CompletionResponse Object

@dataclass
class CompletionResponse:
    id: str                    # Response ID
    choices: List[Choice]      # List of completion choices
    usage: Usage              # Token usage statistics
    model: str                # Model used for completion
    time_stats: float         # Processing time in seconds

Choice Object

@dataclass
class Choice:
    index: int                # Choice index (always 0 for now)
    message: Message          # The message content
    finish_reason: str        # Reason for completion stop

Message Object

@dataclass
class Message:
    role: str                 # "assistant" for responses
    content: str              # The actual text content

Usage Object

@dataclass
class Usage:
    prompt_tokens: int        # Tokens in the prompt
    completion_tokens: int    # Tokens in the completion
    total_tokens: int         # Total tokens used

Accessing Response Data

response = client.chat.completions.create(...)

# Get the text content
text = response.choices[0].message.content

# Get token usage
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens

# Get timing information
processing_time = response.time_stats

# Get model used
model_name = response.model

Advanced Examples

Example 1: Temperature Control

from arpia_tools import inference_client

client = inference_client.create_client(token=API_TOKEN)

prompt = "Write a creative tagline for a coffee shop"

# Low temperature (0.0-0.3) - More deterministic, focused
response_focused = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.2,
    max_tokens=50
)
print("Focused:", response_focused.choices[0].message.content)

# Medium temperature (0.7-1.0) - Balanced creativity
response_balanced = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.7,
    max_tokens=50
)
print("Balanced:", response_balanced.choices[0].message.content)

# High temperature (1.5-2.0) - Maximum creativity
response_creative = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    temperature=1.8,
    max_tokens=50
)
print("Creative:", response_creative.choices[0].message.content)

Example 2: Using Different Providers

from arpia_tools import inference_client

client = inference_client.create_client(token=API_TOKEN)

question = "Explain quantum computing in simple terms"

# OpenAI
response_openai = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": question}],
    max_tokens=200
)

# Anthropic
response_anthropic = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[{"role": "user", "content": question}],
    max_tokens=200
)

# Cerebras (fast inference)
response_cerebras = client.chat.completions.create(
    model="cerebras/llama3.1-8b",
    messages=[{"role": "user", "content": question}],
    max_tokens=200
)

# Compare responses
print("OpenAI:", response_openai.choices[0].message.content)
print("\nAnthropic:", response_anthropic.choices[0].message.content)
print("\nCerebras:", response_cerebras.choices[0].message.content)

Example 3: Batch Processing

from arpia_tools import inference_client

client = inference_client.create_client(token=API_TOKEN)

# Multiple questions to process
questions = [
    "What is machine learning?",
    "What is deep learning?",
    "What is neural network?",
    "What is natural language processing?"
]

results = []
total_tokens = 0

for question in questions:
    response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": question}],
        max_tokens=100,
        json_mode=True  # Get token usage
    )
    
    results.append({
        "question": question,
        "answer": response.choices[0].message.content,
        "tokens": response.usage.total_tokens
    })
    
    total_tokens += response.usage.total_tokens

# Display results
for i, result in enumerate(results, 1):
    print(f"\n{i}. {result['question']}")
    print(f"   Answer: {result['answer']}")
    print(f"   Tokens: {result['tokens']}")

print(f"\nTotal tokens used: {total_tokens}")

Example 4: Dynamic Client Configuration

from arpia_tools import inference_client

# Create client with initial configuration
client = inference_client.create_client(
    token="initial-token",
    base_url="http://dev-api.example.com"
)

# Use client
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=50
)

# Update token (e.g., after re-authentication)
client.set_token("new-session-token")

# Switch to production environment
client.set_base_url("https://prod-api.example.com")

# Continue using with new configuration
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello again"}],
    max_tokens=50
)

Example 5: Helper Function Pattern

from arpia_tools import inference_client
from typing import Optional

# Configuration
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"

def ask_llm(
    prompt: str,
    model: str = "openai/gpt-4o",
    system_prompt: Optional[str] = None,
    max_tokens: int = 500,
    temperature: float = 0.7
) -> str:
    """
    Convenience function for single-turn LLM queries
    
    Args:
        prompt: User prompt
        model: Model to use
        system_prompt: Optional system prompt
        max_tokens: Maximum tokens to generate
        temperature: Sampling temperature
    
    Returns:
        LLM response text
    """
    client = inference_client.create_client(
        token=API_TOKEN,
        base_url=API_BASE_URL
    )
    
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens,
        temperature=temperature
    )
    
    return response.choices[0].message.content

# Usage
answer = ask_llm("What is Python?")
print(answer)

# With system prompt
answer = ask_llm(
    prompt="Explain recursion",
    system_prompt="You are a computer science teacher. Use simple language.",
    max_tokens=300
)
print(answer)

Example 6: Workarea Custom Models

from arpia_tools import inference_client

client = inference_client.create_client(token=API_TOKEN)

# Use your workarea's custom fine-tuned model
response = client.chat.completions.create(
    model="workarea/my-company-gpt4",  # Must have "workarea/" prefix
    messages=[
        {"role": "system", "content": "You are our company's AI assistant."},
        {"role": "user", "content": "What are our Q4 priorities?"}
    ],
    max_tokens=300,
    temperature=0.7
)

print(response.choices[0].message.content)

Error Handling

Exception Hierarchy

ArpiaToolsError               # Base exception
└── ArpiaInferenceError      # Inference client errors

Basic Error Handling

from arpia_tools import inference_client, ArpiaInferenceError

client = inference_client.create_client(token=API_TOKEN)

try:
    response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=100
    )
    print(response.choices[0].message.content)
    
except ArpiaInferenceError as e:
    print(f"Inference error: {e}")
    # Handle API-specific errors (invalid model, auth issues, etc.)
    
except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle unexpected errors

Comprehensive Error Handling

from arpia_tools import inference_client, ArpiaInferenceError
import requests

def safe_llm_call(prompt: str, model: str = "openai/gpt-4o") -> Optional[str]:
    """
    Make an LLM call with comprehensive error handling
    """
    client = inference_client.create_client(token=API_TOKEN)
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=200,
            timeout=60
        )
        return response.choices[0].message.content
        
    except ArpiaInferenceError as e:
        if "not found" in str(e).lower():
            print(f"Model '{model}' not found. Check model availability.")
        elif "invalid" in str(e).lower():
            print(f"Invalid request parameters.")
        else:
            print(f"API error: {e}")
        return None
        
    except requests.exceptions.Timeout:
        print("Request timed out. Try again or increase timeout.")
        return None
        
    except requests.exceptions.ConnectionError:
        print("Connection error. Check network and API URL.")
        return None
        
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

# Usage
result = safe_llm_call("What is AI?")
if result:
    print(result)
else:
    print("Failed to get response")

Common Error Scenarios

ErrorCauseSolution
invalid session tokenToken expired or incorrectRefresh authentication token
model not foundInvalid model identifierCheck model name format
timeoutRequest took too longIncrease timeout or reduce max_tokens
connection errorNetwork/API unavailableCheck base_url and network
invalid model formatWrong model syntaxUse correct format (see Model Formats)

Best Practices

1. Use Constants for Configuration

# ✅ Good - Easy to manage and update
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"
DEFAULT_MODEL = "openai/gpt-4o"

client = inference_client.create_client(
    token=API_TOKEN,
    base_url=API_BASE_URL
)

2. Implement Retry Logic

import time
from arpia_tools import inference_client, ArpiaInferenceError

def llm_call_with_retry(prompt: str, max_retries: int = 3):
    client = inference_client.create_client(token=API_TOKEN)
    
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="openai/gpt-4o",
                messages=[{"role": "user", "content": prompt}],
                max_tokens=200
            )
            return response.choices[0].message.content
            
        except ArpiaInferenceError as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Attempt {attempt + 1} failed. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

3. Monitor Token Usage

from arpia_tools import inference_client

client = inference_client.create_client(token=API_TOKEN)

# Track token usage across multiple calls
total_tokens = 0
responses = []

for prompt in prompts:
    response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=100,
        json_mode=True  # Get token counts
    )
    
    total_tokens += response.usage.total_tokens
    responses.append(response.choices[0].message.content)
    
    # Alert if approaching limit
    if total_tokens > 100000:
        print(f"Warning: Used {total_tokens} tokens")
        break

print(f"Total tokens used: {total_tokens}")

4. Use Appropriate Temperature

# Factual/analytical tasks - Low temperature (0.0-0.3)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    temperature=0.0  # Deterministic
)

# Creative tasks - Medium to high temperature (0.7-1.5)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a poem"}],
    temperature=1.2  # More creative
)

# Code generation - Low to medium temperature (0.2-0.7)
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write Python function"}],
    temperature=0.5  # Balanced
)

5. Validate Model Format

def validate_model_format(model: str) -> bool:
    """Validate model identifier format"""
    
    # Check for workarea models
    if model.startswith("workarea/"):
        return len(model.split("/")) == 2
    
    # Check for core models (should have exactly one slash)
    if model.count("/") == 1:
        provider = model.split("/")[0]
        valid_providers = ["openai", "anthropic", "cerebras", "openrouter"]
        return provider in valid_providers
    
    return False

# Usage
model = "openai/gpt-4o"
if validate_model_format(model):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Hello"}]
    )
else:
    print(f"Invalid model format: {model}")

6. Reuse Client Instances

# ✅ Good - Reuse client
client = inference_client.create_client(token=API_TOKEN)

for prompt in prompts:
    response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    process(response)

# ❌ Bad - Creating new client every time (unnecessary overhead)
for prompt in prompts:
    client = inference_client.create_client(token=API_TOKEN)
    response = client.chat.completions.create(...)

API Reference

inference_client Module

inference_client.create_client(token, base_url, timeout)

Create an Arpia AI inference client instance.

Parameters:

  • token (str): Session token for authentication
  • base_url (str, optional): API base URL. Default: "http://localhost:9191"
  • timeout (int, optional): Request timeout in seconds. Default: 300

Returns: ArpiaAI instance

Example:

client = inference_client.create_client(
    token="your-token",
    base_url="http://api.example.com",
    timeout=60
)

ArpiaAI.chat.completions.create(...)

Create a chat completion request.

Parameters:

  • model (str, required): Model identifier
  • messages (List[Dict], required): Conversation messages
  • max_tokens (int): Maximum tokens to generate. Default: 100
  • temperature (float): Sampling temperature 0-2. Default: 1.0
  • stream (bool): Enable streaming. Default: False
  • json_mode (bool|"plain"): Response format. Default: False
  • format (str): Format type. Default: "text"
  • response_format (dict): Additional format specs. Default: {}

Returns: CompletionResponse object

Raises: ArpiaInferenceError on API errors

Example:

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    max_tokens=150,
    temperature=0.7,
    json_mode=True
)

ArpiaAI.set_token(token)

Update the session token.

Parameters:

  • token (str): New session token

Example:

client.set_token("new-session-token")

ArpiaAI.set_base_url(base_url)

Update the API base URL.

Parameters:

  • base_url (str): New base URL

Example:

client.set_base_url("https://prod-api.example.com")

Utility Functions

get_version()

Get the current SDK version.

Returns: Version string

Example:

from arpia_tools import get_version
print(f"SDK Version: {get_version()}")  # "1.0.0"

list_modules()

List all available modules.

Returns: List of module names

Example:

from arpia_tools import list_modules
print(list_modules())  # ["inference_client"]

Future Modules

The following modules are planned for future releases:

data_client (Planned)

Data operations and management.

# Future usage
from arpia_tools import data_client

data = data_client.create_client(token=API_TOKEN)
df = data.read_table("customers")
data.write_table("processed_data", df)

storage_client (Planned)

File and object storage operations.

# Future usage
from arpia_tools import storage_client

storage = storage_client.create_client(token=API_TOKEN)
storage.upload_file("report.pdf", "/reports/")
storage.download_file("/reports/report.pdf", "local_report.pdf")

workflow_client (Planned)

Workflow automation and orchestration.

# Future usage
from arpia_tools import workflow_client

workflow = workflow_client.create_client(token=API_TOKEN)
workflow.trigger("data_pipeline", params={"date": "2025-01-01"})
status = workflow.get_status("job-12345")

Support & Feedback

For questions, issues, or feature requests:

  • Documentation: Check this guide first
  • Platform Support: Contact Arpia Platform support team
  • API Status: Monitor API health dashboard
  • Updates: SDK updates are automatically available in your execution environment