Arpia Tools SDK Documentation
Single-File SDK for Arpia Platform
Overview
Arpia Tools is a unified, single-file Python SDK that provides access to all Arpia Platform capabilities. Currently includes:
- inference_client: LLM inference via Arpia AI Chat Proxy (OpenAI, Anthropic, Cerebras, OpenRouter)
Key Features:
- ✅ OpenAI-compatible API interface
- ✅ Support for multiple LLM providers
- ✅ Zero external dependencies (only
requests
) - ✅ Comprehensive error handling
- ✅ Type hints and dataclasses
- ✅ Single file - easy to deploy
Installation
Option 1: Direct File Inclusion
# arpia_tools.py is automatically available in your Arpia execution environment
from arpia_tools import inference_client
Option 2: Manual Installation
# Download arpia_tools.py to your project
# Then import it
from arpia_tools import inference_client
Requirements:
Python >= 3.7
requests >= 2.25.0
Quick Start
from arpia_tools import inference_client
# Create client with your session token
client = inference_client.create_client(
token="your-session-token",
base_url="http://localhost:9191" # Optional, defaults to localhost
)
# Generate a completion
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"}
],
max_tokens=150,
temperature=0.7
)
# Access the response
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
Inference Client
Initialization
Method 1: Using create_client()
(Recommended)
create_client()
(Recommended)from arpia_tools import inference_client
client = inference_client.create_client(
token="your-session-token",
base_url="http://localhost:9191", # Optional
timeout=300 # Optional, seconds
)
Method 2: Direct Instantiation
from arpia_tools import inference_client
client = inference_client.ArpiaAI(
token="your-session-token",
base_url="http://localhost:9191",
timeout=300
)
Using Constants (Best Practice)
from arpia_tools import inference_client
# Store configuration as constants
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"
client = inference_client.create_client(
token=API_TOKEN,
base_url=API_BASE_URL
)
Basic Usage
Simple Completion
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100
)
print(response.choices[0].message.content)
With System Prompt
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Explain list comprehensions."}
],
max_tokens=200,
temperature=0.7
)
Multi-Turn Conversation
# Initialize conversation history
messages = [
{"role": "system", "content": "You are a coding assistant."}
]
# First turn
messages.append({"role": "user", "content": "How do I create a dict in Python?"})
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=messages,
max_tokens=200
)
# Add assistant response to history
messages.append({
"role": "assistant",
"content": response.choices[0].message.content
})
# Second turn with context
messages.append({"role": "user", "content": "Show me an example."})
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=messages,
max_tokens=200
)
Model Formats
Core Models (No Prefix Required)
Core models from supported providers. DO NOT add "core/" prefix — the API handles this automatically.
# OpenAI Models
"openai/gpt-3.5-turbo"
"openai/gpt-4"
"openai/gpt-4-turbo"
"openai/gpt-4.1"
"openai/gpt-4.1-mini"
"openai/gpt-4.1-nano"
"openai/gpt-4.5-preview"
"openai/gpt-4o"
"openai/gpt-4o-mini"
"openai/gpt-4o-search-preview"
"openai/gpt-5"
"openai/gpt-5-mini"
"openai/gpt-5-nano"
"openai/o1"
"openai/o1-mini"
"openai/o3-mini"
# Anthropic Models
"anthropic/claude-3-5-haiku-latest"
"anthropic/claude-3-5-sonnet-latest"
"anthropic/claude-3-7-sonnet-latest"
"anthropic/claude-4-5-sonnet-latest"
"anthropic/claude-opus-4-0"
"anthropic/claude-opus-4-1"
"anthropic/claude-sonnet-4-0"
# Cerebras Models
"cerebras/gpt-oss-120b"
"cerebras/llama-3.3-70b"
"cerebras/llama-4-maverick-17b-128e-instruct"
"cerebras/llama-4-scout-17b-16e-instruct"
"cerebras/llama3.1-8b"
"cerebras/qwen-3-32b"
"cerebras/qwen-3-coder-480b"
# Google Models
"google/gemini-2.0-flash-001"
"google/gemini-2.5-flash-preview-05-20"
"google/gemini-2.5-pro-preview"
# Lambda Models
"lambda/deepseek-llama3.3-70b"
"lambda/deepseek-r1-0528"
"lambda/deepseek-r1-671b"
"lambda/hermes3-405b"
"lambda/llama-4-maverick-17b-128e-instruct-fp8"
"lambda/llama-4-scout-17b-16e-instruct"
"lambda/llama3.2-11b-vision-instruct"
"lambda/llama3.2-3b-instruct"
"lambda/llama3.3-70b-instruct-fp8"
"lambda/qwen25-coder-32b-instruct"
# Mistral Models
"mistralai/codestral-2501"
"mistralai/mistral-nemo"
"mistralai/pixtral-large-2411"
# Perplexity Models
"perplexity/sonar"
"perplexity/sonar-deep-research"
"perplexity/sonar-pro"
# xAI Models
"xai/grok-2-latest"
"xai/grok-2-vision-latest"
# OpenRouter Models (nested format)
"openrouter/anthropic/claude-3-5-sonnet"
"openrouter/openai/gpt-4o"
Workarea Models (Requires Prefix)
Custom models specific to your workarea. MUST include "workarea/" prefix.
"workarea/my-custom-model"
"workarea/fine-tuned-gpt4"
"workarea/company-llm-v2"
❌ Common Mistakes
# WRONG - Do not use "core/" prefix
"core/openai/gpt-4o" # ❌
# CORRECT
"openai/gpt-4o" # ✅
# WRONG - Missing "workarea/" for custom models
"my-custom-model" # ❌
# CORRECT
"workarea/my-custom-model" # ✅
Parameters Reference
create()
Method Parameters
create()
Method ParametersParameter | Type | Default | Description |
---|---|---|---|
model | str | Required | Model identifier (see Model Formats) |
messages | List[Dict] | Required | List of message dicts with role and content keys |
max_tokens | int | 100 | Maximum tokens to generate |
temperature | float | 1.0 | Sampling temperature (0.0-2.0). Lower = more deterministic |
stream | bool | False | Enable streaming responses (not yet implemented) |
json_mode | bool|"plain" | False | Response format control (see below) |
format | str | "text" | Response format type |
response_format | dict | {} | Additional format specifications |
JSON Mode Values
The json_mode
parameter controls the response structure:
Value | Behavior | Use Case |
---|---|---|
False | Returns plain text only | Simple text responses, minimal overhead |
True | Returns full JSON with metadata | When you need token counts and timing |
"plain" | Same as True | Alternative syntax for JSON response |
Examples:
# Plain text response (default)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
json_mode=False # Default
)
# response.usage may have zero values
# Full JSON response with metadata
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
json_mode=True
)
print(f"Tokens: {response.usage.total_tokens}")
print(f"Time: {response.time_stats}s")
Response Structure
CompletionResponse Object
@dataclass
class CompletionResponse:
id: str # Response ID
choices: List[Choice] # List of completion choices
usage: Usage # Token usage statistics
model: str # Model used for completion
time_stats: float # Processing time in seconds
Choice Object
@dataclass
class Choice:
index: int # Choice index (always 0 for now)
message: Message # The message content
finish_reason: str # Reason for completion stop
Message Object
@dataclass
class Message:
role: str # "assistant" for responses
content: str # The actual text content
Usage Object
@dataclass
class Usage:
prompt_tokens: int # Tokens in the prompt
completion_tokens: int # Tokens in the completion
total_tokens: int # Total tokens used
Accessing Response Data
response = client.chat.completions.create(...)
# Get the text content
text = response.choices[0].message.content
# Get token usage
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
total_tokens = response.usage.total_tokens
# Get timing information
processing_time = response.time_stats
# Get model used
model_name = response.model
Advanced Examples
Example 1: Temperature Control
from arpia_tools import inference_client
client = inference_client.create_client(token=API_TOKEN)
prompt = "Write a creative tagline for a coffee shop"
# Low temperature (0.0-0.3) - More deterministic, focused
response_focused = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.2,
max_tokens=50
)
print("Focused:", response_focused.choices[0].message.content)
# Medium temperature (0.7-1.0) - Balanced creativity
response_balanced = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=50
)
print("Balanced:", response_balanced.choices[0].message.content)
# High temperature (1.5-2.0) - Maximum creativity
response_creative = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=1.8,
max_tokens=50
)
print("Creative:", response_creative.choices[0].message.content)
Example 2: Using Different Providers
from arpia_tools import inference_client
client = inference_client.create_client(token=API_TOKEN)
question = "Explain quantum computing in simple terms"
# OpenAI
response_openai = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": question}],
max_tokens=200
)
# Anthropic
response_anthropic = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": question}],
max_tokens=200
)
# Cerebras (fast inference)
response_cerebras = client.chat.completions.create(
model="cerebras/llama3.1-8b",
messages=[{"role": "user", "content": question}],
max_tokens=200
)
# Compare responses
print("OpenAI:", response_openai.choices[0].message.content)
print("\nAnthropic:", response_anthropic.choices[0].message.content)
print("\nCerebras:", response_cerebras.choices[0].message.content)
Example 3: Batch Processing
from arpia_tools import inference_client
client = inference_client.create_client(token=API_TOKEN)
# Multiple questions to process
questions = [
"What is machine learning?",
"What is deep learning?",
"What is neural network?",
"What is natural language processing?"
]
results = []
total_tokens = 0
for question in questions:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": question}],
max_tokens=100,
json_mode=True # Get token usage
)
results.append({
"question": question,
"answer": response.choices[0].message.content,
"tokens": response.usage.total_tokens
})
total_tokens += response.usage.total_tokens
# Display results
for i, result in enumerate(results, 1):
print(f"\n{i}. {result['question']}")
print(f" Answer: {result['answer']}")
print(f" Tokens: {result['tokens']}")
print(f"\nTotal tokens used: {total_tokens}")
Example 4: Dynamic Client Configuration
from arpia_tools import inference_client
# Create client with initial configuration
client = inference_client.create_client(
token="initial-token",
base_url="http://dev-api.example.com"
)
# Use client
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=50
)
# Update token (e.g., after re-authentication)
client.set_token("new-session-token")
# Switch to production environment
client.set_base_url("https://prod-api.example.com")
# Continue using with new configuration
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello again"}],
max_tokens=50
)
Example 5: Helper Function Pattern
from arpia_tools import inference_client
from typing import Optional
# Configuration
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"
def ask_llm(
prompt: str,
model: str = "openai/gpt-4o",
system_prompt: Optional[str] = None,
max_tokens: int = 500,
temperature: float = 0.7
) -> str:
"""
Convenience function for single-turn LLM queries
Args:
prompt: User prompt
model: Model to use
system_prompt: Optional system prompt
max_tokens: Maximum tokens to generate
temperature: Sampling temperature
Returns:
LLM response text
"""
client = inference_client.create_client(
token=API_TOKEN,
base_url=API_BASE_URL
)
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature
)
return response.choices[0].message.content
# Usage
answer = ask_llm("What is Python?")
print(answer)
# With system prompt
answer = ask_llm(
prompt="Explain recursion",
system_prompt="You are a computer science teacher. Use simple language.",
max_tokens=300
)
print(answer)
Example 6: Workarea Custom Models
from arpia_tools import inference_client
client = inference_client.create_client(token=API_TOKEN)
# Use your workarea's custom fine-tuned model
response = client.chat.completions.create(
model="workarea/my-company-gpt4", # Must have "workarea/" prefix
messages=[
{"role": "system", "content": "You are our company's AI assistant."},
{"role": "user", "content": "What are our Q4 priorities?"}
],
max_tokens=300,
temperature=0.7
)
print(response.choices[0].message.content)
Error Handling
Exception Hierarchy
ArpiaToolsError # Base exception
└── ArpiaInferenceError # Inference client errors
Basic Error Handling
from arpia_tools import inference_client, ArpiaInferenceError
client = inference_client.create_client(token=API_TOKEN)
try:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100
)
print(response.choices[0].message.content)
except ArpiaInferenceError as e:
print(f"Inference error: {e}")
# Handle API-specific errors (invalid model, auth issues, etc.)
except Exception as e:
print(f"Unexpected error: {e}")
# Handle unexpected errors
Comprehensive Error Handling
from arpia_tools import inference_client, ArpiaInferenceError
import requests
def safe_llm_call(prompt: str, model: str = "openai/gpt-4o") -> Optional[str]:
"""
Make an LLM call with comprehensive error handling
"""
client = inference_client.create_client(token=API_TOKEN)
try:
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
max_tokens=200,
timeout=60
)
return response.choices[0].message.content
except ArpiaInferenceError as e:
if "not found" in str(e).lower():
print(f"Model '{model}' not found. Check model availability.")
elif "invalid" in str(e).lower():
print(f"Invalid request parameters.")
else:
print(f"API error: {e}")
return None
except requests.exceptions.Timeout:
print("Request timed out. Try again or increase timeout.")
return None
except requests.exceptions.ConnectionError:
print("Connection error. Check network and API URL.")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
# Usage
result = safe_llm_call("What is AI?")
if result:
print(result)
else:
print("Failed to get response")
Common Error Scenarios
Error | Cause | Solution |
---|---|---|
invalid session token | Token expired or incorrect | Refresh authentication token |
model not found | Invalid model identifier | Check model name format |
timeout | Request took too long | Increase timeout or reduce max_tokens |
connection error | Network/API unavailable | Check base_url and network |
invalid model format | Wrong model syntax | Use correct format (see Model Formats) |
Best Practices
1. Use Constants for Configuration
# ✅ Good - Easy to manage and update
API_TOKEN = "your-session-token"
API_BASE_URL = "http://localhost:9191"
DEFAULT_MODEL = "openai/gpt-4o"
client = inference_client.create_client(
token=API_TOKEN,
base_url=API_BASE_URL
)
2. Implement Retry Logic
import time
from arpia_tools import inference_client, ArpiaInferenceError
def llm_call_with_retry(prompt: str, max_retries: int = 3):
client = inference_client.create_client(token=API_TOKEN)
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=200
)
return response.choices[0].message.content
except ArpiaInferenceError as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Attempt {attempt + 1} failed. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise
3. Monitor Token Usage
from arpia_tools import inference_client
client = inference_client.create_client(token=API_TOKEN)
# Track token usage across multiple calls
total_tokens = 0
responses = []
for prompt in prompts:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=100,
json_mode=True # Get token counts
)
total_tokens += response.usage.total_tokens
responses.append(response.choices[0].message.content)
# Alert if approaching limit
if total_tokens > 100000:
print(f"Warning: Used {total_tokens} tokens")
break
print(f"Total tokens used: {total_tokens}")
4. Use Appropriate Temperature
# Factual/analytical tasks - Low temperature (0.0-0.3)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}],
temperature=0.0 # Deterministic
)
# Creative tasks - Medium to high temperature (0.7-1.5)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a poem"}],
temperature=1.2 # More creative
)
# Code generation - Low to medium temperature (0.2-0.7)
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write Python function"}],
temperature=0.5 # Balanced
)
5. Validate Model Format
def validate_model_format(model: str) -> bool:
"""Validate model identifier format"""
# Check for workarea models
if model.startswith("workarea/"):
return len(model.split("/")) == 2
# Check for core models (should have exactly one slash)
if model.count("/") == 1:
provider = model.split("/")[0]
valid_providers = ["openai", "anthropic", "cerebras", "openrouter"]
return provider in valid_providers
return False
# Usage
model = "openai/gpt-4o"
if validate_model_format(model):
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello"}]
)
else:
print(f"Invalid model format: {model}")
6. Reuse Client Instances
# ✅ Good - Reuse client
client = inference_client.create_client(token=API_TOKEN)
for prompt in prompts:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
process(response)
# ❌ Bad - Creating new client every time (unnecessary overhead)
for prompt in prompts:
client = inference_client.create_client(token=API_TOKEN)
response = client.chat.completions.create(...)
API Reference
inference_client Module
inference_client.create_client(token, base_url, timeout)
inference_client.create_client(token, base_url, timeout)
Create an Arpia AI inference client instance.
Parameters:
token
(str): Session token for authenticationbase_url
(str, optional): API base URL. Default:"http://localhost:9191"
timeout
(int, optional): Request timeout in seconds. Default:300
Returns: ArpiaAI
instance
Example:
client = inference_client.create_client(
token="your-token",
base_url="http://api.example.com",
timeout=60
)
ArpiaAI.chat.completions.create(...)
ArpiaAI.chat.completions.create(...)
Create a chat completion request.
Parameters:
model
(str, required): Model identifiermessages
(List[Dict], required): Conversation messagesmax_tokens
(int): Maximum tokens to generate. Default:100
temperature
(float): Sampling temperature 0-2. Default:1.0
stream
(bool): Enable streaming. Default:False
json_mode
(bool|"plain"): Response format. Default:False
format
(str): Format type. Default:"text"
response_format
(dict): Additional format specs. Default:{}
Returns: CompletionResponse
object
Raises: ArpiaInferenceError
on API errors
Example:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
max_tokens=150,
temperature=0.7,
json_mode=True
)
ArpiaAI.set_token(token)
ArpiaAI.set_token(token)
Update the session token.
Parameters:
token
(str): New session token
Example:
client.set_token("new-session-token")
ArpiaAI.set_base_url(base_url)
ArpiaAI.set_base_url(base_url)
Update the API base URL.
Parameters:
base_url
(str): New base URL
Example:
client.set_base_url("https://prod-api.example.com")
Utility Functions
get_version()
get_version()
Get the current SDK version.
Returns: Version string
Example:
from arpia_tools import get_version
print(f"SDK Version: {get_version()}") # "1.0.0"
list_modules()
list_modules()
List all available modules.
Returns: List of module names
Example:
from arpia_tools import list_modules
print(list_modules()) # ["inference_client"]
Future Modules
The following modules are planned for future releases:
data_client
(Planned)
data_client
(Planned)Data operations and management.
# Future usage
from arpia_tools import data_client
data = data_client.create_client(token=API_TOKEN)
df = data.read_table("customers")
data.write_table("processed_data", df)
storage_client
(Planned)
storage_client
(Planned)File and object storage operations.
# Future usage
from arpia_tools import storage_client
storage = storage_client.create_client(token=API_TOKEN)
storage.upload_file("report.pdf", "/reports/")
storage.download_file("/reports/report.pdf", "local_report.pdf")
workflow_client
(Planned)
workflow_client
(Planned)Workflow automation and orchestration.
# Future usage
from arpia_tools import workflow_client
workflow = workflow_client.create_client(token=API_TOKEN)
workflow.trigger("data_pipeline", params={"date": "2025-01-01"})
status = workflow.get_status("job-12345")
Support & Feedback
For questions, issues, or feature requests:
- Documentation: Check this guide first
- Platform Support: Contact Arpia Platform support team
- API Status: Monitor API health dashboard
- Updates: SDK updates are automatically available in your execution environment
Updated 1 day ago