Mobile Edge Inference SDKs EDGE BETA

The future of Sovereign AI is untethered on iOS and Android devices.

Privane is actively engineering native, highly optimized mobile SDKs that compile our core local inference engine directly to target mobile systems. This ensures mobile applications can execute 2B-4B parameter models with a privacy-first architecture, full offline support, and on-device local inference.

Model Compatibility Matrix

To maintain high technical standards, we strictly benchmark model runtime compatibility across different platforms and hardware acceleration targets:

Model	Parameter Size	Web GPU	iOS (Metal/MPS)	Android (Vulkan/ExecuTorch)
Gemma 2B	2.5 Billion	`Stable`	`Stable`	`Stable`
Llama 3 8B	8.0 Billion	`Stable`	`Beta`	`Beta`
Mistral 7B	7.2 Billion	`Stable`	`Beta`	`Beta`
Phi-3 3.8B	3.8 Billion	`Planned`	`Beta`	`Stable`

iOS SDK (Metal & CoreML)

The @privane/ios edge SDK heavily integrates with Apple’s CoreML and Metal Performance Shaders (MPS) to harness the Apple Neural Engine (ANE). This maximizes tokens-per-second while keeping battery usage and thermal profiles extremely low.

iOS Integration Example (Swift)

import PrivaneEngine
 
// Initialize the on-device AI runtime
let engine = PrivaneEngine()
 
// Load a quantized Gemma 2B model directly from the application bundle
try await engine.load(
    modelURL: Bundle.main.url(forResource: "gemma-2b-q4", withExtension: "gguf")!
)
 
// Stream local generation natively on Metal Performance Shaders
for try await chunk in try await engine.generate(prompt: "Explain edge computing:") {
    print(chunk, terminator: "")
}

Android SDK (Vulkan & ExecuTorch)

The @privane/android edge SDK uses Vulkan hardware acceleration APIs alongside PyTorch’s mobile-focused ExecuTorch runtime, allowing native Android developers to dispatch models seamlessly on Qualcomm Snapdragon, Samsung Exynos, or MediaTek processors.

Android Integration Example (Kotlin)

import dev.privane.engine.PrivaneEngine
 
// Initialize the local Android engine
val engine = PrivaneEngine(applicationContext)
 
// Load a quantized weights index from the local file system asset storage
engine.load("file:///android_asset/gemma-2b-q4.gguf")
 
// Coroutine-based asynchronous local stream collection
lifecycleScope.launch {
    engine.generate("Explain edge computing:").collect { chunk ->
        print(chunk)
    }
}

Cross-Platform Compilation

Because both edge SDKs utilize the exact same underlying .gguf weight files and compiler backends as the central Privane WebGPU Engine, prompt templates and custom-tuned models are completely interoperable across Web, Desktop, iOS, and Android systems without modification.

Agent Framework Integrations LiteLLM Proxy