Software — oss.report

◈

AI Agent Platforms

Personal software for running agentic AI services locally.

OpenClaw

Run AI agents locally with tool use, memory, and multi-channel messaging. CLI-first, self-hosted.

BSL-1.1

CrewAI

Multi-agent orchestration framework. Role-based agents with task delegation.

MIT

AutoGen

Microsoft's multi-agent conversation framework. Agent-to-agent workflows.

MIT

LangGraph

Stateful multi-agent workflows from LangChain. Graph-based agent orchestration.

MIT

Haystack

End-to-end NLP/AI framework by deepset. Pipelines, RAG, agents.

Apache 2.0

⚡

Distributed & Cluster Inference

Pool multiple devices to run models too large for one machine.

exo

Connect all your devices into an AI cluster. RDMA over Thunderbolt, automatic device discovery, tensor parallelism.

Apache 2.0

Petals

Run 100B+ models at home by joining a distributed swarm. BitTorrent-style collaborative inference.

MIT

△

Inference Engines

Run models locally on your hardware.

Ollama

One-command model runner. Pull and run any GGUF model.

MIT

llama.cpp

C/C++ inference engine. The foundation most local tools build on.

MIT

vLLM

High-throughput serving with PagedAttention. Production-grade.

Apache 2.0

MLX

Apple Silicon native. Best performance on Mac hardware.

MIT

ExLlamaV2

Optimized GPTQ/EXL2 inference for NVIDIA GPUs.

MIT

SGLang

Fast serving framework with RadixAttention. Structured generation.

Apache 2.0

◇

Interfaces & Frontends

UIs for interacting with local models.

Open WebUI

ChatGPT-style interface for Ollama and OpenAI-compatible APIs.

MIT

LM Studio

Desktop app for discovering and running local models. GUI-first.

Proprietary

Jan

Offline-first desktop client. Clean UI, local-only by default.

AGPL-3.0

AnythingLLM

All-in-one RAG application. Documents → local LLM → answers.

MIT

GPT4All

Cross-platform desktop client from Nomic. Simple and reliable.

MIT

Msty

Native Mac/Windows app for running local models. Clean, fast.

Proprietary

○

Fine-Tuning

Train and adapt models on your own data.

Unsloth

2x faster fine-tuning with 80% less VRAM. QLoRA optimized.

Apache 2.0

Axolotl

Streamlined fine-tuning config. Supports LoRA, QLoRA, full fine-tune.

Apache 2.0

PEFT

Hugging Face parameter-efficient fine-tuning. LoRA, prefix tuning, etc.

Apache 2.0

TRL

Transformer Reinforcement Learning. RLHF, DPO, PPO training.

Apache 2.0

□

Quantization

Compress models to fit smaller hardware.

llama.cpp (GGUF)

CPU-friendly quantization. Q4_K_M is the sweet spot for most users.

MIT

AutoAWQ

Activation-aware weight quantization. Good GPU inference quality.

MIT

AutoGPTQ

GPTQ quantization. Widely supported, GPU-optimized.

MIT