Stop configuring from scratch. These are five production-tested Hermes Agent configs. Copy the one that matches your situation, fill in your tokens, and you’re done.
- Just me, personal use → Config 1
- Small team (2–10 people) → Config 2
- Team in production → Config 3
- Enterprise / Kubernetes → Config 4
- Cost-conscious with quality fallback → Config 5
Config 1: Personal Setup (CLI + Ollama, Fully Local)
Use when: Solo developer, privacy-first, zero API costs.
Requirements: 8GB RAM, Ollama installed, 4GB disk for models.
Monthly cost: $0
# ~/.hermes/config.yml
llm:
provider: "ollama"
endpoint: "http://localhost:11434"
model: "mistral"
streaming: true
context_window: 4096
temperature: 0.7
platforms:
cli:
enabled: true
memory:
skill_generation: true
skill_auto_save: true
conversation_retention: 365 # days
performance:
max_concurrent_tasks: 1
cache_embeddings: true Setup steps:
# 1. Install Ollama and pull model
brew install ollama && ollama serve
ollama pull mistral
# 2. Install Hermes
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# 3. Save config above to ~/.hermes/config.yml
# 4. Start
hermes Config 2: Small Team (Discord + Ollama, Local Server)
Use when: 2–10 person team, single server, want free inference.
Requirements: 16GB RAM server, NVIDIA GPU (optional), Ollama, Discord bot token.
Monthly cost: ~$30–50 (server electricity/hosting), $0 for AI
# ~/.hermes/config.yml
llm:
provider: "ollama"
endpoint: "http://localhost:11434"
model: "mistral:q4"
streaming: true
context_window: 8192
temperature: 0.7
num_gpu: 1 # Remove if no GPU
platforms:
discord:
token: ${DISCORD_BOT_TOKEN}
learn_from_conversations: true
allowed_servers:
- ${DISCORD_SERVER_ID}
telegram:
token: ${TELEGRAM_BOT_TOKEN}
learn_from_chats: true
memory:
skill_generation: true
skill_auto_save: true
skill_versioning: true
conversation_retention: 180
performance:
max_concurrent_tasks: 4
cache_embeddings: true
security:
log_sensitive: false Environment file:
# ~/.hermes/.env
DISCORD_BOT_TOKEN=your_discord_token_here
DISCORD_SERVER_ID=your_server_id_here
TELEGRAM_BOT_TOKEN=your_telegram_token_here Always use environment variables. Add .env to .gitignore if you’re
version-controlling your config.
Config 3: Production Team (Multi-Platform + Cloud LLM)
Use when: 10–100 person team, cloud infrastructure, need highest quality responses.
Requirements: VPS/cloud server (4 vCPU, 8GB RAM min), cloud LLM API key.
Monthly cost: $50–200 (server) + $20–100 (LLM API usage)
# ~/.hermes/config.yml
llm:
provider: "openai"
api_key: ${OPENAI_API_KEY}
model: "gpt-4"
fallback_model: "gpt-3.5-turbo" # Cost fallback
streaming: true
context_window: 16000
temperature: 0.7
# Cost controls
cost_limit:
daily_usd: 50
monthly_usd: 1000
fallback_trigger:
on_cost_exceed: true
platforms:
discord:
token: ${DISCORD_BOT_TOKEN}
learn_from_conversations: true
slack:
token: ${SLACK_BOT_TOKEN}
channels:
- "dev-team"
- "ops"
- "general"
learn_from_messages: true
telegram:
token: ${TELEGRAM_BOT_TOKEN}
learn_from_chats: true
memory:
skill_generation: true
skill_versioning: true
skill_auto_refinement: true
prune_old_skills: true
prune_threshold_days: 90
conversation_retention: 180
summarize_old_conversations: true
performance:
max_concurrent_tasks: 8
queue_strategy: "priority"
cache_embeddings: true
cache_ttl: 86400
monitoring:
log_level: "info"
log_to_file: true
log_path: "/var/log/hermes/hermes.log"
security:
log_sensitive: false
token_rotation_days: 90 Config 4: Enterprise (Kubernetes + Monitoring)
Use when: 100+ users, high availability, full observability needed.
Requirements: Kubernetes cluster, PostgreSQL for shared memory, Prometheus/Grafana.
Monthly cost: $200–1000+ (depends on scale)
# ~/.hermes/config.yml (deployed via ConfigMap)
llm:
provider: "openai"
api_key: ${OPENAI_API_KEY}
model: "gpt-4"
fallback_model: "gpt-3.5-turbo"
context_window: 16000
streaming: true
platforms:
slack:
token: ${SLACK_BOT_TOKEN}
workspaces:
- ${SLACK_WORKSPACE_1}
- ${SLACK_WORKSPACE_2}
learn_from_messages: true
discord:
token: ${DISCORD_BOT_TOKEN}
learn_from_conversations: true
memory:
backend: "postgresql"
connection: ${DATABASE_URL}
replication: true
skill_versioning: true
prune_threshold_days: 90
performance:
max_concurrent_tasks: 16
queue_strategy: "priority"
cache_backend: "redis"
cache_connection: ${REDIS_URL}
cache_ttl: 86400
monitoring:
prometheus:
enabled: true
port: 9090
log_level: "info"
log_format: "json"
track_latency: true
track_skill_usage: true
security:
log_sensitive: false
audit_log: true
audit_log_path: "/var/log/hermes/audit.log" Kubernetes deployment snippet:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hermes-agent
namespace: ai-tools
spec:
replicas: 3
selector:
matchLabels:
app: hermes-agent
template:
spec:
containers:
- name: hermes
image: ghcr.io/nousresearch/hermes-agent:latest
envFrom:
- secretRef:
name: hermes-secrets
resources:
requests:
memory: "4Gi"
cpu: "1"
limits:
memory: "8Gi"
cpu: "2"
volumeMounts:
- name: config
mountPath: /root/.hermes/config.yml
subPath: config.yml
volumes:
- name: config
configMap:
name: hermes-config Config 5: Hybrid (Local Ollama Primary, Cloud Fallback)
Use when: Cost-sensitive but need quality headroom. Best of both worlds.
Requirements: Machine with 16GB+ RAM, GPU preferred, OpenAI API key for fallback.
Monthly cost: $5–20 (minimal cloud fallback usage)
# ~/.hermes/config.yml
llm:
# Primary: free local inference
provider: "ollama"
endpoint: "http://localhost:11434"
model: "mistral:q4"
streaming: true
context_window: 8192
timeout: 30 # seconds before fallback
# Fallback: cloud for complex tasks
fallback:
provider: "openai"
api_key: ${OPENAI_API_KEY}
model: "gpt-4"
trigger_on_timeout: true
trigger_on_complexity: "high"
cost_limit:
daily_usd: 5
monthly_usd: 20
platforms:
discord:
token: ${DISCORD_BOT_TOKEN}
learn_from_conversations: true
cli:
enabled: true
memory:
skill_generation: true
skill_versioning: true
prune_threshold_days: 90
performance:
max_concurrent_tasks: 4
cache_embeddings: true
preload_models:
- "mistral:q4" # Keep loaded in memory Typical usage: 90% of requests handled by Ollama ($0), 10% by OpenAI (~$2–5/month). You get near-GPT-4 quality at near-zero cost.
Quick Token Setup Reference
Every config that uses platforms needs these tokens. Here’s where to get each:
| Platform | Where to get token | Config key |
|---|---|---|
| Discord | discord.com/developers → Bot tab | DISCORD_BOT_TOKEN |
| Slack | api.slack.com/apps → OAuth tokens | SLACK_BOT_TOKEN |
| Telegram | @BotFather on Telegram → /newbot | TELEGRAM_BOT_TOKEN |
| OpenAI | platform.openai.com → API keys | OPENAI_API_KEY |
| Anthropic | console.anthropic.com → API keys | ANTHROPIC_API_KEY |
Store all of these in environment variables, never in the config file itself.
What to Read Next
- Security Guide — Protect your tokens and deployment
- Platform Connections — Set up Discord, Slack, Telegram
- Advanced Config — Tune every parameter
- Setup Checklist — Verify your deployment is correct
Related Articles
Deepen your understanding with these curated continuations.
Hermes Agent Setup Checklists: Personal, Team & Production
Three copy-paste checklists for Hermes Agent. Personal setup (15 min), team deployment (1 hr), and production security (before go-live).
Install Hermes Agent in 15 Minutes: Complete Setup Guide
One bash command and you're done. Here's the exact step-by-step installation for Hermes Agent on Linux, macOS, and Windows.
Advanced Hermes Agent: Optimization, Scaling & Learning Loop Tuning
Make your Hermes Agent production-grade. Optimize the learning loop, scale to thousands of users, and tune every parameter.