A dedicated AI inference node in the mesh network — running local language models, hosting persona bots, processing WTM gaps, gating feedback quality, and anchoring models on IPFS. Your data never leaves the network to get intelligent.
Two llama.cpp instances run side by side, matched to workload complexity. The inference cascade routes each function to the right model automatically.
Google · Q4_K_M · 769 MB · port 8080 · 8 threads · 2 slots · ~40 tok/s
Functions served locally:
content-summarize · content-rewrite · content-translate · sentiment-analyze · language-detect · quality-score · chat-respond · feedback-generate · explanation-generate
Google · Q4_K_M · 2.4 GB · port 8081 · 12 threads · 2 slots · ~18 tok/s
Functions served locally:
opinion-objective · opinion-subjective · opinion-insight · opinion-generate · topic-classify · argument-for · argument-against · question-generate
This node runs two Google Gemma 3 models covering 22 built-in functions. Gemma 3 1B handles fast text tasks (~40 tok/s); Gemma 3 4B handles opinion matrices and complex reasoning (~18 tok/s). Both stream responses token-by-token and fall back to api.mutual.ai when all local slots are occupied.
Beyond on-demand function calls, this node runs four continuous workloads that keep the mesh ecosystem intelligent and self-improving.
Always-on conversational personas (mesh-guide, topic-scout) use a three-tier escalation: fast RiveScript pattern matching first, then local inference via chat-respond on Gemma 3 1B (~40 tok/s, 8 s timeout), and cloud fallback only if local slots are full. Zero cost for most conversations.
The World Topic Map has thousands of topics with missing descriptions, aliases, training phrases, and reference frames. A WtmInferenceRouter maps each gap type to the right function and model tier — fast model for descriptions, quality model for opinion templates — and processes them in batches without touching cloud APIs.
Every opinion submitted to the mesh passes through a local coherence check using the quality-score function. The gate evaluates accuracy, clarity, and completeness — low scores dampen the effort weight but never block the submission. No cloud dependency, no added latency for high-quality content.
Model weights (Gemma 3 1B: 769 MB, Gemma 3 4B: 2.4 GB) and application bundles are pinned to IPFS via the local Kubo daemon. A pin manifest tracks every CID by logical name so other nodes can fetch models peer-to-peer — making 4 TB of RAID-1 storage available as a content-addressed availability layer.
When mutual.app needs AI, it works through six tiers — always trying local and trusted sources first, escalating toward the cloud only when necessary. This node sits at Tier 5.
Running AI on mesh infrastructure keeps your data inside the network — no queries sent to third-party cloud APIs, no training on your content.
Requests to this node are routed through encrypted peer-to-peer connections. No content is logged, stored, or forwarded outside the mesh.
Intel Xeon Silver 4123 (8c/16t), 96 GB ECC RAM, 8 TB RAID-1 storage. Two Gemma 3 instances run in parallel — 1B for speed, 4B for depth — using only ~3.2 GB of 96 GB available. Models are memory-locked (mlock) to prevent paging to disk.
You call a function by name — summarize, classify, generate opinion, embed. The cascade picks the model tier automatically. No model expertise required, no endpoint to manage.
The cascade picks the best node based on reputation, capacity, and latency — not just availability. Quality inference earns the node more requests.
Conversational personas escalate from RiveScript patterns to local Gemma 3 inference before ever reaching the cloud. Most AI-powered bot replies cost zero and stay on-node.
Model weights and bundles are pinned to IPFS from 4 TB RAID-1 storage. Other nodes fetch models peer-to-peer instead of from centralized mirrors.
Opinion submissions pass through a local coherence check — accuracy, clarity, completeness scored by Gemma 3 1B. Low quality dampens influence weight without blocking participation.
Prometheus metrics at /metrics. Every function's throughput, latency, and error rate is observable. The cascade trusts nodes that are transparent.
Relay and inference nodes form complementary layers. Relay nodes move data; inference nodes process it. Together they make the mesh self-sufficient.
Open mutual.app — your AI requests will cascade through the network, using this node when it's the best available option.
Open mesh.app