Thunderbolt is Mozilla’s new self-hostable AI client designed for organizations seeking to run AI on their own infrastructure and maintain data in-house.
It sounds like a step further than open-webui; it’s an enterprise grade client-server model for access to agents, workflows, and centralized knowledge repositories for RAG.
In addition to local chatbot for executive/admin use, I can see this being the backend for developers running Cursor or some other AI enhanced IDE, with local knowledge stores holding proprietary documents and running against local large models.
I am also curious about time share and prioritization of resources; I assume it would queue simultaneous requests. Presumably this would let you more effectively pool local compute, rather than providing A100 GPUs to each developer that may sit unused when they’re not working.
Edit: Somewhat impressively, this whole stack does not even include a local inference provider; so it does everything except local models right now, and requests are forwarded to cloud inference providers (Anthropic, OpenAI, etc). But it does have the backend started for rate limiting and queuing, and true “fully offline/local” is on the roadmap, just not there yet.
After reading through the GitHub docs, the most impressive thing is that they open sourced their Thunderbolt coding agent for Claude Code. There are quite a few skills available for implementation planning, dependency/build environment setup, coding, linting/cleanup, QA, and managing agent pull requests. Pretty good examples if you are looking at building Claude Code skills.
It sounds like a step further than open-webui; it’s an enterprise grade client-server model for access to agents, workflows, and centralized knowledge repositories for RAG.
In addition to local chatbot for executive/admin use, I can see this being the backend for developers running Cursor or some other AI enhanced IDE, with local knowledge stores holding proprietary documents and running against local large models.
I am also curious about time share and prioritization of resources; I assume it would queue simultaneous requests. Presumably this would let you more effectively pool local compute, rather than providing A100 GPUs to each developer that may sit unused when they’re not working.
Edit: Somewhat impressively, this whole stack does not even include a local inference provider; so it does everything except local models right now, and requests are forwarded to cloud inference providers (Anthropic, OpenAI, etc). But it does have the backend started for rate limiting and queuing, and true “fully offline/local” is on the roadmap, just not there yet.
After reading through the GitHub docs, the most impressive thing is that they open sourced their Thunderbolt coding agent for Claude Code. There are quite a few skills available for implementation planning, dependency/build environment setup, coding, linting/cleanup, QA, and managing agent pull requests. Pretty good examples if you are looking at building Claude Code skills.
I thought gstack already gave developers God Mode?