Skip to content

Blog

Announcing: New Rust-based Cluster Agent


tl;dr We migrated our cluster agent from Go to Rust and now it’s smaller and uses less memory. To use the new cluster agent just upgrade to the latest release (cli/v0.8.2, helm/v0.15.2). You can also try it out live here.


Recently, we decided to migrate our cluster agent from Go to Rust. Now I’m happy to say the rewrite is complete and the result is a cluster agent image that is 57% smaller (10MB) and uses 70% less memory (~3MB) while still using minimal CPU (~0.1%).

The first version of Kubetail was designed to run inside the cluster and expose logs to users through a web browser. For that version, the primary responsibility of the backend was to make requests to the Kubernetes API and relay responses to the frontend in real-time. After looking at a few options including Python and JavaScript, I decided to write it in Go because it’s well supported by the Kubernetes API, has great multi-threading support and produces fast executables and small Docker images.

The next version of Kubetail added the kubetail CLI tool that was capable of running the web dashboard locally. To implement the CLI tool I chose Go again because the language has good CLI interaction libraries (thanks spf13!), great cross-platform support and most importantly because it enabled me to re-use the Go-based web app used by the in-cluster dashboard.

Until then, Kubetail only fetched logs using the Kubernetes API. But when I wanted to add new features such as log file sizes and last event timestamps — data not exposed by the Kubernetes API — I realized that we needed an agent with direct access to raw log files on each node. Although I could have used a different language, I chose Go again because it was the language I knew best and it had served us well so far. Luckily, it also had great support for gRPC which was a natural choice for agent’s interface.

Given the app’s feature set at that point, I was very happy with my original choice of Go because it had served us well on the desktop as well as in the cluster. Then I started looking into how to implement our number one most requested feature: log search.

When I started thinking about log search I knew that I wanted to use grep instead of a full-text index because it’s sufficient for most use cases and I didn’t want our users to incur the resource penalty of maintaining a full-text index. At the same time, I’d been using rg personally to grep logs for a while and I was impressed with its speed so when I started looking for a grep solution I was curious if I could use it somehow. That’s when I realized it was available as a library but with one catch — it was written in Rust.

Before writing any custom code I explored the idea of using rg as an external executable using exec.Command to interact with it via stdin/stdout. This worked for basic use cases but it got unwieldy as I started to add custom features like time filters, ANSI escape sequence handling and support for JSON formatted lines. So, I decided to dive in and write a custom log file grepper. I briefly explored using Go but ultimately decided that for performance and robustness reasons I wanted to use the library behind rg, ripgrep, which meant the code had to be written in Rust.

At the time, I didn’t want to rewrite the entire cluster agent in Rust so instead I looked into ways to call Rust from Go (e.g. rustgo) and settled on keeping the custom Rust code as a separate executable and calling it from Go using exec.Command. To make the code as simple as possible I used a shared protocol buffers schema with serialization/deserialization implemented at the stdin/stdout interface.

After launching the search feature, our community started to grow and I met a couple of hackers who had a lot more Rust experience than I did, Christopher Valerio (freexploit) and Giannis Karagiannis (gikaragia). Initially they started making improvements to the Rust code and as they got comfortable with the codebase we started talking about how to eliminate the impedance mismatch between Go and Rust in the cluster agent. Separately from the search feature, the cluster agent runs on every node in a cluster so its important for it to be as performant and lightweight as possible which is exactly the use case for Rust. With these ideas floating in the air, we had a community meeting where we discussed the idea of migrating the entire agent to Rust. They said they were excited to work on it so I said, let’s do it!

Once we made the decision, Christopher and Giannis got to work. Christopher defined the initial high-level architecture for the project and created some initial issues in GitHub. Then Giannis stepped in and started implementing the feature set, writing tests and creating more issues so we could get help from other contributors. Giannis was able to get to feature parity with the Go-based cluster agent in just a few weeks and after about another week of testing we decided the code was ready to merge into main.

I only started learning Rust recently so Claude Code and Codex CLI were invaluable in helping me to review Giannis’s pull requests. He was also using the chatbots on his side so it was a true human-bot partnership mediated by GitHub pull requests. One of the key benefits we had was that because the agent uses a well-defined gRPC interface we were able to re-use the protocol buffers schema and then just flip the switch when the Rust-based agent reached feature parity with the Go-based version. To build the Rust-based gRPC server we used tonic which was straightforward and only had minor differences compared to the Go-based gRPC server.

The end result is a cluster agent image that is 57% smaller (10MB) and uses 70% less memory (~3MB) while still using minimal CPU (~0.1%). Plus the code is much easier to work with now because it is all in the same language.

Our mission is to give users access to powerful logging tools in a simple and lightweight package but the Kubernetes API has limited logging capabilities so unlocking more advanced features requires direct access to raw log files on every node. That’s where the cluster agent comes in — it’s the foundation for everything we want to build next.

Of course users are understandably cautious about installing agents in their clusters. In addition to being useful, agents must also be small, fast and secure. The Rust migration is our answer to those requirements. By cutting image size by more than half and reducing memory use by 70%, we’ve made the Kubetail agent small enough that it can be deployed even in the most resource-constrained environments.

But this is just the beginning. Rust will let us push the limits of what can be done inside the cluster in real-time, directly with files on disk, while using as little CPU and memory as possible. Right now, our focus is on logs, but the same approach applies to metrics, notifications and other types of observability data.

We’re excited about what’s next and we’d love for you to be part of it. If you like what we’re doing and you want to contribute code or share feedback as a user, join us on Discord.

Thank You, OCV

If we succeed in our mission to build a new logging layer for Kubernetes that runs in every cluster it will be in no small part due to Open Core Ventures (OCV) and their Catalyst program led by Alex Smith.

OCV is a venture firm started by Sid Sijbandij (Co-Founder of GitLab) that funds early-stage open source companies that are founded on Open Core principles. As part of their open source outreach efforts they started a program called Catalyst that gives a small stipend and a lot of mentorship to open source project maintainers who are interested in growing their projects. Over the course of 12 weeks they teach you how to build an open source community and how to market your product effectively so that it can grow and find traction. Kubetail recently participated in the program, and for us it was a game-changer.

Prior to working on Kubetail, I co-founded a startup called Octopart that was a part of Y Combinator’s W07 batch. We didn’t do badly as far as startups go, so when I began working on Kubetail I followed a similar approach: I focused on building an MVP and as soon as it was ready, I posted it to Hacker News (HN). Thankfully, the post reached the front page for a few hours and we ended up with a couple of hundred GitHub stars and a small number of real users (~10).

Then Kubetail entered the Trough of Sorrow. This is the part of the startup curve after your initial launch when the buzz dies down and you’re left with a handful of users, no external validation, and you only have your own internal optimism to keep you going. I was no stranger to the trough so I did what I had done previously and just put my head down and kept coding.

During this period, I focused on making our MVP (the Kubetail Dashboard), as easy to use as possible. In response to bits of feedback from a few early users, I changed the architecture so it could run on a user’s desktop in addition to inside the cluster. I also focused on making it easier for users to find and download the app via Homebrew and other package repositories. And in the background I focused on implementing our number one requested feature, search.

For over a year, I worked on my own while growth on the project remained stagnant. Then I received a random outreach email from OCV that led to our acceptance into the Catalyst sponsorship program that changed everything.

As part of Catalyst, I received hands-on mentorship from Alex and the OCV team that proved invaluable for myself as someone with technical skills but without any experience in community building or managing an open source project. With Catalyst’s help I shifted my routine from pure coding to balancing development with community engagement and contributor support.

Before doing Catalyst, Kubetail had zero community. We had a Discord server but I was the only one in it, just sitting there working alone every day. Then Alex guided me week by week suggesting things to focus on and new things to try. With his help, Kubetail grew from around 300 stars to over 1,300 in the span of 12 weeks. And even more significantly, the community took off. Before Catalyst, we had 3 contributors and no users in Discord. Now we have 35 contributors and a vibrant Discord community with 61 members.

During Catalyst, everything came together and we were finally ready to launch our log search feature except this time with a community behind us and OCV’s mentorship to help us market the feature to new users. This time when we announced the feature, Kubetail ended up on the front page of HN for over a day and it got seen by tens of thousands of users on Reddit and Twitter. This translated to an increase in monthly downloads from less than 100 to over 400 and it transformed Kubetail from a small passion project into an ambitious community-driven endeavor. The highlight of Catalyst for me occurred around this time when I was able to share our 1,000 star GitHub milestone with a new Kubetail maintainer (rxinui) and the rest of our community.

Discord celebration

I’m under no illusions about how difficult the road ahead is. We’re working on a hard technical problem and we’re operating in a space with many well-funded enterprises such as Datadog, Grafana, New Relic and ClickHouse that already have the attention of most of our potential users. In addition, users already expect a lot of features from observability tools so we will need a lot of talented engineers to get the job done and for this we need resources that we haven’t figured out how to get yet.

However, I’ve never been more optimistic about our chances of success. Every time I learn something new from one of our experienced contributors or see how excited our younger contributors get when one of their pull requests gets merged, it energizes me. Every time I review a pull request from a user solving their own problem or get into a conversation with someone about a new feature, it makes me even more confident that I picked the best way to build a product - together as part of an open source community.

To me, an open source project is like a cooking pot that can produce high quality products that users love to use and are healthy for them too. But of course the magic ingredient behind every product is community and when it comes to Kubetail’s community for this I have to give a big thank you to Alex and the rest of the OCV team.

Announcing: Real-time log search for Kubernetes

Ever since Kubetail launched last year, the single most-requested feature has been log search. Now I’m happy to say we finally have log search available in our latest official release (cli/v0.4.3, helm/v0.10.1). You can check it out in action here:

https://www.kubetail.com/demo

Implementing search took time because the Kubernetes API doesn’t support it natively, so we had to build the feature from scratch. We considered implementing it quickly using client-side grepping but this wouldn’t have been a great user experience because each search could potentially trigger a full download of many log files which would have been slow and bandwidth-heavy. There are ways around this but it would have required extra user input which isn’t a great user experience.

Instead, we implemented search by creating a custom Rust-powered executable that wraps ripgrep. Why Rust? Because it’s fucking fast. Most of the Kubetail backend is written in Go but for this low-level component that reads log files on disk we wanted it to be as fast as possible. The result: a full scan of a 1GB file takes ~250msec. On every query, the executable scans only the relevant container log files on each node and streams back just the matching lines to your browser. Most queries can stop early which means they can even return before doing a full scan. You can think of Kubetail search as “remote grep” for your Kubernetes logs. Now you don’t need to download an entire log file just to grep it locally.

To enable search, you need to install Kubetail “Cluster Resources” in your cluster. This can be done easily by clicking “Install” from the GUI or by calling kubetail cluster install from the CLI. This action will deploy a Kubetail Cluster Agent to each node as well as an instance of the Kubetail Cluster API. When the Cluster API is available, the dashboard will use it to access Kubetail custom features like search on the nodes. Otherwise, the dashboard will disable those features in the GUI and fall back to using the Kubernetes API.

We’re just getting started with log search, and there’s plenty of room to make it even better. If you’re a Rust, Go, or React hacker - or a UI designer who loves logs - come help us build the most user-friendly, open-source logging platform for Kubernetes. Join our community on Discord!

Andres

Announcing Kubetail CLI "logs" command

To make it easier to monitor and debug multi-container workloads on Kubernetes, we’ve added a new logs command to the Kubetail CLI tool. With the new logs command you can now grep your Kubernetes workload logs in real-time from your terminal. You can also filter by time and other source properties such as node and zone.

To install the Kubetail CLI tool you can download it from the release page or use homebrew:

Terminal window
brew install kubetail

Here are some examples of what you can do with the new logs command:

Terminal window
# Tail 'web' deployment in the 'default' namespace
kubetail logs deployments/web
# Tail 'web' deployment in the 'frontend' namespace
kubetail logs frontend:deployments/web
# Return last 100 records
kubetail logs deployments/web --tail=100
# Return first 100 records
kubetail logs deployments/web --head=100
# Stream new records
kubetail logs deployments/web --follow
# Return all records
kubetail logs deployments/web --all
# Return first 10 records starting from 30 minutes ago
kubetail logs deployments/web --since PT30M
# Return last 10 records leading up to 30 minutes ago
kubetail logs deployments/web --until PT30M
# Return first 10 records between two exact timestamps
kubetail logs deployments/web --since 2006-01-02T15:04:05Z07:00 --until 2007-01-02T15:04:05Z07:00
# Return last 10 records that match "GET /about"
kubetail logs deployments/web --grep "GET /about" --force
# Return first 10 records that match "GET /about"
kubetail logs deployments/web --grep "GET /about" --head --force
# Return last 10 records that match "GET /about" or "GET /contact"
kubetail logs deployments/web --grep "GET /(about|contact)" --force
# Stream new records that match "GET /about"
kubetail logs deployments/web --grep "GET /about" --follow --force

The logs command uses your local kube config file to authenticate with your cluster so to switch clusters just change your kube config context. You can also use the --kube-context flag:

Terminal window
kubetail logs --kube-context minikube deployments/web

One thing you’ll notice is that in order to use --grep you also have to use --force. This is because filtering is done client-side which means the tool will continuously download logs from your cluster until the desired number of matches are found. This could result in unexpectedly large downloads so we added a secondary flag check. We’re working on a new feature to get around this issue.

Please check out the new logs command and let us know what you think!