x-terminate - Chad Nauseam Home

# X-Terminate: Hide Politics From Your Twitter Feed A sneak peak into Wafer's AI stack - 2025-05-04 --- I'm excited to announce X-Terminate: a Chrome extension that hides politics from your X feed. Here's a demo of the algorithm: ![[x-terminate-demo.mp4]] Political tweets are highlighted in red for the demo, but in practice the extension hides political tweets before you see them. [Source code and installation instructions](https://github.com/wafer-inc/x-terminate) ## Is this a serious product? Not really. You can install it and it'll work, but I created X-Terminate because I wanted an open-source demonstration of the AI tooling I've been working on. ## How does it work? At [Wafer](https://wafer.systems/), we're building a mobile OS that knows everything you know. To make it work, we often need to label the data on your phone, and LLMs are great for that. But running an LLM against every piece of data in your phone isn't practical. So we looked away from LLMs, and towards embedding models. Embedding models with great performance can easily run on a phone. The problem is that embeddings aren't easy to interpret. (You basically need another AI to do it for you.) This lead us to staple of classic ML: decision trees. Decision trees are incredibly cheap to run and train, and in our experience, they're extremely good at interpreting embeddings. We're all-in on Rust, so I built Rust libraries for [data labelling](https://github.com/not-pizza/tysm) and [decision tree inference](https://github.com/wafer-inc/catboost). The new workflow is: 1. Collect data. 2. Use an LLM to label the data. 3. Create embeddings based on the data. 4. Train a decision trees that can interpret those embeddings. Essentially, we use LLM-generated labels to bootstrap the decision tree. Once the decision tree is trained, the LLM is no longer needed. To train the decision trees, I use CatBoost[^1]. In my experience, models trained with CatBoost generalize incredibly well off only a few examples. I think this workflow is awesome, and wanted to create a demo project to share it with the world. Thus, the idea for X-Terminate was born. (NB: I am not the AI expert at Wafer, so I've been learning as I go. I didn't realize how easy it was to train models this way, so I'm excited to share it. However, it is entirely possible that I'm the last person to find out about this and everyone else is already using it.) The first step was to write a chrome extension that could scrape tweets as they get loaded. Once I had that, I set up a dummy X account[^2], followed a bunch of random people, and scraped their tweets. I then used our data-labelling library to ask ChatGPT to label all of them. Then I created embeddings for each tweet using `text-embedding-3-small`. A python script then trained a decision tree on the embeddings, and the result was a model that could label tweets as either political or not using only the embedding. I then just had to integrate that model into the chrome extension, which I did by [compiling our CatBoost-inference library to WASM and putting it on NPM](https://www.npmjs.com/search?q=catboost). The reliance on OpenAI's `text-embedding-3-small` means that the Chrome extension needs your OpenAI API key to run. Unfortunately, we didn't have time to experiment with embedding models that can run in the browser for this project. However, it makes for a great comparison against the naive approach of running an LLM on every tweet. `gpt-4o` costs about 100x more than `text-embedding-3-small`. Assuming the average tweet is 100 tokens and a twitter user can view 2000 tweets per hour, one hour of browsing would cost you $0.004 with our approach (interpreting embeddings). If you used `gpt-4o` instead, it would cost $0.50. Not only that, but using `gpt-4o` would have higher latency. I've gotten good enough at executing this workflow that the whole project took me less than one work day, plus a little extra to play with the CSS. The other thing I like about this project is that it epitomizes our philosophy at Wafer. I believe that your data on your computer should be yours, to view and use as you see fit. Companies are incentivized to lock you into their platform and to try to limit what you can do on your own computer, but it's a battle I want the users to win. I also believe that AI has the capability to make your computer more human, at least in the sense of being able to organize your data in a human-like way. This is why the X-Terminate repo has instructions for how to train your own filter - maybe you want to block all tweets that mention AI or Rust, I won't get mad. [^1]: I prefer CatBoost over XGBoost and LightGBM because, in my limited experience, it is less likely to overfit. I also like that it creates symmetric/oblivious trees, which I find more elegant. [^2]: I needed a dummy account because I wanted to avoid the scenario where tweets from someone with their profile set to "private" inadvertently get included in the dataset.