A Tool For me
If you're reading this blog, you're probably one of the very few people who actually care about it. And I've got to come clean... I have not been writing these blog posts. Technically, AI has been writing them, but not in the way you think.
This journey started about three or four months ago when I switched to Linux. Originally, I wanted to find a good speech-to-text software. My goal was to have something very similar to the Windows dictation feature, when you press the Windows key and H, it pulls up a small GUI with a microphone button to start dictation, along with some other Cortana junk. But that's beside the point.
I searched for about three or four days trying to find something that filled this specific gap in my software ecosystem, and I just couldn't. There was nothing like that. All I wanted was two buttons, one to pick the microphone and one to start dictation. I didn’t need anything fancy. I didn’t need a voice visualizer that makes bars fly around when I speak because that's too complicated for me. All I wanted was to press a button, start speaking into the mic, and have it dictate what I was saying.
This is such an incredible tool that I needed, not for accessibility reasons, but because I’m lazy. I don’t want to write five-page-long papers when I could just grab a sticky note, jot down some bullet points, and then talk through my thoughts for half an hour, creating some kind of coherent essay. Maybe AI tech could filter it, but that rarely happens. I usually go back and clean things up by hand.
Wait a minute... I'm a developer!
As a developer, if something doesn’t exist, and you need it, you go ahead and make it. It doesn’t matter how much software already exists—if nothing fits my exact needs, I build my own. So, I decided to create my own little software for converting my voice into text, which I’m using to write this very blog post. I ended up writing it in about a day.
I call this piece of software Yapper because, well, I’m yapping into the mic. For the tech, I chose GTK because I like how it looks on Linux. I know some people prefer Qt, but to me, GTK just looks very polished, especially with GTK4. This seemed like the right time to explore the GTK ecosystem and see how easy it was to build an app.
Not too shabby, if I do say so myself.
The Tech Behind It
If you came to this blog post expecting some magical technology such as like me training a custom AI model to recognize my voice and convert it to text then you might be disappointed. This entire project consists of about 250 lines of completely AI-generated code. I wanted to see how good the new Claude 3.5 sonnet was at generating code, and this seemed like a small enough project to test it on.
It’s built on whisper.cpp, which I downloaded from the NixOS repositories with CUDA support. That was pretty much the only requirement, it had to be fast, and the only way to achieve that was by running it on the GPU. So far, I’m pretty happy with the results. It takes about three to four seconds to generate a full sentence, which means it doesn’t lag too far behind my thoughts.
To condense this entire paragraph into a single sentence: it uses Python with GTK and whisper.cpp. And that’s it.
Conclusion
This isn’t a tool I expect anyone else to really use—it’s mainly just for me. But if you want to try it, there’s a flake available on my GitHub. If you have an NVIDIA GPU and run Wayland, you might be able to use this. I’ve written some pretty good docs on GitHub, which I’ll link here.
https://github.com/Shlok-Bhakta/yapper
And with that, I’ll wrap up this blog post. See y’all next time!