I Had the Idea Two Years Ago. Now I'm Building It Properly.

Week 1 of DeskSpirit – a real AI assistant built locally in Python over 30 days.

This is a real AI assistant built locally in Python over 30 days. It uses voice input, local AI models (Ollama), and a simple memory system to create a practical daily productivity companion. Here's what worked, what failed, and how you can build your own.

What this is:

A month-long build of a local AI productivity assistant

Built with:

Python, Ollama (local AI models), local speech-to-text and text-to-speech

Goal:

A real daily assistant – not a demo

Constraint:

Built step-by-step, one working layer at a time

What this is not

This is not a waifu chatbot.
This is not a novelty assistant.
This is not a weekend toy.

It is a practical companion for doing real work.

The idea (again)

Two years ago, I tried to build a personal AI companion that could talk with me, remember what I was working on, and help me stay productive.

The idea was good. The tools were not ready.

Now I am trying again – with better tools and a different approach.

What I tried before (2023)

Back in 2023, I built a basic "Jarvis"-style assistant in Python. It used SpeechRecognition, PyAudio, and pyttsx3.

It could listen through the microphone, convert speech to text, detect simple commands, and respond using text-to-speech.

It worked. But only as a demo.

It didn't remember anything. It didn't improve. It didn't help with real work.

At the time, I thought I needed better code. Looking back, the real problem was structure – and the tools.

Week 1 – Voice Loop

Goal: Can I talk to it and hear it answer back?

The "voice loop" is the basic cycle: I speak → it understands → it replies → it speaks back.

Current progress

The local Python app now completes the full loop:

microphone recording
speech-to-text
local AI response (via Ollama)
text-to-speech
conversation saving

Every interaction is stored in a simple append-only JSONL file.

It works end-to-end. No UI. No memory system. No productivity layer. Just the core loop.

What I cut (Week 1)

I deliberately did not build:

a UI
a memory system
multi-step task execution
automation features

All of these were part of the original idea. They were removed to make the core loop work first.

What worked

A simple loop: input → process → output.

No persistence. No complexity. Just something that works immediately.

What failed

The first version failed immediately. Too many moving parts: voice input lag, unclear command structure, no defined core loop.

The biggest issue: it was trying to do everything. That's the same mistake I made two years ago.

Setup (Week 1)

If you want to try this yourself, here is the setup.

Requirements

Python 3.12 (or similar)
VS Code or any editor
Ollama installed locally
A working microphone

Install Python libraries

These libraries handle microphone input, speech-to-text, and voice output.

pip install sounddevice faster-whisper pyttsx3
ollama pull gemma3:4b

Notes

sounddevice – microphone recording
faster-whisper – local speech-to-text
pyttsx3 – offline text-to-speech

On Windows, pyttsx3 uses system voices like Microsoft David or Zira.

Build note – what actually slowed things down

The first blocker wasn't AI. It was setup.

Python was installed, but not available on PATH. After fixing that, the real issues were microphone timing, imperfect transcription, and choosing a usable local model.

The system works – but it's still rough. That's intentional.

Why this matters

This is not impressive. But it is the first version that actually works.

Most AI assistant projects fail because they try to build everything at once. This one doesn't. It proves the smallest useful loop can exist first.

Why I am structuring this differently

The biggest change from my first attempt is structure. Before, I tried to build everything in one go. This time, I am breaking it down from the start.

I'm using a folder and markdown system so tools like Claude Code and Codex stay focused. The idea is simple: one main map file, separate folders for each part of the build, small context files, and clear rules about what not to build yet.

This keeps the project from drifting.

Where Week 1 is now

The foundation is in place. The project now has:

a dedicated voice-loop workspace
config and path helpers
local audio storage
local conversation storage
a working loop
clear build rules

It is not impressive yet. That's not the goal. The goal is to make the loop work.

Used in this build: Ollama local AI model

Back to DeskSpirit overview →

See all builds →

See another build: Tab Shelf Cleaner →