"I'll handle that." "We should follow up on this." "Remind me to check in with Alex." "Don't let me forget about the design review."
These are all todos. None of them look like a traditional task entry.
The classification problem
When we started building Maatix's todo detection, we thought it would be straightforward. People say "I need to do X" — extract X, store it as a task. Done.
We were very wrong.
The actual complexity is enormous. People express intent to act in dozens of ways, across a spectrum from explicit to deeply implicit. They use hedged language ("we should probably..."), passive voice ("it would be good if..."), temporal hints ("before the call next week..."), and social commitments ("I told Sarah I'd...").
A naive classifier catches maybe 40% of real todos. Getting to 90%+ requires understanding context, speaker, and relationship between statements.
How we approached it
We trained a custom classifier on thousands of annotated examples from real conversations (with permission). We labeled not just "is this a todo?" but also: who is responsible, what is the implicit deadline, how confident are we, and what's the priority signal.
The hardest cases are implicit social commitments — "I told Marcus I'd send the deck" — where the todo is buried in a narrative statement rather than expressed as intent.
What we still get wrong
False positives: "We should get lunch sometime" gets classified as a todo more than it should. We're still tuning this.
False negatives: Highly indirect commitments in long conversations. We improve here with each model update.
The goal isn't perfect recall. It's building a system where the todos Maatix captures are the ones that matter — and where you trust the output enough to act on it.