Teaching a Machine to Read 3D Handwriting
This project started with a simple observation: I already have an app for drawing in 3D space on Apple Vision Pro. When you pinch, you draw. When you release, the stroke ends. Each pinch-to-release gives you a precise sequence of 3D coordinates — a trajectory through space that represents whatever the user intended to draw.
The question is: can a machine learn to read it?
May 2026
The idea
The goal is to recognise hand-drawn letters in 3D space and substitute them with proper 3D geometry — so you draw an M in the air, and an M appears. It sounds simple. It isn't.
Most handwriting recognition research deals with 2D input on screens, or with camera-based systems that have to first figure out where the hand is. The Vision Pro removes those problems entirely. The hand tracking is precise, the pinch mechanic provides a clean intent signal — the system knows exactly when a stroke starts and ends — and the 3D coordinates are delivered directly, no computer vision required.
What remains is the classification problem: given a sequence of 3D points, what letter is it?
The pipeline
I wanted to build the entire thing myself. No pretrained model exists for this specific problem — classifying a letter from the 3D trajectory a fingertip traces through the air. Gesture recognition models work on continuous joint movement over time, not on the accumulated shape of a stroke. Training from scratch wasn't just the more interesting choice; it was the only practical one.
The pipeline I landed on has four stages.
Synthetic data in Houdini. Rather than collecting thousands of real hand-drawn strokes, I generate training data procedurally. Each letter starts as a curve in Houdini. A repeat block runs 1000 iterations, resampling each stroke to a fixed 64 points and applying randomised noise to simulate the variation of real handwriting. The result is 1000 plausible versions of each letter, each slightly different, all correctly labelled. I have full control over the variation, and generating data for a new letter takes minutes rather than hours.
LSTM classifier in PyTorch. Houdini 21 ships with built-in ML nodes, and I did explore using them. The ML Train Regression node supports custom PyTorch models and loss functions, so in theory it could be bent toward classification — but it's designed around regression tasks like muscle deformers and mesh deformation, where both input and output are continuous geometry. Getting classification labels through that system required fighting the tooling rather than working with it. I also wanted full control over the training loop. So the decision was straightforward: use Houdini for what it's genuinely excellent at — generating and exporting the training data — and handle training in a dedicated Python environment where PyTorch does exactly what you ask of it.
Each stroke is a sequence of 64 xyz coordinates — 192 floats in total. An LSTM is a natural fit for sequential data like this: it processes the stroke point by point, building up a representation of the shape before passing a final hidden state to a linear classifier that outputs probabilities across 26 letters. The whole model is small. Training on three letters took seconds.
ONNX export. PyTorch models export cleanly to ONNX, a format that can be read by inference runtimes outside the Python ecosystem — including Houdini's own ONNX Inference node, and Apple's Core ML converter.
Inference back in Houdini. Before taking anything to the Vision Pro, I wanted to verify the full loop inside Houdini. Feed a raw stroke in, get a predicted letter out, pipe it into a Font SOP that renders the letter as 3D geometry. It works. That said, there is a lot of work still needed before this is ready for the Vision Pro app. For now, the plan is to keep the iteration loop tight — Houdini for data generation, PyTorch for training and evaluation — and get the model to a point where it handles real hand-drawn input reliably before thinking about deployment.
What day 1 looks like
The model currently recognises A, B and C with reasonable accuracy on strokes generated by the same pipeline it was trained on. Novel curves drawn directly in Houdini are less reliable — which is expected, and likely reveals something important: the model may be learning world space position rather than the actual shape of the letter. That's exactly what local space normalisation is meant to fix.
The two most important improvements I already know I need to make: normalising each stroke into local space before training (so the model learns shape, not position or scale), and generating significantly more variation in the training data to better cover the messiness of real input.
Both are tractable. Neither is particularly exotic. Day 1 was about proving the pipeline works end to end — and it does.
What comes next
The full alphabet. Local space normalisation. Better noise models in Houdini. TensorBoard and proper hyperparameter tracking to make training less of a guessing game and more of a systematic process. Eventually: a Core ML model running on-device in the Vision Pro app, so you draw a letter in the air and it appears in 3D space in front of you.
That last part is the demo. Everything between here and there is the work.