What we need

The bar this feature has to clear

The ways to do it

Every path we found

See it in action

Real demos: the good, and the limit

Short clips of this feature working in the real world, including where it hits its limit. Click any to play.

How the options compare

Side by side

excellent yes partial / test first × no

The catch

What to watch out for

What we recommend

The pick

Build effort

How hard is this to build?

Translation is the easiest of the three to build, and you may not need to build it at all. The pipeline is just microphone, then speech-to-text, then translate, then caption, and each step is a commodity cloud API (Gemini, Google Cloud, or Azure Speech) that already covers Japanese, Thai, and English. A working two-way version is a few days of wiring. The real work is the factory noise, not the AI, so plan for a good noise-cancelling mic, low latency, and a Thai-capable engine. And remember Gemini Live on the phone and the RayNeo on the lens already do this for free, so a custom build is only for on-glass captions on a glass that lacks them, or for offline use.

Translate JP / TH / ENLow effort
See-what-I-seeMedium effort
3D measurement, ±1-2 cmHigh effort

The decision checklist

What to confirm before we commit

The other two features