Short clips of this feature working in the real world, including where it hits its limit. Click any to play.
Translation is the easiest of the three to build, and you may not need to build it at all. The pipeline is just microphone, then speech-to-text, then translate, then caption, and each step is a commodity cloud API (Gemini, Google Cloud, or Azure Speech) that already covers Japanese, Thai, and English. A working two-way version is a few days of wiring. The real work is the factory noise, not the AI, so plan for a good noise-cancelling mic, low latency, and a Thai-capable engine. And remember Gemini Live on the phone and the RayNeo on the lens already do this for free, so a custom build is only for on-glass captions on a glass that lacks them, or for offline use.