As I mentioned in my last post, I had planned to switch the hand tracking code to MediaPipe before porting the code to the Nvidia Jetson AGX Xavier. After spending weeks fighting the build system that MediaPipe uses and porting MediaPipe’s Python hand tracking solution API to C++, I successfully implemented MediaPipe as a hand tracking option near the end of August. Unfortunately, the result was not as high quality as I had hoped.
So, having experienced MediaPipe’s hand tracking as a method of input, I have decided to continue using Leap Motion for the current proof of concept.
This blog post will answer the following questions:
- Why doesn’t MediaPipe work as well as Leap Motion?
- How am I going to continue using Leap Motion without tethering to a Windows machine?
- Can an open source solution like MediaPipe be made to work in the future?
MediaPipe, as currently implemented
The MediaPipe hand tracking solution takes a single image and outputs a nominally 3D point per joint including fingertips. The X and Y values together make a point on the image on a scale from 0 to 1, but the Z value is the depth relative to the wrist of that hand with a poorly documented relative scale.
So, by itself, MediaPipe can’t give an absolute position in space. In fact, without knowing the size of the hand, it is impossible. However, by calculating the hand tracking image points for each eye of the stereo camera, it is possible to triangulate the absolute positions in space if the camera’s optical properties are known.
This is how Kros currently uses MediaPipe. Unfortunately, this means that if there is a tracking failure in either camera, the product of the stereo calculation is ruined. Potential tracking failures include:
- Failing to recognize a hand at all.
- Misidentifying an object as a hand.
- Correctly locating the joints of a hand, but identifying the hand as the wrong hand facing the wrong way (e.g. the back of a right hand can get mislabeled as the palm of a left hand).
The rate these failures occur, combined with stereo triangulation requiring double the reliability, prevents Kros’s MediaPipe implementation from being usable. To be clear, for an open source piece of software, MediaPipe’s hand tracking is impressive, but I need a solution that’s more reliable.
I must give credit to the Leap Motion team. The Leap Motion Orion hand tracking works spectacularly, and comparing it to MediaPipe’s hand tracking makes that clear. However, the issues with using Leap Motion have not changed.
The Leap Motion driver can’t be run on the Nvidia Jetson directly, but the final proof-of-concept for the crowdfunding can’t be tethered to giant Windows machine if I want backers to see what a Kros computer can really do. What’s the solution? Have the final proof-of-concept for the crowdfunding be tethered to a tiny Windows machine. There appear to be mini-PCs that are roughly the same scale as the Nvidia Jetson that have the specifications needed to serve as the hand tracking server. At least, that’s the plan.
For now, I’ll continue tethering to a separate Windows laptop and resume work on the other issues that need fixing. Perhaps during that time, the situation may change.
Long term, Leap Motion still has two big problems:
- Tethering to a Windows machine of any size is not an acceptable requirement for a commercially sold wearable computer.
- The license puts restrictions on commercial use that would hamper Kros as a released product without a direct agreement with Ultraleap (the company that owns Leap Motion).
The situation could change by the time the crowdfunding starts, but it’s still good to talk about future solutions that don’t depend on closed source software or vendor-specific camera hardware. With talented software developers employed, potential fixes to Kros’s MediaPipe implementation could include:
- Improving MediaPipe itself.
- Filtering the results using an algorithm like a Kalman filter.
- Doing a one-time calculation of hand sizes using stereo triangulation, so that absolute space positions can be calculated from just one camera eye image if that’s all that’s available.
While I’m disappointed by the failure to resolve the long-term issues with the hand tracking, I can now move forward and hopefully make progress on other fronts. The next post will probably relate to getting VR mode working on the Nvidia Jetson AGX Xavier.
Thanks for reading, and if you haven’t already, subscribe to the blog or the Twitter feed @KrosDev.