As I mentioned in my last post, I had planned to switch the hand tracking code to MediaPipe before porting the code to the Nvidia Jetson AGX Xavier. After spending weeks fighting the build system that MediaPipe uses and porting MediaPipe’s Python hand tracking solution API to C++, I successfully implemented MediaPipe as a hand tracking option near the end of August. Unfortunately, the result was not as high quality as I had hoped.
So, having experienced MediaPipe’s hand tracking as a method of input, I have decided to continue using Leap Motion for the current proof of concept.
This blog post will answer the following questions:
- Why doesn’t MediaPipe work as well as Leap Motion?
- How am I going to continue using Leap Motion without tethering to a Windows machine?
- Can an open source solution like MediaPipe be made to work in the future?
MediaPipe, as currently implemented
The MediaPipe hand tracking solution takes a single image and outputs a nominally 3D point per joint including fingertips. The X and Y values together make a point on the image on a scale from 0 to 1, but the Z value is the depth relative to the wrist of that hand with a poorly documented relative scale.
So, by itself, MediaPipe can’t give an absolute position in space. In fact, without knowing the size of the hand, it is impossible. However, by calculating the hand tracking image points for each eye of the stereo camera, it is possible to triangulate the absolute positions in space if the camera’s optical properties are known.
This is how Kros currently uses MediaPipe. Unfortunately, this means that if there is a tracking failure in either camera, the product of the stereo calculation is ruined. Potential tracking failures include:
- Failing to recognize a hand at all.
- Misidentifying an object as a hand.
- Correctly locating the joints of a hand, but identifying the hand as the wrong hand facing the wrong way (e.g. the back of a right hand can get mislabeled as the palm of a left hand).
The rate these failures occur, combined with stereo triangulation requiring double the reliability, prevents Kros’s MediaPipe implementation from being usable. To be clear, for an open source piece of software, MediaPipe’s hand tracking is impressive, but I need a solution that’s more reliable.
Leap Motion
I must give credit to the Leap Motion team. The Leap Motion Orion hand tracking works spectacularly, and comparing it to MediaPipe’s hand tracking makes that clear. However, the issues with using Leap Motion have not changed.
The Leap Motion driver can’t be run on the Nvidia Jetson directly, but the final proof-of-concept for the crowdfunding can’t be tethered to giant Windows machine if I want backers to see what a Kros computer can really do. What’s the solution? Have the final proof-of-concept for the crowdfunding be tethered to a tiny Windows machine. There appear to be mini-PCs that are roughly the same scale as the Nvidia Jetson that have the specifications needed to serve as the hand tracking server. At least, that’s the plan.
For now, I’ll continue tethering to a separate Windows laptop and resume work on the other issues that need fixing. Perhaps during that time, the situation may change.
Future Solutions
Long term, Leap Motion still has two big problems:
- Tethering to a Windows machine of any size is not an acceptable requirement for a commercially sold wearable computer.
- The license puts restrictions on commercial use that would hamper Kros as a released product without a direct agreement with Ultraleap (the company that owns Leap Motion).
The situation could change by the time the crowdfunding starts, but it’s still good to talk about future solutions that don’t depend on closed source software or vendor-specific camera hardware. With talented software developers employed, potential fixes to Kros’s MediaPipe implementation could include:
- Improving MediaPipe itself.
- Filtering the results using an algorithm like a Kalman filter.
- Doing a one-time calculation of hand sizes using stereo triangulation, so that absolute space positions can be calculated from just one camera eye image if that’s all that’s available.
While I’m disappointed by the failure to resolve the long-term issues with the hand tracking, I can now move forward and hopefully make progress on other fronts. The next post will probably relate to getting VR mode working on the Nvidia Jetson AGX Xavier.
Thanks for reading, and if you haven’t already, subscribe to the blog.
Incredible stuff! I just stumbled across your site while doing preliminary research for a class project. I have a Leap Motion (Stereo IR170) and am not satisfied with its positional accuracy and occlusion problems, so I’m investigating a wrist-mounted IR camera and Vive tracker combo. I’m sure the future is untethered tracking like you are doing and Lighthouses will be obsolete soon enough, but I want to see if I can get extreme precision to enable tasks like modelling in Blender.
Anyway just wanted to say this whole project is really inspiring, it’s always exciting to see more productivity applications in the XR space!
LikeLike
Thanks for the wonderful comment! If you’re working with a camera that is fixed relative to the wrist, MediaPipe still might be a great solution for you. I’m wondering if even without lighthouses, one would still want wrist cameras for finger tracking separate from a main camera or sensor that just tracks the wrists.
LikeLike
oh sorry for double posting, I just remembered that I did have questions: 1) Did you get MediaPipe running on the Jetson, or did you drop it before that stage? 2) Was there any difference in tracking latency vs Leap?
LikeLike
To answer your questions:
1) I didn’t spend much time getting MediaPipe running on the Jetson since my recent work on the Jetson was after I dropped MediaPipe.
2) I don’t remember a difference in latency; both seemed acceptable for my purposes. But I’d be careful about using my experience to compare between the two. Kros’s Leap Motion implementation offloads the work to a separate machine, so there are factors like network latency that have nothing to do with Leap Motion or MediaPipe.
LikeLike
Thanks for the responses! I’m going to go ahead and try it as my term project.
LikeLike