Computer Vision @ Diamond Kinetics

Part 1: Detecting Swings

Diamond Kinetics Dev Team
5 min readFeb 5, 2024

This is the first of a multi-part series on how Diamond Kinetics is using Computer Vision in our iOS app to provide exciting and unique experiences to our users.

Diamond Kinetics SmartMotion Swing Logo
Diamond Kinetics SmartMotion, our CV-based swing detection

The core tenet of what drives us at Diamond Kinetics is that technology can provide novel ways for baseball and softball players to improve their skills while having fun doing so. Since the company’s inception, our primary focus has been building the best physical set of sensors that generate data to which we can apply rigid body dynamics to calculate meaningful metrics that we can communicate back to our users. One of the challenges of using our sensors, however, is that not only does it require purchasing a sensor, but it also necessitates having the space to swing a bat and make an impact with the ball to generate the data measured by those sensors to compute the desired metrics. Doing so involves setting up a net and using a tee, which adds additional friction, making it harder for a user to use our app.

To reduce that friction, and lower the barrier to entry for new users of our app, we wanted to meet them where they are and use the technology they already have. We considered using the Apple Watch as a stand-in for our sensor. Still, it lacked the addressable market of not all kids with iOS devices also have a watchOS device and didn’t have the necessary resolution on all the sensors to be as accurate as our hardware. Another piece of hardware that all kids did have was the camera already built into the phone, specifically the front-facing camera that lets the user see the app UI while also capturing themselves in the frame.

With the advancements in on-device machine learning (ML) and computer vision (CV), we were able to leverage Apple’s Vision framework along with our already enormous catalog of baseball and softball players performing swinging motions to generate a CV model that provided the information necessary to detect if a user of the app took a swing. With this swing detection, we can use this knowledge to detect and count swings to drive experiences in our application.

Generating a Swing Detection Model

Before we could implement any CV features in the Diamond Kinetics app, we needed to train a model for the machine learning algorithms to use. The first step is to determine whether or not the video contained a human being performing the act of swinging a baseball or softball bat. Apple’s APIs are already fine-tuned to detect a human in the frame. Beyond that, we were able to leverage Apple’s CoreML tools to train the model of what a baseball or softball swing would look like. We used our access to the wealth of MLB open-side videos that have been available since 2022 and the deep library of user videos from our legacy apps.

Setting Up

Controlling the Camera

The first step to leveraging CV is to invoke the necessary calls to control the iOS device’s camera. Specifically, where the player needs to see a user interface (UI) laid over the camera’s captured video stream, we are using the front-facing camera. We need to make sure the camera, app UI, and physical device orientation are all known and controlled so that we can provide either a portrait or landscape user experience, depending on the features of our app. Without getting too far in the weeds (perhaps we’ll explore it in a future post), the key takeaway was that the camera always captures in landscape, independent of the device orientation. The system takes this into account when previewing the camera output on the screen, but when capturing the pixel buffers and distilling down the joints to compare against our ML model, we needed to be aware of the orientation in which they were captured. Once we accounted for that correctly, independent of what we drew over the video in the preview, we got much better results!

Detecting the Player

Once we’ve opened the camera we start by capturing the camera frames, also known as pixel buffers. This capture is performed at 60 frames per second. From there we hand the buffers to the CV model to determine, within a level of confidence, if a human is in the frame. Apple’s APIs can detect up to 19 unique body points, but we are only concerned with a subset of those around the trunk, arms, and legs for a baseball or softball swing. We can draw over the live video capture to inform the user that we have the joints and provide them a visual signal when we have enough of a human body for them to start a swing.

Player getting set up to take swings
Detecting the person based on multiple joints

Counting Swings

Analyzing the Joints

After the pose is detected, our swing detection CV model processes the video buffer and returns a confidence level of whether the previous 120 frames (2 seconds at 60 frames per second) contain a swing. If that swing exceeds a level of confidence deemed significant, we count that as a swing. We can then alert the rest of the app that a swing was detected so the app can progress to the next step in the process of swing analysis.

A player going through a Computer Vision-based Guided Session in the Diamond Kinetics app
Getting ready to process swings once a player is detected

Our initial goal of implementing CV swing detection was to leverage it in our Guided Sessions feature. Guided Sessions are recorded training videos, provided by our DK coaches, that guide the user through a series of drills and exercises. The coaches can instruct players to take a series of swings which the computer vision can detect to advance the player through the session.

A swing detected in a Computer Vision Guided Session in the Diamond Kinetics app
Counting the swings!

Next Steps

Now that we know we can detect swings and use the detection of that swing to drive interaction within our app, we know we can grow beyond just our CV Guided Sessions as they currently exist. We want to explore calculating metrics as useful and fun as what our sensor can tell us, to inform the user about their performance. For example, our sensor measures the time the player triggers their swing to the moment of impact. It seems highly likely that we could calculate that same time duration via CV and encourage the user how to improve that time. We hope to explain more of that later in a follow-up post.

In Part II of this series, however, we will explain how we leverage our partnership with Major League Baseball (MLB) to compare our players’ swings against those of MLB players. Stay tuned for that post soon!

--

--

Diamond Kinetics Dev Team

The engineering team at Diamond Kinetics, the Trusted Youth Training Platform of Major League Baseball