Using a cell phone, some old dice I had laying around, and a table in my garage I was able to capture some fairly consistent images of both individual dice to use as templates and groups of dice to use as example rolls. I followed the approach laid out in the previous post and was able to get some decent results with minimal work in openCV, successfully identifying the location and rank of all 5 dice in all of my test images (including some with partial occlusions) and the value of all 5 in most of the test images (only the 8 and 12 sided die had issues, and I used a very primitive template match scoring system). With some changes to the lighting, templating, and matching portions of the problem I am very confident that this solution will work!
The python software used in this post is available on github.
Image Capture
To capture my test images I clamped my cell phone (Pixel 6 Pro) on the end of an engine hoist suspended over a work table. The worktable has bright fluorescent lights mounted under a cabinet which provided some decent all-around lighting for the images. Because I wanted the camera to be perfectly “top-down”, the lighting did produce a slight shadow just below all of the dice. I tried adding some other angles of light with a camping headlamp, but its focused beam mostly just produced extra glare on the shiny dice.
My phone has a 4x optical zoom (“periscope” lens, about 100mm equivalent, compared to 25mm standard) which was very useful for minimizing distortions, although it required that the camera was positioned further from the dice. One other thing I learned was that while posterboard made an OK white background, the slightly shinier (more plasticy? more reflective?) flexible white sheet from an old camera booth kit gave a much brighter, consistent background color.
Taking inspiration from the research I found that referenced the online data repository “Kaggle”, I uploaded all of these images to that site as well: https://www.kaggle.com/datasets/burksbuilds/color-coded-rpg-dice-set-d6d8d10d12d20-v1/data
Template Images
To capture images to use as templates for die face matching I manually positioned each face of each die roughly near the center of the camera’s field of view then manually captured a photo. In hindsight, capturing a template aligned to the lighting in a consistent way (or multiple templates at specific rotations) might have made some shadow rejection easier.
I did not capture the d4 (black triangle) or d10% (white d10, counting by 10s) because I wanted to focus on color filtering and I thought that a black and white die (what this specific set had) would not be good fits for a hue based filter. In hind sight, capturing the black triangle d4 just to try and match to a die where the result is not on a face normal to the camera would have been a good experiment.
Example Rolls
I wanted to try a few different rolls with different combinations of dice. Some dice, like the d10, are particularly good at occluding (blocking the view of) other adjacent dice. For each grouping of dice I chose, I tool 10 pictures of random rolls, then 5 pictures of “staged” rolls, where I clustered dice next to each other in different ways (without necessarily changing the apparent faces of the dies between pictures).
I considered jumping into “labeling” the example rolls with one of the many popular image-labeling tools for machine learning training sets, but then decided to put it off until after I had some basic software written and actually needed the labels.
Template Image Processing
The goal for each template image I took was to automatically find the one die in the image and crop it in a fairly repeatable and centered way. This cropped image can be saved somewhere else to make future processing easier, although on my desktop I never bothered because the code ran pretty fast. When loading a template image (from a cropped version or the original large version) I also precalculate the keypoints and descriptors using openCV image matching functions.
Once all the template images (one per face) from a die were cropped out a few characteristics of the die as a whole were calculated:
- The color of the die (as a band of hues that encompasses 95% of the dice colors)
- The saturation cutoff that distinguishes the colored die from the background
- The radius of the circle that circumscribes the die
This section roughly follows the python notebook here.
Image Cropping
The HSV color space (Hue, Saturation, Value) was very useful for this project because of my goal to segregate dice by color, but it also ended up being helpful for background separation and shadow rejection. The first step in the image processing was to convert the RGB (Red, Green, Blue) image to HSV.
Notice how noisy the hue is on the plain white background, but how consistent it is in the center where the die is. Once the pixels that have the die in them have been identified it will be easy to pick out what the color of the die is, but for simply cropping the die the hue should be ignored. Similarly, by ignoring the brightness (“Value” channel) from the image the dark shadow at the bottom of the die will be less impactful on the die border identification step.
Saturation is the scale from white to “vibrant” of a particular color, while ignoring what color it happens to be and how “dark” the image is. This made saturation an excellent way to pick out the die in the middle of the image no matter what color it is (although some of the shadow still comes through on this channel, probably from a diffuse reflection of the side of the die on the plain white background).
The saturation channel by itself is basically a grayscale image, but by doing an adaptive threshold detection a new binary “mask” image can be obtained. Small imperfections on the background can sometimes generate small blobs that might be confused for dice or thin strands that make the edge of the die seem rougher than it really is. By doing an “opening” morphology (an erosion pass to shrink all of the contours followed by a dilation pass to grow them back to their original size, if they still exist) those false positives can be filtered out.
After the filtering pass it is fairly straightforward to find all of the external contours in the black and white image. While there may be some small off-center contours for various reasons, the large blob in the middle is definitely the die! A new black and white mask image was created that contains just a filled-in version of the contour of the die. By finding the smallest circle that contains the die a rotation-invariant boundary for cropping the image can be determined.
The contour mask can also be cropped down to size and overlaid with the RGB image as an Alpha channel (transparency). This allows the contour information, which separates the die from the background, to be retained without losing the full RGB image. Finally, an AKAZE keypoint detector was used (because it was in an example, I do not know the pros and cons of different algorithms yet but am excited to explore) and the resulting keypoints and descriptors were cached along with the cropped image for future use identifying dice rolls!
Overall this method worked pretty well. There were some issues with shadows and brightness (the d6 mask included a shadow and the d20 mask excluded two darker faces). Additionally, glare at some of the face boundaries (specifically at the d10 and d12) was creating a very prominent feature that would not be rotation independent. I think both of these problems are better solved with good lighting than more complex programming, and I can circle back on the keypoint detection algorithm, including what source image to use (black and white? some weighted combination of HSV?), once I have more quality images.
Die Characteristic Calculations
Similar to how the dies were originally located in the larger uncropped image by using the saturation channel, this process starts by looking just at the saturation channel of an HSV version of the cropped image and running an OTSU thresholding algorithm. This algorithm is basically just trying to group the saturation into two distinct peaks – in this case the larger values correspond with the colored die face (excluding the numerals) and the smaller values are background and numerals. As a comparison, the bright orange D8 has a much more distinct high-saturation peak than the dull green D10, but the algorithm is still able to sort them both out.
Once the image is divided by saturation into two distinct groups, the hue of just the pixels corresponding to the die face (without background or numerals) can be extracted. A histogram of the hue of all of the faces on the die reveals a distinct peak corresponding to the color of the die (the range is 0-180, representing 0-360 degrees of a color wheel)
For each die, the smallest range of hues that contains 95% of all pixels is determined. This range will be used later to extract just dies of this rank from images of a group of rolled dice.
Similarly, an average radius (in pixels) for the smallest circle that can completely circumscribe a die is stored. Other geometric information about each die (largest enscribed perimeter circle, smallest circumscribed face circle, and largest enscribed face circle) were precalculated and stored in a lookup table for each die shape. Combined, this information can be used to focus on keypoints closer to the center face of a die, or mask out areas of a group of dice after a die has been detected there.
Roll Image Processing
Before the dice in a roll can be matched to their templates they need to be located within the roll image, matched to the correct die rank, and cropped into a similar size sample image. This section roughly follows the python notebook here.
Segmentation
The color based dice segmentation algorithm worked really well. The color and saturation filtering only presented a few tiny false positives that were completely drowned out by the true dice. It was also interesting to see that the d20 filter missed the two shadowed faces in the roll, just like how the templating algorithm failed to include the two shadowed faces.
The distance transform calculates a number that is comparable to the enscribed radius of a die at every possible pixel. By normalizing the distance transform output by the expected enscribed radius of a particular die a confidence score can be calculated (a value of 1 indicates the die is probably located there). By comparing the confidence scores of the “best match” for each expected die in a roll, the high level algorithm can start stripping out the detected dice one-by-one to make future matches even easier to determine.
Because I only have one die of each color available I was not able to test rolls with multiple dice of the same color. In cases where those dice are adjacent, especially for the six-sided die where solid clumps completely block the background, I am worried that the distance transform method will suggest the center of the cluster instead of multiple edges. However, until I have different dice to test on I won’t bother adjusting the algorithm.
The cropping was done by assuming that the local maximum from the distance transform was the center of the die, and the standard padding (25px) was done around the average circumscribed circle size calculated from the templates. The color and saturation filters were used to create an alpha channel, and keypoints were calculated and cached for the sample.
The resulting image was set up very nicely to be compared to all of the templates for the die, sharing a common mask, padding, and pixel density. The biggest difference would be the orientation of glares and shadows due to poor lighting.
Die Face Identification
Identification is done by attempting to match the keypoints from the cropped sample image to keypoints on each of the template face images. Each image contains a ton of keypoints: for this example the ‘5’ on the d8 template has 187 keypoints while the sample image has 156 keypoints. Keypoints are only generated inside the “enscribed circle” of the perimeter so try and reject false keypoints from neighboring or occluding dice, while still allowing neighboring faces to contribute to the match. This is probably way too many non-meaningful keypoints and I will definitely need to adjust the keypoint detection algorithm and parameters in the future to find a smaller, mor meaningful set of keypoints per die.
The keypoints between images are compared using a brute force matcher (knn style in this case, based on a tutorial I found). For “correct” comparisons it is expected that there will be a ton of keypoints that match, while “incorrect” comparisons will match very few keypoints. This is the first metric I take into account when I score how well a die face matches.
The second important aspect of keypoint matching is the ability to infer a 3D transformation (“homography”) from the sample die image to the template die image. For a “correct” match, the homography would solve using many of the matched keypoints and result in a mostly rotational (around the Z axis) transformation. For an “incorrect” match, the homography solution would likely need to discard many points and would result in a transformation with a lot of skew or rotations around the X and Y axes.
While I was unable to reliably determine if the transformation was mostly Z axis rotation based on the homography values themselves, judging the quality of the output homography based on how many keypoints were used ended up being a decent metric. The ratio of good matches (75% distance) or used matches (in the homography) to total keypoints available was used as a scoring metric in the final matching algorithm. The ratios were normalized so that the could score from zero to one.
There are some obvious problems with this approach (see example above where a 2 was misidentified as a 1 because of the 3 on the side looking like an 8). The normalization does not do a good job indicating when the algorithm thinks there are no good results when very few keypoints are matches (which would be my preferences compared to just returning a “best” match that is not “good”). Without doing a better job generating only meaningful keypoints (a combination of better lighting and better algorithm selection) I don’t think I can assign a clear threshold to these metrics to avoid the normalization.
There are also complexities I could add to the scoring algorithm that would improve its performance, most of which have already been suggested in the previous research. Scoring matches near the center of the die higher than those near the sides of the die would help solve the above problem of over-matching the edge faces. Checking for skew in the image would have helped reject some of the false matches as well. There is also a built in template matching function in openCV that I have not explored which might be more appropriate for this use case.
I will be moving forward with a more robust die-rolling mechanism with improved lighting so that I can continue to refine the algorithm and approach the kind of reliability I need to justify a more complex die-rolling machine.