There are plenty of examples online, especially academic content, involving people trying to detect the presence of and values of game dice. Classic six sided dice (with “pips” or “dots” instead of Arabic numerals) are the most common target because they are easier to evaluate (just count the dots!) and are more familiar to the broader community. Detecting the presence, position, and rank of RPG dice is not uncommon, with people employing both classic computer vision algorithms and machine learning algorithms to achieve their goals. Some people have event built physical machines to generate the image data while other people use existing datasets.
![](https://burksbuilds.com/wp-content/uploads/2024/06/opencv-vs-neural-networks.png)
Dice Rolling Machines
There are already some integrated machines that can do both dice rolling and identification, although none that meet my full specifications. Regardless, there is plenty to learn from the people who have worked on similar problems.
Shoot To Win Craps
My first encounter with automated dice rolling and summing in the wild was on a trip to Vegas almost 10 years ago. The only money I won on the trip was when I played Shoot to Win Craps, an automated craps “table” that lets you and seven friends bet on a touch screen while large physical dice are rolled in the center of the machine. The “shooter” for the round gets to launch the dice extra high by pressing a ~big red blinking button~ at their station, which mimics the tactile interaction requirement of the dice roller I want to make quite well!
![](https://burksbuilds.com/wp-content/uploads/2024/06/shoot-to-win-craps-853x1024.png)
While I don’t know for sure if this machine uses vision to determine the value of the dice rolls (or perhaps even forces the result of the dice like a slot machine) the players do all get to see a top-down camera view of the dice result. Obviously the system is reliable enough to pass the strict regulations of gaming commissions, which gives me hope that I can get a reliable enough outcome from my own project.
Dice-o-Matic
Even though the vision system on the Dice-o-Matic revolves around counting dots on six-sided dice, it’s mind-blowing scale (over 1 million dice rolls per day!) makes it worth a mention. The unique thing about this machine is how it combines the dice reading and dice recirculation systems, allowing a camera to capture an image of the dice as they cling to the dice elevator.
Dice Roller Cup
The most straightforward system I have seen for rolling and reading the value of a 20 sided die is the Dice Roller by Mark Ficker. This simple system consists of a cup on a servo that is rotated sideways to reset, then flicked upright to roll. A camera and light activate to capture an image of the single d20 in the cup.
Mark used this machine to do some very thorough testing of the true randomness of a variety of dice brands, and the volume of data he collected speaks well for the reliability of his machine and algorithm. Mark is kind enough to include details on his approach to the machine vision:
- Find the die by subtracting the reference image of an empty cup from the current image of a rolled die in a cup
- Crop out the giant blob representing the die found using the above process
- Do a feature comparison with OpenCV to compare the cropped images to each other
- Cluster the images by feature similarity into a reasonable number of groups each with (presumably) matching face values
This clustering approach works well when all of the images are processed as a batch after the fact, but it is easy to imagine how a group of pre-clustered template images for each die face would be better than using just a single template image for evaluation.
To top it all off, Mark has put his code on github for anyone to use as a reference! Hats off to Mark for creating and documenting such a successful project.
Dice Identification Systems
Computer Vision
Besides the work done by Mark Ficker on the Dice Roller cup, Romain D. has a similar openCV feature/template matching approach in his Autodice project on github. While Romain only uses one template image per die face, the novel aspect of his approach is to weight the value of a match higher if the keypoints have a similar radius and are located closer to the center of the die. This technique does a good job at rejecting false matches between two dice that are one face away from matching (for example, matching an image with a 1 facing up against an image with a 19 facing up and a 1 facing nearly up on an adjacent face).
![](https://burksbuilds.com/wp-content/uploads/2024/06/image-8-1024x626.png)
This contrasts with Mark’s feature matching approach from the Dice Roller Cup that tried to solve the adjacent-face problem by putting a limit on the amount of scaling it would take to transform the test image to match the reference image. Mark’s assumption was that a “true” match would involve a homography (scaling) that was primarily rotation and translation, with very minimal distortion, while a “false” (adjacent face) match would involve some significant, detectable distortion. The Dice Roller Cup matching algorithm simply set a threshold for the homography distortion in order to consider the two images a match. Mark also put a threshold on the ratio of keypoints found between the two images, which appears to help avoid false matches to a subset of the characters on a face (i.e., a “1” matching a “12”).
DC Mouser has an excellent whitepaper on the end-to-end process needed for automated dice identification. Section 4 repeats many of the challenges and proposed solutions seen in other projects (such as scoring features in the center of the die higher than the edges and rejecting scale transformations when matching images), but one new insight is that template or captured images can be modified to try and remove any contours on the primary die face that are not a part of the numeric symbol (such as glare, texture, edges, etc). This might make the comparisons between dice more resilient to rotation or location specific noise related to cameras and lighting. While they published plenty of code to github to support the whitepaper, it is unclear what state the code is in and it appears to deviate from the whitepaper process in some areas.
For my use case, an additional feature that I need in an identification system is a way to segregate multiple dice that are visible in the image together. While I could not find anyone doing this with traditional machine vision and RPG dice, there are a few examples that use six sided dice with pips.
![](https://burksbuilds.com/wp-content/uploads/2024/06/diceomatic-image-processing.gif)
The Dice-o-Matic solved this problem by using dice where the pip color corresponded with the die face so that only the pips needed to be found and isolate within the image, and the pips could simple be counted by color. For example, because all of the ‘twos” die faces had a pair of yellow pips, then if six yellow pips were found the machine knew that three dice were rolled with the value of two (6 / 2=3). The color trick is pretty handy but I imagine it would not work well on a 20-sided die, even if a custom die were made with 20 distinct colors. However, the colro trick would probably work well for isolating the different ranks of dice from each other.
Quentin Golsteyn wrote an article on how he used vision to segregate, identify, and sum the values of multiple six-sided pip dice. In his project he also started by finding all of the pips, but then used density based clustering (DBSCAN) of the pips to group them into distinct dice. While the pips in his example have a distinct color from the dice and background, he finds them with a simple blob detector based on an inertia (aspect ratio) filter that is good at isolating circles. This seemed to work well for him, but may not be directly transferable to dice with Arabic numerals on them instead of pips, especially when the markings on adjacent faces are visible in the image too.
One machine vision approach that I did not see used to segment multiple dice in an image is distance transform and watershed processing. The distance transform helps highlight the distinct centers of multiple overlapping roughly convex binary shapes, especially when they are of similar size. The watershed algorithm essentially flood fills away from the markers (distinct sources), simultaneously, at a speed inversely proportional to the gradient of the image. This results in the boundary of two adjacent markers naturally occurring at areas of high contrast.
![](https://burksbuilds.com/wp-content/uploads/2024/06/image-9-1024x420.png)
An example with playing cards does a good job highlighting how even non-spherical shapes with contained symbols can be used with this algorithm, although there were some obvious imperfections related to heavily obscured cards that were not caught by the thresholding. Further processing on the distance transform result (potentially contour finding?) would likely improve the marker identification success. Processing the original image (potentially removing internal contours?) may have improved the watershed’s ability to successfully pick out the card boundaries as region boundaries.
![](https://burksbuilds.com/wp-content/uploads/2024/06/image-11.png)
Machine Learning
Machine learning seems like magic, and it is hard for me to let go of the (illusion of) “control” that a deterministic classical computer vision algorithm can give you. However, there is a large body of work already done by smart people to identify dice with machine learning models that seem to work very well, though perhaps not perfectly.
Ignacio Pascual has a simple proof of concept that combines a Haar Cascade Classifier to segregate the dice with a Convolutional Neural Network to identify the value of each die. I can’t tell for sure, but I think the training set for the cascade classifier was manually cropped, or built from datasets of single dice. It also appears that a distinct CNN model was created for each die rank.
![](https://burksbuilds.com/wp-content/uploads/2024/06/ignacio-image-detection.png)
A group of researches from Western Kentucky University took this in a different direction and made one large CNN model to identify both the rank and value of a die simultaneously. Their paper shows that they achieved accuracy of around 95%, which is definitely not sufficient for a single D&D session. It also only worked on pre-cropped images of dice.
However, this paper led me to a few online datasets of RPG dice which have other people’s software linked to them! There are definitely plenty of people working to train models to detect dice rank on the kaggle dataset, as well as the roboflow dataset, but because these datasets come with pre-cropped images they only work on images of dice that have already been segregated from a more complex roll.
Nell Bylen has posted to his github a very thoroughly documented approach to using tensorflow to train a model for detecting die rolls . He does a good job outlining the difference between image classification (determining rank and value from a cropped picture of a single die) and image detection/localization (finding and segregating each die in a roll). If I ever need to go down this rabbit hole his writeup will be invaluable in setting up the training infrastructure!
![](https://burksbuilds.com/wp-content/uploads/2024/06/nell-image-recognition.gif)
Favored Approach
I think that my problem is best broken down into two parts: segregating and cropping individual dice from a roll, then evaluating the rank and value of the cropped die image.
Dice Segregation
The simplest form of this problem is automatically cropping template images of a single known die (rank and value) on a plain background. The largest blob in view is very likely to be the single die in the frame. Many projects in my research used this approach successfully.
The harder form of this problem is cropping images of potentially occluded identical dice from a larger roll of multiple dice. While the machine learning models I found were imperfect at determining the rank and value of dice, I found no reason to believe that they could not determine the presence or location of dice in an image. Because I have complete control over the dice I am using, “overtraining” an algorithm to work on my particular dice may be beneficial.
However, the deterministic nature of the distance transform and watershed algorithm approach draws me in, especially because I haven’t seen anyone attempt this method on dice so far despite its apparent applicability. The watershed component may not be necessary because I really only need to locate the centroid of each die in order to crop and prepare it for identification. The examples online make this seem much easier to set up than a custom training workflow, so it is the obvious place to start.
Dice Identification
Using colors to help filter some aspect of a vision target appears to be a very useful trick, and I think distinguishing the rank of a die based on color is the easiest place to implement it. This circumvents the need to use an imperfect machine learning model to distinguish rank. Also, RPG dice are readily available in a wide range of colors, including sets where each rank of die is a different color so customizing the equipment to make use of the color track won’t be necessary.
Once a cropped image has been assigned a rank I think the most reliable way to determine the value is to use a template feature matching approach similar to many others who have already done this. Because each candidate die image is compared and scored against all templates it will be easier to adjust the thresholds of certainty and avoid cases where an overconfident machine calculates an incorrect result (it is must more preferable in my use case to admit that human intervention is required than to generate an incorrect result.)
There are plenty of great “tricks” already discovered to help make the matching more successful:
- Preparing more than one dissimilar template per die face to increase the odds of a high scoring match
- Penalizing matches that involve a large amount of distortion or scaling while preferring matches that only require rotation and some slight transformation and scale
- Weighting feature matches more heavily if they occur towards the center of the die
- Preferring matches where both images have a similar number of total features to avoid matching subsets of two-character numbers
- Removing some small features from the templates that are indicative of glare or shadowing that might be orientation specific
Potential Pitfalls
Segregating adjacent, overlapping dice of the same rank will be the biggest challenge of the project. The distance transform and watershed method ignores all of the context related to visible faces and numeric features that help humans visually segment dice from each other. Falling back on some kind of machine learning algorithm, if only for dice segmentation, seems like a reasonable backup plan.
All of the classical machine vision processing, especially a ton of template matching, will take a lot of compute power during operation. To meet the requirements of the system, the entire vision workflow likely needs to occur in under a second. I may be able to meet this requirement on my giant engineering desktop computer, but doing the same processing quickly on a mini PC or embedded computer could be impossible. As a fallback option, I can use my desktop to generate a very large auto-labeled dataset of my specific dice in my specific environment, then (over?) train a machine learning algorithm to determine rank and value of dice more reliably than the generic versions that used more variant datasets.