Lucas Beyer, Vision Transformer researcher, details his 2013 experiments using von-Mises loss to predict continuous angles from discrete labels
The approach reduced angular regression error to 29.4 degrees.
The second big insight is that the "output space" for the model, a single number, is awkward.
From my times doing 3D graphics, I know that Quaternions (4-number vectors) are so much better for representing angles in 3D space. So I made a "2D quaternion" which I naturally call a "Biternion", and use that for the output parametrization.
Oh wow, another huge gains by giving the model a more natural output space!

The first big insight is that, as we should all know, linear regression is doing MLE with a Gaussian. I learned that there exits such thing as a Gaussian defined on the circle: von-Mises distribution. So I turn it into a loss, and yay, massive improvements!
However, because of the robot's first-person perspective, this dataset/model was useless, and it was the only existing dataset.
Luckily this was a large collaboration project across many Uni's, so each time we got together, I grabbed each of my colleagues and made them walk around in circles in front of the robot to record data.
I then played around with different (self-made) UIs to annotate the data (myself). But everything I could make was tedious, slow, and imprecise. I am and have always been a lazy bum. No way I spend half my PhD annotating this, just to throw it all in the bin when we decide to put the robot's camera elsewhere. I had to come up with something scalable.

The second big insight is that the "output space" for the model, a single number, is awkward. From my times doing 3D graphics, I know that Quaternions (4-number vectors) are so much better for representing angles in 3D space. So I made a "2D quaternion" which I naturally call a "Biternion", and use that for the output parametrization. Oh wow, another huge gains by giving the model a more natural output space!
After some thinking and prototyping, the only thing I'd tolerate annotating that data myself, is dump individual frames/head-crops, and then range-select them and classify into quadrants.
This is fast but coarse. Since images are dumps from videos, there's continuity and I could average out like 5img/sec or so.
I did it twice: once for front/left/right/back. And once again for front-left/front-right/back-left/back-right. Then we can marge this to get 8-bin orientation.
Turns out @PINTO03091, the probably most GOATED labeller in history, converged to the same schema, which is what reminded me of my work yesterday.
However, because of the robot's first-person perspective, this dataset/model was useless, and it was the only existing dataset. Luckily this was a large collaboration project across many Uni's, so each time we got together, I grabbed each of my colleagues and made them walk around in circles in front of the robot to record data. I then played around with different (self-made) UIs to annotate the data (myself). But everything I could make was tedious, slow, and imprecise. I am and have always been a lazy bum. No way I spend half my PhD annotating this, just to throw it all in the bin when we decide to put the robot's camera elsewhere. I had to come up with something scalable.
A more "quantitative" evaluation and ablation that this actually works.
This is an "angle heatmap" of predictions. The "simple" but non-smooth thing to do with such discrete data would be to do softmax classification, and then use the probabilities to interpolate into a continuous output. As you can see, that doesn't really become smooth, largely because of softmax+dl being notoriously over-confident.

Now it turns out that training with these coarse labels as targets, allows the model to predict continuous angle! First a qualitative example in the pic below, a video of me (a held-out person; it's never seen me) turning in circles generating a ~smooth two-circle prediction. That this works (continuous prediction from discrete label training) is not a coincidence. It's due to four factors coming together: - The smoothness of CNNs - The smoothness of Biternion output space - The smoothness of von-Mises loss - The data being (forcibly/naturally) noisy at the "borders" of the discrete classes.
Now it turns out that training with these coarse labels as targets, allows the model to predict continuous angle! First a qualitative example in the pic below, a video of me (a held-out person; it's never seen me) turning in circles generating a ~smooth two-circle prediction.
That this works (continuous prediction from discrete label training) is not a coincidence. It's due to four factors coming together: - The smoothness of CNNs - The smoothness of Biternion output space - The smoothness of von-Mises loss - The data being (forcibly/naturally) noisy at the "borders" of the discrete classes.

After some thinking and prototyping, the only thing I'd tolerate annotating that data myself, is dump individual frames/head-crops, and then range-select them and classify into quadrants. This is fast but coarse. Since images are dumps from videos, there's continuity and I could average out like 5img/sec or so. I did it twice: once for front/left/right/back. And once again for front-left/front-right/back-left/back-right. Then we can marge this to get 8-bin orientation. Turns out @PINTO03091, the probably most GOATED labeller in history, converged to the same schema, which is what reminded me of my work yesterday.
I never put the paper on arxiv because my plots were too big, I hit arxiv's limit, and I since lost the source.
Paper: https://lucasb.eyer.be/academic/biternions/biternions_gcpr15.pdf Video: https://www.youtube.com/watch?v=5Kbsx7CWxIA Code: http://github.com/lucasb-eyer/BiternionNet
And I decided to make the slides public today: https://docs.google.com/presentation/d/15US8duAtU1dfWh0YMMaIemir33xYKdQECQgq9DoGLS8/edit?usp=sharing
Thanks a ton to @PINTO03091 for reminding me of this, and appreciating my old work, it warms my heart :)
PS: All the data I collected and labeled was never published and destroyed at the end of the project, sadly, as per regulations of the project grants, because it's (consenting) people's faces.
Compare to what you get with the exact same data/labels, but using the smooth outputs (Biternion+vonMises), it's night and day. Making the model output/loss naturally fit your problem-space is a game-changer.
Compare to what you get with the exact same data/labels, but using the smooth outputs (Biternion+vonMises), it's night and day.
Making the model output/loss naturally fit your problem-space is a game-changer.

A more "quantitative" evaluation and ablation that this actually works. This is an "angle heatmap" of predictions. The "simple" but non-smooth thing to do with such discrete data would be to do softmax classification, and then use the probabilities to interpolate into a continuous output. As you can see, that doesn't really become smooth, largely because of softmax+dl being notoriously over-confident.
This is about this:
if you're doing AI research at all; I recommend doing the "ETH zurich" route Train models that use a single GPU. Make sure that it takes less than a minute to train models. Pufferlib is a great example. The more models you train the more you learn
magic
Now it turns out that training with these coarse labels as targets, allows the model to predict continuous angle! First a qualitative example in the pic below, a video of me (a held-out person; it's never seen me) turning in circles generating a ~smooth two-circle prediction. That this works (continuous prediction from discrete label training) is not a coincidence. It's due to four factors coming together: - The smoothness of CNNs - The smoothness of Biternion output space - The smoothness of von-Mises loss - The data being (forcibly/naturally) noisy at the "borders" of the discrete classes.





