Lucas Beyer, Vision Transformer researcher, details his 2013 experiments using von-Mises loss to predict continuous angles from discrete labels · Digg

Lucas Beyer, Vision Transformer researcher, details his 2013 experiments using von-Mises loss to predict continuous angles from discrete labels · Digg

Posts from X

Most Activity

VIEWS16.5K

Lucas Beyer (bl16)@giffmana

Now it turns out that training with these coarse labels as targets, allows the model to predict continuous angle! First a qualitative example in the pic below, a video of me (a held-out person; it's never seen me) turning in circles generating a ~smooth two-circle prediction.

That this works (continuous prediction from discrete label training) is not a coincidence. It's due to four factors coming together: - The smoothness of CNNs - The smoothness of Biternion output space - The smoothness of von-Mises loss - The data being (forcibly/naturally) noisy at the "borders" of the discrete classes.

Lucas Beyer (bl16)@giffmana

After some thinking and prototyping, the only thing I'd tolerate annotating that data myself, is dump individual frames/head-crops, and then range-select them and classify into quadrants.

This is fast but coarse. Since images are dumps from videos, there's continuity and I could average out like 5img/sec or so.

I did it twice: once for front/left/right/back. And once again for front-left/front-right/back-left/back-right. Then we can marge this to get 8-bin orientation.

Turns out @PINTO03091, the probably most GOATED labeller in history, converged to the same schema, which is what reminded me of my work yesterday.

34d16.5K6923

BOOKMARKS32LIKES81RETWEETS7REPLIES2

Lucas Beyer (bl16)@giffmana

I never put the paper on arxiv because my plots were too big, I hit arxiv's limit, and I since lost the source.

Paper: https://lucasb.eyer.be/academic/biternions/biternions_gcpr15.pdf Video: https://www.youtube.com/watch?v=5Kbsx7CWxIA Code: http://github.com/lucasb-eyer/BiternionNet

And I decided to make the slides public today: https://docs.google.com/presentation/d/15US8duAtU1dfWh0YMMaIemir33xYKdQECQgq9DoGLS8/edit?usp=sharing

Thanks a ton to @PINTO03091 for reminding me of this, and appreciating my old work, it warms my heart :)

PS: All the data I collected and labeled was never published and destroyed at the end of the project, sadly, as per regulations of the project grants, because it's (consenting) people's faces.

Lucas Beyer (bl16)@giffmana

Compare to what you get with the exact same data/labels, but using the smooth outputs (Biternion+vonMises), it's night and day.

Making the model output/loss naturally fit your problem-space is a game-changer.

34d6.2K8132

Lucas Beyer (bl16)@giffmana

This is about this:

kache@yacineMTB

if you're doing AI research at all; I recommend doing the "ETH zurich" route

Train models that use a single GPU. Make sure that it takes less than a minute to train models. Pufferlib is a great example.

The more models you train the more you learn

34d3.8K386

Lucas Beyer (bl16)@giffmana

The second big insight is that the "output space" for the model, a single number, is awkward.

From my times doing 3D graphics, I know that Quaternions (4-number vectors) are so much better for representing angles in 3D space. So I made a "2D quaternion" which I naturally call a "Biternion", and use that for the output parametrization.

Oh wow, another huge gains by giving the model a more natural output space!

Lucas Beyer (bl16)@giffmana

The first big insight is that, as we should all know, linear regression is doing MLE with a Gaussian. I learned that there exits such thing as a Gaussian defined on the circle: von-Mises distribution. So I turn it into a loss, and yay, massive improvements!

34d2.5K335

Lucas Beyer (bl16)@giffmana

After some thinking and prototyping, the only thing I'd tolerate annotating that data myself, is dump individual frames/head-crops, and then range-select them and classify into quadrants.

This is fast but coarse. Since images are dumps from videos, there's continuity and I could average out like 5img/sec or so.

I did it twice: once for front/left/right/back. And once again for front-left/front-right/back-left/back-right. Then we can marge this to get 8-bin orientation.

Turns out @PINTO03091, the probably most GOATED labeller in history, converged to the same schema, which is what reminded me of my work yesterday.

Lucas Beyer (bl16)@giffmana

However, because of the robot's first-person perspective, this dataset/model was useless, and it was the only existing dataset.

Luckily this was a large collaboration project across many Uni's, so each time we got together, I grabbed each of my colleagues and made them walk around in circles in front of the robot to record data.

I then played around with different (self-made) UIs to annotate the data (myself). But everything I could make was tedious, slow, and imprecise. I am and have always been a lazy bum. No way I spend half my PhD annotating this, just to throw it all in the bin when we decide to put the robot's camera elsewhere. I had to come up with something scalable.

34d2.5K404

Lucas Beyer (bl16)@giffmana

Compare to what you get with the exact same data/labels, but using the smooth outputs (Biternion+vonMises), it's night and day.

Making the model output/loss naturally fit your problem-space is a game-changer.

Lucas Beyer (bl16)@giffmana

A more "quantitative" evaluation and ablation that this actually works.

This is an "angle heatmap" of predictions. The "simple" but non-smooth thing to do with such discrete data would be to do softmax classification, and then use the probabilities to interpolate into a continuous output. As you can see, that doesn't really become smooth, largely because of softmax+dl being notoriously over-confident.

34d2.2K324

Lucas Beyer (bl16)@giffmana

A more "quantitative" evaluation and ablation that this actually works.

This is an "angle heatmap" of predictions. The "simple" but non-smooth thing to do with such discrete data would be to do softmax classification, and then use the probabilities to interpolate into a continuous output. As you can see, that doesn't really become smooth, largely because of softmax+dl being notoriously over-confident.

Lucas Beyer (bl16)@giffmana

Now it turns out that training with these coarse labels as targets, allows the model to predict continuous angle! First a qualitative example in the pic below, a video of me (a held-out person; it's never seen me) turning in circles generating a ~smooth two-circle prediction.

That this works (continuous prediction from discrete label training) is not a coincidence. It's due to four factors coming together: - The smoothness of CNNs - The smoothness of Biternion output space - The smoothness of von-Mises loss - The data being (forcibly/naturally) noisy at the "borders" of the discrete classes.

34d2.2K272

Lucas Beyer (bl16)@giffmana

However, because of the robot's first-person perspective, this dataset/model was useless, and it was the only existing dataset.

Luckily this was a large collaboration project across many Uni's, so each time we got together, I grabbed each of my colleagues and made them walk around in circles in front of the robot to record data.

I then played around with different (self-made) UIs to annotate the data (myself). But everything I could make was tedious, slow, and imprecise. I am and have always been a lazy bum. No way I spend half my PhD annotating this, just to throw it all in the bin when we decide to put the robot's camera elsewhere. I had to come up with something scalable.

Lucas Beyer (bl16)@giffmana

The second big insight is that the "output space" for the model, a single number, is awkward.

From my times doing 3D graphics, I know that Quaternions (4-number vectors) are so much better for representing angles in 3D space. So I made a "2D quaternion" which I naturally call a "Biternion", and use that for the output parametrization.

Oh wow, another huge gains by giving the model a more natural output space!

34d2.4K240

Jake@Jakebgo

@willdepue is this for steering video gen?

34d21