Samurai Swords: A Bayesian Perspective

A classic Japanese Katana, with a thickness of around 2-3 inches, has over 2000, hand-folded layers of steel. To put this into context, if you fold a sheet of paper 15 times, it will reach a height of 3 meters, or, in other words, Shaq with about 3 burritos on his heads. The swords were so powerful that foreigners would often find their blades shattered within seconds of a fight. So I guess the question on your mind is, what the hell does any of this have to do with Bayesian Statistics???

The type of steel used to make a sword defines the sword itself. There are a number of factors that influence the form of the steel, but the most dictating factor the steel’s concentration of Carbon. Steel that has a high Carbon concentration is extremely hard, but breaks easily. Steel with a low Carbon concentration is much stronger, but also much softer. Ancient Swordsmiths had long discovered this fact, and throughout the world, swords were made with a low-Carbon base and high-Carbon edges.

Low vs High Carbon Concentration

However, Carbon steel doesn’t occur frequently in nature, and until the 1800s, forges were not hot enough to heat Iron to the point where it could merge with Carbon. Fortunately (or unfortunately, depending on the perspective) humans have always been extremely innovative with building better weapons, so they didn’t let a little thing like the laws of physics stop them. Nah. The majority of the world found various ways of fusing iron with other high carbon element, essentially creating proxies for Carbon Steel.

Image result for war innovation

But the country that would go on to give us Sushi, Pokemon, and Teriyaki Sauce was not happy with that weak solution. And, as they would go on to do with Sushi, Pokemon, and Teriyaki Sauce, the Japanese were able to create something that otherwise would have been thought to be impossible: they created Carbon Steel.

They began like many other countries: creating a proxy for Carbon Steel. In their case, they infuse the Iron with Carbon by heating the Iron with Charcoal. And, like all other easy things, this alone doesn’t work: it’s not hot enough for the Carbon to truly infuse with the Iron, and so this is just a cheap proxy for Carbon Steel.

Which brings us to the folding. So the Japanese at this point are not better than any other cheap knockoff sword (i.e. every other sword in the world). But this is where grit comes in. It’s not possible to truly fuse the Carbon and Iron, so the Japanese do the next best thing: they literally fold the steel, over and over again. Each time they fold the steel, the Carbon and Iron atoms essentially undergo a random perturbation. Even though, theoretically, the steel only becomes carbon steel as the number of folds approaches infinite, after thousands and thousands of folds, the atomic structure of the steel is effectively the same as that of Carbon steel, a material capable of literally slicing through and shattering enemy swords.

 

So back to your first question: what the hell does Japanese Swordsmithing have to do with Bayesian Statistics??

I’m glad you asked :). Well, while I was reading about Japanese sword crafting methods, I realized, the difference between Japanese Swordsmiths and the rest of the world perfectly demonstrates the difference between Bayesian Statistics and Frequentist Statistics.

 

The Frequentist Machine Learning Approach:

“Yo. Here’s any easy, contrived loss function. Now let’s optimize this contrived loss function! Woah- found the best point! Booyakasha.”

And that’s how the majority of the world built their swords. They essentially developed methods with contrived assumptions that gave descent results. Thus, the atom arrangements of their steel were the equivalent of local maxima: if you changed it a little, the quality would decrease, but at some other set of atomic positions, with some other method, the quality will be much better.

A Dangerous attitude when dealing with optimization :/

 

The Bayesian Approach:

The Japanese did something different. Instead of assuming they knew what the optimal combination of elements were, they simply realized, the more times they hammered and folded the steel, the closer to the optimal, or posterior distribution of atom configuration the steel became. They effectively ran their steel atoms through a Monte Carlo Markov Chain (check out this post for more on MCMC) , running random perturbations on the atoms that probabilistically approach the optimal distribution. And the swords took much longer to make. But instead of optimizing some contrived convex loss function, the Japanese folded the sword so many times, ran so many probabilistic perturbations, that final sword’s atomic structure ends up reflecting the entire probabilistic landscape that defines Carbon Steel.

 

Ensemble Methods:

And today, the top machine learning models are essentially taking tips from Japanese steel. While ML is still a mostly frequentist domain for computational reasons, the unpredictability of Deep Learning models has lead to a rise in the popularity of Bayesian Ensemble models. The effect of these ensemble models is that they are effectively giving a (bad) sample of the optimization space, allowing for a much more probabilistic (kinda) prediction, rather than a single estimate.

 

Demo:

To see this in action, I’ll just use a popular ensemble model: A Random Forest, which is essentially an series of Decision Tree models with each Tree only leveraging a random subset of the features. (Check out this post for more on Random Forests)

I build an artificial dataset of overlapping Gaussian Blobs to just have a quick and clean dataset that similates (theoretically) real world data:

From there, I train a series of Random Forest Models using a various ensemble sizes:

And let’s check out the results 🙂

This plot demonstrates the difference between an Probabilistic Samurai sword and everyone else :). The more accurately you reflect the probabilistic landscape of an optimization space, the more accurate your model will be.

Just thought this was a super cool instance of Bayesian Statistics arising naturally in history and showing, once again, that even though it is a lot of effort, the results speak for themselves.

Check out this link for cool stuff on Samurai Swords. Hope ya’ll enjoyed.