TLDR: In areas like classical physics where degrees of freedom are low, a comparatively small amount of data can enable you to accurately predict how the macro will react. E.g. I don’t need to know the quantum state of every single subatomic particle in a baseball in order to calculate how long it takes to reach the floor if I drop it. I need merely two variables (the downward acceleration caused by Earth’s gravitational field and the height from the ball to the floor), resulting in a simple yet accurate model. In highly complex systems such as business, things like butterfly effects can cause massive distortions rendering our models flawed. Pretending, as we (correctly) do in physics, that there’s a near-perfect bijection between our simple model and reality often causes major mistakes. We must caution against an over-reliance on numbers and the easily measurable because for us, the valuable and the (easily) measurable aren’t always synonymous. Until we have better models or are able to collect enough of the right data points (big and minute), it’s often preferable, as well as faster, to simply try things out on a small scale in the real world and use a feedback loop to iterate.
The specific anticipative understanding of the conditions of the uncertain future defies any rules and systematization. It can be neither taught nor learned. If it were different, everybody could embark upon entrepreneurship with the same prospect of success. What distinguishes the successful entrepreneur and promoter from other people is precisely the fact that he does not let himself be guided by what was and is, but arranges his affairs on the ground of his opinion about the future. He sees the past and the present as other people do; but he judges the future in a different way. In his actions he is directed by an opinion about the future which deviates from that held by the crowd.
– Ludwig von Mises (1949).
It used to be a very contrarian thing to say that business plans have virtually no value.
That’s not true anymore.
Most people now agree that business plans are mostly useless.
But if we think about it… that seems extremely counterintuitive, doesn’t it?
We have more ways to capture data, visualize data, and use data to guide decisions.
How on earth is it possible that with such an abundance of data, we don’t see an extremely high correlation between being highly data-driven, doing years of market research, using complex mathematics to make projections and a successful business outcome? Or if anything an inversely proportionate relationship?
It just doesn’t seem to add up…
I think the answer can be found in long- and medium-range forecasting in meteorology.
…a forecast for 5 days out is typically less reliable than a forecast for the next day. This occurs since small changes and small size phenomena are more likely to influence observed weather events as time advances (Butterfly Effect). It is more difficult to analyze phenomena as the size gets smaller, thus it is difficult to know how the extreme multitude of tiny phenomena will impact observed weather as time moves forward. (Haby, 2019)
In order to perfectly predict the weather, you’d need incredible amounts of data and an equally overwhelming number of data points and then somehow synthesize that into an accurate prediction.
But when you’re dealing with business plans which involve competitors, the pace of innovation, markets (which are made up of many individuals each with their own mind), and are often projecting many, many years into the future… we’re probably dealing with an order of magnitude more difficulty.
It quickly becomes an exercise of the imagination of the author as it starts to resemble a fantasy novel behind a pseudo-scientific facade.
It’s like trying to predict the exact place on the floor a crumpled up piece of paper will land if it’s thrown…
If that piece of paper is a hundred dollar bill…
And it’s placed in the middle of Times Square…
The day before NY’s Eve…
With elementary Newtonian physics, we can just calculate how far a person will throw it and then we’ll have our answer.
Who’s gonna throw a hundred dollar bill away?
Someone will find it and might use it to buy some fireworks of a sketchy dude who only sells snakes and sparklers.
Always laugh at this scene because we, as entrepreneurs, have a tendency to almost fight the market about what they want to buy, because we’re so invested in what we want to sell.
Now our fireworks fella has the $100 bill, gets into a cab, leaves NYC and drops the bill while trying to hand it over.
If we somehow had all the information in the universe, we might be able to calculate the exact place that bill will touch the floor*1.
But we don’t.
And Big Data too but more on that later.
There are just too many degrees of freedom.
Think of degrees of freedom as ‘moving parts’. In statistics, it’s the number of variables that are allowed to vary.
If you have a set of five numbers that average out to a certain integer, then four numbers are free to change but the last one can not because that one is needed to create the right average. So you have n-1 = 4 degrees of freedom in this example.
In physics (and biomechanics), it’s the minimum number of variables required to completely describe how something can move.
Your arm (excluding your hand) has seven degrees of freedom. An iPhone that’s wirelessly charging has two (x and y axes).
The way I see it, there are two ways to view or use complicated mathematical models on reality, be it in economics or in our entrepreneurial world.
Option 1: Employ those select few people who truly and therefore deeply understand the possibilities and limitations of mathematical tools.
Think of people with Ph.D.’s or more generally, anyone who has a deep understanding from first principles.
Option 2: Have a childlike perspective and ask why a lot. Eventually, you might realize that the people who don’t belong in the first category don’t really understand the foundation.
Therefore, any mistake in the foundation renders everything that’s built on top of it useless.
They have enough understanding to make it seem like they’re knowledgeable but too little to actually understand what they’re doing, resulting in overconfidence in their models.*2
One such mistake can be seen in standard economic theory.
I’ll borrow an example given by Ole Peters (of the London Mathematical Laboratory) which he gives in his lecture, Time for a Change: Introducing irreversible time in economics (Peters, 2012).
Suppose you have a coin and $100 and you’re going to play a game of heads or tails.
If it lands on heads you’ll win 50%, if it lands on tails you’ll lose 40%.
The chance that it lands on heads or tails is P(heads) = P(tails)=0.5, so this seems like a good bet.
Some quick back of the envelope mathematics suggests that the
Expected Value = win 50% at 0.5 probability — lose 40% at 0.5 probability
= (50%*.5)–(40%*.5)=25%-20% = +5%
We’ll likely end up with more money than we started with. Sweet! But before we start picking out brand new cars with our imaginary winnings… let’s double-check to see if the math works out.
If we play enough times, the number of heads will start to roughly equal the number of tails.
Let’s see what happens when we play four times and we get HHTT:
$100*1.5*1.5*0.6*0.6 = $81
Hmm… That’s weird.
If we play 100 times we get $100*1.5⁵⁰*0.6⁵⁰=$0.51
In fact, the more we play, the closer we get to zero.
If we play N times, we should get half of that as heads and half of that as tails:
This implies that if we play long enough we’ll eventually lose all of our money.
This isn’t completely accurate because I make it seem like order doesn’t matter. It doesn’t in mathematics because we won’t run out of small integers. In the real world, the smallest unit of money is one cent, so if you lose that bet there’s no bouncing back. So in my example, 50 losses followed by 50 wins will get you to the same outcome as the reverse, while in reality, you’d lose your last penny after you’d lose for the 19th time in a row, starting with $100. (N>Log0.6 ($0.01/$100) = 18.03).
Turns out that if we lose 40% we have $100*0.6 = $60. In order to break even, we need to win at least 66,66..%, $60*1.66.. = $100.
And our imaginary cars?!
Suppose we had a group of 10.000 people, each with $100, playing this game once.
The group then has $1.000.000 collectively.
With the probability of both heads and tails being 50%, suppose 5000 people win and 5000 people lose.
That means 5000 people now have $150 and 5000 people have $60.
This gives us 5000*$150 = $750.000
And 5000*$60 = $300.000
Giving us a total of $1.050.000.
And tada… we’ve got our +5% Expected Value back. *3
As it turns out, it’s not just the odds of the bet that matters but also the bet size. Imagine a scenario where if you win you win 300% but if you lose, you lose 100%. Betting all your capital is not a good strategy because while you’ll grow fast when you win, all it takes is one loss to lose everything. On the other side of that extreme is betting the smallest amount possible, one cent. Now you won’t lose much but it’ll take forever to make some serious money. And as a wise woman once said: ‘’Ain’t nobody got time for that!’’.
So the ideal bet size is somewhere in that $0.01 — $100 range. In order to determine the right % of your capital that you should bet in order to grow as quickly as possible and not so aggressively that you lose it all, there’s something called the Kelly Criterion. It suggests that the ideal percentage of our capital that we should be betting is: Kelly % = (50/100)-((100–50)/300)*100% = 33% of our capital. If you’re interested you can read more about it here.
Quick side note, in our original bet (40% when we lose a bet, 50% when we win a bet) the Kelly Criterion was 25%. Betting all our capital (100%) for exceeds our KC which is why we go broke in our time perspective.
This is this essence of our little conundrum, winning or losing depends on how you analyze it.
Time: In one situation we increase the number of times one person plays the game to get rid of noise. As we play longer and longer we see that we’re losing money and that this is a bad bet.
Ensemble: In another situation, we increase the number of people playing the game to get rid of noise. And when we take their average, it’s a good game in the collective sense.
The first situation looks like our game. The second situation looks like insurance.
If we were somehow able to travel through the branches of the universe, then it wouldn’t matter if RJ in this universe lost, because RJ in universe 1243 won.
Since that’s obviously not possible (or at least, I don’t have enough in my savings account to afford such a trip), it makes no sense to analyze this game with an ensemble perspective.
So we see that there’s a difference between time averaging and ensemble averaging. This is called non-ergodicity, and it’s obviously very important to us because most of us don’t care if on average our country gets wealthier if we go dead broke.
If we borrow an extra 200 bucks and increase our leverage to 300% then in the ensemble perspective, we’ll grow our expected value and make even more money.
In the time perspective, however, we’ll be wiped out almost instantly.
So it seems that in this case ‘Go Hard Or Go Home’, will result in you going home. (Probably to a severely pissed off spouse who’s mad that you didn’t factor in non-ergodicity. Youngling & Feynman, saving marriages again.)
Processes, where there’s no difference, are called ergodic. So the mistake that one can easily make in the above example is unknowingly assuming that this game is ergodic. If that assumption turns out to be wrong, it can have very bad consequences for an individual.
To add further confusion, the way the term ergodicity is used in economics differs from physics. In physics, something is ergodic when there’s no difference between time- and ensemble averaging (this is the definition we’ve been using). In economics, something is ergodic if the laws don’t change over time.
If we define science in such a way that only mathematics fits that narrative, of course, it isn’t.
But as we move from mathematics to physics to chemistry to biology to economics to psychology to sociology, it gets harder and harder in some sense.
You see, in math, you control the game.
You get to set the axioms.
Then you create the definitions and you use those to reason from the axioms.
If you have a hypothesis, you call it a conjecture and once someone proves it, it becomes a theorem.
It’s also possible to disprove things of course. You can find a counterexample to show a conjecture is false. Another way is to use something called proof by contradiction, which is purposefully assuming something you know is false then show that it leads to a contradiction and therefore something else has to be true.
Once you get into the realm of physics it already gets harder because, unlike math, it’s no longer enough to have a beautiful theory.
If it doesn’t apply to nature, it’s wrong. For example, it’s possible that string theory is a beautiful theory but just doesn’t apply to the universe we live in.
Newton, for example, didn’t ‘understand’ how gravity worked.*4
He just found the mathematical framework that he could map onto reality and use it to accurately make predictions.
It goes without saying that this was an incredible achievement.
As we later found out, there are situations, where those frameworks stop being accurate and we needed a different one for those situations.
That’s why we have things like special relativity, general relativity, and quantum mechanics.
These are mathematical models that allow us to work in situations where we’re dealing with high speeds (near the speed of light), strong gravity (e.g. black holes), and when we’re dealing with sub-microscopic systems (e.g. atoms), respectively.
You don’t have many degrees of freedom when you hold a ball in your hand and let it drop to the floor. This allows us to use that consistency and predictability to create a model that describes that one-dimensional motion.
t = (2h/g)^(0.5), where t is time, h is height and g is the gravitational acceleration at 9.81m/s².
There are many more degrees of freedom involved. And unfortunately not just known unknowns, but also many unknown unknowns.
A change in mood, for example, will likely have a huge impact on buying behavior.
In Why Spend Less When You Can Spend More? Final Part, I gave the example of a counterintuitive choice architecture:
You could buy an online subscription to a magazine for $59, a print subscription for $125 and a combination of both for $125.
The inclusion of the seemingly useless middle option actually drastically changed the number of people opting for the combination package.
Big data can have a lot of uses but we need to be careful with just randomly collecting a ton of data in the hopes of some machine learning algorithm magically finding signal in all that noise.
In the TEDx talk below, Tricia Wang (a technological ethnographer) tells the story of how, while doing ethnographic research for Nokia, she discovered people in third world countries were willing to do virtually anything to get their hands on a smartphone.
(Ethnography is the systematic study of people and cultures.)
She didn’t have many data points but because she was living among the locals, here data was very deep.
Nokia refused to take her recommendation seriously and chose to ignore the smartphone because they had millions of data points which suggested people were only willing to pay a certain amount for a phone.
She had a hundred data points which indicated that the rules of the game change for people when you go from a normal phone to a smartphone.
The problem with all data is that it comes from the same place, the past.
Rory Sutherland (Vice Chairman, Ogilvy UK)
It doesn’t matter how many data points you have if they’re wrong and it doesn’t matter how little data points you have if they’re right, so we need to caution against an over-reliance on quantitative data.
I think it’s easy to forget that the purpose of data is just to help you make a decision. It was never intended to be a be-all-end-all in and of itself.
This is very similar to statistics. It’s a tool we use to cope with pragmatic limitations. If we’d want to calculate average human height and we had perfect and instant data of all humans’ height at this moment, you could simply calculate it. But of course, we don’t. So we use statistics to turn a comparatively small data set into a useful derivative of reality. The danger here is when we forget that the purpose of using statistics was to get a useful approximation of the real-world problem we were trying to solve.
In ‘Highlight negative results to improve science’, Devang Mehta writes the following:
The pressure to publish a positive story can also lead scientists to spin their results in a better light, and, in extreme instances, to commit fraud and manipulate data . . . The problem is worsened by funding agencies that reward only those researchers who publish positive results, when, in my view, it’s the scientists who report negative results who are more likely to move a field forward . . . Simply put, we need more honesty in science (Devang, 2019).
It’s easy for the takeaway to be human behavior is difficult, business is complex, all data is meaningless, we can’t apply a scientific process at all, let’s wing it.
And while there are people who lean heavily on intuition (Gary Vaynerchuk comes to mind) I believe if that’s the takeaway, the pendulum has swung too far out of whack.
The idea, in my opinion, is simply that we need to find the right balance.
We should stop pretending that this is physics and a neat, simple model where we throw some stuff in will gives us perfect stuff out. You increase the force on lever A and therefore increase what happens at point B in an, exactly as expected, quantifiable way in physics.
But in our world, it’s messy.
I wrote about the inversely proportionate relationship between efficiency and effectiveness in marketing in Marketing Is Sex, Not Manufacturing. And I wrote about the strange nature of marketing declining in effectiveness when you use a paint-by-numbers approach in Marketing Is Comedy, Not Engineering.
To validate your hypotheses as quickly as possible with as little money as possible and the try and create experiments with asymmetrical risk.
Because the world is so complex, you’ll often have an answer faster by just building some prototype of the hypothesis you want to test.
If it catches on, you can always scale later.
It’s essentially hedging against false-positives. You minimize the chance that you spend a ton of time and money working on something which turns out to be a dud. It is possible that you kill things too early (false negatives). The opposite (most common) approach is to hedge against false-negatives. You believe in your idea and maximize time and money spent in order to be 100% sure that your idea won’t work. The problem is that this isn’t academia. We’re often bootstrapping, at least initially, so it’s better to quickly move on from something that doesn’t seem to connect with the marketplace aka hedging against false-positives.
So instead of doing years of research and burning tons of capital in order to figure out where you’ll build your next store for people to buy prescription glasses and sunglasses… why not take a page out of Warby Parker’s book and deck out a big, yellow school bus in order to create a mobile pop up store and gauge demand?
Asymmetrical risk means that you want to create experiments where you’re risking maybe 10% of your capital but if it works, it’ll double the business.
So as we often do at the end of these essays… today’s TLDR boiled down to a single sentence is:
The point is not to ‘not fail’, the point is to fail quickly and fail often in non-lethal ways so you have pragmatic, real-world knowledge to help you guide your decisions.
*1 This will probably depend on your interpretation of quantum mechanics because it’s not clear, even if we had all the possible information in the universe, that we could predict the future.
From my limited understanding, as soon as a particle interacts with some other particle the wave function of the universe branches. When we look at something, we don’t see things as a superposition of different possibilities described by the wave function. Instead, we always see them in a particular position.
So we might be able to predict all possible situations that would occur, but we still wouldn’t know which branch of the wave function we’re on. (Meaning which of those events would actually occur in our branch. The bill ending up in place X or Y etc.) I’ve framed this from an Everettian perspective. There are other interpretations of quantum mechanics such as the Copenhagen interpretation which, in a nutshell, says that the wavefunction collapses when it‘s observed according to the Schrodinger equation. What collapses means and how you define observed is left in the dark in this interpretation.
*2 This is partly the fault of our educational system. I’ve always found it strange for example that we allow psychologists to perform their own statistics. You don’t run a company with just the CEO. You have a mix of specializations. The job of a psychologist shouldn’t be to use their 3 courses of undergrad statistics to find a signal in the noise. Their job should be to come up with interesting and testable hypotheses. Once they’ve run experiments and collected the data, they should have highly qualified statisticians analyze it.
This would also remove biases and things like p-hacking. But unfortunately, it seems that it often isn’t about discovering the truth, but rather about ego and status. I quoted this essay before in The Melanie Principle, but it’s a transcript of Richard Feynman’s commencement address at Caltech. He tells the story of a scientist who tried to learn something about the behavior of rats. Instead, he discovered all the things you need to do to remove the noise.
[Other scientists] paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats (Feynman, 1974).
*3 What’ll actually happen in our previous example, where we played multiple times, is that most people will go bankrupt. But a few people will make such an incredible amount of money that they’ll pull up our average. This same problem occurs in the startup world, where most go broke and a few start a unicorn. We then forget about everyone going broke and focus only on the unicorns and you have yourself a wonderful case of survivorship bias.
*4 Those in the Youngling & Feynman fam know I enjoy working on mathematics, so this is in no way meant as a knock against hard science. But rather to illustrate that it’s simply too naive to frame psychology as a pseudo-science because, as you increase the degrees of freedom and include butterfly effects, it gets exponentially harder to create models that make accurate predictions.
Subjects like math and physics are difficult. But they benefit from the fact that there aren’t as many degrees of freedom. I think this is one of the main reasons why even before modern physics, we had so many models that could accurately predict phenomena in the world. From one-dimensional motion all the way to thermodynamics.
In fact, before Niels Bohr, Max Planck and Albert Einstein came along with quantum dynamics, there were multiple physicists who believed the entire field of physics was almost done. Here’s a quote from Planck about his advisor.
When I began my physical studies [in Munich in 1874] and sought advice from my venerable teacher Philipp von Jolly…he portrayed to me physics as a highly developed, almost fully matured science…Possibly in one or another nook there would perhaps be a dust particle or a small bubble to be examined and classified, but the system as a whole stood there fairly secured, and theoretical physics approached visibly that degree of perfection which, for example, geometry has had already for centuries.
You could, in some sense, say that the environment mathematicians and other hard scientists work in is less complex, which can make things like replicability easier. It’s important to note that, that’s not because of the genius of physicists but just because of luck. It just so happens to be that human behavior is harder to model than things like motion and forces. We could very easily imagine an alternate universe where behavior is extremely straightforward but where laws describing the universe are much more complex. It reminds me of a joke about a mathematician who was asked if he thought he could ever be an economist, to which he replied: ‘’No, I could never be an economist, it’s too hard!’’.
Feynman, R. (1974) Cargo Cult Science. Retrieved 11 October 2019, from http://calteches.library.caltech.edu/51/2/CargoCult.htm
Haby, J. (2019). Hard to forecast. Retrieved 8 October 2019, from http://www.theweatherprediction.com/hardtoforecast/
Mehta, D. (2019). Highlight Negative Results To Improve Science. Retrieved 10 October 2019, from https://www.nature.com/articles/d41586-019-02960-3
von Mises, L. (1949). Human Action: A Treatise on Economics, scholar’s edition. Auburn, Ala: Ludwig von Mises Institute.
Peters, O. (2012). Time For A Change: Introducing irreversible time in economics. Retrieved 8 October 2019, from https://youtu.be/f1vXAHGIpfc
Youngling, R. (2019). Marketing Is Comedy, Not Engineering. Retrieved 10 October 2019, from https://www.younglingfeynman.com/essays/comedy
Youngling, R. (2019). Marketing Is Sex, Not Manufacturing. Retrieved 11 October 2019, from https://www.younglingfeynman.com/essays/sex
Youngling, R. (2019). The Melanie Principle. Retrieved 11 October 2019, from https://www.younglingfeynman.com/essays/melanieprinciple