Please write the following four paragraphs – Description of overfitting, Causes of overfitting, Cross-validation and Penalising complexity using the notes provided below. There is no need for a title page.
The notes are taken from section “Between blindness and hallucination” Pages 52 to 54 in the Master Algorithm and Sections ” The Case Against Complexity”, ” The Idolatry of Data” ” Detecting Overfitting: Cross-Validation” and ” How to Combat Overfitting: Penalizing Complexity” Pages 112 to 120 in Algorithms to Live by
Description of overfitting
a model that’s too simple can fail to capture the essential pattern in the data.
On the other hand, a model that’s too complicated, becomes oversensitive to the particular data points that we happened to observe.
The most complex models can fit any patterns that appear in the data, but this means that they will also do so even when those patterns are just random noise. Whenever a learner finds a pattern in the data that is not true in the real world, we say that it has overfitted the data.
John von Neumann, one of the founding fathers of computer science, famously said that “with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” Today we routinely learn models with millions of parameters, enough to give each elephant in the world his distinctive wiggle.
Learning algorithms are particularly prone to overfitting, though, because they have an almost unlimited capacity to find patterns in data.
The Bible Code, a 1998 bestseller, claimed that the Bible contains predictions of future events that you can find by skipping letters at regular intervals and assembling words from the letters you land on. Unfortunately, there are so many ways to do this that you’re guaranteed to find “predictions” in any sufficiently long text.
Causes of overfitting
There can be errors in how the data were collected, or in how they were reported.
Overfitting can be caused by noise or mismeasurement. Sometimes the phenomena being investigated are hard to even define, let alone measure.
Overfitting is a consequence of focusing on what we’ve been able to measure rather than what matters.
it’s not always better to use a more complex model, precisely because it is tuned so finely to that specific data set, the solutions it produces are highly variable. Simpler models using fewer data points are more stable and reflect the general truths. algorithms that use too many data points are too tightly fitted to the data.
Because overfitting presents itself initially as a theory that perfectly fits the available data, it may seem insidiously hard to detect.
How can we expect to tell the difference between a genuinely good model and one that’s overfitting? In an educational setting, how can we distinguish between a class of students excelling at the subject matter and a class merely being “taught to the test”? In schools, for example, standardised tests offer a number of benefits, including a distinct economy of scale: they can be graded cheaply and rapidly by the thousands.
Alongside such tests, however, schools could randomly assess some small fraction of the students—one per class, say, or one in a hundred—using a different evaluation method, perhaps something like an essay or an oral exam.
Research in machine learning has yielded several concrete strategies for detecting overfitting, and one of the most important is what’s known as Cross-Validation.
Simply put, Cross-Validation means assessing not only how well a model fits the data it’s given, but how well it generalises to data it hasn’t seen.
One way to avoid overfitting is to follow the Occam’s razor principle, which suggests that all things being equal, the simplest possible hypothesis is probably the correct one. Only the factors that have a big impact on the results remain in the equation—thus potentially transforming, say, an overfitted nine-factor model into a simpler, more robust formula with just a couple of the most critical factors.
If we introduce a complexity penalty, then more complex models need to do not merely a better job but a significantly better job of explaining the data to justify their greater complexity.
Russian mathematician Andrey Tikhonov proposed one answer: introduce an additional term to your calculations that penalise more complex solutions.
We must balance our desire to find a good fit against the complexity of the models.