From Newton to gradient descent: how artificial intelligence learned to learn

The Quiet Idea Behind Modern AI: A Story of Gradient Descent

Modern artificial intelligence looks complex. Deep networks. Millions of parameters. Massive datasets.

Yet at its core lies a surprisingly simple idea:

If you make an error, take a small step in the direction that reduces it.

This idea feels intuitive. But intuition alone is not enough for mathematics.

Turning this thought into a precise, repeatable, and general method took nearly three centuries. This is the story of gradient descent – not as a formula, but as a quiet idea connecting Newton to modern AI.

1. It Started with Newton (But Not Gradient Descent)

In the 17th century, Newton introduced a powerful concept:

To find an optimum,
don't just look at the slope,
look at the curvature.

Newton's method is fast because it understands the shape of a function.

But it has a cost:

second derivatives are expensive,
and often impossible to compute in high dimensions.

Newton showed us that optimization is possible – but not always practical.

2. Defining Error Instead of Truth

Scientists soon realized:

measurements are noisy,
truth is uncertain.

So the question changed:

"Which answer is correct?"

became

"Which answer makes the least error?"

This led to the idea of a loss function.

Squaring and summing errors made the problem smooth and solvable. This idea still defines machine learning today.

But one question remained:

How do we actually reduce this error?

3. Knowing the Minimum Is Not Enough

Mathematics told us:

at a minimum, the derivative is zero.

But that does not explain how to reach it.

This is where optimization truly begins.

4. Zoom In and Everything Becomes Flat

A simple but crucial insight followed:

When you zoom in closely enough, any smooth function looks almost linear.

Small steps are trustworthy. Large steps are not.

Gradient descent is built on this local trust.

This local linear view is formalized in mathematics as the Taylor approximation.

It is the decision to momentarily set aside the full complexity of a function and treat it as approximately linear at the point where you are standing.

The entire logic of gradient descent rests on this simple but powerful assumption.

5. Slope Can No Longer Be a Number

Real problems are high-dimensional.

If you can move in infinitely many directions, a single number cannot describe change.

The solution:

measure change along each variable,
collect them into a vector.

That vector became known as the gradient.

6. The Steepest Direction Question

Now the key question could be asked:

"If I can take only one small step, which direction reduces the error the fastest?"

The answer is derived, not assumed:

fastest increase → gradient direction
fastest decrease → negative gradient

7. Cauchy: Turning Insight into an Algorithm

In 1847, Cauchy made a simple decision:

"At each step, move slightly in the negative gradient direction."

Repeat.

At that moment:

a goal existed,
a direction was known,
a step mattered,
iteration began.

👉 Gradient descent was born.

8. Why Small Steps Matter

Because the method relies on local linearity.

Large steps break the assumption. Small steps preserve it.

Gradient descent favors stability over speed.

9. Where AI Enters the Story

By the 20th century, learning was reframed as:

minimizing a loss function.

Deep learning does exactly this:

compute error,
compute gradients (via backpropagation),
update parameters slightly.

SGD, Adam, Momentum – these are refinements, not new ideas.

Final Thought

Newton gave us speed.
Cauchy gave us patience.
Artificial intelligence is built on that patience.

We did not find perfection in one leap.
We found a way to move consistently toward better –
and then taught machines to do the same.

From Newton to gradient descent: how artificial intelligence learned to learn

The Quiet Idea Behind Modern AI: A Story of Gradient Descent

1. It Started with Newton (But Not Gradient Descent)

2. Defining Error Instead of Truth

3. Knowing the Minimum Is Not Enough

4. Zoom In and Everything Becomes Flat

5. Slope Can No Longer Be a Number

6. The Steepest Direction Question

7. Cauchy: Turning Insight into an Algorithm

8. Why Small Steps Matter

9. Where AI Enters the Story

Final Thought

Author

Read next

Reasoning Effort

Is Evolution Going Somewhere? Notes on Humans, Consciousness, and Information

All You Need Is V?

A token's journey

Subscribe to Newsletter