Session 7: Solutions of Nonlinear Equations; Newton-Raphson Method

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: This lecture talked about the system of non-linear equations.

Instructor: James Swan

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation, or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at

OK, we're moving on. No more linear algebra. We're going to try solving some more difficult problems-- of course, all those problems will just be turned into linear algebra as we move on, so your expertise now with different techniques from linear algebra is going to come in handy.

The next section of this course, we're talking about systems of nonlinear equations, and we'll transition into problems in optimization-- which, turns out, look a lot like systems of nonlinear equations, as well. And we'll try to leverage what we learn in the next few lectures to solve different optimization problems.

Before going on, I just want to recap. All right, last time we talked about singular value decomposition-- which is like an eigenvalue decomposition for any matrix. And associated with that matrix are singular vectors, left and right singular vectors. And the singular values of that matrix.

Your TA, Kristen, reminded me that you can actually define a condition number for any matrix, as well. The condition we gave originally was associated with solving square systems of equations that actually have a solution. But there is a condition number associated with any matrix, and it's defined in terms of its singular values. So if you go back and look at the definition of the two norm, you try to think about the condition number associated with the two norm, you'll see that the conditions of any matrix maybe can be defined as the square root of the ratio of the biggest and the smallest singular values of that matrix.

So there's a condition number associated with any matrix. The condition number as an entity makes most sense when we're thinking about how we amplify errors-- numerical errors, solving systems of equation zones. It's most easily applied to square systems that actually have unique solutions, but you can apply it to any system of equations you want. And the singular value decomposition is one way to tap into that.

OK, the last thing we did discussing linear algebra was to talk about iterative solutions, the systems of linear equations. And that's actually our hook into solutions to systems of nonlinear equations. It's going to turn out that exact solutions are hard to come by for anything but linear systems of equations.

And so we're always going to have these iterative algorithms, where we refine initial guesses for solutions until we converge to something that's a solution to the problem that we wanted before. And one question you should ask, when you do these sorts of iterations, is when do I stop? I don't know the exact solution to this problem. I can't say I'm close enough-- what does close enough even mean? So how do I decide to stop?

Do you have any ideas or suggestions for how you might do that? You've done some of this already in a homework assignment. But what do you think? How do you decide to stop? Yeah?


PROFESSOR: OK, so this is one suggestion-- look at my current iteration, and my next iteration, ask how far apart are these two numbers? If they're sufficiently far apart, seems like I've got some more steps to make before I converge to my solution. And if they're sufficiently close together, the steps I'm taking are small enough that I might be happy accepting this solution as a good approximation.

That's called the step norm criteria-- I'll give you a formalization of that later-- it's called the step norm criteria. How big are the steps that I'm taking? And are they sufficiently small that I don't care about any future steps? Another suggestion?

AUDIENCE: I've got a question.


AUDIENCE: [INAUDIBLE] absolute [INAUDIBLE] when we did that for homework, I tried to do it [INAUDIBLE], and [INAUDIBLE].

PROFESSOR: Yes. I will show you a definition of the step norm criteria that integrates both relative and absolute error into the definition. And we'll see why, OK?

One problem may be-- what if the solution I'm trying to converge to is 0? How do you define the relative error with respect to the number 0? There isn't one-- there is only absolute error when you're trying to converge to 0. So you may want to have some measure of both absolute and relative step size in order to determine whether you're converged.

Is that the only way to do it, though? Have any ideas, alternative proposals for deciding convergence?

AUDIENCE: [INAUDIBLE] the residual.

PROFESSOR: The residual. OK, what's the residual?


PROFESSOR: Good, OK, so we asked, how good a solution-- we can ask how good a solution is this value that we've converged to by putting it back into the original system of equations, and asking, how far out of balance are we? I take my best guess for solution x, and multiply it by and A, and I subtract b-- we call that the residual.

And is the residual sufficiently converged, or not? If it's small enough in magnitude, then we would say, OK, maybe we're sufficiently close to the solution. If it's too big, then we say, OK, let's iterate some more until we get there. That is called the function norm criterion.

We're going to talk about these in detail, as applied to systems of nonlinear equations. But these same criteria apply to all iterative processes. Neither is preferred over the other. You don't know the exact solution-- you have no way of measuring how close or far away you are from the exact solution.

So usually you use as many tools as possible to try to judge how good your approximation is. But you don't know for certain.

What do you do, when that's the case? We haven't really talked about this in this class. You have a problem, you program it into a computer, you get a solution. Is at the end of the story? We just stop? You get a number back out, and that's the answer?

How do you know you're right? How do you know? We talked about numerical error-- every calculation has a numerical error-- how do you know you got the right answer?

What do you think?


PROFESSOR: OK, yeah, this is true-- so you plug your solution back into the equation, you ask, how good a job does it do satisfying that? But maybe this equation is relatively insensitive to the solution you provide. Maybe many solutions nearby look like they also satisfy the equation, but those solutions are sufficiently far apart. So how do you--


PROFESSOR: OK, so that sort of physical reasoning is a good one. In your transfer class, you'll talk about doing asymptotic expansions, or asymptotic solutions to problems. Solve this complicated problem in a limit where it has some analytical solution, and figure how the solution scales, with respect to different parameters. So you can have an analytical solution that you compare against your numerical solution in certain limits.

You have experiments that you've done. Experiment-- that's the reality. The computer is a fiction that's trying to model reality, so you can compare your solution to experiments.

You could also solve the problem a bunch of different ways, and see if all these answers converge in the same place. So we're going to talk about solving nonlinear equations-- we're going to need different initial guesses for our iterative methods. We might try several different initial guesses, and see what solutions we come up with. Maybe we converge all to the same solution, or maybe this problem has some weird sensitivity in it, and we get lots of different solutions that aren't coordinated with each other.

One of the duties of someone who's using numerical methods to solve problems is to try to validate their result-- by solving it multiple times or multiple ways, or comparing against experiment, or against known analytical results, and certain limits where the answer should be exact.

But you can't just accept what the computer tells you-- you have to validate it against some sort of external solution that you can compare with. Sometimes it's hard to come up with that solution, but it's immensely important. We know every calculation can be an error. And as we go on to more complicated problems, it's even more important to validate things.

So, systems of nonlinear equations-- so these are problems of a type f of x equals 0, where x is some vector of unknowns, and has dimension N. And f is a function that takes as input vectors of dimension N, and gives an output a vector of dimension N. It's a map from R N to R N. But it's no longer necessarily a linear map-- it can be some nonlinear function of all the elements of this x.

And the solution to this equation, the particular solution of this equation, x-- for which f of x equals 0-- are called the roots of this vector-valued function. The linear equations, then, are just represented in this form as A x minus b, A x minus b equals 0-- it's the same as the linear equations we were solving before. So we're searching for the roots of these functions.

Common chemical engineering examples include equations of state, often nonlinear, in terms of the variables we're interested in. Energy balances have lots of non-linearities introduced into them. Mass balances with nonlinear reactions. Or reactions that are non-isothermal, so their kinetics are sensitive to temperature, and temperatures are variable, we want to know.

These sorts of nonlinear equations crop up all over the place, and you want to be able to solve them reliably. Here's a simple, simple example. The Van der Waals equation of state-- here I've written it in terms of reduced pressure, temperature, and molar volume. Nonetheless, this is the Van der Waals equation of state.

And somebody told you once that if I plot pressure versus molar volume for different temperatures, I may see that, at a given pressure, there could be just one root-- one possible molar volume that satisfies the equation of state. Or there can be one, two, three potential roots.

It turns out we don't know, with nonlinear equations, how many possible solutions there are. We knew, for linear equations, I either had zero, one, or an infinite number of solutions. But with nonlinear equations, in general, there's no way to predict them.

This problem can be transformed into a polynomial, and polynomials are one of the few nonlinear equations where we know how to bound the possible number of solutions.

So we can transform this nonlinear equation to the form I showed you before-- we just move 8/3 T to the other side of this equation. We want to find the roots of this equation. So possibly, given pressure and temperature-- pressure and temperature-- find all the molar volumes that satisfy the equation of state.

This is actually overly simplified for a particular physical problem, of looking at vapor liquid coexistence of the Van der Waals fluid. You can't specify pressure and temperature independently-- the saturation pressure, the coexistence pressure depends on the temperature as the Gibbs phase rule.

So actually, phase equilibria is made up of three parts-- there's thermal equilibrium. I'm going to add two phases, a gas and a liquid, and they have to have the same temperature, otherwise they're not in equilibrium. There's got to be mechanical equilibrium of two phases, the gas and the liquid. They better have the same pressure, otherwise one is going to be pushing harder on the other one. They'll be in motion-- that's not in equilibrium.

They've got to have the same chemical potential-- they have to be in chemical equilibrium. So there can't be any net mass flux from one phase to another, otherwise, one phase is going to grow and the other is going to shrink. They're not in equilibrium with each other.

So actually, the problem of determining vapor liquid coexistence, in this Van der Waals fluid, involves satisfying a number of different equations, some of which are nonlinear, and are constrained by the equation of state.

Given the temperature, there are three unknowns-- the pressure, and the more volumes of the gas and the liquid. And there are three nonlinear equations we have to solve. Two of those are the equation of state, in the gas and the liquid-- I'll show them to you in a second-- and the other is this Maxwell equal area construction, which essentially says that the chemical potential in the two phases is equal. This is one way of representing that.

So we have to solve this system of nonlinear equations for the saturation pressure, the molar volume of the gas or vapor, and the molar volume of the liquid. And these are those equations. Here's the equation of state and the gas, here's the equation of state in the liquid, and here's the Maxwell equal area construction at defined values of P sat, V G and V L that satisfy all three equations. There's not going to be an analytical way to do this-- it has to be done numerically.

Here's a simplification that I can make, though. So I can take that equal area construction, and I can solve for P sat in terms of V G and V L. And so that reduces the dimensionality of these equations from three to two. And when it's two dimensional, I can plot these things, so that's helpful.

So let's plot f 1-- the equation of state in the gas is a function of V G and V L. Where that's equal to 0, that's this black curve here. Let's plot f 2-- the equation of state in the liquid as a function of V G and V L, where that's equal to 0, that's this blue curve here. And the solutions are where these curves intersect. So we're seeking out the specific points graphically where these curves intersect.

First, this solution, and that solution aren't the solutions we're interested in at all. That would say that the molar volume of the gas and the liquid is the same-- that's not really phase separation. We want these heterogeneous solutions out here.

So we need some methodology that can reliably take us to those solutions. We'll see that that methodology-- the most reliable methodology, and one of the ones that converges fastest to the solutions-- is called the Newton-Raphson method. But even before we do that, let's talk more about the structure of systems of nonlinear equations, and what sort of solutions we can expect.

Does this example makes sense to everyone? Have you thought about this before, maybe? Yeah.

So given a function, which is a map from R N to R N, find the special solution, x star, such that f of x star equals 0. That's our task. And there could be no solutions. There can be one to an infinite number of locally unique solutions, and there can be an infinite number of solutions.

A solution is to be locally unique if I can wrap that solution by some ball of points, in which there are no other solutions. That ball can be very, very small, but as long as I can wrap that solution in some ball of points, which are not solutions, we term that locally unique.

So consider the simple function, one which depends on two variables-- x1 and x2, and f 2, which depends on x1 and x2, equals 0. And I'm going to plot, in the x1, x2 plane, were f 1 and f 2 are equal to 0, so that these curves here. And here, we have a locally unique solution-- we see the curves cross at exactly one point. Here, you can see these two curves are tangent with each other.

They could be coincident with each other, over some finite distance. In which case there's a lot of solutions that live on some locally tangent area. Or they could just touch in one point. So they may be tangent, and the solutions there are not locally unique, or they may touch at one point, and the solutions are-- there's one solution, and it's locally unique there.

The reason why we talk about locally unique solutions is it's going to be hard for a numerical method to find anything that's not locally unique in a reliable way. Locally unique solutions, numerical methods can find very reliably. But if they're not locally unique? My iterative method could converge to any one of these solutions that lives on this line, any of these tangent points. And I'm going to have a hard time predicting which one it's going to go to. That's a problem if you're trying to solve something reliably over and over again-- if I converge to one of these solutions, or another solution, or another solution, the data that comes out of that process isn't going to be easy to interpret.

There's something called the inverse function theorem, which says if f of x star is equal to 0, and the determinant of this matrix, J, which we call the Jacobian, evaluated at x star is not equal to 0, then x star necessarily is a locally unique solution. So it's the inverse function theorem.

The Jacobian is a matrix of the partial derivatives of f, if an elements of f, with respect to different elements of x. So the first row of the Jacobian is the derivatives of the first elements of f, with respect to all the elements of x, and the other rows proceed accordingly. If the determinant of this matrix, evaluated at the solution, is not equal to 0, then this solution is necessarily locally unique. That's the inverse function theorem.

The Jacobian describes the rate of change of this vector-valued function with respect to all of its independent variables. And you may find that, for some solutions, the determinant in the Jacobian is equal to 0. We can't really say what's going on there-- the solution may be locally unique, it may not be locally unique. I'm going to provide you some examples in a second.

And most numerical methods are only going to find one of these local unique solutions at a time. If we have some non-local solutions, that'll cause us problems. So we tend to want to work with functions that have locally unique solutions to begin with.

OK, here's an example. Oh, you have your notes, so you know the formula for the Jacobian. Compute the Jacobian of this function.

So this function has a root at x1 equals 0, x2 equals 0. If you think graphically about what each of these little functions represents, you would agree that that route is locally unique-- it's just one point where both elements of this vector-valued function are equal to 0. Here's what the Jacobian of this function should be.

If you take the derivative of the first element with respect to x1, and then x2, take the derivative of the second element with respect to x1 and then x2-- at the solution, at the root of this function, where x1 is 0, and x2 is 0, the Jacobian is a matrix of zeros. Its determinant is 0. But the solution is locally unique.

The inverse function theorem only tells us about what happens when the determinant's not equal to 0. If the determinant's not 0, then we have a locally unique solution. Solution may be locally unique, its determinant may be equal to 0. Does that makes sense? You see how that plays out?

OK. There's a physical way to think about-- or a geometric way to think about this inverse function theorem. So think about the linear equation, f of x is A x minus b. You can show-- and you should actually work through this-- that the Jacobian of this function is just the matrix A. It says how the function changes, with respect to small changes in x. Well, that's just A-- this is a linear function.

So the equation, f of x equals 0, has a locally unique solution when the determinant of the Jacobian-- which is the determinant of A-- is not equal to 0. But you already knew that, right? We already talked through linear algebra. And so you know when this matrix A is singular, then we can't invert this system of equations and find a unique solution in the first place.

So the inverse function theorem is nothing more than an extension of what we learned about when functions are and aren't invertible. Because a locally unique solution when A is invertible.

In the neighborhood of f of x, in the neighborhood of a root of f of x, we can often approximate the function as being linear. We can treat it as though it's a system of linear equations, very close to that root. And then the things that we learned from linear algebra are inherited by these linearized solutions.

So here's this set of curves that I showed you before. Near this root, let's zoom in-- let's zoom in. These lines look mostly straight. It's like the place where two planes intersect-- intersect this x1, x2 plane-- they each intersect at a line, and the crossing of those lines is the solution. And it's locally unique. Because these two planes span different subspaces.

Here's the case where we may have non-locally unique [INAUDIBLE]. Zoom in on this root, and very close to this root, well, it's hard to tell. Maybe these two planes are coincident with each other, and they intersect and form the same line-- in which case they may not be locally unique. Maybe if I zoom in close enough, I see, no, actually, they have a slightly different orientation with respect to each other, and there is a locally unique solution there. It's difficult to tell here.

So these cases where the curves cross are easy to determine. These are the ones that the inverse function theorem tells us about. These ones are a little harder to work with.

So I mentioned that you can zoom in, and look close to a root, and approximate the function is linear. This is a process called linearization. You are in this for 1-D functions-- f of x, at a point x plus delta x, is f of x plus its derivative times delta x. And this'll typically be valid for reasonably well-behaved functions-- this sort of a linearization is going to be valid as delta x goes to 0. So as long as I haven't moved too far away from the point to x, I can approximate my function in the neighborhood of x using this linearization.

You know, turns out the same thing is true for nonlinear functions. So f of x plus delta x is f of x plus-- well, I need the derivatives of my function with respect to x, and those derivatives are partial derivatives now, because we're in higher dimensional spaces. That's the Jacobian multiplied by this vector of displacements, delta x.

And this will typically be valid as long as the length of this delta x is not too big-- as long as I haven't moved too far away from the point I'm interested in, f of x, this will be a reasonably good approximation. As long as our functions are well-behaved.

There's an error that's incurred in making these approximations. And for a general function that's well-behaved, that error is going to be ordered delta x squared-- and they're in the 1-D, or the multi-dimensional case.

And this sort of an expansion, it's just part of a Taylor expansion for each component of f of x. So we take element I of f. I want to know its value at x plus delta x. That's its value at x. Plus the sum of partial derivatives of that element, with respect to each of the elements of x, times delta x-- each of those delta x's.

Plus, there are some high order term in this Taylor expansion, which is quadratic in delta x instead. There's a cubic term, and so on. These quadratic terms are what give rise to this order, delta x squared error.

In higher dimensions, we typically don't worry about these quadratic terms. We're pretty satisfied with linearization of our system of equations. Sometimes for 1-D nonlinear functions, you can take advantage of knowing what these quadratic terms are to do some funny things. But in many dimensions, you usually don't use that. Usually you just think about linearizing the solution.

So if I know where the solution is, if I know it's close, if I can figure out points that are close to the solution, then I can linearize the function in that neighborhood-- I can find the solution to the linearized equation, instead. That's going to be suitably close to the exact solution. Does that makes sense?

Nonlinear equations, like I said, are solved iteratively. Which means we make a map in our algorithmic map which takes some value xy and generates some new value x plus 1, which is a better approximation for the solution we're after. And we designed the map so that the root, x star, is what's called a fixed point of the map.

So if I put x star in on this side, I get x star in on the other side. By design, the root is a fixed point of the map. The map may converge, or it may not converge, but the root is a fixed point. And we'll stop iterating when the map is sufficiently converged. You guys came up with two different criteria for stopping.

One is called the function norm criteria. I look at how big my function is-- I'm trying to find the place where the function is equal to 0. So I look at how big, in the norm space, my function is for my current best solution, and ask if it's smaller than some tolerance epsilon. If it is, then I say, well, my function is sufficiently close to 0-- I'm happy with this solution. The solution is close enough to satisfying the original equation that I'll accept it, and I stop.

The other criteria, it's called the step norm criteria. I look at two successive approximations for the solution. I take their difference, and ask, is the norm of that difference smaller than either some absolute tolerance, or some relative tolerance, multiplied by the norm of my current solution?

So suppose x is a large number, that spacing between these x's may be quite big, but the relative spacing may actually be quite small. And if the relative spacing is small enough, you might say, well, this is sufficiently converged. And so that's where this relative error, relative error tolerance, comes into play in the step norm criteria.

Suppose x is a small number, close to 0 instead. These steps may be very tiny-- these steps may be quite tiny. They may satisfy this relative criteria quite well, but you may want to put some absolute tolerance on how far these steps are before you stop instead. Because these x's may be small in and of themselves.

And so this one is easy to satisfy, but this one becomes harder. So you usually use both of these. Sometimes you have solutions that are converging toward small numbers, and then the absolute error tolerance becomes important. Sometimes you have solutions that are converging towards large numbers, and so the relative error tolerance becomes important. Does that make sense?

Of course, you can't just use this one, or just use that one. You typically like to use all of these. Because they can fail.

And if the function norm criterion fail-- here's an example where I'm taking some iterations, some approximate solutions that are headed towards the actual root of this function. And at some point, I find that this solution is within epsilon of 0. And so I'd like to accept this solution, but graphically it looks like it's very far away from the root. So this is a case where the function has a very shallow slope.

It's a very shallow slope, and the functional criteria, not so good, really. I call this a solution, but it's quite a ways away from star. So sometimes it's going to work, but sometimes it's not going to work.

Here's the step norm criteria-- here, I have a function nowhere near a root-- I have no idea where I am on this function, I don't know what value this function has, but my steps suddenly got small enough that they're smaller than this absolute tolerance, or they're smaller than the relative error tolerance. I might say, OK, let's stop. I'm not taking very large steps anymore, this seems like a good place to quit.

But actually, my function just after this didn't go to 0 at all, it curved up and went the other way. There's not even a solution nearby. So both of these things can fail, and we try to use both of them instead to evaluate whether we have a reasonable solution to our nonlinear equation or not. Make sense?

Are there any questions about that before you I go on? No. OK.

We also talk oftentimes about the rate of convergence of the iterative process that we're using. We might say it converges linearly, or quadratically. And the rate of convergence is always assessed by looking at the difference between successive-- well, we look at the ratio of differences for successive approximation. So here's the difference between my best approximation, step i plus 1 minus the exact solution, normed, divided by my best approximation at step i, minus the exact solution normed, and raised to some power, q.

And as I go to very large numbers of iterations, i-- this limit should be i, I apologize. I'll fix that in the notes online, but this limit should be i-- is i, the number of steps gets very large. This ratio should converge to some constant. And the ratio will converge to a constant when I choose the right power of q here.

So when this limit exists, and it doesn't go to 0, we can identify what sort of convergence we get. So if q equals 1, and C is smaller than when we say that convergence is linear, what is that saying, really? This top step here is the absolute error, an approximation i plus 1. This bottom step here is the absolute error in step i-- remember two is one for linear convergence. So the ratio of absolute errors, as long as that's less than 1-- I'm converging, I'm moving my way towards the solution. And we say that rate is linear.

If C is 10 to the minus 1, then each iteration will be one digit more accurate than the previous one. The absolute error will be 10 times smaller in the next iteration versus the previous one. That would be great-- usually C isn't that small.

If this power, for which this limit exists, q, is bigger than 1, we say the convergence is super linear. If q is 2, which we'll see is something that results from the Newton-Raphson method, then we say convergence is quadratic. What that means is the number of accurate digits in my solution will actually double with each iteration. Linear, which equals 10 to the minus 1, I get one digit per iteration. Quadratic, I double the number of digits per iteration-- I have one digit on one iteration, I get two the next one, and four the next one, and eight the next one.

So quadratic convergence is marvelous. Linear convergence, that's OK. That's about the minimum you'd be willing to accept. Quadratic convergence is great, so we aim for methods that try to have these higher order convergences. So you really quickly get highly accurate solutions.

You can go back and look at your notes and see that the Jacobian method, and the Gauss-Seidel method both show linear convergence rates. They're linear methods. Does this make sense? OK.

So I meant it mentioned Newton-Raphson. Hopefully, somebody at some point told you about the Newton-Raphson method for solving at least one-dimensional, nonlinear equations. It goes like this, though. You say, I guess my solution is close to this green point here. Let me linearize my function at that point, and find where that linear approximation has a root-- which is this next green point.

And then repeat that process here. I find the linearization of my function, this pink arrow. I look for where that linear function has a root, and that's my next best approximation. And I repeat this process over and over, and it will reliably-- under certain circumstances-- converge to the root.

What does that look like, in terms of the equations? So I linearized my function, so I want to approximate f of x at i plus 1, in terms of f of x at i-- so it's f of x at i, plus the derivative, multiplied by the difference between x i plus 1 and x i. And I say, find the place where this approximation is equal to 0, and determine what the next point that I'm going to use to approximate my solution is. So I solve for x i plus 1, in terms of x i.

How big of a step do I take from x i to x i plus 1? It's this big, so the ratio of the function to its derivative at x i. And the derivative does the job of telling me which direction I should step in. Derivative gives me directionality, and this ratio here tells me the magnitude of the step.

The magnitudes, you know, they're not very good oftentimes, because these functions that we're trying to solve aren't very linear. Usually they're highly nonlinear. What's really helpful is getting the direction right. You could go right, you could go left-- only one of those ways is getting you to the root. Newton-Raphson has this advantage-- it always points you in the right direction. OK?

Of course, you can do this in any number of dimensions, not just one dimension. So you can approximate your function as linear-- f of x i plus 1 is approximately 0. And then let's take our linearized version of the function, and let's find where it's equal to 0. Sometimes what's done is to replace this difference, x i plus 1, minus x i, with an unknown vector, d i-- which is the step size. How big a step am I going to take from x i to x i plus 1?

And so we solve this equation for the displacement, d i. Move f to the other side, so you have Jacobian times d i is minus f. And solve-- d i is Jacobian inverse times f. It's just a system of linear equations. Now we know our step size. So x i plus 1 is x i plus b, or x i plus 1 is x i minus Jacobian inverse times f.

The inverse of the Jacobian plays the same role is 1 over the derivative. It's telling us what direction to step in, in this multi-dimensional space. And this solution to the system of equations is giving us a magnitude of the step that's good, not great, but is taking us closer and closer to the root.

So this is the Newton-Raphson method applied to the system of nonlinear equations. This is really the way you want to solve these sorts of problems. It doesn't always work-- things can go wrong.

What sorts of things go wrong here? Can you guess? Yeah?


PROFESSOR: OK, this is good. So in the 1-D problem, sometimes the Newton-Raphson method can get stuck. So it won't have good necessarily global convergence properties. If you have a bad initial guess, it might get stuck someplace, and the iterates will converge. That can be true. What else can go wrong?


PROFESSOR: Good, so if you're derivative is 0, that's going to be problematic. What's the multi-dimensional equivalent of the derivative being 0?


PROFESSOR: What's that? Singular Jacobian, right? So if this is J, the Jacobian, has some null space associated with it, how am I supposed to figure out which direction to step in? There's some arbitrariness associated with the solution of this system of equations. So the derivative is 0 in the 1-D example, that's a problem.

That problem gets a little fuzzier, but it's still a big problem when we try to solve for the step size, or the step-- the Newton-Raphson step. They may not be able to do this.

You don't run into this very often, but you can, from time to time. One place where this is going to happen is if we have a non-locally unique solution. When we have one of those, we know the determinant of the Jacobian at that point is going to be 0. If we're close to those solutions, well the determinant of the Jacobian is going to be close to 0-- you might expect that the system of equations you have to solve becomes ill-conditioned.

So even though there may be an exact solution for all the steps leading up to that point, the equations may become ill-conditioned. You may not be able to reliably find those solutions with your computer, either. So then these steps you take, well, who knows where they're going at that point. It's going to be crazy.

There are ways of fixing all these problems. Let's do an example.

This is a geometry example, but you can write it is a system of nonlinear equations, as well. So we have two circles-- circle f 1, circle f 2 in the x1, x2 plane. They satisfy-- these are the locus of points, that satisfy the equation f 1 of x 1 and x 2 equals 0-- this is the equation for one circle, and this is the equation for the other circle. And we want the solution, we want the roots of this vector-valued function, f, for vector-valued x. And those are the intercepts between these two circles.

You can do it using Newton-Raphson, so you're going to need to know the Jacobian. So compute the Jacobian of this function. This is practice-- maybe most of you know how to compute a Jacobian, but some people haven't done it before. So it's always good to make sure you remember that the first row of the Jacobian is the derivatives of the first element of f. And later rows are later elements.

You don't want the transpose of the Jacobian. Then it won't work.

OK, so should look something like this. There's your Jacobian. The Newton-Raphson process tells us how to take steps from one approximation to the next. The step is equal to minus the Jacobian inverse evaluated, and my best guess for the solution multiplied by the function-- evaluated at my best guess of the solution.

You're never going to compute Jacobian inverse explicitly-- that's just code for solve this system of linear equations. So use the slash operator in MatLab, for example.

And here you go-- I had an initial guess for the solution, iterate 0 at minus 1 and 3. This is somewhere outside the circles-- it's pretty far away from the solutions. But I do my Newton-Raphson steps, I iterate on and on. And after four steps, you can see that the step size in absolute value is order 10 to the minus 3.

The function norm is order 10 to the minus 3, as well-- and maybe order 10 to the minus 2, but it's getting down there. These things are decreasing pretty quickly. And I move to a point that you'll see is pretty close to the solution.

Here's some things you need to know about Newton-Raphson, you'll want to think carefully about as we go forward. So it possesses a local convergence property. I'm going to illustrate that graphically for you.

So here, I didn't solve the problem once, I solved it-- I don't know, 10,000 times. And I chose different initial points to start iterating with. Here, minus 1, 3, that was one point. But I chose a whole bunch of them. And I asked, how many iterations-- how many steps did my Newton-Raphson method have to take before I got sufficiently close to either this root here, or this root there?

I don't remember what that convergence criterion was-- it doesn't really matter, but there was some 10 to the minus 3, or 10 to the minus 5 or 10 to the minus 8, the convergence criterion that I made sure the Newton-Raphson method hit, in both the function norm and step norm cases. And then, if the color on this map is blue-- the solution converged to this star in the blue zone-- if the color's orange, the solution converged to the star in the orange zone. And if the color is light, it didn't take so many iterations to converge. And if the color gets darker, it takes more and more iterations to converge.

So that's the picture-- that's this map. I solved it a bunch of times, and then I mapped out how many iterations did it take me to converge to different solutions? So you can see if I start close to the solution, the color is really light. It doesn't take very many iterations to get there. And the further way I move in this direction-- still doesn't seem like it take so many iterations to get there, either.

I need a good initial guess-- I want to be close to where I think the solution is. Because once I'm over here somewhere, I do pretty well. And the same is true on the other side, because this problem is symmetric.

There's a line down the middle here, and along this line, the determinant of the Jacobian is equal to 0.

So we talked about these points with the Newton-Raphson method. And if I pick initial guesses sufficiently close to this line, you can see the color gets darker and darker. The number of iterations required to converge the solution goes way up.

Now, the Newton-Raphson method possesses a local convergence property. Which means, if a locally unique solution, there's always going to be some neighborhood around that solution for which the determinant of the Jacobian is not equal to 0. And in that neighborhood, I can guarantee that this iterative process will eventually reach the solution.

That's pretty good. That's handy. These iterates can go anywhere-- how do you know you're getting to the solution? Are you going to waste your time, or are you going to get there? So that's this local convergence property associated with it, which is nice.

But it'll break down as I get to places where the determent of the Jacobian is equal to 0. so there could be a zone-- it's not in this one-- there could be a zone, for example, like a ring on which the determinant of the Jacobian sequence is 0. And if I take a guess inside that ring, who knows where that solution is going to go to. Could be something like Sam said, where the iterative method just bounces around inside that ring and never converges.

But when I have roots that are locally unique, and I start with good guesses close to those roots, I can guarantee the Newton-Raphson method will converge. I'll show you next time that not only does it converge, but it also converges quadratically. So you start sufficiently close to the solution, you get to double the number of accurate digits in each iteration. You can see that happening here.

OK, so I have one accurate digit, now I have two. The next iteration I'll have four, and so on. The number of accurate digits is going to double at each iteration.

So that's going to conclude for today. Next time, we'll talk about how to fix these problems with the Newton-Raphson method. So there are going to be cases where the convergence isn't ideal, where we can improve things. There are going to be cases where we don't want to compute the Jacobian or the Jacobian inverse. And we can improve the method. Thanks.