Area calculations under line graphs

  • Thread starter Thread starter Tryfan
  • Start date Start date
T

Tryfan

I have found it impossible to find out any information
regarding the calculation of area undernieth a specified
line in the 'area chart' option. How would you calculate a
given area undernieth a line?
 
Tryfan said:
I have found it impossible to find out any information
regarding the calculation of area undernieth a specified
line in the 'area chart' option. How would you calculate a
given area undernieth a line?

If you mean the area under the line that Excel uses to smooth its plots,
I can't yet tell you; haven't taken the time to work through it.
Apparently MS uses Bezier curves, but I haven't verified this, and am
not sure how end-point constraints are handled. An OK (often)
approximation to this, and *certainly* a reasonable, if not superior,
alternative, is to use a (cubic) spline fit. You can find posts with
code for doing the fit that also can return the integral of the curve
under my name, posted some years back under something like "splines" in
the archives maintained by Google. They fall under excel.*

If you are, on the other hand, comfortable with what Bernard suggests,
namely getting the area of a known function using the trapezoidal
method, you can do it very quickly as follows:
1) Enter some labels in column A.
In A1, enter "X_1"
In A2, enter "X_n"
In A3, enter "Panels"
In A4, enter "delta"

2) Specify bounds:
In B1, enter the lower bound of your range (the lowest value of "x"
for the area youy have in mind), e.g. 1
In B2, enter the highest value of "x", e.g. 5
3) Specify # of panels
In B3, enter something like 6500. Note--- this will be the # of
trapezoids in Bernard's diagram
4) Define some names:
Select A1:B3, choose Insert->Name->Create, click on the "left" box.
IMHO, defining names is a really good and useful practice that MS
Excel's current design sort of hides.
4) Enter formula to calculate delta
In B4, enter the formula =(X_n-X_1)/Panels
This is the base width of the trapezoidal panels.
5) Name delta:
Select B4, then go to Insert->Name->Define, and assign to cell B4 the
name "delta"
6) Create Midpoints
While in the Insert->Name->Define box, create a new name "midpoints",
and assign to it the formula
=X_1+ROW(INDIRECT("1:"&Panels-1))*delta

Ok, ok, this may have seemed like a hassle, but I am trying to help you
make this somewhat self-documenting. Finally, the good stuff:
Suppose your function is something like f(x) = Ln(x)^3. Then in, say,
A5, type in =(SUM(LN(midpoints)^3)+SUM(LN(B1:B2)^3)*0.5)*$B$4, and hold
ctrl-shift keys when you press Enter (or "return"). For the general case
of some f(x), you would have
=(SUM(f(midpoints)^3)+SUM(f(B1:B2)^3)*0.5)*$B$4
again, ctrl-shift-entered (it's an array formula).

The calc is very fast, and usually darn good. You can goof around with
the number of panels (cell B3) to get a feel for what additional calcs
will elicit.

If your function has "jumps" and the like, then this whole approach may
not be apprpriate, in which case repost. And, of course, if my
explanation didn't cut it somehow, please let us know.

HTH
Dave Braden
 
One minor tweak, David, and an oversight correction to the general
case.

The correction for the general case of some f(x): One would use, not
=(SUM(f(midpoints)^3)+SUM(f(B1:B2)^3)*0.5)*$B$4, but
=(SUM(f(midpoints))+SUM(f(B1:B2))*0.5)*$B$4

Tweak: If Panels=1 the definition of midpoints as
X_1+ROW(INDIRECT("1:"&Panels-1))*delta will fault. An alternative
formulation is X_1+(ROW(INDIRECT("1:"&Panels))-1)*delta
or
X_1+(ROW(OFFSET(Sheet1!$A$1,0,0,Panels,1))-1)*delta

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 
Thanks for spotting my mistype, Tushar. That is certainly what I had in
mind, evidenced by the original example of wanting to integrate f(x) =
Ln(x)^3.
It's darn fast, isn't it?
Dave B
 
Tushar,
What's the point of adding "+x*0" in the EVALUATE? To induce some
volatility? I ask because I don't get such on my setup (XL10 on Mac).
Your suggestion, as best I can implement it, doesn't update the result
with a change in parameters without going into the define name dialog.

Any clue?
TIA
Dave
 
Tushar,
Any feel for using OFFSET versus INDIRECT? As I see it, INDIECT is more
robust to woeksheet changes.
TIA
Dave
 
Something to do with forcing an array result/operation. See Stephen
Bullen's chrtfrm.zip at www.bmsltd.co.uk for an equally terse
explanation. I couldn't get my initial formulation to work. So, after
struggling with it for way too long, I checked his version.

Originally, I had a name XVals (instead of X), and the name YVals was
defined as =EVALUATE(SUBSTITUTE(myFx,"X",XVals)). It gave the correct
YVals when used as an array formula in a worksheet range, but any
arithmetic operation on YVals, such as SUM(YVals), yielded incorrect
results. So, after reading his implementation, I tried naming YVals as
=EVALUATE(myFx) and that also didn't work. But =EVALUATE(myFx&"+x*0")
did.

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 
Not sure what you mean by more robust to worksheet changes, but for
charting purposes, INDIRECT requires a lot more care setting up the
correct formulation (compared to OFFSET). In other words, OFFSET is a
lot more robust. So, I use it whenever possible.

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 
One comment, David.

I can't think of any satisfactory solution for this entire class of
problems. The class consists of problems for which UDFs are ideal, but
which require changing the worksheet to get the function result --
plotting a function, differentiating/integrating a function, or
optimizing a function. Note that by 'function' I mean anything that
yields a result in a worksheet cell. In a sense it requires treating
an XL workbook (worksheet?) as a function with a well-defined input and
a well-defined output. Interestingly enough, this feature already
exists, albiet in a very restricted manner. It is XL's Data | Table...
capability.

Using named formulas is limiting since they cannot be parameterized.
So, the technique that we've been discussing is limited in its
applicability to multiple functions in the same workbook/sheet.

That leaves us with a subroutine. Of course, the biggest downside is
that it any such solution is 'on demand' and not part of XL's
recalculation chain.

A long standing low priority task has been improvements to the Plot
Manager add-in. Part of the reason for not working on the program has
been that other than cosmetic changes -- like making the process of
data generation and plotting invisible -- I cannot think of any useful
improvements. I might use its source as the basis for a separate add-
in that provides numerical integration using one of a few methods
(Trapezoid, the 2 Simpson rules, ??)

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

Tushar and all,
This has grabbed me because we get the "area under a curve" question
fairly frequently. Yes, we can toss this into VBA, but I like the idea
of keeping it in the wks. This is an open invitation for folks to come
towards some sort of open "recipe" that most anyone could follow (ok,
that's a stretch, perhaps)

Just noticed that for your posted implementation (simpson's) to work as
intended, it is important that nbrVals be an even integer. So adjusting
your statement to something like
Nbrvals =Sheet1!$F$1+MOD(Sheet1!$F$1,2)
would be safer.

With that in mind, and adapting your ideas to my preference of keeping
the evaluated function "in the open" in the worksheet, we end up with
this:
==========
"Area under a curve"

To use Excel for evaluating the integral of, say, 2+3*(Ln(x))^0.6 using
Simpson's Rule (also see notes below):

1) Enter some labels:
In cell.....enter
A1 "X_1"
A2 "X_n"
A3 "NbrPanels"

2) Set some values:
In cell.....enter
B1 1
B2 2.5
B3 1000

3) Define some names:
Select A1:B3, then choose Insert->Name->Create. Make sure that only the
"Left Column" box is selected. If it isn't, you might have entered text
instead of numbers in the right column, or mis-selected the range. Press
OK.

4) Choose Insert->Names->Define
Enter each of the following names and their definitions, pressing Add
with each entry (you can copy and paste these):
EPanels =NbrPanels+MOD(NbrPanels,2)
delta =(X_n-X_1)/EPanels
Steps =ROW(INDIRECT("1:"&EPanels+1))-1
EvalPts =X_1+delta*Steps
SimpWts =IF(MOD(Steps,EPanels)=0,1,IF(MOD(Steps,2)=1,4,2))*delta/3

(optional) If interested in a trapezoidal approximation, define
TrapWts =IF(MOD(Steps,EPanels)=0,0.5*delta,delta)

5) Close the Define Names box, and in, say, cell D1, array-enter
=SUM(SimpWts*(2+3*LN(EvalPts)^0.6))
That is, type in the function, and hold ctl-shift when pressing Enter.

In general, ctrl-shift-enter =SUM(SimpWts*f(EvalPts))
where f() is a legitimate Excel expression that yields a scalar numeric
value.

To use the trapezoidal method, substitute in the above expression
TrapWts for SimpWts.

Notes:
(1) With this implementation, an odd number for NbrPanels doesn't cut it
(for Simpson's rule), so there will be no improvement moving from an odd
number to the next integer (odd ones are automatically changed to the
next even, internally).

(2) If you have "jumps" in your function, break it up at the points
where those occur, and add the pieces.

(3) If you want to know the are under the curve that Excel draws for
*data points* where the line isn't smoothed (i.e., there are straight
lines joining successive points... right-click [or ctrl-click] the
series, choose "Format Data Series",select the Patterns tab, deselect
"Smoothed line"):
(a) Suppose the "x values" are in a column range Xpts, the "y values"
in column range Ypts; both should have the same number of values.
(b) Insert->Name->Define "Pnls" to be =COUNT(Xrng)-1.
(c) Enter the function
=SUMPRODUCT((OFFSET(Xrng,1,0,Pnls)-OFFSET(Xrng,0,0,Pnls))*(OFFSET(Yrng,1,
0,Pnls)+OFFSET(Yrng,0,0,Pnls)))*0.5

(d) if, say, Xrng is a Row range, adjust the above OFFSETs to
OFFSET(Xrng,1,0,0,Pnls) and OFFSET(Xrng,0,0,0,Pnls)

(4) If you want to know the area under a smoothed graph of data points,
refer to the SplineIntegrate function at
http://groups.google.com/[email protected]

Excel apparently uses Bezier smoothing; to know that are under *that*
smooth, well, I dunno (yet).

(5) For other ideas and "visuals", see Tushar Mehta's post at
http://www.mrexcel.com/board2/viewtopic.php?t=59869
and Bernard Liengme's page:
http://www.stfx.ca/people/bliengme/ExcelTips/AreaUnderCurve.htm

For some tech info, refer to "Numerical Recipes in C" or such.

(6) Don't forget that there are sophisticated packages that you can call
from Excel to handle integration, such as Matlab, Maple and Mathematica.

=======

Do you (or anyone!!!) see any room for improvement here? If so, could
you please correct/edit as you see fit and repost (I trust you). As far
as I can tell, this framework allows one to toss in wts for most any
Newton-Cotes formula, so it's pretty flexible, though I don't see why
one would bother with more than the two listed here, with the possible
exception of Boole's (aka Bode) method.

TIA
Dave B

Tushar Mehta said:
If you want to see how to use Simpson's Rule using named formulas see
my post in http://www.mrexcel.com/board2/viewtopic.php?t=59869.

Implementing the trapezoidal rule with a variant of David's suggestion
that uses only named formulas:

Suppose you have the function as a *literal* (not as a formula) in B1.
Suppose the min. and max. x values are in D1 and D2 respectively.
Finally, suppose the number of slices you want is in F1.

Then, with the named formulas below, =TrapezoidalRlst in a cell will
give you the result.

myFx =Sheet1!$B$1
XMax =Sheet1!$D$2
XMin =Sheet1!$D$1
Nbrvals =Sheet1!$F$1
delta =(XMax-XMin)/Nbrvals
incrSteps =(ROW(OFFSET(Sheet1!$A$1,0,0,Nbrvals+1,1))-1)
X =XMin+incrSteps*delta
YVals =EVALUATE(myFx&"+x*0")
MultFacts =IF(MOD(incrSteps,Nbrvals)=0,0.5,1)
TrapezoidalRslt =SUMPRODUCT(MultFacts,YVals)*delta
 
Tushar,
Thanks for the thoughts. I'm in the same boat as you as regards a
solution for the entire class of problems.

What I *do* like about our techniques is that they are seriously
elegant, and appropriate, for many situations. I really like your way of
using the MOD function to set up weights. My taste is to keep the
function being integrated "open", but what the hey. Where I see our
(independently derived!) solutions falling short at this level is with
respect to the integral's bounds: If one wants to piecewise-integrate to
deal with discontinuities, then our approach seems to fall short of
anything but a clumsy solution.

But keeping in mind that the question is "what is the area under my
(Excel) graph", the problem is much, much easier to address than the
sort of stuff you, I and many others think of. The statement precludes
singularities and open intervals. And it is univariate (sigh).

Taking that posed problem quite literally, the question comes down to
(in the smoothed case) determining the integral under a cubic Bezier
curve. Brian Murphy has convinced me that this is what's up with Excel's
smooth. What I haven't yet figured out is how to best calculate the
integral of such a curve.

FWIW, consider Weddle weights instead of the Simpson ones. I just
discovered these after your last input to this stuff, and they look very
promissing for what one would see in the context of Excel. For a single
partition, they run [1 5 1 6 1 5 1], with the whole shebang multiplied
by (3/10). Error term is much tighter, the calc is just as fast.

I would greatly appreciate any other ideas as they come to mind,
*especially* regarding integrating those pesky Bezier curves.

Dave B

Tushar Mehta said:
One comment, David.

I can't think of any satisfactory solution for this entire class of
problems. The class consists of problems for which UDFs are ideal, but
which require changing the worksheet to get the function result --
plotting a function, differentiating/integrating a function, or
optimizing a function. Note that by 'function' I mean anything that
yields a result in a worksheet cell. In a sense it requires treating
an XL workbook (worksheet?) as a function with a well-defined input and
a well-defined output. Interestingly enough, this feature already
exists, albiet in a very restricted manner. It is XL's Data | Table...
capability.

Using named formulas is limiting since they cannot be parameterized.
So, the technique that we've been discussing is limited in its
applicability to multiple functions in the same workbook/sheet.

That leaves us with a subroutine. Of course, the biggest downside is
that it any such solution is 'on demand' and not part of XL's
recalculation chain.

A long standing low priority task has been improvements to the Plot
Manager add-in. Part of the reason for not working on the program has
been that other than cosmetic changes -- like making the process of
data generation and plotting invisible -- I cannot think of any useful
improvements. I might use its source as the basis for a separate add-
in that provides numerical integration using one of a few methods
(Trapezoid, the 2 Simpson rules, ??)

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions

Tushar and all,
This has grabbed me because we get the "area under a curve" question
fairly frequently. Yes, we can toss this into VBA, but I like the idea
of keeping it in the wks. This is an open invitation for folks to come
towards some sort of open "recipe" that most anyone could follow (ok,
that's a stretch, perhaps)

Just noticed that for your posted implementation (simpson's) to work as
intended, it is important that nbrVals be an even integer. So adjusting
your statement to something like
Nbrvals =Sheet1!$F$1+MOD(Sheet1!$F$1,2)
would be safer.

With that in mind, and adapting your ideas to my preference of keeping
the evaluated function "in the open" in the worksheet, we end up with
this:
==========
"Area under a curve"

To use Excel for evaluating the integral of, say, 2+3*(Ln(x))^0.6 using
Simpson's Rule (also see notes below):

1) Enter some labels:
In cell.....enter
A1 "X_1"
A2 "X_n"
A3 "NbrPanels"

2) Set some values:
In cell.....enter
B1 1
B2 2.5
B3 1000

3) Define some names:
Select A1:B3, then choose Insert->Name->Create. Make sure that only the
"Left Column" box is selected. If it isn't, you might have entered text
instead of numbers in the right column, or mis-selected the range. Press
OK.

4) Choose Insert->Names->Define
Enter each of the following names and their definitions, pressing Add
with each entry (you can copy and paste these):
EPanels =NbrPanels+MOD(NbrPanels,2)
delta =(X_n-X_1)/EPanels
Steps =ROW(INDIRECT("1:"&EPanels+1))-1
EvalPts =X_1+delta*Steps
SimpWts =IF(MOD(Steps,EPanels)=0,1,IF(MOD(Steps,2)=1,4,2))*delta/3

(optional) If interested in a trapezoidal approximation, define
TrapWts =IF(MOD(Steps,EPanels)=0,0.5*delta,delta)

5) Close the Define Names box, and in, say, cell D1, array-enter
=SUM(SimpWts*(2+3*LN(EvalPts)^0.6))
That is, type in the function, and hold ctl-shift when pressing Enter.

In general, ctrl-shift-enter =SUM(SimpWts*f(EvalPts))
where f() is a legitimate Excel expression that yields a scalar numeric
value.

To use the trapezoidal method, substitute in the above expression
TrapWts for SimpWts.

Notes:
(1) With this implementation, an odd number for NbrPanels doesn't cut it
(for Simpson's rule), so there will be no improvement moving from an odd
number to the next integer (odd ones are automatically changed to the
next even, internally).

(2) If you have "jumps" in your function, break it up at the points
where those occur, and add the pieces.

(3) If you want to know the are under the curve that Excel draws for
*data points* where the line isn't smoothed (i.e., there are straight
lines joining successive points... right-click [or ctrl-click] the
series, choose "Format Data Series",select the Patterns tab, deselect
"Smoothed line"):
(a) Suppose the "x values" are in a column range Xpts, the "y values"
in column range Ypts; both should have the same number of values.
(b) Insert->Name->Define "Pnls" to be =COUNT(Xrng)-1.
(c) Enter the function
=SUMPRODUCT((OFFSET(Xrng,1,0,Pnls)-OFFSET(Xrng,0,0,Pnls))*(OFFSET(Yrng,1,
0,Pnls)+OFFSET(Yrng,0,0,Pnls)))*0.5

(d) if, say, Xrng is a Row range, adjust the above OFFSETs to
OFFSET(Xrng,1,0,0,Pnls) and OFFSET(Xrng,0,0,0,Pnls)

(4) If you want to know the area under a smoothed graph of data points,
refer to the SplineIntegrate function at
http://groups.google.com/[email protected]

Excel apparently uses Bezier smoothing; to know that are under *that*
smooth, well, I dunno (yet).

(5) For other ideas and "visuals", see Tushar Mehta's post at
http://www.mrexcel.com/board2/viewtopic.php?t=59869
and Bernard Liengme's page:
http://www.stfx.ca/people/bliengme/ExcelTips/AreaUnderCurve.htm

For some tech info, refer to "Numerical Recipes in C" or such.

(6) Don't forget that there are sophisticated packages that you can call
from Excel to handle integration, such as Matlab, Maple and Mathematica.

=======

Do you (or anyone!!!) see any room for improvement here? If so, could
you please correct/edit as you see fit and repost (I trust you). As far
as I can tell, this framework allows one to toss in wts for most any
Newton-Cotes formula, so it's pretty flexible, though I don't see why
one would bother with more than the two listed here, with the possible
exception of Boole's (aka Bode) method.

TIA
Dave B

Tushar Mehta said:
If you want to see how to use Simpson's Rule using named formulas see
my post in http://www.mrexcel.com/board2/viewtopic.php?t=59869.

Implementing the trapezoidal rule with a variant of David's suggestion
that uses only named formulas:

Suppose you have the function as a *literal* (not as a formula) in B1.
Suppose the min. and max. x values are in D1 and D2 respectively.
Finally, suppose the number of slices you want is in F1.

Then, with the named formulas below, =TrapezoidalRlst in a cell will
give you the result.

myFx =Sheet1!$B$1
XMax =Sheet1!$D$2
XMin =Sheet1!$D$1
Nbrvals =Sheet1!$F$1
delta =(XMax-XMin)/Nbrvals
incrSteps =(ROW(OFFSET(Sheet1!$A$1,0,0,Nbrvals+1,1))-1)
X =XMin+incrSteps*delta
YVals =EVALUATE(myFx&"+x*0")
MultFacts =IF(MOD(incrSteps,Nbrvals)=0,0.5,1)
TrapezoidalRslt =SUMPRODUCT(MultFacts,YVals)*delta
 
Back
Top