This note discusses methods for estimating the values of the derivatives of an unknown function, given measured equally-spaced function values with a certain amount of random measurement noise. I was interested in a curvature analysis for which a second derivative would be useful. I knew that this would be a challenging problem, but I was surprised to find that the numerical methods usually suggested for this are really not very good.
As an example of the drawbacks of some of these commonly cited methods, consider the classic central differences "exact fit" polynomials. The idea is that if you construct a polynomial curve that passes through the data points, then the derivative of this polynomial is an estimate of the derivative of the underlying function. The following plot shows what happens if you apply this approximation strategy to estimate the second derivative of a cosine wave with peak amplitude 1, and then you adjust the frequency of the cosine wave. The green curve shows what the correct response magnitude should be at each frequency, while the blue curve shows what the approximation formula calculates.
f''(0) ≅ [ 2.0 f(-3h) - 27.0 f(-2h) + 270.0 f(-h) -490.0 f(0) + 270.0 f(h) - 27.0 f(2h) + 2.0 f(3h) ] * 1/(720.0 h2) where h is the interval between samples
While at first this seems like a very good approximation, it has a severe problem — its sensitivity to noise. A normalized frequency of 1.0 represents the sampling rate of 1 sample per 1 time interval. The Nyquist limit, above which frequencies can no longer be uniquely distinguished, falls at frequency 0.5, so this is the maximum range plotted. A frequency of 0.1 means roughly that the fastest meaningful "up and down" variation is spanned by 10 or more samples. If your data does not contain enough points for this, it is probably insufficient to represent the function well anyway, and you can expect poor estimates. Consequently, anything above normalized frequency 0.1 is probably not useful, and anything above 0.2 is almost certainly noise.
But notice in the preceding plot how the gains become very high above frequency 0.2. That is: noise is significantly amplified relative to the level of the values you are attempting to compute. It is better for the estimator not to track the ideal curve in this range, but instead to swing down as close as possible to zero, so that there is very little response to the noise.
Lanczos proposed a method described by R. W. Hamming :
"... a low noise differentiation filter is proposed that has (a) a unit slope at f=0 and (b) minimizes the sum of squares of the coefficients."
Large gain does tend to follow from filters that have large terms, so a filter that does not have these should be quieter... but the accuracy is poor at the low frequencies where accuracy matters.
Pavel Horoborodko presented another approach  to a weighted error design, producing derivative estimators that are better but that remain insensitive to noise. The following graph illustrates the frequency response characteristic of one of these second-derivative filter designs.
f''(0) ≅ [ 1.0 f(-4h) + 4.0 f(-3h) + 4.0 f(-2h) - 4.0 f(-h) - 10.0 f(0) - 4.0 f(h) + + 4.0 f(2h) + 4.0 f(3h) + 1.0 f(4h) ] * 1/(64.0 h2)
This was still not completely satisfactory. The high frequency noise rejection is quite good given the size of the filter, but the approximation error remains relatively poor at frequencies below 0.1 where you probably want near-perfect accuracy. If you increase the order of the filter used in this design (thus adding more terms), this gives better high-frequency rejection but further reduces the useful range at the "top end". If you reduce the order of the filter, this both reduces the effectiveness of the noise reduction and also compromises how well the approximation can match changes in curvature at the low end, limiting the estimator accuracy.
I tried a naive approach to this problem. The classical polynomial-based estimators become sensitive to noise because their approximation curves exactly match every measurement point, noise and all. A "standard" noise suppresion technique is to fit a smooth curve to the data set, without attempting to match the values exactly. What if this idea is used to fit a polynomial to the data, rather than fitting every point exactly? Low order polynomial approximations have limited curvature, hence a natural tendency to not track high frequencies, and this should somewhat limit the response to high-frequency noise. However, there is nothing to explicitly force noise rejection. It just happens naturally — or not! Here is the result of a typical design.
% To frequency 0.10 of sample rate with 0.01% peak error filter2 = [ ... -0.0025402 0.0224100 -0.0779679 0.1199416 -0.0274123 -0.1321265 ... 0.0337787 0.2130250 0.1305009 -0.1379610 -0.2832965 -0.1379610 ... 0.1305009 0.2130250 0.0337787 -0.1321265 -0.0274123 0.1199416 ... -0.0779679 0.0224100 -0.0025402 ] ;
This was rather surprising. Using approximately twice as many terms, this estimator came very close to matching the low noise properties of Holobrodko's "robust low noise design" at high frequencies, but with much better accuracy in the desired frequency range. The following compares the two methods given a noisy data sequence with at an input frequency of about 0.02. The ideal response is in green. The "robust low noise" estimate is in red. The "naive least squares" estimate is in blue.
If the fastest disturbance is spanned by 50 or more points, it is clear that the "least squares design" offers no advantage. However, the "least squares design" is accurate within 0.01% (relative to the input magnitude of 1.0) through frequency 0.10, and useful though not fully accurate beyond that.
I was starting to think maybe this was a useful idea — a relatively minor adjustment to control the high frequency noise response would produce a really superior design. Unfortunately, I couldn't find any reasonable way to do this. This approach was at a dead-end.
So I turned my attention to a frequency-based approach. I knew that a derivative operator would show a negative symmetry, which would correpond to a harmonic series of sine waves. So, if I could find a good sine-series approximation to the ideal frequency response curve, this frequency-oriented approach should produce a good filter design. Furthermore, since I was working with frequencies, it is straightforward to include some additional artificial terms to penalize any tendency to respond at high frequencies.
I won't go through the details of this aproach. It appeared to work very well, with noise rejection as good as the "robust low noise" designs, yet improved acuracy at low frequencies. This turned out to not be the breakthrough I was looking for.
However, two other approaches eventually turned up that show a lot more promise. If you need the best of the best, proceed directly to the FFT Design Method page. While the properties of these estimators are appealing, all of the manual tweaking necessary to get a design that fully meets all requirements is a bit tedious, and then applying the results takes a lot of computation. Another design approach produces more economical estimators with accuracy almost as good as the FFT methods, at the cost of allowing slightly more noise to leak through. See the Optimal Design Method page.
 The classic "Digital Filters" by R. W. Hamming covers the derivative estimator filters from a frequency analysis point of view, and describes the Lanczos low-noise designs in "an exercise."
 You can find Pavel's pages at:
Site: RidgeRat's Barely Operating Site http://home.earthlink.net/~ltrammell Author: Larry Trammell Created: Mar 24, 2011 Revised: Jan 22, 2012 Status: Stable Contact: NOSPAM ltramme1476 AT earthlink DOT net NOSPAM Related: Restrictions: This information is public