Hamming  suggested the way.
To go straight to the answer: you can get pretty much any accuracy you want, with whatever noise rejection you want, but you must be prepared to invest some effort to achieve it. (Or, you can use the suggested example design as it stands.) Let me show how these designs can be done.
A derivative operator will take a cosine wave, which is real symmetric,
and produce from it a negative sinewave, which is real
anti-symmetric. We know that an anti-symmetric real sequence will have
a spectrum with zero real parts. Examining the derivative of a cosine
wave as a function of its frequency, we find that the magnitude runs
from 0.0 to π as the frequency of the sampling rate varies from
0 to 0.5. Hence, the imaginary spectrum terms are this linear sequence
-1. To make the spectrum correspond to a real-valued
filter in the time domain, the second half of an FFT is the "complex
conjugate" of the first half of the spectrum. This information is
everything necessary to uniquely define the transform of an ideal
differentiator. (To apply this approach to a second derivatives:
square these terms, getting a negative, real-valued, symmetric ideal
But as we also know, in practice we only want the derivative estimator to match the ideal curve through a limited range, otherwise it will severely amplify noise and produce compromised results. I typically take 0.1 of the sampling frequency as the end of the accurate range. Shortly past that point, the response should begin to drop off quickly toward zero and stay there. The strategy of "track an upward sloping line" must give way to a strategy of "stay close to zero" through some kind of transition.
The scheme that Hamming suggested was as follows.
Over some range that you want to approximate, set the spectrum terms to the desired ideal line.
Everywhere else in the spectrum, specify that the values should be zero.
Take the inverse transform of this spectrum.
Apply a suitable window to truncate the filter to a desired number of terms.
There was a reason for using this model with an abrupt step edge: it made the theory easier. But it is well known in practice that such abrupt edges introduce "Gibb's phenomenon" oscillations in transforms, resulting in finite length filters that either wobble excessively (inaccurate, leak noise) or have a very large number of terms (too inefficient). The "suitable window" will attempt to suppress all of these side effects, but it seems like the task will be much easier if any such disturbances are already very small.
To reduce these difficulties, I chose to pre-shape the spectrum to have the right values at low frequencies, zero at high frequencies, with a smooth transition between. This should avoid the worst of the discontinuities that could cause problems. More specifically, the adjusted spectrum shape has the following properties:
This is not as hard as it sounds. I chose a window in the frequency domain that consists of a "plateau" (similar to a rectangular window) in the center, and "tails" similar to von Hann window at the ends. Apply this window by multiplying it term-by-term with the "theoretically ideal spectrum".
Use an inverse FFT to convert the shaped spectrum into a time sequence. By construction, the FFT result is an anti-symmetric real sequence. Discard any imaginary parts, because they consist of nothing more than numerical roundoff noise.
This has the desired effect. The high-frequency terms are very small. They are not zero, however, so truncating the real sequence from the full FFT length to the desired finite length can have unintended consequences — this could cause unacceptable "ripple" artifacts equivalent to the combined effects of all the terms that were cleared. A way to suppress the ripple effects is to apply a finite-length window, multiplying the window term-for-term with the raw transform result. The length of the window vector dictates how many terms remain in the final filter.
Hamming recommended using a Kaiser window for this final windowing operation. It is close to optimal for most purposes, and it has an adjustable window-width parameter, which allows a tradeoff between the narrowness of the transition shape and the high-frequency ripple rejection. I took his advice on this; it worked out very well.
To recap, these are the adjustable parameters used to obtain the filter design:
The number of terms in the "flat plateau region" of the
first spectrum-shaping window. In concept, this defines the band that
you want to attempt to match exactly. In practice, this approximately
positions the peak of your filter response. When windowing is applied
later, terms before and after this point will be affected, so allow
extra terms in the "accurate region" to cover this. For an example design,
170 was selected, meaning 170 locations out of
1000 for the FFT, or frequency 0.17. I only want accuracy through 0.10,
so the extra terms from 0.10 to 0.17 anticipate effects of the windowing.
The number of terms in the "cosine transition shape" of the
initial spectrum-shaping window. The wider this is, the better
the tracking accuracy in the band you want, but the worse
the noise rejection immediately above the desired band. For an
example design, the transition size
84 was used to set
the nominal target for reaching zero response at roughly location 170+84,
or frequency 0.254.
The number of terms you wish to preserve in the final filter, which is also the number of terms in the final Kaiser window for ripple reduction. For an example design, the final filter size was set to 25 terms. You can use more terms to incrementally improve on the noise rejection at the expense of more computation when the filter is used.
The windowing parameter of the Kaiser window, to determine the degree of ripple suppression, which comes at the expense of the transition narrowness and more noise transmission there. Typical parameter values are 3.5 to 7. For an example design, I used the parameter value 6.2, because the 25-term filter is becoming small and needs lots of ripple smoothing.
To build the filter according to these parameters, enter them into the filter design script. This script implements all of the steps previously described. Let it grind out the coefficients.
This script also uses
a separate window construction function
that you will need. Download these script files and change the file
design script is intended to be
compatible with GNU Octave / Mathworks Matlab. These scripts are submitted to
the public domain, hence free as in beer, and also free as in
freedom, but it does not mean free as in no worries. If you
choose to use this, you are strictly on your own. You have been warned!
Here is the results of a design using the parameter values given as an example above:
match = 170; transit = 84; nfilt = 25; alpha = 6.2;
% High-performance derivative estimator filter1 = [ ... -0.000128291316161, -0.000421427471795, 0.001039598292226, ... 0.003793406986458, -0.001298001343678, -0.015244969850269, ... -0.008439774762739, 0.036664149706911, 0.052652912201092, ... -0.049494997825050, -0.204043104762539, -0.207564884934383, ... 0.000000000000000, 0.207564884934383, 0.204043104762539, ... 0.049494997825050, -0.052652912201092, -0.036664149706911, ... 0.008439774762739, 0.015244969850269, 0.001298001343678, ... -0.003793406986458, -0.001039598292226, 0.000421427471795, ... 0.000128291316161 ] ;
This design matches the ideal curve from frequency 0.0 to frequency 0.10 within a 0.01% error. The "accurate band" is equivalent to the accuracy of the experimental "least squares" designs , but the noise rejection outperforms all of the other formulas I have found so far.
If you need accuracy and noise insensitivity, the FFT-based differentiator filter designs can give you whatever accuracy you need through the frequency range you need, with excellent noise rejection, but deriving a design can require a lot of effort. Because the filter has quite a lot of terms, evaluation requires extra computation, but that is part of the compromise: you "get what you pay for." The example design is a reasonable choice if you want to avoid the design work.
While the filter length might seem rather excessive compared to other designs, this is not as bad as it first seems. Taking the point of view that this design is a kind of bandpass filter, the high frequency noise rejection is typical of what you would expect for such a filter given the number of filter terms.
Though performing well, there is something troubling about this design strategy. It constructs an estimator formula with good properties but with no guarantees that the choice is optimal. The computations may not be as efficient as they could be, with too much numerical effort directed toward fitting the spectrum shape specified, not enough toward the real goals of acceptable low-frequency accuracy, not too much high frequency noise, and a minimal number of terms. I speculate that a truly optimal design would show some kind of bounded error property, with zero crossings at high frequencies in the manner of the "Alternation Theorem." 
 Digital Filters, R. W. Hamming.
 Experiments with the much simpler least-squares designs were described briefly elsewhere on this site.
 For background on Chevyshev equal-ripple approximations, try
Site: RidgeRat's Barely Operating Site http://home.earthlink.net/~ltrammell Author: Larry Trammell Created: Mar 24, 2011 Revised: Status: Stable Contact: NOSPAM ltramme1476 AT earthlink DOT net NOSPAM Related: Restrictions: This information is public