7 April 1999

 

Dear PC WEEK Editors;

 

I opened my first issue of PC WEEK (April 5, 1999) with great anticipation of reading the App Server test ("Measuring Up App Servers"), however, my anticipation turned to acute disappointment at the poor metrics used for evaluation and marketing hype as prose. The poor metrics can be summed up by the phrase, "Numbers don't lie, but liars figure". Writer Timothy Dyck either wanted to show Sapphire/Web as the "winner", or has chosen very poor metrics and made significant errors in graphing the data. I have detailed my concerns with the article in the seven points below.

 

Poor Metrics:

 

1. Partial truth - "But the bottom line is performance--if the server can't keep up with demand, your e-business is in e-trouble."


Of course, an Internet server needs to handle customer's requests in a *reasonably* expeditious fashion, however server crashes and web pages overloaded with useless graphics do more to slow responses to customers' timely web page viewing ('browsing, searching, & shopping') than do server delays. Furthermore, most web sites are currently CGI (Common Gateway Interface) driven, which is notoriously slow. This begs the question as to why were no comparisons made with "legacy" CGI based servers? Would any of the tested App Servers have been significantly quicker than a CGI equivalent? The answers to these questions are what systems personnel need to know.

First Graph
(PC Week's Graph #1 - Useless Web Page Output Measurement)

 

2. Meaningless graphs - The first graph (above; "Sapphire/Web cranks out the pages") which shows "Pages per second" vs. # of "Users" is virtually meaningless. What is the size of the page being served? What bandwidth (server to Internet connection speed) is required to serve that number of pages? Unless the pages contain virtually no information (data), just a few hundred pages per second is enough to stuff a T-1 line and possibly even a 10-Mbps Ethernet LAN! Nor is there any mention of the significance of the download time for the dial-up user (the majority of Internet users today). A 3 second page server delay is just 10% of response time compared to a 30 second download for that page (note: 30 seconds is a suggested maxium time for a web page to download given most user patience levels, however many web pages violate this suggestion).

Another useless factoid; is Sapphire's 800 pages/sec for 100 users, or 8 pages/sec per user meaningful? What customer reads or uses 8 pages in a second? Even 'lowly performing Hahtsite' gives a user 3.7 pages/sec per user, which is still well beyond any real-world purpose! Also, why were the number of users cut off at 1000 when handling more users-connections is more meaningful for a large Internet site? Did Sapphire's performance degradation (the downward sloping portion of its curve) with increasing users make its comparison with the 'lower performing WebObjects and Hahtsite' less dramatic, thus the arbitrary 1000 user cutoff? (see below)

The number, or rate of web page output is more critical at higher users loads for obvious reasons; more pages are being demanded by more users. A decrease in web page output indicates a degradation of internal server processes (i.e., the server is having trouble keeping track of the users so the page output is cut back).

As the questions illustrate above, Dyck measures things that are not useful to determine if a web site can handle a real-world load.

 

3. Unknown formulas - If the graphs are analyzed using simple mathematics, it becomes apparent that there are significant errors in the graphs (see point #4), but compounding the problem is the unknown measurements and calculations conducted to obtain the graphs. Reasonable deduction can supply the missing formulas.

PC Week's Graph #1

PC Week's Graph #2

For example, Graph #2 ("Keeping response time to a minimum") was determined by an unknown algorithm, however Graph #2 seems to be related to the Graph #1 by the formula:

[(# Users)/(pages/sec)]

Even if this formula does not describe Graph #2, there is a direct relationship between the two functions as the 'number of cranked out pages' is surely an inverse direct relationship to 'response time', thus the change in slopes in one graph will cause a change in slope in the related graph, but there are some anomalies in the relationship between graphs 1 & 2. (see below)

 

4. Possible Errors in the graphs - Slope changes in Graph #1's curves should and usually do produce induced slope changes in Graph #2. However, there are some significant exceptions. A fixed upper limit to pages served, combined with no real decrease in the number of pages served should produce a steadily increasing response time; i.e., the number of users keeps increasing, but the time increases due to the greater number of users who must 'share' the pages. However, not all slope changes in Graph #1 are reflected in Graph #2 and even worse for WebObjects performance degradation in Sapphire/Web Graph #1 is reflected in WebObjects performance curve in Graph #2. For example note Graph #2 slopes changes at 100 (125 for WebObjects), 200 for Sapphire, and 500 for Sapphire in the Graph #1 do not correspond to Graph #2 at similar places. In fact, the change in Graph #1, Sapphire's slope from positive to negative (at 200 users) is reflected (accurately) in the sharp increase in its slope in Graph #2, however an increasingly negative slope for Sapphire (at 500 users) causes WebObjects' slope to increase instead of Sapphire's, which remains unperturbed by its decreasing web page output performance! Both Hahtsite and WebObjects web page output remains virtually constant under increasing load, but Graph #2 shows an increasing response time beyond that implied in Graph #1. The only likely explanation is an error, but one which benefits Sapphire/Web comparison against the other two.

Below is a graph I constructed using the admittedly suspect numbers from Graph #1, but in lieu of any better data, it at least is consistent with the changes in rates in Graph #1. I also extrapolated the data (which may not be correct, i.e. there may be non-linearities past 1000 users) beyond the 1000 user arbitrary cut off to illustrate how the decreasing performance trend of Sapphire/Web causes it to equal WebObjects performance at 1500 users; the described limit in the text of the article. Although there may be non-linearities, the text of the article indicates that the test was conducted up to 1500 users and therefore Dyck should have graphed it. If a server crash occured (a big non-linearity), then it should have been noted on the graph. This was not done and the reason seems to be the poor showing Sapphire/Web displays above 1500 users. (see "Users Response Time" graph below)

PC Week's Suspect (Erroneous) Graph of Response Time & A Corrected-Extrapolated Graph

PC Week's Graph #2

Corrected Graph #2

Note how there is really very little difference between WebObjects and Sapphire/Web's performance in the low ranges of users. Interestingly, WebObjects and Hahtsite seem to be more scaleable than Sapphire/Web under the highest load factors, as they have the flattest, most consistant slopes. WebObjects much higher page serving ability at higher loads makes it the highest performing App Server; it servers the most pages when you need them the most in the real world, i.e, when there are the most users! This conclusion (WebObjects is the better server) is in direct contradiction to PC Week's, but in line with WebObjects' much higher scalability factor (20% higher than Sapphire/Web's; 20.5 vs. 17) buried in the article's text. Even 'laggard' Hahtsite's performance is extremely stable (linear), indicating a superior server to Sapphire/Web's which begins an exponential decay of service under increasing loads. (see below)

 

5. Curious number of users cutoff - The text on page 38 states, "Sapphire/Web also posted the best response times in the field and didn't pass the 3-second page limit until we had 1,500 users pounding on its Nile.com--half again as many as either WebObjects or Hahtsite could handle." Did the other servers completely fail right after 1,000 users and only Sapphire/Web kept handling the "pounding"? Why didn't graphs 1 & 2 show the continued "pounding" handling ability of Sapphire/Web and the complete failure of the other two by extending the X-axis to 1500 or 2000 tested users? I suspect the trend lines of Graph #1 continued until Sapphire/Web performance became less than WebObjects. By extrapolating the slopes in Graph #2 the following graph is obtained:

A Corrected & Extrapolated
"Users Response Time"

 

Note that the curves of WebObjects and Sapphire/Web in my graph of "Users Response Time" are much closer and that WebObjects actually surpasses Sapphire/Web after 1,500 users. I constructed this graph from the data points in the first graph and by using the formula I postulated above
[(# Users)/(pages/sec)]. I also assumed the data points in Graph #1 are correct.

This corrected graph above makes more sense too, as the article points out that WebObjects has a higher "user-load scaling factor" than does Sapphire/Web (20.5 vs. 17) and a 1% less "overall scaling factor" of 0.92 versus 0.93. Even 'poor' Hahtsite has just a 4% lower "overall scaling factor" of 0.89; trivial differences! Thus, it is hard to believe that WebObjects would score so poorly against a product when it has a much higher 'scaling factor' (20%) and just a 1% lower 'overall scaling factor'.

 

6. More suspect graphs - Graph #3 ("Scaling to meet heavy demand") is the final attempted nail in the coffins of WebObjects and Hahtsite's products. Building upon already suspect numbers, Dyck chooses to use the arbitrary cut off of having a 3 second limit on the time it takes to load a web page. Why 3 seconds? We are never told, but are led to believe it is a standard limit, widely accepted. It has the benefit (for Sapphire/Web), especially when combined with the previously mentioned suspect numbers from Graph #2, to give the following major differences in the key "maximum user carrying capacity" in Graph #3:


PC Week's Graph #3

Graph showing the
"Maximum User Carrying Capacity"

(depending upon what time is selected as the maximum permissible time for a web page to down load)

(1, 2, & 3 seconds max. download permitted & using questionable numbers from graph #2)

The graph on the right was constructed by multiplying the number of users at the Response Time limits (1 second is 1000 milliseconds) in Graph #2 by the "user-load scaling factor" (pg. 38) for the Hahtsite ("9"), WebObjects ("20.5"), and Sapphire/Web ("17") servers. This number becomes the "maximum user carrying capacity". I do not agree with this calculation or conclusion, but am just trying to illustrate how the choice of time limit greatly affects the calculated factor.

PC Week gave only the values for the "Maximum User Carrying Capacity" at a 3 second time limit for a web page download. Why a 3 second limit is chosen is never explained, but it is clear from the graph on the right ("Graph Scaling Factor Follies") that Sapphire shines best at that time limit.

 

Marketing Hype as Reporter's Prose:

7. Cheer-Leading - PC Week is supposed to be an independent evaluator and reporter of PC computing technologies. This role is questionable when marketing terms are used to describe performance. For example, Graph #1 is labeled, "Sapphire/Web cranks out the pages". Could Sapphire's marketing have done any better? Obviously not as they use that graph on their main corporate web page (www.bluestone.com). Or how about, "Sapphire/Web also posted the best response times in the field and didn't pass the 3-second page limit until we had 1,500 users pounding on its Nile.com--half again as many as either WebObjects or Hahtsite could handle." Another pure marketing phrase, and as detailed in point 5, no explanation is give for non-favored WebObjects or Hahtsite's performance under similar "pounding".

Consumer Reports magazine does not allow any advertising so as to prevent motive, or even the suggestion of financial gain motive, to color their reporting. I am not suggesting PC Week give up advertisements, but in clearly biased cases such as this article demonstrates there at least needs to be caution in the writing (e.g., explain the limitations of the test) and accuracy in the data and its presentation. In both regards, PC Week is lacking and looks like marketing flak to this new reader of your magazine.

 

In Conclusion

My first issue of PC Week was not an auspicious one. In fact, I am greatly disappointed with it. Although I guess I receive it for free as a perk of my position, it is not worth my time to read if it routinely contains articles that are more marketing than reporting. Most of us are bombarded with marketing, but I suppose it is mainly the weak minded who succumb to marketing disguised as reporting. Please cancel my PC Week subscription if marketing is your primary product.

 

Sincerely,

 

Bruce Hoglund
(pseudo-signature)

 

Bruce Hoglund
Systems Analyst

 

NOTE: On all the calculations and derived graphs above, there may be minor errors due to inaccuracies in reading the data from the various graphs (which is how I got my data). These errors are most likely minor, but if there are significant errors, please contact me (Bruce Hoglund <bhoglund@earthlink.net>) with your comments and suggestions.


 

Proof of Slope of "Response Time Graph" when page/sec (page serving rate) is constant

 

Given:

#P2/sec = #P1/sec

A slope of a line (m) is:

m = (Y2 - Y1)/(X2 - X2)

In this case:

m = (Y2 - Y1)/(X2 - X2) = (Rt2 - Rt1) / (#U2 - #U1)

 

Since: Rtx = #Px/sec

m = (#P2/sec - #P1/sec ) / (#U2 - #U1)

 

Since: #P2/sec = #P1/sec, or just #P/sec (a constant)

m = (#P2/sec - #P1/sec )/(#U2 - #U1) = (#U2/sec - #U1/sec ) / #P/sec/(#U2-#U1)

m = (#U2/sec - #U1/sec )/ #P/sec / (#U2 - #U1) = 1 / #P/sec

 

Where:

Variable symbol

Meaning

m
slope
#Px/sec
number of pages per second served for a # of users
#Ux
number of users (x axis)

 Last Modified, 12 Apr 1999.

Copyright 1999, by Bruce N. Hoglund