Path: newsspool2.news.pas.earthlink.net!stamper.news.pas.earthlink.net!stamper.news.atl.earthlink.net!elnk-atl-nf1!newsfeed.earthlink.net!in.100proofnews.com!in.100proofnews.com!newsfeed.cs.utexas.edu!news.cs.utexas.edu!not-for-mail Message-ID: From: djimenez@cs.utexas.edu (Daniel A. Jimenez) Newsgroups: sci.math Subject: Re: Second Call For Primecounter Contest Entries Date: 9 Sep 2003 14:48:13 -0500 References: Lines: 112 Organization: CS Dept, University of Texas at Austin Sender: djimenez@zorkmid.cs.utexas.edu NNTP-Posting-Host: zorkmid.cs.utexas.edu X-Trace: news.cs.utexas.edu 1063136894 20045 128.83.120.248 (9 Sep 2003 19:48:14 GMT) X-Complaints-To: usenet@cs.utexas.edu NNTP-Posting-Date: Tue, 9 Sep 2003 19:48:14 +0000 (UTC) X-Received-Date: Tue, 09 Sep 2003 12:48:17 PDT (newsspool2.news.pas.earthlink.net) Xref: lexi2.athghost7038suus.net sci.math:36798 In article , Stan Gula wrote: >"Daniel A. Jimenez" wrote in message >news:bjl2dp$l9d$1@zorkmid.cs.utexas.edu... > >The JSH program variants produce the same answers through 4*10^9. So your >answer for 10^10 is way off. Hmm. > >Thanks for the info. I think we've won the contest. This was a lot of >fun. Yes, I've been having fun, too. I've got one more modification (sorry, I won't promise to stop, but I'll try :-) Notice that the loop in S2 only needs to iterate over the primes. I peeled off the first iteration of the loop for i=2, then propagated a bunch of constants to simplfy things. I moved the code inside the loop to a function called "compute_sum," then had the inner loop iterate over values of i of the form 6*k+1 and 6*k+5 where k is an integer; all primes except for 2 and 3 take this form. The result is 30% faster for large values on a Pentium 4 and about 7% (there's a lot of noise in the measurement) on an Athlon. Here's the code: #include #include #include static unsigned S2(unsigned x, unsigned yin); #define cache_size 65536 int f_cache[cache_size]; static unsigned JSHpi1(unsigned xin) { // cache most common answers if (xin yin) return sum; int tmp2 = x / 2; if (2 < tmp2) a3 = tmp2 - S2 (tmp2, 2) - 1; else a3 = JSHpi1 (tmp2); sum += a3; if (9 <= yin) sum += compute_sum (3, x); if (25 <= yin) sum += compute_sum (5, x); for (i=6; ; i+=6) { j = i + 1; if (j*j > yin) return sum; sum += compute_sum (j, x); j = i + 5; if (j*j > yin) return sum; sum += compute_sum (j, x); } } int main(int argc,char *argv[]) { unsigned p; clock_t start,end; // cache elements are marked as uninitialized for (int i=0; i0 on the command line"<