The Implicate Order Within Randomness

S.Roof, May-June 2003

>?<

"The question is not how can we know the un-knowable. The question, rather, is how can we structure our expectations?" ...SAR

"The longer you observe the stock market, the more apt you are to make a big mistake!" ...SAR

[ ABSTRACT: When we observe randomness, what we see is conditioned by the finite constraints of our point of view or measuring apparatus. This paper will show that the constraint of finite sequences creates an implicit order in a random sequence that can be measured. The implicit organizing influence of constraints is symmetric, scale invariant, unbiased, cohesive, parsimonius and relative. This order amidst randomness is implicit in our founding axiom of uncertainty. It cannot be extracted from randomness, but it can serve as a prime metric against which the degree of randomness can be measured. This gives us an objective definition of randomness. Seen in this light, randomness can serve as the fundamental lynchpin of Probability Theory. CAVEAT: These ideas deal with arbitrary strings of bits. Any extensions beyond the mathematical realm of binary strings is to be considered speculation but still worth looking at. While this paper claims no discoveries that cannot be found elsewhere, it nonetheless provides an alternative and perhaps illuminating approach to conventional theories. ]

Introduction to the Axiom of Uncertainty

Each of us sits at the nexus of past and furture uncertainties. The past is a branching tree of possible causes leading up to the present. Ahead of us lies another branching tree of possibilities, the future. The further we look in either direction, the more uncertain the view becomes. The nexus of 'now' is born of chance and choice. Every chance event or act of choosing can potentially spawn a multitude of other events, but it in itself is only one of a possible number of events. Events therefore expand exponentially.

We deal with this enormous complexity of large uncertainties by using symbols. Information Theory describes the use of symbols at the most elementary level of expression. Binary information or the 'bit' is the epistemological foundation of Information Theory. It symbolizes the prime duality between that which is and that which is-not (1/0). One bit of information describes the smallest uncertainty one can have about an outcome without knowing the result. One bit represents 2 possiblities i.e. 1/0.

Information, by definition, compresses uncertainties. Shannon defined the reduction in uncertainty from before an event to after as the negative log base2 of the probability of that event happening. A more simple formulation is that information measure is just the log2 of the number of possibilities. Possiblities are exponential and daunting in magnitude. Logarithms reduce exponentials to linear progressions.

Information thus is the compressive necking down of the nexus of now. This compression of non-linear explosive branching uncertainty down into compressed linear sequence is perhaps the foundational template of semiotics. It may be the very thing that enables us to experience time as a linear sequence of events. Our very thoughts, at their most succint expression, reduce to linear strings of language symbols. If >?< represents the nexus of uncertainty, we can now replace it with >(1/0)<. Information is the mechanism for transforming chance into choice.

But what is randomness? Conventionally we think of randomness as unpredictable, chaotic, having no pattern or structure. This paper, however, will demonstrate that randomness is nothing more, nothing less, than the assumption of maximum uncertainty. Starting with this primitive definition, one can show that randomness does indeed have deep structure, an underlying order with profound consequences to our ability to cognize reality. The Concept of Randomness turns out to be the very scaffolding around which we organize and structure information.

Fundamental Axiom of Uncertainty: The minimal amount of randomness is defined as one bit of maximal uncertainty.

This paper will show that randomness is symmetric, scale invariant, indifferent to bias, cohesive, parsimonius and relative. This order amidst randomness is implicit in our founding axiom of uncertainty. It cannot be extracted from randomness, but it can serve as a prime metric against which order can be measured. We shall see that it is possible to have an objective measure of what we mean by order.

Symmetry

Our definition of randomness is the assumption of maximum uncertainty. The duality of 1/0 is the cornerstone of symmetry and the fundamental limit of uncertainty. One bit equals one bit of uncertainty about two possiblities 1 or 0. When we access a bit from a random string we face maximum uncertainty about its state. The assumption of randomness means we have no reason to believe either possibility is favored. The more possibilities there are, the greater the uncertainty about any observation or outcome. When considering strings of bits, we shall see that maximum uncertainty peaks at a central point of balance.

Statisticians and Probability Theorists always introduce probability at this point with the idea of Equal Likelihood or The Principle of Indifference. I believe, however, that one can avoid some of the epistemic difficulties by not introducing probability. Let's look at an example of what I mean. Consider one bit b from a random stream.

The uncertainty before we look at b is unity. H(b) = 1.

The uncertainty after we know its value = H(b) = 0.

The information gained by the observation is the reduction in uncertainty.

Hbefore - Hafter = 1 - 0 = 1.

The above seems trivial but it is fundamental and should always be kept in mind. Hafter is not always zero, so it should not be forgotten. In this paper, though, we will be dealing only with bit strings and the assumption that accessing a bit is always 100% certain and successful.

Now Shannon defined H = -log2( Prob(b) ). He used the probability of b as the basis for calculating H. Theorists use all sorts of ad hoc justifications for saying that Prob(b) for one bit being 'on' or 'off' is 1/2. It is the probability used in the proverbial 'coin toss' experiment. But we can dispense with probability and use a simpler expression: H = log2( Poss(b) ) where Poss(b) is the number of possible states or arrangements of b.

There are two possibilities (1/0), so H = log2(2) = 1. The probability can be deduced from this by inventing another term, the idea of relative frequency. This is the cornerstone of Statistical Probability. The statistical probability of b is the limit of Freq(b)/Poss(b) as the number of cases becomes very large. We can thus deduce a probability of b as 1/2 because there is only one case here (frequency of one) and two possibilities.

It all works out because H = -log2(1/2) = -log2(2^-1) = (-1)(-log2(2)) = 1

That's why Shannon used the negative log in his formulation. He included probability from the get go but theorist today still struggle with the issue that statistical relative frequency involves an assumption about limits and trends which seems to imply a natural force directing the probabilities of events happening. We shall see that this is really only an epistemic 'red-herring' that goes away if we delay the introduction of that terminology until we have established some conclusion to justify it first. In the process of doing this we will discover symmetry.

When exploring all the possible binary strings that are a given number of bits long, it is instructive to think of them as a decision tree. Each bit is a choice of which branch (left or right) to take through the tree. All the possible combinations of bits together make up the total tree with a depth of branching equal to b. A binary tree, after only 8 levels of branching, has 2^8 = 256 branches. We can list the successive levels as having 2, 4, 8, 16, 32, 64, 128, and 256 branches. The number of possible bitstrings b-bits long is given by 2^b.

Exponential branching is also combinatorial. Binary numbers can be subdivided into combinatorial classes i.e 2^8 = {Sum over i of 8Ci } = { 8C0 + 8C1 + 8C2 + 8C3 + 8C4 + 8C5 + 8C6 +8C7 + 8C8 } = { 1+ 8 + 28 + 56 + 70 + 56 + 28 + 8 + 1 } = 256.

The combinatorial formula nCr = n!/r!(n-r)! was used to to calculate the number of binary numbers in each subset of the 256 possible binary numbers 8bits long above. For example 8C1 = 8 can be shown to represent numbers with one bit set to 'on'. There are 8 possible such numbers 8 bits long: 0000 0001, 00000010, 00001000, 00010000, 00100000, 01000000, and 10000000. 8C2 represents 28 numbers with 2bits set 'on'. etc.

The combinatorial formula nCr = n!/r!(n-r)! is the template for elucidating the foundations of symmetry. When we start with the simplest case and then map out all the others, we can see that symmetry is embedded as the prime point of balance. Below is a graph of all the possibly sub-classes of possibilities for any bit string that is b-bits long. It has been normalized to what I call BMD or bitmass density. BMD is just the number of 'on' bits divided by the total number of bits b. The y-axis is normalized relative to b also. We will take the information measure of nCr by using the log to base2. This gives an informational measure of the uncertainty for any particular combination of possibilities given by nCr. The graphs below represent a series of plots for different lengths of possible binary strings. We could easily list out each and every possible combination or string of bits but the exponential explosion makes this a daunting task. That is why we use the compressive ability of logarithms to tame the exponential beast.

 

The first thing to notice is that the curves have a peak at BMD = 1/2. This is the central balance point between 1/0 or heads vs tails in a coin toss experiment after 'b' tosses. This symmetry is present in ALL combinatorial sub-classes for any possible binary string. Notice also the the curves tend very quickly to a limit curve. I call this the KBIT limit curve where k is just an arbitrary variable that stands for the log of possible combinations. The KBIT limit curve establishes symmetry as the cornerstone of randomness. Astute readers will recognize it as the logarithmic form of the Binomial Distribution.

Recall that randomness is the assumption of maximum uncertainty. The curve above shows that uncertainty is maximum at the peak where BMD = 0.5. Recall that uncertainty H = log2(Poss(b)) and Poss(b) = bCm for any given number of 'on' bits i.e. m. What this is telling us is that, regardless of the length of any binary string, the number of variations or possible combinations of bits is at a maximum when the number of ones is equal to the number of zeros. The uncertainty is thus greatest when the two are in balance. Since we have already defined randomness as the assumption of maximum uncertainty, the only conclusion is that randomness is symmetric and balanced at 1/2.

Either side of the peak of the curve above represents a deviation from symmetry. The possibility classes to the left are for binary combinations with more zeros than ones. The possibility classes to the right are for binary combinations with more ones than zeros. For each and every binary string on the right there is a complimentary variation on the left which exactly mirrors it with a zero everywhere there was a one. The maximum possibilities lie in the center. Therefore the maximum uncertainty lies in the center. Asymmetry implies redundancy or excess. The extreme on either side is all ones or all zeros. That is a case of complete redundancy and allows of only ONE possibility. The calculated H for a possibility of one gives H(1) = 0. Deviation from randomness thus is a deviation from the point of balance. Randomness is at the heart of yin/yang.

We can empirically take any binary string of bits and count the 'ones' to get a value for the KBIT uncertainty of that string. K = log2(bCm) where b is the length or number of bits and m is the number of 'ones' in the string. When we do this for arbitrarily chosen files we discover something remarkable. Almost all files cluster near the peak of the kbit limit curve. Now most of the files on any computer are most decidably not random so what gives? A further plot will help.

 

We have plotted log2(bCb/2)/b vs b here because b/2 for any given b is the point of maximum uncertainty or greatest number of possibilities or combinations. We easily see that as the number of bits b increases, the relative uncertainty (relative KBITS) very quickly approaches 1.0. It is asymptotic and gets very close VERY quickly. We only plotted up to 50 bits.

Note that the KBITS plotted is relative to b, so what we are depicting is the ratio of binary information contained in the peak combination class bCb/2 versus the total amount of information in the string itself. This shows that most of the possibilities for any string of length b fall into this class, and that the ratio gets much closer to unity as size increases. If one divides the possible arrangements into separate sets or combination classes, he can see that the central peak class represents only 1/(b+1) of the total number of bCm classes, yet that one class at the center contains most of the possible arrangements.

Recall that bCb/2 is the combination class in which the number of 'ones' equals the number of 'zeros'. The above facts thus demonstrate that the overwhelming tendency for long bit strings is to have an equal share of ones and zeros. Symmetry is inevitable in large numbers. Asymmetries form just a small percentage of variations from the norm. One could construe this to mean that randomness is inevitable, but dont forget that the most central balanced variation of all is simply 010101010101010........ which is most decidably not random. It is perfectly symmetrical periodicity. Such a string could easily be compressed down to the pattern '01' and a number giving the repitions of '01'.

The final conclusion we must face is that the 50/50 balance between heads and tails is an inevitable fact of mathematics. We have not used probability or arguments about the balance or fairness of a coin toss. We have simply shown that as a string of bits gets progressively longer, the number of possible combinations that are symmetrically balanced between heads and tails immensely out number the variations that aren't balanced.

Since we have defined randomness as an assumption of maximum uncertainty, we now have a good reason to say that randomness is most likely to produce binary strings at the center of the kbit limit curve. We can also say that the relative frequency of heads/tails surely approaches a probability equal to 1/2. We now have a theoretical justification for forming a theory of probability. In fact, we have just demonstrated Bernoulli's Golden Theorem of Probability also called the Law of Large Numbers. Just remember that all of this comes from the implicit mathematical order buried in the Fundamental Axiom of Uncertainty. No assumptions about nature or equal likelihood have been made. Rather, those ideas emerge from the implicit structure to begin with.

When faced with any unknown situation (binary stream) we can now make at least some predicitons about how things turn out even if we know nothing! If we prove to be wrong, then we can surely say that some bias or odering influence producing an asymmetry was in effect. The point of all this is that randomness is our best guess assumption in the face of the unknown.

Randomness provides the unbiased benchmark against which we can compare results as new information is gained about a source. The assumption of randomness is the hallmark of symmetry. Epistemologically, randomness is the foundation on which all information is based.

Scale Invariance

To discuss scale invariance we need to derive some relationships about how patterns can occur amidst randomness. This is where we get to the real 'meat' of this paper.

First though we need to point out that all the discussion above was framed in the context of a certain amount of knowledge. We assume we are talking about binary strings and that a bit can only be a '1' or '0'. We have also been using 'b' as the length or number of bits. This of course may not be known ahead of time. The point is that any thing we do always has a context. This context establishes the constraints of the discussion.

We can always count the number of bits as they come in and establish the total number. This count produces a constraint on the possible combinations we can encounter along the way! We are saying here that a finite binary string is an entirely different beast than an infinite string! Theorists who willfully bandy infinities about are treading dangerous waters, but amazingly most writers and thinkers use infinity implictly all the time. Any time you see a matematician use a continuous function or variable across the real number domain, they are using an assumption of infinity! It is my contention that this is where huge errors in thinking are easily made. It is also the entry point for all sorts of horrid parodox.

[ SIDEBAR: I am grappling with this myself because Information Theory uses the log2() function willy nilly. When you take log2(0) the thing blows up in your face! It is undefined for zero. This means one has to be very careful in his formulations where the log2(0) might possibly occur. It represents the case of uncertainty for no bits at all! ]

Having said all this, I re-emphasize that constraints form the finite box in which any discussion occurs. These constraints have a profound influence on the results we get. To get a flavor of where we are headed, just remember - randomness is an assumption of maximum uncertainty within the constraints of what is known. Knowing the number of bits 'b' gives just such a constraint.

Now, back to a bit stream that we want to sample. We are going to assume maximum uncertainty about its contents i.e. that it is random. We will assume also that someone has given us the total number of bits, or we can just arbitraily stop our experiment after a pre-decided number of accesses. Given these constraints, can we apriori make some deductions about the patterns we might encounter? It turns out that yes we can! Amazingly there is more implicit order hidden within randomness. This order is there for free. It's built into the Axiom of Uncertainty.

First we need to discuss what I mean by pattern. Simply put, call a pattern 'p' any contiguous p-bits long subset of our binary string b. I will be calling patterns by the name p but it is to be understood that p is merely the number of bits in that particular pattern.

What we will be looking for is patterns within the random string we are accessing. I say 'accessing' because of the obvious computer metaphor, but one can use the words observation, trial, experiment etc. depending on his particular education or orientation. We want to discover if there are any regularities about how patterns appear in a random string. Now the intuitive feeling about randomness is that any and all patterns can appear and that there is thus no underlying structure to randomness. Kolmogorov-Chaitin Theory takes a strong position and says there is absolutely no mathematical structure to randomness. I take an alternative view, and demonstrate that there is implicit structure even if it is only statistical or probalistic.

This is not to say that Chaitin is wrong. A random string is still uncompressible, because it already represents maximal uncertainty and conveys maximal information as the totality is observed. The total uncertainty H reduces to zero as the full contents of the string are known, and H is simply equal to b for a b-bits long string.

So let's start with maximum uncertainty.

A pattern p that is one bit long has H = 1 i.e. 1 bit of uncertainty. The uncertainty for a pattern p-bits long is H = p

Now H as defined by Shannon is H = -log2( Prob(p) ) where Prob(p) is the probability of that p-bit pattern occuring within our bit string b. As before, though, I wish to avoid the use of probability terms so I am going with the shorter version:

H = log2( Poss(p) ) where Poss(p) is the number of possible ways a p-bit pattern can occur in the b-bit string we talking about. We will now assume we choose any particular pattern p and scan the entire string for occurrences of that particular pattern. Poss(p) will refer not to all possible patterns p-bits long, but to all possible occurrences of that one particular pattern p-bits long. We really dont care at this point what that pattern is. We only need to know its length. Just keep in mind though that we are talking about one specific pattern as specified i.e. a given. This distinction is important to understanding the argument that follows.

Now any given p-bit pattern that occurs within our string uses up p bits of space so there is less room for that pattern to occur again. Given the total space of b bits we can readily see that the number of ways or places any given p can occur is given by the expression Poss(p) = b - p + 1. This is because for every bit we access in the string, there is the potential or possibility that the pattern p will be starting. We may get to the last bit of our pattern p and discover that there was a mismatch so we start looking for p all over again but we must go back to one bit past where we started. There is potential for patterns to overlap.

Thus we see that every bit all the way up to the last (p-1) bits is a possible location for pattern p to occur. If p is only one bit long ( say it is a '1' for instance), there are (b - 1 + 1) = b possible places p could occur. If p = b then Poss(p) = b - b + 1 = only one way p can occur.

What we have done is establish the possiblities of a specific pattern p occuring at least once. What if p occurs more than once though? How does that affect the possiblities? One can see that the more occurences there are, the less the uncertainty. If the string is completely full of p we can be sure all possible available slots for p have been taken up and the uncertainty must reduce to zero. What expression tells us about the possibilites for intermediate cases?

We now introduce a quantity Freq(p) as the number of occurences of p. This includes any overlapping cases as well. The start and end of '11011' for example overlaps, so the last two bits of that 4-bit pattern could be starting another occurence of that pattern.

Since we have Poss(p) = b - p + 1 for one occurence of p, there are b-p+1 bits of possibility space or room left to put other occurences of p in. We thus see that if we divide (b-p+1) by Freq(p) we get the number of possiblity slots available to each individual occurence of p. Freq(p) is just the total number of occurences.

Therefore Poss(p) = (b-p+1)/Freq(p)

We can insert this into the H = log2( Poss(p) ) equation and formulate H for Freq(p) number of occurences of any specific pattern p-bits long.

H = log2( (b-p+1)/Freq(p) )

but H for any p has already seen to be H = p for any p-bit pattern!

Therefore log2( (b-p+1)/Freq(p) ) = p

The reader may balk at suddenly equating the two different formulations of H, but it is plausible. Our expectations tell us that the uncertainty of pattern p should not change just because we are looking at it in a longer stream. All we are doing is getting an idea of how many times we expect it to occur given that very uncertainty 'p' operating over and over again as we scan the bits in.

Now we can solve for the very thing we wanted to know - How many times can we expect any specific pattern p to occur in a random string b bits long?

rearranging we get p = log2( (b-p+1)/Freq(p) ) = log2(b-p+1) - log2(Freq(p))

:. log2(Freq(p)) = log2(b-p+1) - p

simplifying we have :

log2(Freq(p)) = log2( (b-p+1)/2^p ) =

Freq(p) = (b-p+1)/2^p

This establishes an amazing fact about randomness. The implicit frequency of occurence on any particular pattern is dependent ONLY on the total number of bits and the length of the pattern. All patterns of a given length are created equal in the assumption of randomness. It is important to state that Freq(p) will only make sense in the light of having chosen a particular specific pattern to look for though. Once, chosen, our best expectation given the Axiom of Uncertainty is that it will likely occur Freq(p) = (b-p+1)/2^p times. If it doesn't then the string deviates from pure randomness.

Notice that the formulation is exponential and decreases as the length of our pattern p increases. This concords with our observation that randomness is symetric and the ratio of ones to zeros approaches 1/2 as b increases. Our expectation is that any long runs of ones or zeros should occur with an exponentially decreasing frequency. This indeed turns out to be the case. I will demonstrate this with some empirical data from real-world files taken arbitrarily on my computer. It will be convenient to use the log2(Freq) form of the equation as it begets linearity and we can most easily digest linear information.

Before we look at the results, let's take notice of the finer points of the equation. For very large b compared to p we see that the expression log2(b-p+1) -> log2(b). Notice that the slope of the line is -1 in this case. Actually the expression is not perfectly linear but for most cases, b really is very much larger than p and 1 so the linearity is very good. Now we can solve the linear equation for the x and y intercepts where p =0 and where log2(Freq) = 0. The intercept in both cases since the slope is -1 turns out to be very close to the number log2(b). Thus we see that the total length of the string b is the defining constraint or bounding box in which any random string must portion its frequency distribution.

Now we are ready to look at a real string of bits taken from a file. Below is a log frequency histogram of an arbitrarily selected text file (73KB) and its compressed zip file. I wrote an algorithm to scan the binary string for each file looking for all occurences of a particular pattern including cases that overlap and plotted the log frequency for each pattern having a different length p. For my patterns I used different methods. I tried just plain strings of zeros or strings of ones. I even tried randomly generated test patterns. The results were always similar with some caveats which I will discuss toward the end of this paper.

The white line with negative slope forming the bounding triangle in the above graphs is the plot of our theoretical equation for expected frequencies given by log2(Freq(p)) = log2(b-p+1) - p. We see that for the compressed file, there is excellent agreement of the empirical histogram with the theoretical line. The file before compression has obvious discrepancies. The compressed file appears to be very close to random when judged by the criteria we have theorized.

Next is an example of an image bitmap file (551KB):

Below are two more examples. The red is an executable program binary file. The white graph is for a file with randomly generated bits.

Note that the exe type binary file (red) has rather large deviations at the longer pattern lengths. Its compressed version, however, maps straight to theory. The random generated file (white) shows that you basically cant compress random files because they are already random!

Every compressed or random generated file I have tested always fits very neatly into the bounding triangle of log2(Freq(p)) = log2(b-p+1) - p.

Notice that the only parameter that determines the scale of this distribution is log2(b) i.e. the length of the bit string. The distribution is scale invariant. Cut a random file in half or double it and you get essentially the same distribution. This means that randomness is a kind of fractal.

Notice also that deviations of non random files from the random distribution invariably are greater at the right bottom corner with longer pattern lengths. Even random and compressed files deviate here a bit. Only one or two or a few long strings of ones or zeros is the essential culprit here. The math is telling us that at the short pattern side, ones and zeros inevitably balance even for non-random files. It's the big clumps of pattern that make or break symmetry and the odds are small for their occurence in some finite string length b.

We now have a way to graphically look at a file and very easily spot whether it is random. Of course, it is usually (not always though) easy for us to do this with just about any file anyway if it is displayed appropriately. Our eyes are built to detect patterns. We even see patterns in randomness but they seem to shift and disappear before our gaze. What we frequently notice are large clumps of black or white (0/1) that are an inevitable part of randomness. If they are too big they become a major deviation from randomness. This is where scale comes in. It occured to me while watching water gush out a downspout during a rain that many natural phenomena appear random at one scale and orderly at another.

Large clumps of variance from symmetry seem to imply order or some organizing force. If these clumps are viewed at their own scale, the context of log2(b) suggests that there is no randomness. At a larger scale, though, these large clumps occur with at least some frequency under the bounding triangle of randomness. They then appear just part of the random variations. Given a large enough b-bits long string, any finite clump is part of randomness, but at its own scale 'p' it is decidably not random. We thus must acknowledge that the appearance of randomness depends on our scale of view, but randomness itself is self-affine. Randomness is a function of the contextual parameters of observation. It is not an essential property of outside events themselves. With a large enough scale, just about anything can appear random.

 

Unbiased

This part is easy. We saw that log2(Freq(p)) = log2(b-p+1) - p means that the expected frequency of a pattern depends only on b and its length p. What do we get if we keep p fixed in size? The equation tells us that the distribution for all patterns of the same size should be equal in a random string. We should get a flat distributuion. So far we have been discussing only patterns with no regard to binary number values i.e their magnitudes. The magnitude of a binary number can have 2^p different values i.e. an 8-bit pattern of ones and zeros can have 256 different values or magnitudes from 0 through 255. Taking 8 bits as our base pattern I plotted the distribution for a few files:

The first file is text, the second is an image.

As you can see the files are not random and there is obvious deviation from a flat distribution for the image file. Next is 8-bit value distributions for the same files compressed:

The distributions have flattened out somewhat. There is still a lot of variance though, and this suggests that the value oriented method is not a good test for randomness. My formula for expected frequencies does give support to the idea that distribution of values approaches flatness for random strings. My explanation for the large variance of this method is to observe that 8-bits is already a large pattern size so variance is to be expected because the expected frequency of any one value for a p=8 pattern is small. When observing random variables with even greater precision (more significant digits), this fact would be good to keep in mind.

The net result of this section is that randomness is unbiased with regard to value or magnitude. Only the length of the pattern impresses it. Large magnitudes represent large bit lengths so their frequency will be more likely to deviate from theory. Randomness is definitely biased when it comes to pattern length, though, so that is the appropriate quantitiy to use when computing an objective measure.

To repeat, randomness is unbiased with regard to value or magnitude. When pattern size is the same, randomness does not prefer one pattern over another.

 

Cohesion About Means

I had at this point wanted to introduce a derivation for the Central Limit Theorem which produces the so-called Normal Distribution. I have as yet not been able to give a complete information theoretic account of this idea from the perspective of this paper. I have a partial solution, though, and results to demonstrate the concept. The normal distribution involves real world measurements and floating point variables. The essential idea is that some random variable distributions are the result of a large number of independant contributions from different sources. The net result tends to cluster around a mean value and has a distribution curve shaped like a bell.

First we need to dismiss real-world style floating point numbers as being exceptional since we are dealing with binary strings in this paper. Any physical measurement always has a finite precision. This fact is of supreme importance and is often overlooked. We are accustomed to using symbols for variables with an implicit infinite precision. This never happens in the real world. Observations in the physical world operate within finite constraints. Given this fact, we can assume a random variable 'x' will have a certain number of bits or digits of precision. The decimal point is easily removed by simply multiplying all observations or trials by a fixed number equal to ten to the power of the number of significant fractional digits. Then we can convert to pure binary digits. Computers are doing this all the time with real world data, so this is nothing new.

Ok, having established x as a binary number with a fixed number of digits, we assume that it is a random variable that is the sum of some number of independant contributions. This is the crux of how a normal distribution comes about. We will first do an elementary approximation to the concept by performing the classic 'coin toss' experiment. Then we will move on to the more generalized case.

Take 'c' coins and toss them. Then count the number of heads. Let our random variable be x = number of heads. In other words, x = the sum of the contribution from each coin which can be either 0 or 1. We assume the coins are fair and independant of each other's actions. Now we can construct a binary number C which has c bits, each bit of which is the result for one of the coins in the toss.

Now do the toss several times, each time counting the heads. Let 'T' be the number of trials. We can now write the entire experiment as a binary string which has b = T*c bits. The string simply consists of T repititions of the binary number C. At any point we can take the bits of any C and add them up to get our random variable x. What we have done here is introduce a coding scheme. The value of x is not the binary value of C. It is merely the sum of the bits. The maximum value of x is thus c, whereas normally it's value could range to 2^c .

Given the above scenario, we need to figure out the expected frequency of occurence of any given x, if conditions of randomness exist. We know from our discussions previously that the frequency distribution of any fixed pattern size is flat for a random string. Therefore we expect any particular toss 'C' to occur with equal frequency. We note, however, that x being the sum of C's digits may fall into a particular class mentioned under symmetry. The number x may be classed as one of arrangements of the combinatorial class cCx i.e. the combination of c things taken x at a time. This fact puts constraints on the possible arrangements of C's bits that have the value x. There are cCx ways that the bits of C can add up to x.

Using this fact we can construct a formula for frequency of x using the same technique that we previously used for random strings.

Our original formulation for a random string was Poss(p) = (b-p+1)/Freq(p). Here, though, the possible ways x can happen is different. We are not allowing overlapping patterns. Each C is a distinct separate trial. cCx gives the number of ways c can have x as its sum. Since there are T instances of C, there are T*(cCx) ways x can happen in the whole string experiment.

Poss(x) = (T*(cCx))/Freq(x)

and H = log2(Poss(x)) = log2((T*(cCx))/Freq(x)) = c

solving for Freq(x) as before we get:

log2(Freq(x)) = log2(T) + log2(cCx) - c

and

Freq(x) = (T*cCx)/2^ c

There we have it! As before, I had to write an algorithm to load a file and test it's bits using the coding scheme devised in this experiment. The file bits thus form one long coin toss experiment. Below is one result on a file before and after compression. The file was a 443KB exe file. The number of coins was arbitrarily set at 20.

The two distributions are for the file before compression and after. The red histograms are log frequency and the white horizontal dashes are the theoretical values using the formula above. As you can see, the agreement between empirical experiment and theoretical is almost exact for the compressed file. Since we are using log2 of frequency, the shape is slightly different from the normal bell curve, but the bell is umistakeable. Below is the more common exponential form for the same file. The coin toss experiment follows the so-called Normal Distribution.

 

Now let's pursue a more general derivation. It would be rare indeed for random variables in the real world to result from independent one-bit contributions. The more general case is that a number of small contributions involving a larger precision than one bit will be in effect. Perhaps we are studying growth curves of the general population. We know that there are several additive factors: genetics, growth hormones, diet, stress, toxic exposure to lead etc etc. Each contribution from these influences add a incremental deviation from the mean height expected for an individual. The value of improved diet might only amount to say a 0.37231 inch difference or some other number. The individual height though is supposedly a result of all these additive factors.

In general we might represent our random variable x by some sum of binary numbers such as:

x = ### + ## + #### + # + ## + ##### where each # is a bit of an additive influence. We are showing here that the each additive influence might have a different number of bits of precision needed to convey its magnitude. Instead of each influence contributing 1 or 0, each contribution now has a possible range of values from 0 up to 2^number of bits in that contribution.

Our original expression cCx number of ways x can happen is now much more complicated. The combinatorial expression now becomes a deep problem in the Partition Theory branch of Combinatorics math. The fundamental question of Partition Theory is how many ways can a number N be partitioned into numbers of which it is the sum. The number 5, for example, can be partitioned into 1+1+1+1+1, 1+1+1+2, 1+2+2 , 2+3, 1+1+3, and 1+4. If our variable x = 5, we see that it can be the sum of many possible contributions, some of which could be zero.

The order of the terms is important also so this adds another increase in complexity to the partition formula. In addition we might not be considering all possible partitions if the number of contributions to x is less than the possible number of partition terms. Contributions of zero have to be included also. The net result of this is that I probably would not be able to do an exact derivation from scratch.

Research on the web revealed:

the mathematicians Hardy and Ramanujan did a lot of work with partitions around 1918, and found that the log of partitions(n) is asymptotic to PI sqrt(2n/3), which led to the approximate formula

partitions(n) ~= e^(PI sqrt(2n/3))/4n sqrt(3)

This formula only gives the full partitions and refers to unordered terms. The partitions I seek are taking order into account. I want a formula for the ordered partitions of n which are less than k terms long! Guess I'll have to wait for another mathematical genus to solve that one for me. One can see from Ramanujan's expression, however, that the appearance of e and pi are already starting to look like the formulation involved in the Normal distribution.

Not willing to wait for an exact mathematical expression, I was nonetheless able to cook up a random variable experiment with binary strings to get empirical results. I could have used a random generator to cook up the exact form of bits for each summary contribution for x but instead I used a simpler scheme. I decided to simply let each successive contribution have one more bit of precision than the one before it. The template for x now became:

x = # + ## + ### + #### + ##### + etc.

My algorithm now scans through a file bit stream and extracts the first contribution and adds its value, then gets the next and so forth. When it reaches a preset number of bits to consider as being a trial for x it hands that sum over to the frequency array and puts it in its correct slot. The process is completed until b/c trials have been completed i.e. the end of file is reached.

This method approximates accessing any arbitrary random variable in the real world with this caveat: We assume that we are in complete knowledge of how the additive influences of the variable are composited by nature and that we have strung them into a binary sequence representing the final experiment. This binary sequence is our file. In other words, we are taking an arbitrary file off the computer and pretending that it was the result of this strange machination. Having made this rather wild assumption, what do we expect to get?

Well here are some of the results doing a dual test as always - before compression and after:

There are several observations we can make. First, the different colors are for different types of file. Red = exe, green = txt, blue = .bmp, and white = random binary. The white vertical line is just the predicted mean value. It is simply the max value of x divided by two. Notice that the 'after compression' figures follow the normal distribution rather well. The exe and txt files before compression, in contrast, do not make good examples of what a random variable normally distributed in a real world experiment would be like. Interestingly the random bin file looks sort of normal before and after as expected, but notice that it really isn't a nice smooth curve. This suggests to me that since a normally distributed random variable is the result of ordering influences, a pure random file would naturally not be a good example necessarily. In contrast, the compression distributions suggest that compression normalizes the random variable.

The white horizontal bars are predicted frequencies. But wait, I said I had no precise formulation yet so how did I get those predicted values? I simply discovered a kludge that works rather well. In our attempted derivation avove we let c = # of contributions to x. Dividing x by c gave me a smaller variable which enabled me to use the simple expression (T*cCx/c) as in the coin toss experiment. This is not the true partition possibilites as discussed but the resulting shape is the same and it scales very nicely onto the frequency distributions. It may even turn out that this expression is the limit case for the true values. I dont know. The so-called Normal Curve, itself, in statistics is a limit expression for the Binomial Distribution using continuous real number variables. The information theoretic approach is thus fundamentally closer to the truth (assuming I ever get the full derivation).

Returning to our main focus, we now need to understand randomness in terms of this latest set of results. Normally distributed random variables cluster about a mean value. That only occurs when there is some set of organizing influences. The conventional idea of statistics and probability theory is that there is order because of the clustering, but that the individual trials themselves are subject to random perturbations. Therefore you get the main peak with some fewer trials falling to either side. The so-called random variable is not itself supposed to be random. It is just randomly selected or the result of real world measurement. In my derivations and empirical testing, though, I never injected any random perturbations and I didn't even inject the idea of having a mean value! Therefore, whence cometh the cohesiveness about a mean and whence cometh the tapering off to either side?

It is of profound importance to the ideas in this paper to realize that we are basing everything on the assumption of maximum uncertainty. Any results or patterns we obtain are produced by the introduction of constraints. The first constraint we previously introduced was the number of bits under consideration. In this section, we introduced the constraint of the number of additive contributions and hence the number of bits in each contribution. Since, this is the only thing new we added, that has to be the source of the cohesion about a mean. A look at this will reveal why.

Recall that we defined the value or magnitude of x as:

x = ### + ## + #### + # + ## + ##### ....

This was a departure from our original thesis of looking only at the patterns not the magnitudes of numbers. This parcelling out into sums of magnitudes has a consequence. The maximum value of any one contribution ### is simply 2 raised to power of the number of bits in that contribution. 2 bits yields a range of 4 possible magnitudes - 0,1,2,and 3. Now imagine x were constructed from three sets of 2-bit contributions. It would be represented by 6 bits total, but its summary value could only have a maximum of 3*3 = 9. The range of possible values for x would thus have to be 0,1,2,3,4,5,6,7,8, and 9. In contrast, a normal 6-bit binary number would have a range of 0-64. Thus we see that additive summation of influences presents a large constriction or reduction of possiblities!

This reduction of possiblities is a box that we force the variable x to lie within. The pure consideration of fixed pattern size in random strings allows no bias and yields a flat distribution. Additive summation, though, forces the possibilities into following the combinatorial binomial distribution. Simply put, there are more possible ways x can happen for any trial, if the there is symmetry among the separate contributions. If all the value is coming from one large contribution there are fewer ways it can happen. If x is small and coming from mostly one contribution there are likewise fewer ways it can happen. These two cases are extremes at the edge of the box. The central region, where there are more possible arrangements, occurs when each contribution is balanced against the others. This is what causes the apparent clustering. It is nothing more than a combinatorial constraint imposed upon randomness! The organizing influence is implicit and comes entirely from the mathematics, not some external influence!

This has significance to statistics and probability theory, and perhaps even to physics. We have revealed that cohesion about a mean is not some act of nature except in the sense that natural influences can be additive. They are not always additive. They can be multiplicative for some real world variables. The point is that the clustering is the result of addition and the box of finite precision of measurement. The individual influences themselves can be quite random and have a flat distribution! These results have implications for what natural law really is to begin with. It appears that any organizing influence about a mean is quite simply a result of the constraints we place upon any experiment.

My intuition suggests that why some things are additive and others not probably has to do with feedback. Feedback generates multiplicative results. Feedback usually results from proximity, relatedness, or some sort of connectedness. Additive influences, in contrast, result from independent unrelated sources. Cohesion about means is thus a hallmark of the implicate symmetry of randomness.

 

Parsimonius and Relative

Parsimony is apparent in randomness also. Randomness seems to like the small patterns best. The treatment in the scale invariance section shows that pattern overlap is part of the mix also. Empirical data for the files shown above indicates that file compression seems to work by picking the places where pattern frequencies deviate from the random and balancing out the overall mix. The more efficient the compressor, the more efficiently it parses out the bits according to this scheme. Since pattern overlap is involved in the frequency distribution also, there is no wasted space in a random string.

Strangely, humans perform poorly when trying to manually generate random sequences. We always tend to over-exagerate the balance of ones and zeros because we fear any long clump is obviously not random. The longer the trial runs, the worse this performance becomes because randomness allows long patterns to occur if 'b' is great enough. This brings up the subject of relativity.

Randomness is relative to the context under which it is observed or measured. We have seen that given 'b' we can deduce the frequencies of various p-bit patterns. The context, therefore, is always about how long (b bits) a sequence we are considering. Humans obviously focus on what is immediately before their eyes. We tend to think in small numbers of say 6 or 7 items at max. We can short term recall that many digits as in a phone number for example. Visually we notice either a large overall pattern or we focus up close and notice small patterns. Either way, the complexity of our judgement is small in terms of pattern complexity. For this reason we tend to judge randomness by the scale at which we are able to pick out patterns. The lesson of this paper is that this relativity is built in and should be emphasized! Randomness is not an absolute! The same sequence judged orderly can be merely a subset of random in the larger context.

There is a lesson here for some people. We can take the results of this paper and project that the longer someone observes the fluctuations in the stock market, the more apt he is to make an error in judgement. This seems contrary to common sense, but in this case, more study is a bad thing. This is because a stock player is looking for patterns that indicate the appearance of a trend which he can captilize on. We have seen though, in a random string longer patterns (trends) are inevitable and more probable, the longer the string one is observing. We humans forget to incorporate the broad context of the total string of events and tend to relate a pattern to its short term context. This is a deadly mistake when dealing with randomness and the stock market!

 

Order, Implicate Order, and Randomness

There appear to be two types of explicit order - grouping order and symmetry order. Recall the kbit curve in the section on symmetry. The informational uncertainty reaches a maximum for combinatorial classes at the center of the curve. This central point of maximum possible combinations occurs when the number of ones balances the number of zeros. This is symmetry order. The single most balanced string is ...010101010101010.... The curve also illustrates the opposite kind of order - grouping order. 'Grouping' order is the kind of order where strings of ones appear or strings of zeros. The extremes are ...0000000..... and ...1111111.... The extremes of grouping order lie at the two tail ends of the kbit curve on either side of the central balance point. Grouping order is thus in opposition to symmetry order.

As fallible humans, we tend to see order if one or the other type is dominant. Randomness or complexity seems to be somewhere between the two extremes. Now an essential aspect of randomness we uncovered is that it is attracted to symmetry, yet at the very most central symmetric point, there is strong symmetry order and randomness cannot exist there. Recalling our log(Freq) plots for randomness, we note that the central ...0101010101... type of string would give a histogram with one bar only at the p=1 measure. It would be most decidably not random. Randomness thus appears as a tight halo about the extreme of symmetry order. My own intuition argues for saying the term complexity applies at the other end. I suspect that complexity hovers around grouping order. The total conceptual picture would be symmetry on one end merging into randomness, merging into complexity, merging into grouping.

The order we have discussed at this point is explicit order easily observed in real world events or binary strings. I use the term 'implicate' order to refer to algorithmic order. It is the domain of symbolic patterning and abstractions. It is functional and structures the things it operates on, but its operation may not be apparent. It is implicit to the mathematical structure of the algorithm. The theoretical treatment of randomness given in this paper shows that randomness has an algorithmic structure that is always operative but not very apparent in our observations. The question becomes now - where does this algorithmic order in randomness come from? Have I performed some mathematical ruse?

Well, recall that the fundamental context for log2(Freq) is 'b', the length of the binary string. This constraint creates a box within which our definition of randomness must operate. Boxes by definiton organize i.e. there is implicate order from the start. This order puts a strong clamp on the expected frequency for large pattern sizes relative to b. There is an even more fundamental algorithm structuring randomness from within though. It is the binary number system itself. We need to take a closer look at that.

The binary number system is base2 arithmetic. Each digit in a base2 number has a value equal to 2 raised to the power of the position. For example, the number '011' = 3, because 011 = 0*2^2 + 1*2^1 + 1*2^0 = 0+2+1 = 3

This is how the binary system works, and this is how the combinatorial arrangements of binary strings works. The number of arrangements equals two to the power of number of bits. This is also how probability works. If the probability for any bit is 1/2, then the net probability of a string of bits is one divided by two raised to the power of the number of bits.

All our theoretical treatment of randomness has done is to start with the fundamental axiom of uncertainty and deduce the structure of the binary number system when constrained within a box of finite size. Pure tautology! The implicate order of the algoritmic structure that was built into our axiom to begin with. This is of course how mathematics works isn't it?

The final conclusion, I think all honest thinkers must reach is that randomness is the fundamental epistemic concept behind information theory, probability, and statistics. It may even have more far-reaching significance. G.J. Chaitin (Kolmogorov-Chaitin Algorithmic Complexity Theory) has produced results which challenge the very foundations of mathematics itself.

My own predilection is to stop viewing randomness as something in itself or as a property of outside systems. It seems to me that it is merely the unbiased structuring of our expectations when faced with uncertainty. One can argue that my definition is just one of many definitions available. I would argue though, that mine is the only one possible that provides a metric with the minimal number of axiomatic assumptions built in. As such it is universal.

Ok, so now that I have fessed up. Let's really lay it out. Recall that the log2(Freq) plots were logarithms of an exponential. Below is the raw exponential form of a histogram for a random string. It was empirically measured on a random generated file by the way. The white line is the theoretical log. The actual frequencies have been scaled to fit in this box just for context only.

The first bar is for p=1 and it represents a log2(freq) = log2(b) - 1 and

freq = (b-1+1)/2^1 = b/2

:. freq = b/2

This is the symmetric balance for the smallest pattern size as expected. Now look at each successive frequency bar. p=2 is half of p=1. p=3 is half again and so forth all the way down to no frequency at all. Each successive frequency is half the preceeding value.

:. freq(p) = b * (1/2)^p

Well duh! (1/2)^p is just the binary multiplicative reduction in probability. It's pure binary inevitability built into the definition. It follows exactly the form of the cyclical positional notation of the binary number system itself.

Since I have just demonstrated randomness to be nothing more that an imaginary scaffold of tautologies, is it even useful to talk about it in terms of real world numbers? Let's take a look at that.

Can we Measure Compressibility?

Even though randomness might be an imaginary scaffold of tautologies, that very structure might provide a metric for measuring real world data. In the same way a yardstick or ruler is vacuous but orderly, so too might be randomness. A yardstick really has very little to say about external reality. It is in its use that observations are obtained. So too with randomness.

Recall the section on Scale Invariance. The neat triangular log histogram we arrived at for randomness suggests an informal measure of the information content and how it is distributed under conditions of maximum compression. It represents the theoretical limit of compressibility. Now when we look at the log histograms of arbitrary files with apparent order, they suggest that the deviations from the ideal are a result of redundancies or grouping order. Compressors take advantage of these redundancies and rebalance the binary mix of patterns to emulate the ideal. Might we not therefore presume that these deviations are a measure of the degree of order?

Since the histograms are already in logarithm form, they are information measures i.e. bits. By going through a log distribution for a file and summing up all the deviations (absolute value not the signed value), we can arrive at an estimate of the order of the file.

Dev = Sum over i of abs(Log2Freq(i) - TheoreticalLog(i))

We want this to be normalized relative to the total so we will divide by the area under the line which turns out to be A = (1/2)*(log2(b)-1)*(log2(b)). This will confine most values from 0.0 to 1.0. Now we have relative deviation RDev = Dev/A.

Then might not the measure of randomness simply be Randomness = 1.0 - RDev.

To shorten things let R = 1.0 - D

Now the compressibility of a file is a function of it's randomness, so we aught to see this for real world files.

I extracted R for several files and tested it against the reciprocal of the compression ratio i.e. 1/CR. I used the reciprocal because that keeps the numbers between 0 and 1 basically. I call this 1/CR factor compressive reluctance. It is merely the file size after compression divided by the size before.

Here are my results. The color coding is the same as previously. Red = executable files. Green = text files. Blue = bitmap image files. Yellow = jpeg files. White = random binary files.

The above plot shows that there is a definite correlation between randomness and resistance to compression. Notice, though, that each type of file seems to lie in it's own group with a different slope than the others. There is a general trend but I found the result unacceptable. Maybe I was seeing patterns that weren't there. Then it occured to me that the compression factor needed a sort of reification. It ocurred to me that a compressed file always has an amount of overhead built into it. A symbol table or coding scheme has to be specified in addition to the raw encoded bits. There is thus an amount of order which isn't squeezed out. We can express this by:

bits after compression = zipbits = rawbits + overhead

Now it seems reasonable to expect the overhead to approximate the relative deviation times the bits before compression. This gives:

overhead = D * filebits

substituting we have: zipbits = rawbits + D*filebits

Now our compressive reluctance by definition is 1/CR = zipbits/filebits so

Substituting again we have:

1/CR = (rawbits + D*filebits)/filebits = rawbits/filebits + D

The true reified compressive reluctance (TCR) we seek can now be identified as rawbits/filebits so:

1/CR = TCR + D

:. TCR = 1/CR - D

Curious to see if this reified approach yielded interesting results, I made another set of plots:

The above plot using the reified compressive reluctance shows everything now basically lined up with a similar slope. We have in effect stripped out the coding overhead from the zipped files and obtained a more direct correlation because the coding overhead is not fully random. There is still a fair amount of variance between the different file types though. What could be the source of that I wondered. Well one snag that I haven't covered yet is that I developed several methods for scanning for patterns in binary strings. The problem you see is that the theory concerns itself only with pattern length and not the pattern itself. To do a scan, though, one has to pick arbitrary patterns. I had been using a technique above that was fast and sequential. There is a more accurate technique though that takes longer. Using that method, I replotted and got the result below:

The plot above shows that there is now a sort of general agreement between all the file types. We need to discuss coding schemes that I used to scan for patterns though. This presented a big problem. Since the theory is independent of what the patterns are, but the empirical data depends on scanning for the existence of a particular pattern and its frequency, the empirical data are dependent on which patterns are chosen. This makes the data file-dependent and if random patterns are chosen, the results will vary a little each time the algorithm is run.

The algorithm is a type of the so-called 'busy beaver' algorithms. Since allowing pattern overlap is a feature of the theory, the algorithm has to back up and repeat if it gets to a mismatch. The longer the pattern length, the busier this beaver has to get. In addition, the whole file has to be scanned for each length of pattern, so the beaver gets even busier still. Now to top it all off, to get a really accurate idea of the terrain, we would also have to scan for the existence of each and every possible variation of pattern that is a given length. This means 2^length separate full trips per pattern length. At this point the beaver gives up and starts looking for an easier method.

I found a really quick and dirty method by simply going with the bitmass concept. Scanning for clumps of contiguous ones is sequential and really fast. The result had to be multiplied by two, presumably because of the mirror-imaged symmetry of ones and zeros but the results were quick and a good enough match with theory to justify its use.

The idea of coding schemes points out another issue with order and randomness. We can take a psuedo random binary file and show that it is indeed close to random. If we take that same file and convert the binary bits to strings of ascii zeros and ones, it becomes a text file essentially. The same randomness of data is there but the file will now test as not random and will in fact compress quite a bit. The coding scheme has introduced a lot of redundancy into the file. Thus we see that the way the data is encoded has a direct impact on the structural qualities with respect to randomness. After compression, though, there should be a similar file size regardless of coding.

Compression thus is a great equalizer and Randomness proves to be the ultimate measure of amount of information content, because it is independent of coding scheme.

Conclusions

We have demonstrated three basic types of distributions for randomness - the linear log scale invariance of pattern size, the flat unbiased distribution of fixed pattern size, and the bell curve distribution of additive variables. What we see depends on what we look for. It also depends on what is actually out there i.e. the raw data. Like everything else in life, our concept is relative and part subjective. The fascinating point of this paper though is that the things that tend to make us project order onto the external world is really the implicate order built into our testing apparatus. The real external order occurs only as deviations from the implicate order. The order we have been looking at in the above discussions occurs essentially as deviations from randomness. Randomness as a concept is the idealized and unbiased structuring of expectation when faced with maximum uncertainty.

Recall that there are two types of order - symmetry and grouping. My theory shows that grouping order is where most of the deviation occurs. Small patterns are present in symmetry order but due to the law of large numbers discrepancies are harder to detect at this end of the scale. This suggests that the theory needs to address symmetry order also. Further research is needed.

Given that much of the order we see is a result of implicate structuring of our expectations, how exactly does this come about? It is all about putting things in a box. Boxes constrain and organize things. They trim the field of possibilities. Information is a special type of box. It squashes exponential possiblities into linear bits.

Consider a guitar string. It is constrained by two end points. At the ends there are exactly no possiblities for variation. In the middle though there is a maximum of possible vibration modes. The fundamental vibration mode is one half a wavelength and it has the shape of well something like our kbits curve. Maximum freedom in the middle and zero on the ends. Possibilities are degrees of freedom. Tying the string at the ends has reduced the overall degrees of freedom down to a structured relationship amongst those remaining. The remaining degrees of freedom are thus implicitly ordered by the constraints of the system. This ordering is abstract and algorithmic in nature. The resulting possibilities are of course real and concrete though. Just remember that the order we are observing is in part, caused by the constraints.

Since randomness is the ideal unbiased assumption of maximum uncertainty, randomness essentially is an exploration of all the degrees of freedom available. When we put constraints on a system, we are injecting order onto randomness and relativizing it. Randomness as a concept though can thrive all the way down to one minimal bit of maximum uncertainty.

Randomness provides a benchmark for scientific endeavors. Deviations from expectations provided by constraints suggest the presence of external organizing influences i.e. natural law. Constraints introduce symmetries. Natural Law seems to occur as symmetries are broken. Or is it that natural law itself is what provides constraints and introduces symmetry? One of the great mysteries of particle physics is how some interactions violate symmetry. Cosmology struggles with the ultimate constraints of space and time.

This latest reseach of mine provides a missing link between my earlier thoughts on meta-level recursion and the compressive valving that I discuss in my paper 'The Universe Algorithm'. It seems to me that from a cognitive theoretic view, our minds work by symbolic compression of data into information. When we reach the limit of our compressive ability, everthing extra appears as random. The flared nose twitch, red face, and threatening gesture of a fellow primate becomes symbolized into a picture of 'anger'. We have boxes full of such symbolic patternings that relieve the load on our cognitive processing. Any behaviour above and beyond that is just random and unassimilated. Once we formulate a higher level order of compression though, we might put everything into boxes labeled behavioural scenarios. Then what was once random might now be part of a larger scenario. Thus we have randomness as the ideal limit of compression which enables detection of pattern relative to the level of cognition available to us.

On the physical plane, randomness in nature seems to imply that some sort of compressive valving has met its limit. Constraints introduce periodicities and implicate order. When these constraints are over-run by too much energy, randomness erupts. Energy thus set free explores the variations open to it in space-time but in a random fashion because there are fewer constraints to organize it! At least those are my intuitions on the matter.

Just remember:

"The longer you observe the stock market, the more apt you are to make a big mistake!" ...SAR