Back
SYNOPTIC LINKS:
[ Software Hardware process physical identity computer encoded program something INSTANTIATION question location properties abstract whereby problem Variables Semantic machine - Synoptic Hyperlinks generated by Shlep - A Paisley Product from TinkerSoft; Click on (*) links to return to this point]

The Semantic Web

and

Further Thoughts About Where Software Meets the Hardware

by Swinton Roof

May 21, 2001

Eldon New and I have over the course of time discussed the problem of just where exactly does software actually touch or interact with hardware. This problem is similar to the mind-body problem and can cause much confusion and vexation. How can an abstract 'intangible' influence something tangibly physical? A program running on a computer can be seen as an assemblage of physical voltage switches deterministically changing states over time with inevitable conclusion given fixed inputs from physical devices. It can also be seen as a physical manifestation of an abstract constellation of ideas interacting with some person or other constellation of abstractions (another machine or program). The first part of this paper will rehash some early concepts that actually can confuse the issue but which nonetheless form a minimal basis for understanding, while the last part will introduce some clarifying concepts that I recently came across while studying the idea of a 'semantic web' - the next great paradigm that will transform the internet.

Before engaging in a detailed investigation of the hardware-software question, it is appropriate to mention a philosophical limitation of such an inquiry. Just as 'Yin' can never be totally separated from the 'Yang', we must concede that all such complementary relations exist only as relationship. No tangible physical object can be a 'thing in itself'. Any properties it may have are a function of the context or enviroment in which the object is embedded and observed. This means that hardware, to exist in the sense we mean, must have properties designed into it with software in mind. A hocky puck, while sharing some propertities common to computer hardware, will not be considered hardware unless it is able to run software somehow. Likewise, software must be encoded in some physical medium to have any useful behaviour other than just exisitng as an object of our mentation.

Let us begin at the lowest level of hardware organization where software has any meaning at all. This will be at the transistor level of on-off bits on the silicon chip. This level is the point at which minimal units of information are encoded into the hardware. Below this level, there is a lot of physical engineering design and organization, but it serves only only as substrate mechanics which maintain the hardware-software system's integrity and functionality. After discussing the low level aspects of the hardware-software question, we will then move on to high level software and the process whereby it is encoded into a running computer program.

A first order solution to the low level mystery lies in the observation that 'pattern' seems to lie at the heart of the question. Abstract intangibles are represented by patterns of on-off bits. The number or value '3' for example can be represented by two 'on' bits as in 0011 where '0011' is a 4-bit binary number pattern or set of digital 'bits'. Using binary numbers to encode alpha-numeric and other information is the modus operandi of digital computers. It is seen that these patterns are easily encoded physically in the hardware as location addressable units of high/low voltages in transistors on the computer chip. So far so good. No real mystery here, or is there?

Well, the deterministic or mechanical behavior can be understood when one realizes that the physical high/low voltages convey energy in exactly the required increments necessary to flip other high/low bits when connected together. These flip/flops likewise enable other flip/flops and the whole process cascades into a silicon clock regulated process of program flow. It is important to realize, at this point, that the machine cares not one whit about semantics or meaning. As far as the machine is concerned, syntax is everything. What do I mean by syntax here. Well, by this I mean that the connections between logic elements are determined solely by binary values fed into these units while the logic units themselves have hardwire-fixed rules ( as in AND/OR/NOT/NAND logic gates ). The whole process has a deterministic outcome based solely on the input values and the fixed logic built into the silicon itself. The reader may be disturbed by the fact that I have already introduced such abstractions as syntax and logic and mixed them up with the hardware. The point I wish to make, though, is that these elementary concepts can be precisely illustrated by pointing out actual real physical things printed on that computer chip that do precise physical things when certain operating conditions are met. In the same way that syntax rules operate on words in a language without caring about what the words mean, the hardware logic on a chip keeps the behavior of binary signals 'well formed' (barring a system crash of course, eh eh). There is even a computer programming term called 'WORD' which means a given number of bits ( 16 or 24) operated on as a unit value. The computer is quite happy to 'operate' on these WORDs but only as values to be fed into it's syntactic gristmill of logic gates.

The lowest level of hardware while quite complex in terms of sheer numbers of transitors etc. is fundamentally simple and has only a small set of basic logical operations it can do. The real staggering complexity of computation comes in when the software is loaded and executed. Now here is a real interesting piece of the puzzle to consider. Just how does one get the software into the hardware? It should be immediately obvious that a computer (if it is more than just a mechanical calculator) must already be on and running some kind of software in order to be imprinted with any new software. I will leave the chicken-egg question and bootstrap teleology alone for now and simply let the reader agree that this must be so. I am more interested right now in the actual process of encoding abstract ideas into 'patterns of on/off bits' which tell the lowlevel hardware what to do. From the programmer's viewpoint this process engages when he writes 'code' using a word processor within some programming enviroment complete with an assembler and compiler or interpreter program. These tools are software processes themselves and translate strokes on the keyboard into language symbols. In the early days of computing, the programmer had to literally flip switches and plug in wires to get the software patterns into the computer. Things are a bit easier now I suppose.

There are many different software languages, but they all pretty much have a set of ideas in common. One is the idea of language, syntax, and statements in that language. Syntax means that there are certain rules for how statements can be constructed in a valid way. To be valid, a statement has to do something (even if it is just a <no-op> do nothing statement). It is usual for a language to allow for the creation of entities which have some value or set of values or properties. In high level languages like C++ these entities may even have behaviours or actions that they can perform. To keep things simple, though, let us consider the lowest level of software entity - the variable. We shall investigate the lifespan of a simple variable and see what insight it gives us into the hardware-software connection. There are other features of the software like operators (arithmetic etc.), functions etc. but they have to have something to operate on and that something is the variable or some aggregate thereof. Variables are where the info data is encoded.

It is extremely important to realize that variables take up space. This is why programmers came up with the idea of a 'type'. Basic types are character, integer, float, and string. A 'char' type (character as in 'a' ) on many computers takes up 8 bits of memory space to represent it. An integer may take 16 or 32 bits etc. etc. What we're getting at here, is that the basic types are the lowest level entities or objects in the abstract world of programming and these entities, like real physical things, have an intrinsic size separate from and regardless of what they may respresent. We might use 8 bits to represent the letter 'a' but we might also use 16 bits to represent that 'a' if we were using a larger scheme capable of representing a more universal character set. Shannon's theory of information tells us that the number 2 to the power of bits gives the number of possible representations that can be encoded within that given bit space.

The above gave a short description of the low-level encoding that goes on in software just as it does in the hardware (remember the logic gates and on/off bits?). Patterns encoded into simple blocks of bit space are used to represent binary numbers which in turn are used to represent higher level things like characters or symbols of a language ( i.e. as an 'a' in English ). The basic types have size boundaries which like atoms in the real world cannot be violated or mayhem ensues. These basic types are the building blocks from which software variables are constructed. How does this happen?

It is usual in a strongly typed language to first declare the variable and its type so that the compiler or interpreter can begin it's job of somehow making that variable real and available for computation. An example in 'C language' would be:

int sum;

This is a statement (delimited by the ';' character) which declares a variable named 'sum' which is of the type 'integer'. In the abstract world of the programmer, he is telling the software environment to set aside some space (say 32 bits) for an integer. In addition he is declaring that that particular space will have a name to identify it. In this particular case the name is 'sum'.

We should point out that in most cases, this act of programming corresponds to what is called 'source code level programming'. This is the initial stage at which pure ideas in the programmer's mind are made tangible by encoding them onto some substrate or process. Further processing and massaging by software processes will later transform this code into instructions the hardware can understand. The reader will hopefully see that there is a continuity and general similarity to how this happens whether talking about hardware or software.

Continuing our discussion, we should note that our variable 'sum' has no defined value. We initialize it in the act of assignment as in sum = 0; Now it is ready for use. In more complicated cases with aggregate variables or structures or classes we must first define a template for the type of bit space to allocate, then declare the variable, allocate it, and initialize. This whole process is called INSTANTIATION. I accent this term to indicate how important it is to the discussion of the sofware-hardware problem. Instantiation is the process whereby something abstract and intangible becomes real. Becomming real (even in a relative sense) means that an entity takes on the properties of having physical size and location. The size delimits what possible representations that entity may express. The location enables the entity as a unique object capable of being addressed or found. These types of properties are necessary in the physical world of hardware just as they are in the software. If your shoes had no size or location in the real world, you would have a very hard time finding and putting them on in the morning!

When speaking of software, it is a philosophical point that representation is the enabler of instantiation. Since abstract ideas have no implicit size or location, they must be 'represented' by something that does. That something is our 'variable' with it's associated type or size. Convention or arbitrary standard decides the actual representational associations, but the process itself must occur for instantiation to succeed at all. It is tempting to further explore the idea of representation, but I don't want to digress into those depths here.

The other half of instantiation not discussed so far is the idea of 'location' or addressability. To be located we need to have a unique identity. In our declaration of the variable 'sum' the name itself is its identifier. This tells the software enviroment to cordon off some type-sized space with a unique inviolate location and give it the name 'sum'. The labeling or naming itself is a relatively high-level affair which is used to represent the identity of the named entity. Just as a person has a name so do variables. Likewise, just as more than one person might have the same name, variables can also - but most importantly - not in the same scope or namespace because that would violate the identity requirements and cause mass confusion! In the real world, a persons ID might consist of name, social security number, driver's license, credit cards, birth certificate, physical description etc. The whole purpose of the above is to establish individuality or uniqueness to avoid confusion between individuals. But what if you were cloned? The one overriding property that would establish your uniqueness is location in space-time! No two objects can occupy the same space at the same time. This is the 'Exclusion Principle'. Likewise no two variables can have the same location at the same time.

Computer programs which violate size and location boundaries generally elicit swift retribution in total program crash and in some cases system crash - the whole software construct comes tumbling down just like a collapsing star whose atoms have exceeded the limits of the 'Pauli Exclusion Principle' ending up in an inert heap or worse yet a spectacular explosion. In my early days of programming the Amiga computer I remember some awesome displays as a rogue program I wrote wandered into system screen memory and blew the lid off of all rational behaviour!

In the world of software, identity becomes merely the sum total of all properties. It is the size and location* properties which are the most fundamental to the process however. In fact, one can write working software without directly using names at all. It can all be done indirectly with pointer referencing and accessing the bits directly. Indeed, this is what happens when the software environment crunches the high level code into instructions the hardware machine can follow. All high level labels etc. are stripped out of the final code (with some caveats of course) till only the low level bit fipping is involved. The high level language syntax is what keeps all this in line with the validity requirements discussed above.

Now let us summarize. We wish to somehow transorm an abstract* idea into a real physical process. To do this we learn a programming language which is full of conventions and syntax rules for how to go about this. Then we instantiate some entities and perform some operations on them. The crucial step where the 'bit hits the bucket' so-to-speak is the process of instantiation or entification. Something abstact has become represented by something* 'concrete'. This, in a nutshell, is where the hardware meets the software. Speaking of the bit hitting the bucket, this is a term from the old IBM punch card days when bit patterns were actual holes punched in paper cards. The punched-out 'holes' fell into a bucket for later disposal. That very moment when a bit was punched is the exact moment of instantiation! The software has met the hardware.

At this point, the question* posed at the beginning of the paper seems clarified, but is it? Where do the semantics and the meaning reside in all this? Everything so far seems to have been rules and syntax with no real content. Does the program really do what the programmer meant it to do? Does the program depend in any fashion on 'meaning'? These are questions not so easily answered. Perhaps the programmer, mentioned above, wanted to get the sum of numbers he inputs into the program so he can itemize and summarize his monthly bills. It matters not one wit to the program what each individual number represents in terms of what the bill is for, unless the programmer explicitly adds that bit of information, and even then it would only be a linkage of one variable to another. No semantics there at all. Strangely, the semantics or meaning of the process incurs only by people interacting with the program*. Meaning is a subtlety which eludes encapsulation or object entification. The most one can do perhaps is to encode as many properties necessary to conform with the intended meanings.

This problem* resolves into a double cone of information flow. On one side is an expanding downward cone of hardware interactions all the way down to atoms and particle physics. On the other is an upward expanding cone of software relations expanding up into the ethereal but human world of intangible ideas and associations. The point of intersection where the tips of these two cones touch is the singularity event whereby* instantiation* occurs and computer* bits flip. Meaning seems to include the entire process flowing back and forth within these two cones. Quantum Physics is the discipline devoted to understanding the turn-around point at the lowest level of the bottom cone. Semiotics is the discipline devoted to understanding the turn-around point at the highest level of the upward cone. This turn around point is the point at which symbols are dereferenced and a representation becomes a semantic or vice versa - a semantic is referenced and encapsulated or instantiated into a symbolic representation. It is my belief that meaning is a term for the process inclusively while semantics is a term for the delineation of specific symbolic encodings. Semiotics is properly the study of how this process is enabled and structured.

This brings me to a final conclusion as to how I came upon these ideas while exploring a new intiative on the internet - the development of a Semantic Web. The Web as we know it today consists of linked documents encoded in HTML. There is no way for machines to access the meaning of documents except in rigid syntactical ways. For this reason, people have to rely on word searches and visual reading of documents to find the information they seek.

The Semantic Web intitiative seeks to encode information in a descriptive way so that machine code or software can access and find information by way of actual logical linkages instead of mere hardwired physical* linkage as provide by HTML statements. A meta-design for doing this is in development now. It is called RDF - Resource Description Framework. Ultimately the RDF relations will be encoded in XML - a more exstensible and robust version of HTML. The basic premis of RDF is that all useful semantic relations can be encoded* using logic triples which embody a subject-verb-object type of relation. These triples can also be seen as identity-property-value statements. The identity of a subject is established by the 'hardwired linkages' mentioned above. They link to an actual document or resource on the Web. Properties are 'type descriptive' and give a logical clue as to the nature or quality of the relation i.e properties give a logical set theoretic basis for computation. The object or value part of the triple is a 'literal' encoding that gives an 'instantiated' version, value, or example of the semantic property expressed in relation to the subject or identity.

Well there you have it! That's the connection I noticed. Variables* instantiated into hardware which have a unique software identity. Identity is easy (well technically at least) to establish, as is object value. To have more than simple linkage or association, however, there has to be a semantic hook into some description of the nature of that subject-object association. That's where the properties* come in. In fact during my entire discussion above, I deliberately left out programming features like operators, functions, inheritance hierarchies, attributes, polymorphism etc. As programming languages have evolved, it appears that the major thrust has been to include semantic structuring of the programming process* in a more explicit way.

The Sematic Web, if successful, will transform the web into a machine readable resource which in a primitive way (at first - who knows what will come later) will have the ability to make semantic connections and perhaps reason on it's own. As this transformation matures, the dividing line between hardware* and software will become just as fuzzy as it is in our own human case. The mystery will perhaps deepen. The same difficulties we humans have will be a stumbling block to the Semantic Web also. The design initiative at present is the establishment of standard RDF repositories or 'ontologies' wherein useful identity*-property-value triples are defined. These ontologies will have a scope delineated by the source documents to apply only within that scope. This will enable different ontologies to be shifted in and out of scope. The resulting web of relations can thus be extensible and adaptable, but herein lies the difficulty. Just as humans disagree and shift their context in unpredictable ways, so too will RDF ontologies. The Semantic* Web software engines of tomorrow must be able to deal with this and establish their own internal scope limited consensus of what is going on or we will end up with techno-babble.

Lest any human beings out there feel disenfranchised, it should be pointed out that there will always be a need for actually encoding the ontologies to our own pupose. Software* agents will inevitably acquire the ability to format and compose their own ontologies given enough grist, but it is up to us to shape that vision and imprint our own connection to the machine* both within and without.