The Five Things You Have to Know, and Why
The News & Observer
Oct. 21, 1993
When you decide to ask the government for an electronic database -- or you're trying to make up your mind -- the first thing you should do is ask for a copy of the record layout.
A record layout is a map of the database. It describes, however briefly, the information contained in the database, and shows you how it is arranged. It names each field of information. For example: firstname, lastname, city. It tells you the type of each field. Most are character. Most of the rest are numeric. It shows you the width of each field. A record layout is a data map, and you'll need it to load the database.
There are two reasons to get it early.
In addition to the record layout, there are four other pieces of information you need to know about the tape structure: tape density; blocking, coding, and labeling.
Density is expressed in bits per inch, or bpi. A bit is an acronym for binary digit, the smallest unit of computer data. Bits are arranged in rows, nine bits to the row, like ranks of solders marching nine abreast down a street. That's why a nine track tape is called a nine-track tape.
Eight bits make up a byte. The ninth bit is a parity bit, which is used to indicate that the byte was not changed during transmission. Don't concern yourself with the ninth bit. The closer these rows of bits are arranged, the higher the density of the tape.
Most every tape you acquire will have a density of 6,250 bpi. But a tape could be 3,200 bpi or 1,600 bpi.
Records are grouped on a tape to form a block. The number of records
in each block is known as the blocking factor and the number of bytes in
the block is known as the block size.
Four bits make a nibble. Two nibbles make a byte. Bytes make fields. Fields make records. And records make blocks. Why are records blocked? To save space.
Spanish, French, and Italian all use the same alphabet. In a similar fashion, there are different kinds of binary code based on a computer's 1s and Os. The two most common binary codes are ASCII and EBCDIC. ASCII stands for American Standard Code for Information Interchange. EBCDIC stands for Extended Binary Coded Decimal Interchange Code. You don't need to remember that. What you need to remember is that virtually every tape you get will be in EBCDIC, because most of them will come from IBM mainframes. Some mainframe vendors, such as Digital Equipment Corp., use ASCII.
The EBCDIC tapes will have to be translated into ASCII, the binary code understood by almost all personal computers. Translation is no problem. You can do that with the touch of a button.
Some tapes are preceded by what amounts to an index of sorts, called a label. A label contains information about the block size and the logical record length. Tapes do not have to be labeled. They can consist of hundreds of thousands of records that terminate in a tape mark, which tells the computer THE END. But tapes usually are labeled. At the News & Observer, we always ask for labeled tapes because we want all the information we can get. IBM developed a form of labeling called "IBM Standard Label." That label consists of three 80-byte records, followed by a tape mark, followed by the data. At the end of the file you will find another tape mark, two small records, and a final tape mark.