-T / T / +T | Comment(s)

Monday, July 1, 2013

System Design: Solving Long-Term Digital Data Storage Problems

by Walter Shawlee 2

The real beauty of digital technology is how simply large amounts of data can be easily stored and transferred, and how compact that storage can be. This ease of use and high density have persuaded many to use this format as the primary data archival technique within their company for technical and design data files. Minimal storage space and cost, and fast retrieval are all compelling benefits. Yet, it’s important to remember that sometimes all that data can also be easily lost with just a few keystrokes, a failed hard drive or system malfunction.

Now that this digital storage pattern has become well established, the underlying difficulties with digitally stored data are becoming widely apparent to many. A blueprint, manual or drawing remains readable for a very long time, but the same cannot really be said of digital records. Paper drawings also cannot erase or delete themselves. Once the technology behind the electronic storage is out of vogue, the data contained in the system is often lost. Historically hot data storage technology of the day has included punched paper tape, punched cards, reel-to-reel tape, tape cartridges, removable disk packs, floppies, complete hard drives and now optical and semiconductor storage.

The problems with digital storage initially materialized as problems with the physical media itself. Reel-to-reel magnetic tape was one of the first true widespread digital archive formats, and required very large and costly drives. This was mainframe computer technology, and used for high-level data storage and transfer in large firms, especially for financial and inventory data. High-quality tapes of this type can have a 10 to 30 year lifespan if stored well, in a cool, humidity controlled and magnetic field free environment. Unfortunately, it is mechanically degraded with every operation when the tape is rubbed across the read/write head. The data is easily erased by magnetic fields and heat, as well as deliberate erasure operations, and can even print through to the next layer of tape over long periods.

For the smaller computer world, reel-to-reel tape was followed by small tape cartridges of many types from an assortment of vendors, all incompatible. Cartridge tape was a highly problematic media, with many recovery problems, and constantly changing densities and formats. It was also incredibly slow, easily worn and stretched, and unfortunately due to the serial compressed record structure, a single error in a long record could cripple the recovery of an entire record. Mercifully, for the most part, cartridge tape has completely fallen away as a common data format.

Floppy disks are an especially interesting chapter in the data storage saga. We started out with 8-inch floppies, eventually moved to a 5.25-inch version in a plethora of hard and soft sectored, single and double-sided incompatible versions. They scaled down further to 3.5-inch microfloppies again in single and double-sided, high and low density formats to make everyone crazy. Early floppy versions had an unprotected opening with exposed magnetic film, and were very easily damaged by handling and dirt, but the later 3.5-inch microfloppies had a sliding protective gate and were also protected by a rigid case, a real improvement. These disks wear out significantly in use, and thus long lifespan was not possible. If the drive became dirty, it would also happily destroy every disk put into it until it was cleaned. Individual drive step alignment was a problem, and it was quite possible to have disks that worked only in one machine, to everyone’s later chagrin. And of course, floppies can be accidentally erased, formatted or re-written, causing unrecoverable data loss.

Like tape, floppies exhibit unavoidable contact wear on reading and writing, so lifespan can be quite short, perhaps only a few weeks. They can also be damaged by heat and magnetic fields. Floppies were the technical data battleground for vendors to try and assert market dominance to the detriment of customers. In the end, all that magnetic storage technology passed away, but not before being used to store a lot of critical company data files, now in various states of unhappy potential recovery today.

Optical data storage followed magnetic media, along with some interim hybrid magneto-optical drives that disappeared quickly once the recordable optical CD became common and cost effective. Predictably, once again we got the war of the vendor-specific formats, with CR-ROM, CD-R, CD+R, CD-RW and CD+RW formats all flailing around for attention from different vendors. Drives also became much faster, and now high speed 48X drives are common. For video storage use, higher density was needed, so DVD disks appeared, again in the same idiotic range of format options, PLUS single and double-sided and layered formats. Clearly, no useful lessons were learned about the real value of a common industry standard by this media group either.

Optical drive makers, physical media and different burn speeds combine to produce a huge range of optical storage variability, so making accurate estimates of recorded data life is quite difficult. Early recordable optical media and drives were very problematic with all kinds of defects and deterioration, and certainly were not good for real archival storage. Today, the problems have largely stabilized, but a lot of marginal and low quality media is in the market place, making it extremely hard to really choose a good media type to use for true long-term records. Budgets often tend to decide these office purchases, not established reliability.

Be aware that recorded optical media can be quite faulty. Unless a verification read-after-write test is conducted (which literally almost no one does), then you cannot in fact really be sure the written data is any good. Drive “faults” usually are gross media problems and buffer problems, and a successful burn does NOT signal a perfect copy. If the data is truly important, then a read after write verification test should be done, wickedly time consuming as it is.

The digital marketplace is fairly savage, and manufacturers of hardware, media, software and drives all keep going out of business with astonishing regularity. This can derail the most well thought-out company data storage plans simply because the entire technology platform behind your storage can disappear suddenly, forcing a change without good preparation. This has been especially true of cartridge tape, super capacity floppy technology, and early magneto-optical and optical media. They simply have not lasted very long in the marketplace.

The secondary problem underlying storage media is also one of physical formatting and operating system compatibility. Very few computer systems achieve operational life spans longer than 5 years with the original users. In fact, now it is often only 3 years, yet data usually has enduring value that spans many equipment generations, as well as operating system changes. This fundamental time problem can be extremely serious, as it has no good universal solution in today’s digital landscape.

Some distribution CD and DVD media is actually recordable media, and so is not long lived. Recordable media is assumed to be “closed” after writing to prevent further writes.

It is noteworthy that high quality paper copies remain one of the most durable techniques for data storage. Recordable digital media of all types is just not very impressive by comparison, even microfilm is better. Many media types can also just be erased at any time, which poses an ongoing serious threat to secure data storage. CD and DVD recordable data is proving especially problematic, as cheaper suppliers are relentlessly squeezing higher quality media vendors out of the marketplace, as they did in the hard drive world. Further, despite lofty claims of up to 100 year lifespan for the technology, real life testing has shown far shorter lifespans for this media than expected.

Best results for optical media will be found in properly cased, dark stored media, at between 4-20C, and 20 to 50 percent relative humidity. Also be careful of the label side, which people think is the tough side, as damage here can also destroy recovery. The new M-Disc has purported lifespan of 1,000 years (of course, not verifiable in any real way), plus full read capability in existing DVD players. It requires a special drive to write the special media, and of note, it has the exact same care and handling limitations as conventional CD/DVD recordable media, but does create a far more rugged physical data record with at least the potential for long term recovery and storage. Very useful, if anybody even knows what a DVD is in a thousand years.

This is a bit of a wake up call for many, as there has been an unfortunate trend to treat digital storage as “robust and sophisticated,” and paper records as “fragile and primitive,” but real data shows just the opposite to be true. I own lots of books older than any computer or digital storage, and they still work perfectly today and require no special equipment to work. Digital records of the same era are for the most part hopelessly degraded, worthless or physically unreadable. Not a good trend.

Of special practical interest for everyone is the “hot back up” technique, where all data is stored on multiple hard disks essentially as a networked “storage cabinet.” I often use this method myself when I have many related projects at once. At one time, 5-year warranties for hard drives were at least available, if not common. Now literally all drives regardless of price and supposed quality have only a one-year warranty. This is completely appropriate, as my own testing shows few current generation hard drives, regardless of maker or cost, are achieving long life spans as back up devices. RAID disk arrays can help overcome individual drive failures in servers, but individual drives remain problematic for reliability on the desktop. If this technique is used, then regular back up to multiple sites is critical, as drive lifespan is very poor in absolute terms.

Assuming one actually had durable media to store data on, the next problem is the application that can read and make use of the data is itself highly volatile, and may not last as long as the media. Many graphics, complier, word processing and Computer Aided Design/Electronic Design Automation (CAD/EDA) programs and versions have come and gone in the last 20 years. I can remember using WordStar, Orcad SDT, CBasic and Uniplex, and created a lot of material using those programs. Today, those files would be almost worthless to me in practical terms even if I could magically recover them.

Of particular significance to us here is the issue of CAD/EDA software; no other single user application group has had as much deliberate format changing and forced incompatibility. This has proved to be especially destructive for users. Virtually all the inexpensive CAD/EDA software makers were bought up by more costly vendors, then the inexpensive versions were methodically eliminated. New software versions were deliberately limited in terms of backward compatibility, and each version was subsequently more expensive. As a result, this single set of design data, which comprises the bulk of critical intellectual property owned by a company, is also the most at risk for obsolescence and incompatibility going forward. Many times, the new corporate CAD entity cannot even offer any software that is still compatible or operationally similar with your existing files, yet appeals to your “product loyalty” to stay with them.

A similar trend has also occurred with common word processing files. Endless application and format churning has gone on, to virtually no useful purpose except to frustrate office workers worldwide, and make useful training impossible. At this stage I think many users are simply turning to the free and more stable Open Office/Libre Office suites (which offer full file compatibility), and exiting the costly office software rat race.

For many users, the 3+ year old existing software tool is perfectly fine, and so forcing new buys in applications has become a real spectacle in the digital marketplace. The Windows XP operating system, with one of the largest user bases in the world, is the classic example. Some new features and fixes are cleverly bundled with some critical incompatibilities, and a new release is promoted. If any new copies are bought at an established site, new data or imported old data can no longer be worked on with existing older software, forcing still more purchases. A great plan for the vendors but not so wonderful for users, and ultimately highly destructive for all, as files and program operation completely fail to achieve any long-lasting stability, destroying any hope of high productivity or effective training.

In addition, Microsoft Windows, one of the more ubiquitous desktop platforms around for many users, has experienced pretty regular upheaval. Of particular significance to everyone has been the steady dropping of specific device drivers and hardware and adoption of others, which has rendered a veritable mountain of critical hardware obsolete. Plus, many applications no longer run in later versions (a significant problem in the transition from XP to Windows 7 and later), and of tragic significance, the serial ports and even parallel ports have disappeared from almost all new machines running this software.

Why are serial and parallel ports significant? Well, many people used both the serial and parallel ports to run external device programmers and interfaces of all kinds, and as users have found, this now often fails when attempted using some kind of ad hoc generic USB-serial-parallel converter in later operating systems in some kind of software compatibility box. The loss of these interfaces is profound, as they were widely used in industry to load and interrogate data in remote systems, check for faults, and do a host of digital housekeeping and for device programming for EPROMs and microcontrollers to PALs. This leads us naturally to the issue of programmed-part life spans, which were often loaded using these now-extinct interfaces.

Silicon Storage

UV erasable EPROMs were the backbone of industrial firmware for decades, and are still in wide use today in everything from elevator controls to avionics. Of note here is that the stated program retention lifespan for these parts has typically been 25 years, and for later non-UV fully electronic EEPROM reprogrammable devices as little as 10 years. While that possibly seems like a lot, it is really not, especially when the system lifespan is far longer than that period. Think “elevator or environmental controls” in your building, “hydroelectric power station” or “autopilot” and consider the implications of random data loss. Essentially, all microcontroller, FPGA or similar technology based systems with any kind of EEPROM data or program storage functionality will fail in this way eventually and inevitably, as the stored data (buried charge layer) is lost. There is no fix for this other than re-programming, if possible.

A common practice for the production of programmed parts is to use a master pre-programmed part to make more in a gang programmer, but of course this breaks down once the reference part deteriorates in storage. If the required hardware and programmer (and maybe even the parts themselves) no longer exist, then making more firmware is no longer possible for equipment support. Dark storage of parts and careful anti-static handling are critical to maintain these parts, and frequent checks and re-burns are good practice to maintain viable copies. UV EPROMS whose program window is not covered will be eventually erased by the UV content in sunlight or fluorescent lights over time.

NAND flash EEPROM memory is being widely used in very high densities as portable storage (thumb drives), and even as replacements for mechanical disk drives in many portable computing devices. The limited write endurance of this technology, its medium storage life and its various parasitic fault modes, make it problematic for real archive storage or any long-term application. It is a good match for low-cost consumer applications, but it not optimal for more serious longer term uses.

There are many techniques that are inexpensive and easily achieved that will help you deal with these problems. Here are a few of the ones we have had good success with:

1. Always print out any critical drawings and related files and store them safely as your ultimate back up.

2. When using CAD/EDA programs, use a PDF printer driver to create an easily read and exportable file of all drawing output. This is extremely valuable for documentation and recovery, long after the program itself is no longer useful.

3. DO NOT compress, password protect or encrypt working data files. You may think this is security conscious, but you will regret it later when you cannot recover them because nobody knows how it was done 5 years later.

4. Defense in depth is the watchword for back ups. Make several, on both networked hot drives, and physical removable media, then use periodic off site storage to combat disasters like fire, theft and flood.

5. When buying a software upgrade, CHECK COMPATIBILITY with your operating system, data files and two-way file interchange with existing software. If compatibility will be lost, consider all your options very carefully.

6. Before a program or system is lost, make every attempt to port the data to the new tools and machine. If that fails, maintain a back-up machine that can still work with the files, and export outputs to PDFs or other exchange formats so they can remain usable in at least some form.

7. If hardware is the repository for code, make sure they get refreshed and still work at no longer than at 3 to 5 year intervals.

8. Store everything properly, and make sure all working machines have surge suppression and UPS capability to prevent operating damage and unexpected loss of data.

9. Every so often, try and recover something, hopefully there won’t be any surprises.

10. In your PC, set all the energy saver options ON, this will power down your drives when not in use, and increases service life so you don’t have to resort to back-ups.

Digital back ups of every type can become infected with viruses, ESPECIALLY if re-writeable and moved between machines, or networked. Thumb drives are notoriously susceptible. If possible, buy a thumb drive that allows physical write disabling (with a physical switch) so it will work as read-only unless you really want to write. Be certain these drives are swept with anti-viral tools and malware tools regularly.

Every digital method and media has an inescapable error rate. Usually in hard drives it’s a pretty low read or write error rate, but it is never zero. As the data is copied and transferred, this accumulates, and never decreases. In a big video file, it’s maybe a bad pixel and literally un-detectable, but in executable code, data corruption can be fatal. As hardware and media ages, this rate increases. Multiple stored copies with hash totals can help deal with this error, but can also lead to three copies, all slightly different for unknown reasons, which is not really an improvement. Just be aware this is a low-level background irritation to deal with.

Walter Shawlee 2 is the president of Sphere Research Corp. in West Kelowna, British Columbia, Canada, and a senior designer at Technisonic Industries. He can be reached at walter2@sphere.bc.ca.

To see an archive of Shawlee’s System Design columns, visit www.aviationtoday.com/shawlee.

 

Live chat by BoldChat