Sunday, October 7, 2007

Disk formatting

Disk formatting is the process of preparing a hard disk or other storage medium for use, including setting up an empty file system. A variety of utilities and programs exist for this task; pictured to the right is the iconic FORMAT.COM of MS-DOS and PC-DOS.

Large disks can be partitioned, divided into logical sections that are formatted with their own file systems. This is normally only done on hard disks because of the small sizes of other disk types, as well as compatibility issues.

A corrupted operating system can be reverted to a clean state by formatting the disk and reinstalling the OS, as a drastic way of combatting a software problem or malware infection. Obviously, important files should be backed up beforehand.

Two levels of formatting

Formatting a disk involves two quite different processes known as low-level and high-level formatting. The former deals with the formatting of disk surfaces and installing characteristics like sector numbers that are visible to, and used by, the disk controller hardware, while the latter deals with file system specific information written by the operating system.

Formatting a disk involves two quite different processes known as low-level and high-level formatting. The former deals with the formatting of disk surfaces and installing characteristics like sector numbers that are visible to, and used by, the disk controller hardware, while the latter deals with file system specific information written by the operating system.

Low-level formatting of floppy disks

The low-level format of floppy disks (and early hard disks) is performed by the disk drive hardware.

The process is most easily described with a standard 1.44 MB floppy disk in mind. Low-level formatting of the floppy normally writes 18 sectors of 512 bytes each on each of 160 tracks (80 on each side) of the floppy disk, providing 1,474,560 bytes of storage on the floppy.

Sectors are actually physically larger than 512 bytes as they include sector numbers, CRC bytes, and other information required in order to identify and verify the sector during reading and writing. These additional bytes do not add to the overall storage capacity of the disk.

To complicate matters, different low-level formats can be used on the same media; for example, large records can be used to cut down on interrecord gap size.

Several freeware, shareware and free software programs (e.g. GParted, FDFORMAT, NFORMAT and 2M) allowed considerably more control over formatting, allowing the formatting of high-density 3 1/2" disks with a capacity up to 2 MB.

Techniques used include:

head/track sector skew (moving the sector numbering forward at side change and track stepping to reduce mechanical delay),

interleaving sectors (to minimize sector gap and thereby allowing the number of sectors per track to be increased),


increasing the number of sectors per track (while a normal 1.44 MB format uses 18 sectors per track, it's possible to increase this to a

maximum of 21), and

increasing the number of tracks (most drives could tolerate extension to 82 tracks – though some could handle more, others jammed).

Linux supports a variety of sector sizes, and DOS and Windows support a large-record-size DMF-formatted floppy format.

Low-level formatting (LLF) of hard disks

User instigated low-level formatting (LLF) of hard disks was common in the 1980s. Typically this involved setting up the MFM pattern on the disk, so that sectors of bytes could be successfully written to it. With the advent of RLL encoding, low-level formatting grew increasingly uncommon, and most modern hard disks are embedded systems, which are low-level formatted at the factory with the physical geometry dimensions and thus not subject to user intervention.

Early hard disks were quite similar to floppies, but low-level formatting was generally done by the BIOS rather than by the operating system. This process involved using the MS-DOS debug program to transfer control to a routine hidden at different addresses in different BIOSs.
Starting in the early 1990s, low-level formatting of hard drives became more complex as technology improved to:


use RLL encoding,

store a higher number of sectors on the longer outer tracks (traditionally, all tracks had the same number of sectors, as is still the case with floppy disks),

encode track numbers into the disk surface to simplify hardware, and

increase the mechanical speeds of the drive.

Disk Reinitialization

While it's impossible to perform an LLF on most modern hard drives (since the mid-1990s) outside the factory, the term "low-level format" is still being used (erroneously) for what should be called the reinitialization of an IDE or ATA hard drive to its factory configuration (and even these terms may be misunderstood). Reinitialization should include identifying (and sparing out if possible) any sectors which cannot be written to and read back from the drive, correctly. The term has, however, been used by some to refer to only a portion of that process, in which every sector of the drive is written to; usually by writing a zero byte to every addressable location on the disk; sometimes called zero-filling.

The present ambiguity in the term "low-level format" seems to be due to both inconsistent documentation on web sites and the belief by many users that any process below a "high-level (file system) format" must be called a low-level format. Instead of correcting this mistaken idea (by clearly stating such a process cannot be performed on specific drives), various drive manufacturers have actually described reinitialization software as LLF utilities on their web sites. Since users generally have no way to determine the difference between a true LLF and reinitialization (they simply observe running the software results in a hard disk that must be partitioned and "high-level formatted"), both the misinformed user and mixed signals from various drive manufacturers have perpetuated this error. Note: Whatever possible misuse of such terms may exist (search hard drive manufacturers' web sites for all these terms), many sites do make such reinitialization utilities available (possibly as bootable floppy diskette or CD image files), to both overwrite every byte and check for damaged sectors on the hard disk.

One popular method for performing only the "zero-fill" operation on a hard disk is by writing zero-bytes to the drive using the Unix dd utility (available under Linux as well) with the "/dev/zero" stream as the input file (if=) and the drive itself (either the whole disk, or a specific partition) as the output file (of=).

High-level formatting

High-level formatting is the process of setting up an empty file system on the disk, and installing a boot sector. This alone takes little time, and is sometimes referred to as a "quick format".

In addition, the entire disk may optionally be scanned for defects, which takes considerably longer, up to several hours on larger harddisks.

In the case of floppy disks, both high- and low-level formatting are customarily done in one pass by the software. In recent years, most floppies have shipped preformatted from the factory as DOS FAT12 floppies. It is possible to format them again to other formats, if necessary.

Formatting in DOS

Under MS-DOS and PC-DOS, disk formatting is performed by the FORMAT program. FORMAT usually asks for confirmation beforehand to prevent accidental removal of data, but some versions of DOS had an undocumented /AUTOTEST option; if used, the usual confirmation is skipped and the format begins right away. The WM/FormatC macro virus uses this command to format the C: drive as soon as a document is opened.

There is also the undocumented /U parameter that performs an unconditional format which overwrites the entire partitio, preventing the recovery of data through software (but see below).

Recovery of data from a formatted disk

As with regular deletion, data on a disk is not fully destroyed during a high-level format. Instead, the area on the disk containing the data is merely marked as available (in whatever file system structure the format uses), and retains the old data until it's overwritten. If the reformatting is done with a different file system than previously existed in the partition, some data may be overwritten that wouldn't be if the same file system had been used. However, under some file systems (e.g., NTFS; but not FAT), the file indexes (such as $MFTs under NTFS, "inodes" under ext2/3, etc.) may not be written to the same exact locations. And if the partition size is increased, even FAT file systems will overwrite more data at the beginning of that new partition.

From the perspective of preventing the recovery of sensitive data through recovery tools, the data must either be completely overwritten (every sector) with random data before the format, or the format program itself must perform this overwriting; as the DOS FORMAT command did with floppy diskettes, filling every data sector with F6h bytes.

No comments: