File Techniques

Introduction

One of the most basic distinctions among files is whether the file is a text file or a binary file. Text files contain readable text characters, and can be downloaded to a Mac or PC to be printed or imported into a word processing program. If you are not sure whether a file is text or binary, try typing "more" followed by the file name. If the file is readable, it's probably text; if it's just a jumble of characters, it's probably binary.

The individual bytes in a binary file represent program instructions, pixels, or other information. A binary file either is a program you can run, or it must be interpreted by a specialized program (e. g., a picture encoded in the GIF format). Since binary files must usually be transmitted perfectly to be useful, they are often encoded so they can be sent by regular mail; we use uuencode and uudecode. Standard types of binary files are

	Archive -- A group of files packaged together
	Compressed -- File(s) with redundant information compressed
	Executable -- Program in machine language, ready to run
	Graphics -- Pixel data, colors, etc.
	PostScript -- Text and printer commands
	Sound -- Sound wave, patch, etc.

Figuring out what to do with a file.

It isn't always obvious that a file needs further processing before it can be used. The extensions to the filename often indicate what needs to be done, or what type of file it is. Process the file one extension at a time, starting with the last extension used.
For example, cubs.gif.uue.Z should be uncompressed first and then decoded.

If the file name has no extensions, look at it with more to see if it is a text file, or use the UNIX file command. A compiled binary looks like

   isles% file /bin/ls
   /bin/ls:        s800 shared executable
   isles% file /usr/local/bin/gcc
   /usr/local/bin/gcc:     s800 shared executable dynamically linked

The following table lists some common extensions:

Extension	Process by running

.arc		arc(PC)
.arj		arj(PC)
.exe		File name at DOS command line(may be self-extracting)
.gz		gzip(Unix,PC)
.hqx		BinHex (Mac)
.lhz		lharc(PC)
.sea		Run on Mac (self-extracting)
.sit		Stuffit(probably for Mac)
.tar		tar
.uue		uudecode
.Z		uncompress
.zip		pkunzip(PC)
.zoo		.zoo(may be PC) 

Preparing a binary file for mailing.

Binary files cannot be mailed as is because the mail program adds headers for the sender, routing, and other information to the file, making the binary file unusable. To encode the file, type uuencode followed by the name of the file to be encoded and the name that the file is to have when decoded; follow that with > and the name of the encoded file (by convention, this is the same as the name of the original file with the .uue extension added).

>uuencode cubs.gif cubs.gif > cubs.gif.uue
The extension .uue reminds the recipient that the file must be decoded with uudecode. Note that the encoded file is about 35% longer, since every three bytes of the original binary file are replaced with four bytes of ACSII characters plus some additional information.
> ls -l cubs.gif*
-rwxr-x-wx   1 agin     grads      11079 Apr  9  1993 cubs.gif
-rw-r--r--   1 agin     grads      15291 May  4 12:11 cubs.gif.uue

Restoring an encoded binary file.

To restore an encoded binary file, type uudecode < encoded_file_name. The uudecode program decodes the encoded file and gives it the filename that was specified when the file was originally encoded.

> uudecode < cubs.gif.uue

> ls -l cubs.gif
-rwxr-x-wx   1 agin     grads      11079 May  4 12:17 cubs.gif
-rw-r--r--   1 agin     grads      15291 May  4 12:11 cubs.gif.uue

Compressing a file.

Compressing files saves disk space. There are two UNIX file compression programs, compress and gzip. The latter is newer and better. To compress a file using gzip, type gzip filename. The compressed file will have the extension .gz. To get better compression use the flag --best
isles% ls -l alice.tex bob.tex
-rw-r--r--   1 charlie  faculty    13983 Dec 29 13:32 alice.tex
-rw-r--r--   1 charlie  faculty    13983 Apr 17 17:39 bob.tex
Same size. Compress one with gzip and the other with compress.
isles% gzip --best alice.tex
isles% compress bob.tex
And compare results.
isles% ls -l alice.tex* bob.tex*
-rw-r--r--   1 charlie  faculty     5605 Dec 29 13:32 alice.tex.gz
-rw-r--r--   1 charlie  faculty     6942 Apr 17 17:39 bob.tex.Z

For more options, see the man pages. You might find gzip --recursive useful. It compresses all the files in a directory and all its subdirectories.

Restoring a compressed file.

To uncompress a file, type gunzip filename(s)
isles% gunzip alice.tex bob.tex
uncompresses both formats, the old compress format (extension .Z) and the newer gzip format (extension .gz).

For more information, see the man pages.

Reading a compressed file without uncompressing it.

The gzcat command reads (.Z or .gz) files and writes their decompressed versions to standard output, leaving the files themselves unchanged. For example,

gzcat filename.Z | more
reads a compressed file without changing it.

Note that gzcat provides useful results only with regular text files. See Unpacking a tar file archive below, for another common use of gzcat.

Compressing and uncompressing files for a PC.

To compress and uncompress files for the PC, first identify the file compression program; .zip implies pkzip was used, .arc implies arc, and .lhz implies lharc.

To unzip a file, type pkunzip followed by the name of the compressed (.zip) file; e.g., pkunzip prog.zip. If the comand was successful, you should see something like

...
Searching ZIP: PROG.ZIP
  Exploding: PROG.DOC
  Exploding: PROG.EXE
  Exploding: PROG.READ.ME
The original prog.zip file is not altered.

To make your own .zip archive, type pkzip -a followed by the name of the archive file you wish to create, and then the names of files to be included in the archive.

>pkzip -a prog.zip prog*.*
will include all files whose names begin with prog.

You should see

...
  Creating ZIP: PROG.ZIP
    Adding: PROG.DOC      imploding (40%), done.
    Adding: PROG.EXE      imploding (42%), done.
"Exploding" and "imploding" are pkzip terminology for compression and uncompression, respectively.

To extract files from an .lzh archive, type lha e followed by the archive name (e for extract); e.g., to unpack the archive prog.lzh, type

>lha e prog
(The .lzh extension is optional.)

To create an archive, type lha a (a for archive) followed by the archive name and the names of files to be added to the archive.

Arc works the same as lharc; typing arc e prog.arc extracts files from the archive prog, and typing arc a prog followed by the files to be archived creates the archive prog.arc.

Compressing and uncompressing files for a Macintosh.

The two most popular programs for the Macintosh are BinHex and Stuffit. BinHex is an encoding and decoding program (similar to uuencode and uudecode) which turns binary files into text files so they can be transmitted.

Stuffit is a multipurpose compression program which handles files processed by BinHex and a variety of other methods. Stuffit Lite is available as shareware from ftp sites and bulletin boards; our local Gopher has Unstuffit available.

Assuming you have downloaded the Stuffit archive to your Mac, run Stuffit and choose Open Archive from the File menu. From the Open dialog box, choose the archive you want to work with; specify the file name for the unstuffed file or click on Unstuff or Unstuff All.

To create a Stuffit archive, run Stuffit and choose New Archive from the File menu. Fill in the name of the archive and where it is to reside in your file system.

Once you have created an archive, you may add files or remove them from the archive.

Packaging files with tar.

Packaging files with tar will create a tar archive from separate files. To create a tar archive, use the command tar -cvf (c for create, v for verbose explanation of its processing, and f means use the named file for the archive) and then the name of the archive followed by the files to be put into the archive.

For example, if you have the files homework1.txt and homework2.txt, you can package these in an archive called hwork.tar by typing

>tar -cvf hwork.tar *.txt
Since tar does no compression, you can then type
>gzip --best hwork.tar 
and the result will be a compressed file called hwork.tar.gz.

Note that tar does not remove the original files. It just places a copy of them in the archive.

Unpacking a tar file archive.

Before unpacking a tar file archive, first see what it does by listing the files in the archive using

isles% tar tvf auctex.tar
if the archive is not compressed, or
isles% gzcat auctex.tar | tar tvf -
if it is compressed. If you don't mind creating those files, then changing tvf to xvf will extract all the files, for example
isles% gzcat auctex.tar | tar xvf -

Creating or unpacking a shell archive (shar).

If you want to package a group of files but don't know what archiving facilities your recipient has available, you can use shar (shell archive program) since shar uses a language understood by standard UNIX shells. (It is useless if the recipient has a Macintosh or PC).

For example, to archive the files chap1, chap2, and chap3 into the archive called thesis, type

>shar chap? > thesis.shar
The wildcard ? matches the names chap1, chap2, and chap3. You may also type the names separately.

To see what a shar archive looks like, you can type

>more thesis.shar
To unpack a shar archive, type
>sh thesis.shar
Warning: This is dangerous with shar files that you did not create yourself. Anything could happen.


Back to the Help and FAQ Page

Back to the School of Statistics Home Page.


Originally written by Marilyn Agin. Last modified 4/17/98 by Charles Geyer (charlie@stat.umn.edu).