One of the most basic distinctions among files is whether the file is a text file or a binary file. Text files contain readable text characters, and can be downloaded to a Mac or PC to be printed or imported into a word processing program. If you are not sure whether a file is text or binary, try typing "more" followed by the file name. If the file is readable, it's probably text; if it's just a jumble of characters, it's probably binary.
The individual bytes in a binary file represent program instructions, pixels, or other information. A binary file either is a program you can run, or it must be interpreted by a specialized program (e. g., a picture encoded in the GIF format). Since binary files must usually be transmitted perfectly to be useful, they are often encoded so they can be sent by regular mail; we use uuencode and uudecode. Standard types of binary files are
Archive -- A group of files packaged together Compressed -- File(s) with redundant information compressed Executable -- Program in machine language, ready to run Graphics -- Pixel data, colors, etc. PostScript -- Text and printer commands Sound -- Sound wave, patch, etc.
It isn't always obvious that a file needs further processing before it
can be used. The extensions to the filename often indicate what needs to be
done, or what type of file it is. Process the file one extension at a time,
starting with the last extension used.
For example, cubs.gif.uue.Z should be uncompressed first and then decoded.
If the file name has no extensions, look at it with more to see if it is a text file, or use the UNIX file command. A compiled binary looks like
isles% file /bin/ls /bin/ls: s800 shared executable isles% file /usr/local/bin/gcc /usr/local/bin/gcc: s800 shared executable dynamically linked
The following table lists some common extensions:
Extension Process by running .arc arc(PC) .arj arj(PC) .exe File name at DOS command line(may be self-extracting) .gz gzip(Unix,PC) .hqx BinHex (Mac) .lhz lharc(PC) .sea Run on Mac (self-extracting) .sit Stuffit(probably for Mac) .tar tar .uue uudecode .Z uncompress .zip pkunzip(PC) .zoo .zoo(may be PC)
Binary files cannot be mailed as is because the mail program adds headers
for the sender, routing, and other information to the file, making the binary
file unusable.
To encode the file, type uuencode followed by the name of the file to be
encoded and the name that the file is to have when decoded; follow that with
>
and the name of the encoded file (by convention, this is the same as the name
of the original file with the .uue extension added).
>uuencode cubs.gif cubs.gif > cubs.gif.uueThe extension .uue reminds the recipient that the file must be decoded with uudecode. Note that the encoded file is about 35% longer, since every three bytes of the original binary file are replaced with four bytes of ACSII characters plus some additional information.
> ls -l cubs.gif* -rwxr-x-wx 1 agin grads 11079 Apr 9 1993 cubs.gif -rw-r--r-- 1 agin grads 15291 May 4 12:11 cubs.gif.uue
To restore an encoded binary file, type
uudecode < encoded_file_name.
The uudecode program decodes the encoded file and gives it the filename that
was specified when the file was originally encoded.
> uudecode < cubs.gif.uue > ls -l cubs.gif -rwxr-x-wx 1 agin grads 11079 May 4 12:17 cubs.gif -rw-r--r-- 1 agin grads 15291 May 4 12:11 cubs.gif.uue
compress and gzip. The latter is
newer and better. To compress a file using gzip, type
gzip filename. The compressed file will have the extension
.gz. To get better compression use the flag --best
isles% ls -l alice.tex bob.tex -rw-r--r-- 1 charlie faculty 13983 Dec 29 13:32 alice.tex -rw-r--r-- 1 charlie faculty 13983 Apr 17 17:39 bob.texSame size. Compress one with
gzip and the other with
compress.
isles% gzip --best alice.tex isles% compress bob.texAnd compare results.
isles% ls -l alice.tex* bob.tex* -rw-r--r-- 1 charlie faculty 5605 Dec 29 13:32 alice.tex.gz -rw-r--r-- 1 charlie faculty 6942 Apr 17 17:39 bob.tex.Z
For more options, see the man pages. You might find
gzip --recursive useful. It compresses all the files
in a directory and all its subdirectories.
gunzip filename(s)
isles% gunzip alice.tex bob.texuncompresses both formats, the old compress format (extension
.Z)
and the newer gzip format (extension .gz).
For more information, see the man pages.
The gzcat command reads (.Z or .gz)
files and writes their decompressed versions to standard output, leaving
the files themselves unchanged. For example,
gzcat filename.Z | morereads a compressed file without changing it.
Note that gzcat provides useful results only with regular
text files. See Unpacking a tar file archive below,
for another common use of gzcat.
To compress and uncompress files for the PC, first identify the file compression program; .zip implies pkzip was used, .arc implies arc, and .lhz implies lharc.
To unzip a file, type pkunzip followed by the name of the compressed (.zip) file; e.g., pkunzip prog.zip. If the comand was successful, you should see something like
... Searching ZIP: PROG.ZIP Exploding: PROG.DOC Exploding: PROG.EXE Exploding: PROG.READ.METhe original prog.zip file is not altered.
To make your own .zip archive, type pkzip -a followed by the name of the archive file you wish to create, and then the names of files to be included in the archive.
>pkzip -a prog.zip prog*.*will include all files whose names begin with prog.
You should see
...
Creating ZIP: PROG.ZIP
Adding: PROG.DOC imploding (40%), done.
Adding: PROG.EXE imploding (42%), done.
"Exploding" and "imploding" are pkzip terminology for compression and
uncompression, respectively.
To extract files from an .lzh archive, type lha e followed by the archive name (e for extract); e.g., to unpack the archive prog.lzh, type
>lha e prog(The .lzh extension is optional.)
To create an archive, type lha a (a for archive) followed by the archive name and the names of files to be added to the archive.
Arc works the same as lharc; typing arc e prog.arc extracts files from the archive prog, and typing arc a prog followed by the files to be archived creates the archive prog.arc.
The two most popular programs for the Macintosh are BinHex and Stuffit. BinHex is an encoding and decoding program (similar to uuencode and uudecode) which turns binary files into text files so they can be transmitted.
Stuffit is a multipurpose compression program which handles files processed by BinHex and a variety of other methods. Stuffit Lite is available as shareware from ftp sites and bulletin boards; our local Gopher has Unstuffit available.
Assuming you have downloaded the Stuffit archive to your Mac, run Stuffit and choose Open Archive from the File menu. From the Open dialog box, choose the archive you want to work with; specify the file name for the unstuffed file or click on Unstuff or Unstuff All.
To create a Stuffit archive, run Stuffit and choose New Archive from the File menu. Fill in the name of the archive and where it is to reside in your file system.
Once you have created an archive, you may add files or remove them from the archive.
Packaging files with tar will create a tar archive from separate files.
To create a tar archive, use the command tar -cvf
(c for create, v for verbose explanation
of its processing, and f
means use the named file for the archive) and then
the name of the archive followed by the files to be put into the archive.
For example, if you have the files homework1.txt and homework2.txt, you can package these in an archive called hwork.tar by typing
>tar -cvf hwork.tar *.txtSince tar does no compression, you can then type
>gzip --best hwork.tarand the result will be a compressed file called
hwork.tar.gz.
Note that tar does not remove the original files. It just places a copy of them in the archive.
Before unpacking a tar file archive, first see what it does by listing the files in the archive using
isles% tar tvf auctex.tarif the archive is not compressed, or
isles% gzcat auctex.tar | tar tvf -if it is compressed. If you don't mind creating those files, then changing
tvf to xvf will extract all the
files, for example
isles% gzcat auctex.tar | tar xvf -
If you want to package a group of files but don't know what archiving facilities your recipient has available, you can use shar (shell archive program) since shar uses a language understood by standard UNIX shells. (It is useless if the recipient has a Macintosh or PC).
For example, to archive the files chap1, chap2, and chap3 into the archive called thesis, type
>shar chap? > thesis.sharThe wildcard
? matches the names chap1,
chap2, and chap3. You may also
type the names separately.
To see what a shar archive looks like, you can type
>more thesis.sharTo unpack a shar archive, type
>sh thesis.sharWarning: This is dangerous with shar files that you did not create yourself. Anything could happen.
Back to the School of Statistics Home Page.