Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions
Section - II. General Graphics File Questions

( Part1 - Part2 - Part3 - Part4 - Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Zip codes ]


Top Document: Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions
Previous Document: 0. Contents of General Graphics Format Questions
Next Document: III. Working with Graphics Files on Usenet and the Internet
See reader questions & answers on this topic! - Help others by sharing your knowledge
------------------------------

ubject: 0. Who cares about graphics file formats?

Well, programmers do mostly. But end-users (that is, non-programmers) do as
well.

The typical end-user only cares about storing their graphics information
using a format that most graphics programs and filters can read. End-users
are typically not concerned with the internal arrangement of the data
within the graphics file itself. They only want the format to do its job
by representing their data correctly in a permanent form.

Programmers, on the other hand, are that rare breed of human that just
can't leave information well enough alone. They need to know how every
byte is arranged to see if someone knows something that they don't (and
often snicker contentedly to themselves when they find that it is really
they that know more). Programmers will then use this information to write
code that may never see the light of distribution, but nevertheless, they
will have had fun and gained enlightenment from writing it.

It doesn't matter which of these two types of people you are. If you have
even the slightest interest in graphics file formats then you may be
counted as one who cares.

------------------------------

ubject: 1. What is raster, vector, metafile, PDL, VRML, and so forth?

These terms are used to classify the type of data a graphics file contains.
Raster files (also called bitmapped files) contain graphics information
described as pixels, such as photographic images. Vector files contain
data described as mathematical equations and are typically used to store
line art and CAD information. Metafiles are formats that may contain
either raster or vector graphics data. Page Description Languages (PDL)
are used to describe the layout of a printed page of graphics and text.
Animation formats are usually collections of raster data that is displayed
in a sequence. Multi-dimensional object formats store graphics data as a
collection of objects (data and the code that manipulates it) that may be
rendered (displayed) in a variety of perspectives. Virtual Reality
Modeling Language (VRML) is a 3D, object-oriented language used for
describing "virtual worlds"  networked via the Internet and hyperlinked
within the World Wide Web. Multimedia file formats are capable of storing
any of the previously mentioned types of data, including sound and video
information. 

------------------------------

ubject: 2. Why should I care about previous versions of a file format?

When version 2.0 of the XXX format is released all of the thousands of
files created using version 1.0 of the XXX format don't magically
disappear or transform to version 2.0 overnight. Although version 2.0
might claim to be fully backwards compatible, the new specification may
obfuscate or even omit details of the previous version of the format. In
short, never throw away older information just because you have something
newer. At one point in time that "out dated" format spec was
state-of-the-art, and it may still contain a singular precious tid-bit of
information that the caretakers of the format didn't carry over to the new
spec (but Murphy will make sure you desperately need to know).

------------------------------

ubject: 3. Can graphics files be infected with a virus?

For most types of graphics file formats currently available the answer is
"no". A virus (or worm, Trojan horse, and so forth) is fundamentally a
collection of code (that is, a program) that contains instructions which
are executed by a CPU. Most graphics files, however, contain only static
data and no executable code. The code that reads, writes, and displays
graphics data is found in translation and display programs and not in the
graphics files themselves. If reading or writing a graphics file caused a
system malfunction is it most likely the fault of the program reading the
file and not of the graphics file data itself.

With the introduction of multimedia we have seen new formats appear, and
modifications to older formats made, that allow executable instructions to
be stored within a file format. These instructions are used to direct
multimedia applications to play sounds or music, prompt the user for
information, or display other graphics and video information. And such
multimedia display programs may perform these functions by interfacing
with their environment via an API, or by direct interaction with the
operating system. One might also imagine a truly object-oriented graphics
file as containing the code required to read, write, and display itself.

Once again, any catastrophes that result from using these multimedia
application is most like the result of unfound bugs in the software and
not some sinister instructions in the graphics file data. Such "logic
bombs" are typically exorcised through the use of testing using a wide
variety of different image files for test cases.

If you have a virus scanning program that indicates a specific graphics
file is infected by virus, then it is very possible that the file
coincidentally contains a byte pattern that the scanning programming
recognizes as a key byte signature identifying a virus. Contact the author
(or even read the documentation!) of the virus scanning program to discuss
the probability of the mis-identification of a clean file as being
infected by a virus. Save the graphics file, as the author will most
likely wish to examine it as well.

If you suspect a graphics file to be at the heart of a virus problem you
are experiencing, then also consider the possibility that the graphics
file's transport mechanism (floppy disk, tape or shell archive file,
compressed archive file, and so forth) might be the original source of the
virus and not the graphics file itself.

------------------------------

ubject: 4. Can graphics files be encrypted?

Of course you can encrypt a graphics file. After all, most encryption
algorithms don't care about the intellectual content of a file. All they
chew on is a series of byte values. Therefore, most any encryption program
that works on ordinary text files will work on graphics files as well.

Why would you want to encrypt a graphics file? Mostly to control who can
view its contents. You can invent a proprietary file format and that might
slow a file format hack down for, say, five or ten minutes. You could add
a proprietary data compression scheme, possibly a twisted variation of an
already public algorithm. But there are so many people out there with
nothing better to do than hack at unknown data formats that your data
would probably be exposed in little time. But suppose we top off all this
effort by encrypting the graphics file itself as we would an ordinary text
file. Would your data then be safe?

Realize that an encrypted graphics file still might not be very secure.
For every data encryption algorithm there exists at least one method of
getting around it, although it may take hundreds of computers and many
years to fully employ and execute that method!

For example, one of the more popular methods used to encrypt data is the
Vernam or XOR cipher. This cipher Exclusive ORs the plain-text data with a
single, random, fixed-length key. The longer the key the harder it is to
break the cipher. A totally random key the length of your data is
impossible to break. Shorter and less-random keys are easier to break.

XOR is very simple and fast, which is a must for a graphics file
translators/viewers that must decrypt a file on the fly. A problem,
however, is that most graphics files contain fixed size headers which vary
only slightly in content from file to file. If you knew the approximate
contents of the header of an encrypted file you could XOR a "decrypted"
header with the encrypted file and possibly produce the key used to
encrypt the file. A short key might be very easily discovered in this way.

If you wish to use a public key/private key encryption method, then
storing the public key in the file format header (usually as a 4-byte
field) and only encrypting the image data would be the way to go. The
SMPTE DPX file format supports such an encryption feature.

If you really need to make the contents of a graphics file secure, then
I'd suggest not only using some form of data encryption, but also create
an unconventional and proprietary file format and do not publish its
format specification.

For more info on data encryption:

  Bruce Schneier, "Applied Cryptography: Protocols, Algorithms,
    and Source Code in C", John Wiley & Sons, 1994.

------------------------------

ubject: 5. How can I convert the XXX format to the YYY format?

With a file conversion program, of course! Without a doubt one of the
most frequently asked categories of questions on comp.graphics.misc
is how to convert one format to another. In every case the answer is
some type of conversion program or filter, but which one?

Section IV of the FAQ is an attempt to list every known graphics file
display and conversion program and application. Although far from
complete, this list may contain the program you need. Go to the
subsection of the particular operating system you are using and scan
through Imports: and Exports: formats listed and see if the formats
you needs to use are there.

In some cases the information in a listing may make the conversion
capabilities of a program a bit misleading. For example, a program
that can import a vector .DWG file and export a raster .BMP file may
not necessarily be able to perform a .DWG->.BMP (vector->raster)
conversion (AutoCAD R12 can, BTW). And just because a program can
both import and export TIFF files doesn't mean it's capable of a
TIFF(CMYK)->TIFF(RGB) conversion (as Adobe Photoshop can do). As
always, read the documentation, contact and ask the author of the
program, or find a user of the program and ask them.

------------------------------

ubject: 6. Do I really need the specification of the format I'm using?

It depends upon the results you are trying to obtain. If you have code
that supports the XXX format and you find it easy (and legal) to integrate
that code into your program, then you may be tempted to do so. But realize
that your program will support the XXX format in just the same way as the
previous program did. In other words, your program will now work the same,
but it will really be no better.

By obtaining the format specification you can make an attempt to fully
support all of the features and capabilities a graphics or multimedia file
format has to offer. If you use pre-written code that only supports a
small subset of the format's features then you are not doing justice to
the format and cheating your users out of functionality they might need.

Always strive to create the best programs possible within reason of time
and money. Obtain the specs, look at code, and talk to programmers who
have worked with the format before. You might gain some insight and save
yourself some hair-pulling by supporting a feature that someone didn't
think to include in the original requirements for your program.

------------------------------

ubject: 7. How can I tell if a graphics file is corrupt?

The easiest way is to display the file and decide if what you see on the
screen or the printer is correct. This method is not fool-proof, however,
because not all information stored in a graphics file is used for
displaying the data it contains. Textual comments, alternate color maps,
and unused fields in the header might be munged and go undetected.

A frequent source of corruption occurs when 8-bit graphics data is
transported via a 7-bit communications channel. The 8th bit of each byte
is cleared (set to zero) and you are left with garbage. ASCII-mode file
transfers may also translate carriage returns (0Dh) to line feeds (0Ah),
or to CR/LF pairs depending upon if the file is being transferred to a
Unix (LF-only), Macintosh (CR-only), or MS-DOS (CR/LF) system.

The PNG file format supports an elegant solution to the quick detection of
this type of corruption. The first character of every PNG file is the
8-bit value 89h. If this value is read as 09h, the 8th bit has been zeroed
and you know the file is corrupt.

Most graphics files do not contain any real, built-in error detection
features. The standard way to check for corruption of any type of data
file is to perform some sort of error-detection scheme on the file. Such
schemes commonly used are Checksum calculations and the Cyclic Redundancy
Check (CRC).  These algorithms are commonly used in the world of
synchronous serial communications for detecting errors in data streams.

If you only wanted to provide error detection for the graphical data
contained in a file, but not the header, then a 2- or 4-byte field in the
header could be used to store the CRC-16 or CRC-32 value of the data. But
what good is pure data if the header is possibly corrupt? 

If we calculate the CRC value of the entire file and then store that
calculated value in the header we will have just "corrupted" the file! You
could initialize the CRC field with zeros, calculate the value, store the
value, and specify that the entire file need be read into memory and the
CRC value field set to all zeros before the CRC calculation is made. 

File formats that segment their data into blocks or chunks would be able
to perform a CRC on each section individually (another feature found in
the PNG file format). Each section would store the CRC value as the last 2
or 4 bytes of the block and the CRC value field would never be read for
the purpose of the CRC calculation. This method makes it easier to find
the location of the error(s) in a file. If the CRC error occured in an
unnecessary block of data, the file might still be useful anyway. This
block-style CRC checking also saves the reader from performing a
time-consuming CRC calculation an entire, possibly very large, graphics
file.

But all this can be quite a pain. Can't we avoid modifying a file and just
store the CRC value externally to the file? Maybe using some sort of
encapsulating "wrapper"?

If you want to make sure a graphics file (or any file for that matter) has
been transported through a communications channel without sustaining any
corruption, first store it using a file archiving program that supports
error checking of the files contained in the archive. (Several good
error-checking file archiving programs include PKZIP, gzip, and zoo. The
ar and tar Unix archiving programs do not support error checking). When
the graphics file is stored, the archival program calculates the CRC value
of the file. If the CRC value does not match the file's calculated CRC
after it is unarchived after transport, you know that the file has been
corrupted.

Note: make sure you turn compression OFF when archiving many types of
graphics files. File archival programs use compression by default and will
attempt to make your compressed data even smaller (which usually results
in larger data, unless the archiver is smart enough to detect the negative
compression and not attempt to compress the file). ASCII-based files (such
as PostScript and DXF) and some RLE-encoded files (such as PCX) will be
compressed, while other formats supporting more advanced data compression
methods (such as JPEG and LZW) will surely grow in size.

------------------------------

ubject: 8. What do I put in my own graphics file format specification?

For people that are faced with the task of writing up a specification for
their own format (or perhaps to better document someone else's), a few
suggestions are hereby offered.

  A large spec needs a table of contents, bibliography, and an index.
  Most specs do not fall into this category though.

  On the cover sheet give the full information of your company, products
  associated with the format, the format version, date of release, 
  where the latest copy of the spec may be obtained, and how developers
  may get in contact with you to ask questions.

  Detail the full history of the spec (including the difference between the
  current version and all previous versions) and not just the dates of its
  revision. Tell why the format was created. Detail some insights of
  how it was designed. Speculate on what features future version might
  contain. And give the names of your developers and other people
  involved. Show the human thought that exists behind the cold chunk of
  data that is your format.

  List the features of your format and explain how you intend that it
  should be used and not used (tell what your format is and is not).
  Give the developer your reasons that they should use your format (and
  why they should not bother with others).
  
  Include both block diagrams and ANSI C code examples of the format's
  internal data structures. Illustrate actual examples of ASCII file
  format data and hexadecimal dumps of binary format data (very useful
  to programmers, I might say).

  If your format includes one or more forms of data compression, error
  checking, encryption, etc., place this information in a separate
  section and give plenty of examples (both written and code) of how
  these algorithms work. Include mathematical formulas if you believe it
  makes your concepts clearer.

  Make the specification available both in hardcopy and electronic
  form. The hardcopy version should be formatted as a technical
  document and using a font that won't degrade badly when the spec is
  photocopied or faxed. Use a standard sized page layout so the spec
  isn't a hassle to fit in an envelope when mailed. The electronic
  version should be available as both ASCII text and PostScript files.
  Making the spec available in a word processing format (such as
  Microsoft Word or Framemaker) is nice, but not absolutely necessary.

  Consider making a developer's toolkit for your format. A collection
  of benchmark graphics files (one of each flavor of your format), and
  a parser written in ANSI C that reads and writes your format, is of
  tremendous help to programmers. Such a kit will allow developers to
  implement your format quickly in their products and helps minimize
  the chances of numerous software packages appearing which create
  graphics files that don't meet your spec. Examples of formats with
  toolkits include TIFF, TGA (Truevision), WordPerfect Graphics (WPG),
  and PNG.

  Submit your specification to every FTP/Gopher/WWW site and BBS that
  archives file format specs. Notify the maintainers of related FAQs
  (graphics, animation, multimedia, audio, medical, etc.) that your
  format exists and ask for a mention. Send your literature to graphics
  and imaging software companies to sell support of your format and/or
  software products.

And a few guidelines on good technical writing in general:

  Write in a tutorial style with explanations and examples of your
  topics. Don't just give a terse, dictionary description of a topic
  which often leaves the readers confused and needing more.

  Write in simple terms. Don't assume your readers enjoy 70-word
  sentences, or have advanced degrees in mathematics or computer
  graphics. 

  Have other people read and attempt to understand your spec. Don't
  assume that just because you understand what you've written that
  every reader will too. You, as the file format specification's
  author, understand the format inside and out. Your readers, however,
  do not. An explanation that may seem clear to you may be just
  another confusing paragraph to your readers.

  Write for a world-wide audience of programmers. Omit slang or regional
  expressions that a developer living on the other side of the planet
  might not understand.

  Programs that check spelling and grammar are our friends. Use them.

Examples of some well-written format specs include: TGA, TIFF, PNG, EPSF,
and PostScript. Some specs are written well, but contain so much
extraneous information that they are quite complex and very tedious to
read. Most government and military formats are in this group (for example,
CALS, NITF, NAPLPS, IGES, GKS, and CGM). Format specs such as PCX, GIF,
JFIF, and Sun Raster definitely fall into the "don't let this happen to
you" catagory.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:




Top Document: Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions
Previous Document: 0. Contents of General Graphics Format Questions
Next Document: III. Working with Graphics Files on Usenet and the Internet

Part1 - Part2 - Part3 - Part4 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jdm@ora.com (James D. Murray)





Last Update March 27 2014 @ 02:11 PM