Top Document: Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions Previous Document: 0. Contents of General Graphics Format Questions Next Document: III. Working with Graphics Files on Usenet and the Internet See reader questions & answers on this topic! - Help others by sharing your knowledge ------------------------------ ubject: 0. Who cares about graphics file formats? Well, programmers do mostly. But end-users (that is, non-programmers) do as well. The typical end-user only cares about storing their graphics information using a format that most graphics programs and filters can read. End-users are typically not concerned with the internal arrangement of the data within the graphics file itself. They only want the format to do its job by representing their data correctly in a permanent form. Programmers, on the other hand, are that rare breed of human that just can't leave information well enough alone. They need to know how every byte is arranged to see if someone knows something that they don't (and often snicker contentedly to themselves when they find that it is really they that know more). Programmers will then use this information to write code that may never see the light of distribution, but nevertheless, they will have had fun and gained enlightenment from writing it. It doesn't matter which of these two types of people you are. If you have even the slightest interest in graphics file formats then you may be counted as one who cares. ------------------------------ ubject: 1. What is raster, vector, metafile, PDL, VRML, and so forth? These terms are used to classify the type of data a graphics file contains. Raster files (also called bitmapped files) contain graphics information described as pixels, such as photographic images. Vector files contain data described as mathematical equations and are typically used to store line art and CAD information. Metafiles are formats that may contain either raster or vector graphics data. Page Description Languages (PDL) are used to describe the layout of a printed page of graphics and text. Animation formats are usually collections of raster data that is displayed in a sequence. Multi-dimensional object formats store graphics data as a collection of objects (data and the code that manipulates it) that may be rendered (displayed) in a variety of perspectives. Virtual Reality Modeling Language (VRML) is a 3D, object-oriented language used for describing "virtual worlds" networked via the Internet and hyperlinked within the World Wide Web. Multimedia file formats are capable of storing any of the previously mentioned types of data, including sound and video information. ------------------------------ ubject: 2. Why should I care about previous versions of a file format? When version 2.0 of the XXX format is released all of the thousands of files created using version 1.0 of the XXX format don't magically disappear or transform to version 2.0 overnight. Although version 2.0 might claim to be fully backwards compatible, the new specification may obfuscate or even omit details of the previous version of the format. In short, never throw away older information just because you have something newer. At one point in time that "out dated" format spec was state-of-the-art, and it may still contain a singular precious tid-bit of information that the caretakers of the format didn't carry over to the new spec (but Murphy will make sure you desperately need to know). ------------------------------ ubject: 3. Can graphics files be infected with a virus? For most types of graphics file formats currently available the answer is "no". A virus (or worm, Trojan horse, and so forth) is fundamentally a collection of code (that is, a program) that contains instructions which are executed by a CPU. Most graphics files, however, contain only static data and no executable code. The code that reads, writes, and displays graphics data is found in translation and display programs and not in the graphics files themselves. If reading or writing a graphics file caused a system malfunction is it most likely the fault of the program reading the file and not of the graphics file data itself. With the introduction of multimedia we have seen new formats appear, and modifications to older formats made, that allow executable instructions to be stored within a file format. These instructions are used to direct multimedia applications to play sounds or music, prompt the user for information, or display other graphics and video information. And such multimedia display programs may perform these functions by interfacing with their environment via an API, or by direct interaction with the operating system. One might also imagine a truly object-oriented graphics file as containing the code required to read, write, and display itself. Once again, any catastrophes that result from using these multimedia application is most like the result of unfound bugs in the software and not some sinister instructions in the graphics file data. Such "logic bombs" are typically exorcised through the use of testing using a wide variety of different image files for test cases. If you have a virus scanning program that indicates a specific graphics file is infected by virus, then it is very possible that the file coincidentally contains a byte pattern that the scanning programming recognizes as a key byte signature identifying a virus. Contact the author (or even read the documentation!) of the virus scanning program to discuss the probability of the mis-identification of a clean file as being infected by a virus. Save the graphics file, as the author will most likely wish to examine it as well. If you suspect a graphics file to be at the heart of a virus problem you are experiencing, then also consider the possibility that the graphics file's transport mechanism (floppy disk, tape or shell archive file, compressed archive file, and so forth) might be the original source of the virus and not the graphics file itself. ------------------------------ ubject: 4. Can graphics files be encrypted? Of course you can encrypt a graphics file. After all, most encryption algorithms don't care about the intellectual content of a file. All they chew on is a series of byte values. Therefore, most any encryption program that works on ordinary text files will work on graphics files as well. Why would you want to encrypt a graphics file? Mostly to control who can view its contents. You can invent a proprietary file format and that might slow a file format hack down for, say, five or ten minutes. You could add a proprietary data compression scheme, possibly a twisted variation of an already public algorithm. But there are so many people out there with nothing better to do than hack at unknown data formats that your data would probably be exposed in little time. But suppose we top off all this effort by encrypting the graphics file itself as we would an ordinary text file. Would your data then be safe? Realize that an encrypted graphics file still might not be very secure. For every data encryption algorithm there exists at least one method of getting around it, although it may take hundreds of computers and many years to fully employ and execute that method! For example, one of the more popular methods used to encrypt data is the Vernam or XOR cipher. This cipher Exclusive ORs the plain-text data with a single, random, fixed-length key. The longer the key the harder it is to break the cipher. A totally random key the length of your data is impossible to break. Shorter and less-random keys are easier to break. XOR is very simple and fast, which is a must for a graphics file translators/viewers that must decrypt a file on the fly. A problem, however, is that most graphics files contain fixed size headers which vary only slightly in content from file to file. If you knew the approximate contents of the header of an encrypted file you could XOR a "decrypted" header with the encrypted file and possibly produce the key used to encrypt the file. A short key might be very easily discovered in this way. If you wish to use a public key/private key encryption method, then storing the public key in the file format header (usually as a 4-byte field) and only encrypting the image data would be the way to go. The SMPTE DPX file format supports such an encryption feature. If you really need to make the contents of a graphics file secure, then I'd suggest not only using some form of data encryption, but also create an unconventional and proprietary file format and do not publish its format specification. For more info on data encryption: Bruce Schneier, "Applied Cryptography: Protocols, Algorithms, and Source Code in C", John Wiley & Sons, 1994. ------------------------------ ubject: 5. How can I convert the XXX format to the YYY format? With a file conversion program, of course! Without a doubt one of the most frequently asked categories of questions on comp.graphics.misc is how to convert one format to another. In every case the answer is some type of conversion program or filter, but which one? Section IV of the FAQ is an attempt to list every known graphics file display and conversion program and application. Although far from complete, this list may contain the program you need. Go to the subsection of the particular operating system you are using and scan through Imports: and Exports: formats listed and see if the formats you needs to use are there. In some cases the information in a listing may make the conversion capabilities of a program a bit misleading. For example, a program that can import a vector .DWG file and export a raster .BMP file may not necessarily be able to perform a .DWG->.BMP (vector->raster) conversion (AutoCAD R12 can, BTW). And just because a program can both import and export TIFF files doesn't mean it's capable of a TIFF(CMYK)->TIFF(RGB) conversion (as Adobe Photoshop can do). As always, read the documentation, contact and ask the author of the program, or find a user of the program and ask them. ------------------------------ ubject: 6. Do I really need the specification of the format I'm using? It depends upon the results you are trying to obtain. If you have code that supports the XXX format and you find it easy (and legal) to integrate that code into your program, then you may be tempted to do so. But realize that your program will support the XXX format in just the same way as the previous program did. In other words, your program will now work the same, but it will really be no better. By obtaining the format specification you can make an attempt to fully support all of the features and capabilities a graphics or multimedia file format has to offer. If you use pre-written code that only supports a small subset of the format's features then you are not doing justice to the format and cheating your users out of functionality they might need. Always strive to create the best programs possible within reason of time and money. Obtain the specs, look at code, and talk to programmers who have worked with the format before. You might gain some insight and save yourself some hair-pulling by supporting a feature that someone didn't think to include in the original requirements for your program. ------------------------------ ubject: 7. How can I tell if a graphics file is corrupt? The easiest way is to display the file and decide if what you see on the screen or the printer is correct. This method is not fool-proof, however, because not all information stored in a graphics file is used for displaying the data it contains. Textual comments, alternate color maps, and unused fields in the header might be munged and go undetected. A frequent source of corruption occurs when 8-bit graphics data is transported via a 7-bit communications channel. The 8th bit of each byte is cleared (set to zero) and you are left with garbage. ASCII-mode file transfers may also translate carriage returns (0Dh) to line feeds (0Ah), or to CR/LF pairs depending upon if the file is being transferred to a Unix (LF-only), Macintosh (CR-only), or MS-DOS (CR/LF) system. The PNG file format supports an elegant solution to the quick detection of this type of corruption. The first character of every PNG file is the 8-bit value 89h. If this value is read as 09h, the 8th bit has been zeroed and you know the file is corrupt. Most graphics files do not contain any real, built-in error detection features. The standard way to check for corruption of any type of data file is to perform some sort of error-detection scheme on the file. Such schemes commonly used are Checksum calculations and the Cyclic Redundancy Check (CRC). These algorithms are commonly used in the world of synchronous serial communications for detecting errors in data streams. If you only wanted to provide error detection for the graphical data contained in a file, but not the header, then a 2- or 4-byte field in the header could be used to store the CRC-16 or CRC-32 value of the data. But what good is pure data if the header is possibly corrupt? If we calculate the CRC value of the entire file and then store that calculated value in the header we will have just "corrupted" the file! You could initialize the CRC field with zeros, calculate the value, store the value, and specify that the entire file need be read into memory and the CRC value field set to all zeros before the CRC calculation is made. File formats that segment their data into blocks or chunks would be able to perform a CRC on each section individually (another feature found in the PNG file format). Each section would store the CRC value as the last 2 or 4 bytes of the block and the CRC value field would never be read for the purpose of the CRC calculation. This method makes it easier to find the location of the error(s) in a file. If the CRC error occured in an unnecessary block of data, the file might still be useful anyway. This block-style CRC checking also saves the reader from performing a time-consuming CRC calculation an entire, possibly very large, graphics file. But all this can be quite a pain. Can't we avoid modifying a file and just store the CRC value externally to the file? Maybe using some sort of encapsulating "wrapper"? If you want to make sure a graphics file (or any file for that matter) has been transported through a communications channel without sustaining any corruption, first store it using a file archiving program that supports error checking of the files contained in the archive. (Several good error-checking file archiving programs include PKZIP, gzip, and zoo. The ar and tar Unix archiving programs do not support error checking). When the graphics file is stored, the archival program calculates the CRC value of the file. If the CRC value does not match the file's calculated CRC after it is unarchived after transport, you know that the file has been corrupted. Note: make sure you turn compression OFF when archiving many types of graphics files. File archival programs use compression by default and will attempt to make your compressed data even smaller (which usually results in larger data, unless the archiver is smart enough to detect the negative compression and not attempt to compress the file). ASCII-based files (such as PostScript and DXF) and some RLE-encoded files (such as PCX) will be compressed, while other formats supporting more advanced data compression methods (such as JPEG and LZW) will surely grow in size. ------------------------------ ubject: 8. What do I put in my own graphics file format specification? For people that are faced with the task of writing up a specification for their own format (or perhaps to better document someone else's), a few suggestions are hereby offered. A large spec needs a table of contents, bibliography, and an index. Most specs do not fall into this category though. On the cover sheet give the full information of your company, products associated with the format, the format version, date of release, where the latest copy of the spec may be obtained, and how developers may get in contact with you to ask questions. Detail the full history of the spec (including the difference between the current version and all previous versions) and not just the dates of its revision. Tell why the format was created. Detail some insights of how it was designed. Speculate on what features future version might contain. And give the names of your developers and other people involved. Show the human thought that exists behind the cold chunk of data that is your format. List the features of your format and explain how you intend that it should be used and not used (tell what your format is and is not). Give the developer your reasons that they should use your format (and why they should not bother with others). Include both block diagrams and ANSI C code examples of the format's internal data structures. Illustrate actual examples of ASCII file format data and hexadecimal dumps of binary format data (very useful to programmers, I might say). If your format includes one or more forms of data compression, error checking, encryption, etc., place this information in a separate section and give plenty of examples (both written and code) of how these algorithms work. Include mathematical formulas if you believe it makes your concepts clearer. Make the specification available both in hardcopy and electronic form. The hardcopy version should be formatted as a technical document and using a font that won't degrade badly when the spec is photocopied or faxed. Use a standard sized page layout so the spec isn't a hassle to fit in an envelope when mailed. The electronic version should be available as both ASCII text and PostScript files. Making the spec available in a word processing format (such as Microsoft Word or Framemaker) is nice, but not absolutely necessary. Consider making a developer's toolkit for your format. A collection of benchmark graphics files (one of each flavor of your format), and a parser written in ANSI C that reads and writes your format, is of tremendous help to programmers. Such a kit will allow developers to implement your format quickly in their products and helps minimize the chances of numerous software packages appearing which create graphics files that don't meet your spec. Examples of formats with toolkits include TIFF, TGA (Truevision), WordPerfect Graphics (WPG), and PNG. Submit your specification to every FTP/Gopher/WWW site and BBS that archives file format specs. Notify the maintainers of related FAQs (graphics, animation, multimedia, audio, medical, etc.) that your format exists and ask for a mention. Send your literature to graphics and imaging software companies to sell support of your format and/or software products. And a few guidelines on good technical writing in general: Write in a tutorial style with explanations and examples of your topics. Don't just give a terse, dictionary description of a topic which often leaves the readers confused and needing more. Write in simple terms. Don't assume your readers enjoy 70-word sentences, or have advanced degrees in mathematics or computer graphics. Have other people read and attempt to understand your spec. Don't assume that just because you understand what you've written that every reader will too. You, as the file format specification's author, understand the format inside and out. Your readers, however, do not. An explanation that may seem clear to you may be just another confusing paragraph to your readers. Write for a world-wide audience of programmers. Omit slang or regional expressions that a developer living on the other side of the planet might not understand. Programs that check spelling and grammar are our friends. Use them. Examples of some well-written format specs include: TGA, TIFF, PNG, EPSF, and PostScript. Some specs are written well, but contain so much extraneous information that they are quite complex and very tedious to read. Most government and military formats are in this group (for example, CALS, NITF, NAPLPS, IGES, GKS, and CGM). Format specs such as PCX, GIF, JFIF, and Sun Raster definitely fall into the "don't let this happen to you" catagory. User Contributions:Top Document: Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions Previous Document: 0. Contents of General Graphics Format Questions Next Document: III. Working with Graphics Files on Usenet and the Internet Part1 - Part2 - Part3 - Part4 - Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: jdm@ora.com (James D. Murray)
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: