Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

PDP-8 Frequently Asked Questions (posted every other month)
Section - What character sets does the PDP-8 support?

( Single Page )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Zip codes ]


Top Document: PDP-8 Frequently Asked Questions (posted every other month)
Previous Document: What does PDP-8 assembly language look like?
Next Document: What different PDP-8 models were made?
See reader questions & answers on this topic! - Help others by sharing your knowledge
From the beginning, PDP-8 software has generally assumed that textual
I/O would be in 7 bit ASCII.  Most early PDP-8 systems used teletypes
as console terminals; as sold by DEC, these were configured for mark
parity, so most older software assumes 7 bit ASCII, upper case only,
with the 8th bit set to 1.  On output, lines are generally terminated
with both CR and LF; on input, CR is typically (but not always) the
line terminator and LF is typically ignored.  In addition, the tab
character (HT) is generally allowed, but software support output of text
containing tabs varies.

One difficulty with much PDP-8 software is that it bypasses the device
handlers provided by the operating system and goes directly to the
device.  This results in very irregular device support, so that, for
example, control-S and control-Q work to start and stop output under
OS/8, but the OS/8 PAL assembler ignores them when reporting errors.

Most of the better engineered PDP-8 software tends to fold upper and
lower case on input, and it ignores the setting of the 8th bit.  Older
PDP-8 software will generally fail when presented with lower case
textual input (this includes essentially all OS/8 products prior to
OS/278 V1).

Internally, PDP-8 programmers are free to use other character sets, but
the "X notation provided by the assembler encourages use of 7 bit ASCII
with the 8th bit set to 1, and the TEXT pseudo-operation encourages the
6 bit character set called "stripped ASCII".  To map from upper-case-only
ASCII to stripped ASCII, each 8 bit character is anded with octal 77 and
then packed 2 characters per word, left to right.  Many programs use a
semi-standard scheme for packing mixed upper and lower case into 6 bit
TEXT form; this uses ^ to flip from upper to lower case or lower to
upper case, % to encode CR-LF pairs, and @ (octal 00) to mark end of
string.  Note that this scheme makes no provision for encoding the %,
^ and @ characters, nor does it allow control characters other than the
CR-LF pair.

The P?S/8 operating system supports a similar 6 bit text file format,
where upper and lower case are folded together, tabs are stored as _
(underline), end-of-line is represented by 00, padded with any
nonzero filler to a word boundary, and end of file is 0000.

Files under the widely used OS/8 system consist of sequences of 256 word
blocks.  When used for text, each block holds 384 bytes, packed 3 bytes
per pair of words as follows:

		aaaaaaaa		ccccaaaaaaaa
		bbbbbbbb		CCCCbbbbbbbb
		ccccCCCC

Control Z is used as an end of file marker.  Because most of the PDP-8
system software was originally developed for paper tape, binary object
code is typically stored in paper-tape image form using the above packing
scheme.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:




Top Document: PDP-8 Frequently Asked Questions (posted every other month)
Previous Document: What does PDP-8 assembly language look like?
Next Document: What different PDP-8 models were made?

Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
jones@cs.uiowa.edu (Douglas W. Jones)





Last Update March 27 2014 @ 02:11 PM