BerandaComputers and TechnologyEfficient text editing on a PDP-10

Efficient text editing on a PDP-10

I did not know into what rabbit hole I’d fall when I clicked on
last week’s Lazy Reading post
and discovered the link to SAILDART.
The linked e-book gives a good preview of what we can find there:
a complete archive of the Stanford AI project’s PDP-10 setup (a.k.a SAIL),
and large parts of it are open for public access!

Back in the early seventies, when American universities got the first
computers suitable for multi-user usage, it was common to develop
custom operating systems (or heavily modify the vendor’s one), as
research facilities had special needs, and also a different mindset
than commercial offerings.

I was mostly familiar with the Incompatible Timesharing System (ITS)
of MIT (written for a PDP-6, later on a PDP-10), due to its leakage
of concepts and culture into Lisp Machines and the GNU project.
And of course, Berkeley is known for developments around Unix in the late
seventies which then turned into BSD. However, I had only remotely heard of
WAITS, the operating system used at the Stanford Artificial
Intelligence Laboratory, which first ran on a PDP-6, and later on a PDP-10.
It was based on an early version of TOPS-10.

So I started to dig around in the SAILDART archives, and quickly found
Donald Knuth’s TeX working directory,
because this actually was the system TeX initially was developed on!
Not only the early TeX78 can be found there (look for the TEX*.SAI files),
but also the source code of TeX82,
written in literate Pascal, which essentially still is the core of
today’s TeX setups.

But not only that, we also can find the
source of the TeXbook
and parts of TAOCP.
(Looking at TeX code from its creator himself is very instructive, by the way.)

One thing that surprised me was that all these files were rather big
for the time;
while the TeX78 code was split up into 6 files, TeX82 was a 1 megabyte
file and the TeXbook a single 1.5 megabyte file. This makes sense
for redistribution of course, but there is no evidence the files were
not kept around as-is, which brought me to the central question of
this post:

How was it possible to efficiently edit files bigger than a megabyte on a PDP-10?

Remember that, at the time in question, these systems supported at most 4
megawords (the PDP-10 is a 36-bit machine, usually one packed 5 7-bit ASCII
characters into a word) main memory at most, and 262 kilowords per
process, so simply loading the file fully into memory was impossible. A
smarter approach was needed. Earlier editors on WAITS, such as
an even older editor), had
to write the file with the changes you made into a different output
file which was then moved to overwrite the original contents.
Of course, this had the disadvantage that saving big files was very slow,
as rewriting a megabyte file easily could take several tens of seconds.

The most popular editor of WAITS, called E
(manual as PDF,
had a better approach:
big text files were split into pages, separated by form feeds (^L),
and the editor could only load some of these pages into main memory.
It was recommended to keep pages less than 200 lines, which roughly are
3 kilowords.
Finally, E edited files in-place and only wrote out changed pages,
so fixing a single character typo, for example, just required a quick write.

In order to know where the pages start, E maintained a directory page,
which was the first page of a file, starting with COMMENT (click on
some links above to see an example) and
then the list of pages and their offset.
Thus, seeking to a specific page
was very quick, and the directory page doubled as a table of contents
for bigger files, which improved the user experience.

This directory page was part of the file. Compilers and tools had
to be adjusted to ignore it if needed (well, FORTRAN ignored lines
with C anyway…), but for example TeX
had modifications (see
“Reading the first line of a file”) to skip the directory page.

This sounded all plausible to me, until I realized that it would not
work for actual editing, because you of course not only overwrite
characters, but also insert a word or a line here or there when you
are working on a big program. E would still have to rewrite the whole
file after the insertion, I thought!

So I dug deeper and realized I had to rethink some implicit
assumptions I had from using Unix-like systems for the last 20 years.
On Unix, a file is a linear stream of bytes: you cannot insert data in
the middle without rewriting the file until the end.

However, WAITS used a record-based file system.
We can read read about it in the documentation on
UUOs (what a Unix user would
call syscalls):

A disk or dectape file consists of a series of 200 word records.
Often, these records are read (or written) sequentially from the
beginning of the file to the end. But sometimes one wishes to read
or alter only selected parts of a file. Three random access UUOs
are provided to allow the user to do exactly that. To do random
access input or output, you must specify which record you want to
reference next. On the disk, the records of a file are numbered
consecutively from 1 to n, where the file is n records long.

This means that a WAITS file is a sequence of records, not a sequence
of bytes. And if a record was not full, it could be re-written with more
content! Of course, if you inserted so much you actually needed to
insert a new record, the file needed to be rewritten. This was called
“bubbling”, and E also did it in-place. But for small edits, rewriting
the records that contained the changed pages was enough.

I think the record-oriented file system of WAITS was actually key to
support editing big files in this environment. Other systems at the
time did not support this as well: Unix ed loaded the whole file into
memory and wrote it out again, and Unix consists of many small files not
larger than 1000 lines. On ITS, the only bigger files I could find
were assembled from other inputs, or mail archives which were only
read or appended to, but not modified inside.

However, as having more memory got feasible, all these optimizations
became obsolete. All modern text editors load files directly into
memory and rewrite them when saving.

The other thing I found that amazed me was how much the E command set
influenced Emacs! Richard Stallman saw the E editor in action and
wanted a real-time screen editor for ITS as well, so their TECO got a
full screen mode. I think that Emacs’ choice of modifier keys
(Control, Meta, or both) and
things like prefix arguments are directly taken from E.
However, E was still fundamentally line-based and
only supported interactive editing of whole lines (reusing the system
line editor for performance reasons).
TECO was stream-oriented and then supported direct editing on the screen.

Digging through the SAILDART archives, and then looking into fragments
of ITS for comparison also showed interesting cultural differences:
WAITS used mechanisms for accounting disk space and CPU usage, and
projects had to be registered to be paid for (I have not heard of any
such features for ITS). WAITS requires logins with passwords from
remote connections (this was added to ITS very late). The ITS
documentation is full of slang and in-jokes. But not everything was
serious on WAITS: SPACEWAR was very important and there are references
to it all over the place.

There are many interesting things to be found in SAILDART,
I recommend you to look around for yourself.

If you got curious now, it’s actually possible to run WAITS on your
own machine! Grab SIMH and a
WAITS image and you can get it
running pretty easily. I recommend having a
Monitor Command Manual
(Also note that currently there is a bug in the SIMH repository which makes
graphical login impossible. I can vouch that commit c062c7589 works.)

Thanks go to Madeline Autumn-Rose, Richard Cornwell and Shreevatsa R for helpful comments. All errors in above text are mine, drop a mail if you have a correction.

NP: Bob Dylan—I Contain Multitudes

Read More



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments