Tuesday, April 19, 2016

Programmers in MOBI

Sometime ago, I downloaded a PDF copy of Susan Lammers' book, "Programmers at work". Although it is possible to read PDF files on a Kindle, the result is generally unsatisfactory, which is why I decided to convert the file to MOBI format (the Kindle's native format). Unfortunately, the conversion was also not very satisfactory: formatting, especially bold text, comes out doubled and almost unreadable.

A few days ago, I wanted to read the chapters with Dan Bricklin and Bob Frankston, the writers of VisiCalc in order to see whether there was some background material suitable for my thesis. As reading the document was almost impossible, I decided to reformat the book (as I did a few months ago with an Ian Rankin novel). This time, I discovered some very useful tricks which greatly improved the format of the book; I want to share them here.

Chapters: we all know what a chapter is, but the conversion from PDF to MOBI (and thence to EPUB for editing) totally ignores them. The book was divided into nine arbitrary files, two of which contained metadata and seven the text. For a chapter to appear correctly in MOBI format, each chapter must be stored in its own file; the file should begin with metatext, which unfortunately I can't show here (as it doesn't display). This metatext does two things: it sets up an id statement which is needed for the table of contents and defines the style sheet (which enables formatting commands such as bold and italics). After editing, the book now contains twenty one files: two metadata (one is the table of contents) and nineteen containing text, one file per interviewee.

Search and replace - a very important function when the text is mangled - can be defined to work on all the files instead of only on the current file; this can save a great deal of time. There is also a command 'Beautify all files' which has no effect on the output, but can make the text files easier to read.

Once I completed the editing, I would be able to read the book properly, although I haven't actually done so yet. The little which I have read, however, is fascinating; one has to remember that the book is composed of interviews with leading computer programmers in 1985. Some of what they say is prophetic, but they completely missed several ideas. 

For example, there is an interview with John Warnock, who is the founder of Adobe Systems. Much of the interview is concerned with PostScript, which is (was?) a language for typesetting the output for laser printers. One of the main selling points of PostScript was that it was device independent. A few years later, Adobe would develop another language/product which also was device independent: the PDF document! There is no hint of this in Warnock's interview.

None of the interviewers mentioned the Internet, although several must had used an early version (the Arpanet) by the time of the interviews. The internet gave birth to another device independent markup language - HTML - which is more ubiquitous than PDF. Even the EPUB format for electronic books is basically HTML - this is how I edited the book.

The Internet also quashes another blossoming idea mentioned by several interviewees - the CD ROM. People talk about CD ROM encyclopedias and interactive games stored on a disk - again, superseded by the Internet. No one foresaw the disk on key or cheap storage.

Closer to where I am professionally, no one mentioned ERP systems or anything similar. There were a few database products mentioned, but no Oracle, no Postgres and no SQL Server. Only in the final chapters are Turbo Pascal and Windows mentioned.

No comments: