Behind the Scenes at the WSJ Interactive EditionBy Liora Alschuler04/01/1997 The Seybold Report on Internet Publishing (COPYRIGHT 1997 Seybold Publications Inc.) Copyright 1997 Information Access Company. All rights reserved.
Visiting the Dow Jones offices where editors create the Interactive Edition of the Wall Street Journal, we found a good example of how to build a contemporary editorial system for online news publishing. Mixing off-the-shelf and home-grown components, the system shows how structured markup and WYSIWYG text editing can fit together in the automation of an online newspaper. Perhaps history's lead for the Wall Street Journal Interactive Edition will always be that it was the first online paper to give nothing away free, but when we visited its offices recently we found more of interest than just the price of admission. Behind the toll gate, Dow Jones has built an editorial production system that is easy to use, gives a high degree of control to writers and editors, carries over to the little flickering screen a good measure of the elegance of print typography, and builds a war chest of reusable content that will last long after the landfills have turned the paper's folios into a reusable substance. The Interactive Journal, as its staff call it, has been shaped by two forces: the print paper and the long history of online publishing by Dow Jones. On the one hand is the legacy of the print edition, with its delicate type and high (some say highest) premium on the quality of the written word. The retro refusal to go the way of USA Today must strike a responsive chord in readers-the paper was one of few major dailies to increase its circulation in 1996. On the other hand, Dow Jones Newswires puts a premium on instant delivery of data, with little regard for presentation and a minimum of editorial intervention. The crew that started the Interactive Journal has roots in both the paper and online database publishing. One of the key, early decisions was to send Alan
Karben, then a graduate student, to the GCA's
annual sgml conference in 1993. On his first business trip,
Karben, in his words, "got both the practical and the
religious sides of SGML." On his return, he found a receptive
audience. Neil Budde, editor of the Interactive Journal, had
come to the project from Dow Jones News Retrieval. Budde
immediately saw the benefit of searching for a byline inside
Given this mandate, Karben, who now works for Dow Jones
full time, created an innovative, sleek editorial system, one
that designers of Web publishing tools and systems would do
well to examine.
As readers of Seybold publications are aware, the bedrock
of sgml is the separation of content and structure from the
codes that specify format and representation. It is a
separation not often made in newspapers. While the Interactive
Journal editorial system leverages this in many ways-and the
staff received a payback for it almost immediately-the
customized version of Word renders a nearly wysiwyg version of
how the story will look online. The composition end of the
system uses article metadata such as placement and article
type to impose hundreds of variations in style. Writers can
preview how the piece will look in any section of the paper at
any time before filing it. The end result is that the
structured markup with layered templates and style sheets
renders a greater control over the final html than would be
possible working in html directly.
A look at the product
From the outset, Budde and Jaroslovsky never intended to
duplicate the print product. Their objective was to use the
online medium to bring the paper's editorial excellence and
breadth of coverage to a new audience and to enhance and
expand their coverage in ways appropriate to the new medium.
They wanted to create an online newspaper, not a library of
articles, so the look and feel of the Interactive Edition has
been a primary concern from the beginning. They have demanded
that, to the extent possible, the visual presentation not be
sacrificed for the ease of batch composition.
Starting in mid-1993, the team spent about a year of
planning using proprietary client software. In early 1995, the
group shifted its plans to the Web while work on the editorial
and archival system continued uninterrupted. There was some
temptation at the time to move the entire project to html, but
they maintained their belief in the long-term advantages of
sgml. A prototype publication, called Money and Investing
Update, focused primarily on breaking business news and
updated market information, was launched in July 1995, and the
full Interactive Journal was inaugurated on April 29, 1996.
Standard pages, with personal options. Online subscribers
open the daily editions at the front page, where the familiar
"What's News" summaries are hyperlinked to the full stories.
Each section-Front, Marketplace, Money, Sports-has its own
"front page" with submenu and summaries linked to associated
articles.
Often a summary can show up on several pages, with each one
linked to a single master version of the full article. A
simple, hyperlinked table of contents gives an overview of the
paper.
The "interactive" portion of the title refers to the
Personal Journal, the Portfolio, the online discussion groups
and other customizable areas of the paper. In the Personal
Journal, the subscriber can set up a profile of stories ranked
by interest according to key words, company names and Journal
features. Selecting the Personal Journal displays the list of
recent articles that match the profile. Note that articles
come from both Dow Jones and the Wall Street Journal,
including its European and Asia editions, as well as from
special Interactive Journal features.
Subscribers also have the option of setting up a portfolio
that tracks and reports on up to 30 stocks and mutual funds.
Other sections of the paper come directly from data feeds. In
addition to a 14-day text-search archive of Journal and Dow
Jones stories, subscribers to the Interactive Journal get
access to the Dow Jones News Retrieval database, although
direct links between the Journal articles and the database are
limited.
24 hours, 365 days, 20 markets. Unlike print editions, for
which writers and editors have one, maybe two deadlines a day,
the Interactive Journal is in a near-constant state of
renewal. It goes through a complete roll-over in the early
hours of the morning, when the front page banner date changes
and multiple stories are swapped in and out. But this new
"edition" is never static. There is an ongoing rolling in of
content, as third-shift editors insert news on the Far Eastern
markets, which are in full swing while New York sleeps. News
that breaks in the morning that won't appear on newsstands for
24 hours is brought online as quickly as possible.
While some online papers shy away from scooping their paper
counterpart, the Interactive Journal is delighted to get the
news out fast. At times, the Interactive Journal will develop
and publish a story while the paper reporters are still hours
away from filing parallel stories. Once the print story is
complete, the Interactive Journal will replace its original
coverage with the later story.
The Interactive Journal is an international service, with a
staff of about 40 reporting and updating stories on 20 global
markets every day. While the print Journal draws from the same
sources, its news hole for global markets is only 20-25 inches
and is localized for each edition; print subscribers rarely
see the full scope of coverage.
System design
There are three primary components to the Interactive
Journal's editorial system:
* Microsoft Word customized with macros, templates and
keyboard shortcuts for assigning stories and summaries to
sections of the publication;
* Edition Maintenance, a database application that keeps
track of all of the pieces associated with each day's edition
and positions stories and summaries in each edition.
* A series of conversion routines that take the rtf through
two styles of sgml, parse it, archive the sgml and apply html
formatting to create the final output sent to the Web server.
In addition to these components, there is an underlying
database, called Copy Flow, that tracks slug, story type,
section desk and revision times and manages
check-in/check-out. Copy Flow was designed under the auspices
of the Dow Jones Global News Management System Team, which
includes integrator EDS, for the print edition, but the
Interactive Journal is the first group to use it in
production. (The print publication staff will make the
migration at some point, when Dow Jones replaces the current
IMOS and CSI systems with the new one under development.)
Structured editing with feedback. Writers and editors work
in a heavily customized, keyboard-friendly version of
Microsoft Word. They use templates with paragraph and in-line
styles named for the type of content they contain, and they
make limited use of hidden text for layout instructions. Users
can show or hide tags.
The word processor is optimized for fast keyboard entry of
precise, structured markup. To link to a profile of a company
mentioned in text, for example, the writer highlights the name
in text and uses a keystroke combination or a toolbar icon to
invoke the Link To Snapshot dialog. The writer inserts the
ticker symbol (if there is one) and accepts or changes the
significance ranking, which can determine the relevance
ranking a particular story receives in a search. A similar
dialog speeds byline entry. Writers can link their articles to
other current stories, archived stories, urls or other points
within the current story. Comments between writers and editors
are stripped out before the story is archived and published.
Story metadata are entered into a Document Attributes
dialog that classifies the story and captures the information
required for routing it into a user's Personal Journal. As the
writer selects categories, starting with Section on the left,
Page and Type are populated with defaults. The writer can
accept, augment or override these defaults. Writers who know
the two-letter industry codes for their beat can use the
type-in Rapid Industry Entry field. This information is
exported with the rtf within the Summary Info carried by all
Word documents.
Writers are more likely to add information (Industry Type)
that is not part of their narrative if it has a direct bearing
on the usability of the story. Links to company profiles,
which are prominent in the online edition, are used
frequently, while glossary entries, which won't be implemented
until browsers have an easy way to do popup windows, are
largely ignored.
The on-screen format in Word mimics the look of the final
html pages. Karben has supplied an additional element of
feedback reminiscent of Passage Systems' Passage Pro. Writers
can click a preview button and get a very close simulation of
how the story will look in a browser, with live icons and
links between articles and summaries.
Tables, according to Karben, were the trickiest part of
converting to sgml. The rtf table model is a limiting factor,
but Karben has given writers the ability to put a paragraph or
series of paragraphs within a cell, including soft returns,
pictures and pictures with captions. A true sgml editor might
provide additional capability, such as nested lists, but given
the Journal's limited use of tables, so far Word has been
sufficient.
Placing stories. When the author is done with a piece, it
is exported to Edition Maintenance, where it shows up on the
list of Ready to Place articles. Edition Maintenance consists
of a Visual Basic front end on top of a UniSQL database
server. A slot editor then places the article in the Working
Edition according to section, page, and column. Placing the
story is a drag-and-drop operation onto the hierarchical tree
of the Working Edition. The slot editor can preview the
Working Edition and prepublish an individual page, section or
the entire edition.
When the slot editor is ready to update the online edition,
he publishes it and it becomes the current online version.
Once a day, when printed Journal content is published for the
first time, the slot editor works from the print lineup.
During most hours, the slot editor works directly in Edition
Maintenance. In the evening, when a huge mass of material is
moving through the system, a news assistant has physical
control of placement.
SGML in two steps. Dragging a story from the Ready-to-Place
list to the hierarchical edition structure triggers the first
conversion from rtf to a simple sgml document type dubbed Dow
Jones Markup Language or DJML-Lo. Parsing errors detected
during the OmniMark conversion are reported to the editor.
Karben uses conversion software from OmniMark Technologies to
rewrite some error messages for nontechnical users. (Some
contain his pager number.) In all cases, the parser returns
the line of text that caused the error.
* * *
Sample parse error (Courtesy Alan Karben)
One line of this article did not translate into "Valid"
DJML.
In this part of the document, you are not allowed to have a
BREAK element.
* * *
Immediately following the conversion to DJML-Lo is a second
conversion to DJML-Hy (for HyTime, the ISO standard for sgml
linking and entity management.) DJML-Hy substitutes entities
(e.g., %plus;) for special characters (+) and ids for path
names and removes all tag minimization. Karben chose HyTime
conventions because they describe one-to-many links and make
it easier to manage entities. The subobjects in the Edition
Maintenance application are taken directly from the sgml
entity files.
Converting and parsing a document takes about 10 seconds on
the Sun Solaris server. The most common error is the
accidental deletion of a portion of the hidden text making it
impossible to supply a complete set of tags during conversion.
Karben reports that about three to four parse errors occur in
a typical 24-hour period and that most can be handled
routinely by the slot editor.
One change Karben would make if he were redesigning the
system today would be to invoke this conversion on the move
into Ready To Place instead of during placement in the
edition. Earlier conversion would ensure that the original
author was on hand to correct the error and would speed up the
placement process. Karben noted that the general design
principle ought to be to convert to sgml as soon as possible
and leave it as late as possible.
Karben said he was "looking for what was neat in the
logical sense and was not jaded by what was available in
current tools." He was confident that he could build anything
required to "make the data do wonders." He hopes that the xml
standard will encourage the creation of new tool sets that
take advantage, as he has, of sgml/HyTime linking mechanisms.
(For more on XML, see story on page 3.)
The resulting sgml archive, stored in the UniSQL database,
contains the DJML-Hy markup but excludes the temporal pieces
of the story such as requests for comments and pointers to
online discussions. While most of the paper's pages are placed
on the Web server as html, the individual views of the paper,
such as the Personal Journal, are created dynamically from the
sgml whenever the user invokes the custom pages.
On the preprocessing, authoring side, the conversion is
done using programming tools from Omnimark Technologies. On
the searching and Personal Journal side, Karben uses the set
of Perl libraries put into the public domain by David
Megginson of the University of Ottawa
(www.uottawa.ca/~dmeggins). These libraries use nsgmls, the
binary output of the SP parser from James Clark (www.
jclark.com). The implementation required an enhancement in SP
for HyTime entity management, which Clark provided. The
individual views of the paper, such as the Personal Journal,
are created dynamically from the sgml whenever the user
invokes the custom pages.
DJML does not use the News Industry Transfer Format (nitf)
document type created by the International Press
Telecommunications Council (www. iptc.org/iptc/). Nitf was
designed for transmission between news agencies and as such
has no element names for article, page and section. Karben
believes that as long as Dow Jones can translate to and from
the industry-standard document type, it loses nothing by using
its own document type definition, created with help from sgml
consulting firm Martin Hensel Corporation.
At present, the online editorial system loses all coding
from the print side and from the wire services. When the wire
services implement an sgml header, Karben's group will be able
to take direct advantage of this, yet still add its own tags.
It would be a tremendous advantage to know bylines, company
names and headlines instead of using macros to make a best
guess, which is what they are forced to do now. Karben's
advice to developers is think of their own content and to take
advantage of the great translation tools available and the
ease of translation between different forms of sgml. Ideally,
they should capture at the time of creation everything needed
to describe all the content of the article.
Turning words into home pages
With richly tagged sgml files as his source, Karben creates
html with the use of down-translation scripts, also written in
OmniMark. Rendering from an sgml source yields consistent html
formatting without tedious hand tagging or complex
manipulations. The translation software, written by Karben,
knows the page, article and intended placement and applies one
of ten common templates. Each template has html boilerplate
that supplies header, footer, gifs and other standard page
features. In this way, the editorial system renders 350
distinct article types, all within an easily controlled
stylistic vocabulary. On an editor's preview screen, a single
summary is rendered three different ways according to
placement.
A single template can render a wide variety of styles
contoured to fit the specific section and page. Each template
has variables for section name, color scheme and other
distinctive features. The conversion program pulls the
context-appropriate values from three tables that correspond
to the article type, page and section. The page type (e.g.,
Review & Outlook) determines formatting characteristics,
such as column separators, column widths, headline
suppression, column and summary logos, and whether ads are
allowed. The article type (e.g., Heard on the Street),
determines icons, headline size, logos and other
characteristics.
Karben explains that keeping the number of basic templates
small makes the system easy to update and maintain. Recently,
the paper contracted with the DoubleClick Network to manage
all of its online advertising placement. Replacing the local
ads with the DoubleClick source was accomplished in less than
a day for all areas of the paper.
Editors can customize the layout of special features with a
specialty template. They also have write permission for the
published html pages, but rarely use it. When a writer wants
something new, Karben reports, it is usually for the sake of
consistency, such as a new separator or the application of
asterisks or dashes to match an existing feature.
Rendering from sgml also means that the layout can
associate a picture with its caption and move and place the
two as a single unit.
Unfortunately, even with this attention to detail in
formatting, we find the screen remains a tedious medium for
extensive reading, and the online paper's mimicking of the
print paper's column widths accentuates that weakness. The
Interactive Journal does use Cascading Style Sheets, so
readers can benefit wherever CSS is supported, but at present
the browsers do not provide a convenient mechanism for
subscribers to override the styles with their own, such as
wider column widths.
The searching advantage. Consistency and automation are not
the only benefits of table-driven markup from a rich source;
searching is also improved. Karben provided this example.
Compare a byline in DJML:
* * *
<BYLINE>By <AUTHOR>Mark Robichaux</AUTHOR>
<CREDIT>Staff Reporter of <TITLE>The Wall Street Journal</TITLE></CREDIT></BYLINE>
* * *
with the html it gets converted to:
* * *
<B>By M<FONT size=-1>ARK</FONT> * * *
which renders:
By MARK ROBICHAUX Rendering from an sgml source means that author Mark
Robichaux is searchable as a single string, which may not be
possible if the source is littered with formatting codes. It
also means that once browsers get hip to more of the finer
points of rendering, the small capitals can be applied with a
single format command, as they would be in any reasonably
adept composition system (or even Microsoft Word).
Conclusions
The Wall Street Journal Interactive Edition has created an
editorial system in which the professionals at the center of
the enterprise, the journalists and editors, can continue to
work much as they always have. Differences in editorial
requirements, such as the need for tighter classification of
stories, are imposed by the new media, not by the technology
used to create it. It is pleasing to see a system that uses
structured markup and yet stays within the reach of an
editorial staff that considers itself the best in the world
and doesn't give a hoot for whatever is under the hood. For
writers and editors, the greatest difference lies in the tempo
and timing of their day-they are always on deadline.
As writers work, the Web technology is mostly hidden, but
they can be called on for front-line troubleshooting. Budde
and Jaroslovsky compare this stage of technology and interface
design to the model T stage of the automobile: "You didn't
have to be a mechanic to drive, but it helped if you could
open the hood and tell the carburetor from the gear box."
The Interactive Journal extends the Journal's rich
typographic tradition into a new media without creating a
facsimile of the print edition. The firm has achieved this
with a composition system driven by structured markup and
batch composition that nevertheless preserves much of the
interactivity of desktop publishing. Writers and editors
determine the final look of a story by the way they define it
and place it in the edition, and they can check their results
with an immediate preview.
The control over the final look of the page through
predetermined, context-driven templates has been achieved
without sacrificing the ability to tweak it as needed and
without compromising the paper's ability to take advantage of
improvements to the Web browsers that ultimately format the
material for the reader. The structured approach does impose
some controls compared to the freedom of building pages from
scratch, but Budde and Jaroslovsky want their writers to focus
on words and logical links, not dropped caps. Given that
journalists' overriding concern is getting the words right and
only when that is accomplished (on deadline) do they care if
the headline fits, we think the Interactive Journal has
provided a good balance among ease of entry, control over
hyperlinking and metadata, and fidelity to the composed
screen.
Little downside. There is a price, however, for in-house
development. Karben has stretched Word Basic to its limits,
and admitted he is probably doing a little more than he should
do with it. He's found that Word Basic cannot nest more than
four subroutines and that there are limits to the macro
interpreter. He can't create a popup dialog box from a popup
and, of course, can't catch parsing errors until they are
further downstream.
More importantly, Dow Jones is now responsible for
maintaining and upgrading the one-off configuration management
applications developed by EDS-Copy Flow and Edition
Maintenance-in contrast with Word, which is already two
upgrades past the 6.0 version in use at the Interactive
Journal.
Another area that might be improved is indexing and
retrieval. The current keyword entry with the Verity search
engine seems more primitive than it need be, given the
sophistication of the paper and archive. A Personal Journal
view pulls from both Dow Jones News Retrieval and the
Interactive Journal, but there is insufficient filtering of
duplicates, so the reader can be inundated with multiple
copies of the same story. It also would be nice to see Dow
Jones leverage its own classification scheme by giving the
reader direct access to the same categories in conducting
searches, or to make use of sgml tags in the searches. Lastly,
it also would be nice if the user could point to a story of
interest and say, "Give me more that are like this," a
technique that other engines offer. The new version of Verity,
Karben hopes, will allow tighter coupling with the sgml source
markup, and, should the paper switch to another indexing tool,
the change could be made without disrupting other aspects of
the system.
Strong upside: SGML payback. In this era of Quark
pagination it is highly unusual for a newspaper editorial
system to generate an sgml repository. But without the
constraints of editing to fit, the Interactive Journal's
experience clearly shows that it can pay off for online news
publications. The rich markup aids the automation and
consistency of the formatting process, and makes it much
easier to support alternative presentations. For example,
Interactive Journal subscribers can get personalized content
"pushed" onto their screens using an After Dark screen saver
that is updated automatically. The translation from DJML to
the HTMx format used by After Dark for its screen savers is
quick and simple.
Karben has also worked on enhancements that make it easier
to use the Interactive Journal with pwWebSpeak's nonvisual web
browser. All he had to do was insert
Equally important, Dow Jones has since ported the sgml to
several spin-off products and media, deriving additional
revenue at a reduced production cost than would otherwise be
possible. For the past few weeks, the Interactive Journal has
delivered its content around the clock for broadcast over the
PointCast Network. Recently PointCast upgraded its client; an
Interactive Journal channel is included.
Budde and Jaroslovsky decided to stick with sgml as long as
they could make it work and make an editorial system that
their editors could live with, that fit within the establish
workflow. They credit Alan Karben for continuing
to find creative ways to do so. They have pressured both
sides-Karben to come up with less intrusive technical fixes,
and the editorial staff to slow down and add the information
required to take advantage of the power of the new medium. In
this way, they avoid two of the usual downsides to sgml-the
awkwardness of the editorial environment and the disconnect
between the writer's screen and the final layout.
Having already ported to several browsers, the team is
ready for the next media platform, whether it is airplane
seat-back displays or heads-up eyeglasses. When the time
comes, Alan Karben is confident that they can
get there from sgml.
Interactive Journal Editorial System Components
Editorial platform: Microsoft Windows 95 or 3.1
Editorial workflow: Copy Flow, inhouse system written by
EDS
Placement: Edition Maintenance, inhouse application written
by EDS in Visual Basic on top of a UniSQL database server
Text editor: Microsoft Word, 16-bit, heavily customized
using Word Basic
Conversion and validation: OmniMark from Omnimark
Technologies; SP, nsgmls from James Clark; Perl libraries from
David Megginson
Ninety Percent Empty or Two-Thirds Full?
The week we began this story, a New York Times headline
read "700 Newspapers to Read Online; Only One Charges for
Everything." The point of the article was that with the
imposition of the universal toll, the audience for the Wall
Street Journal Interactive Edition had largely evaporated,
going from roughly 700,000 free-loaders to 70,000 paid
subscribers. Even more alarming, the Times reported that some
online subscribers had pulled their print subscriptions, thus
"cannibalizing" the audience for the print edition.
Rather than look at this as Niagara-like fall-off,
Interactive Journal editors Neil Budde and Rich Jaroslovsky
present an alternate view: Not all of the 700K were readers;
many were surfers. On a really good day, before the toll gate
went up over the entrance, the site had about 45,000 visitors.
Today, it has about 30,000 a day, turning a 90% reduction into
a 33% reduction. (The Times, in contrast, has about 1 million
registered users, about 70,000 of whom use the paper daily. )
Jaroslovsky suggests yet another way to look at the same
figures: If the 700,000 are viewed not as subscribers to a
free service but as recipients of a direct mail promotional
piece, the 10% who paid represent a triumph of marketing.
The editors see the Interactive Journal as neither a
cannibal nor clone of the print parent. They want to create a
publication of substance that will attract new readers and
extend the overall franchise of the paper. The profile of the
Interactive Journal reader is younger and more high-tech than
the audience for the print Journal. As long as both
publications continue to grow overall, they see no need to be
concerned. Jaroslovsky's E-mail indicates that there are new
print subscribers being generated from the online product,
though not yet as many as drop print once they get the online
product.
Still, looking at the daily profit and loss statement, the
Interactive Journal, a separate business unit within Dow Jones
Interactive Publishing, is not in the black. It may seem
anomalous to demand a direct payback on the basis of one
product when it is, at the same time, building an archive that
will continue to spin off products and new media deliverables.
The expectation at Dow Jones, however, is that both print and
the Interactive Journal will be profitable, each in its own
sphere and each catering to distinct, albeit overlapping,
audiences.
The Wall Street Journal is not eager to replace the print
product-among other reasons the paper owns its printing
plants-but if paper does go away management doesn't expect
this change to happen overnight. Whatever the mix of
readership in coming years, as long as they continue to
expand, not shrink, the franchise, it will be to the company's
benefit.
|