The purpose of this article is to acquaint chemists with some basic features of
World
Wide Web (WWW), to instil the notion that there is something in the WWW for
the chemical community and to encourage participation. It is not in any way a
definitive description of the WWW.
By now many will be aware of the existence of a computer network called the
Internet, and perhaps to some extent of the
World
Wide Web (WWW). The WWW was
first
conceived at the
European
Laboratory for Particle Physics (CERN) in 1989 and first implemented in
late 1990.It is all about accessing information on remote computers. Until
relatively recently, accessing information on other InterNet-linked computers
has been perceived as only for wireheads typing out bizarre and cryptic textual
incantations on a keyboard throughout the night until the early hours. If that
was ever true, it is at least no longer completely true.
Early mechanisms for transferring files from a remote computers involved a
method called file transfer protocol (FTP) whose operation was not easy for
most people. A further disadvantage is that text file contents cannot be
inspected without completely transferring it to the users own computer. Another
method of file access called
GOPHER
is more sophisticated and allows the user to read text files, at least, during
transfer, as well as allowing the transfer of graphics files, application
files, and formatted text files. These days, implementations of FTP and GOPHER
on sophisticated micro computers such as the Macintosh using programs such as
Fetch, Anarchie, and
Turbogopher
is superb, with these programs taking care of all the difficult commands for
the user.
At the heart of the WWW system is another file transfer method called HTTP
(Hypertext Transfer Protocol) At its simplest, the WWW system is a 'web' of
linked 'hypertext' documents. By hypertext, here we mean that clicking the
mouse on a hilighted piece of text contained in a document being displayed on a
computer screen causes the computer to retrieve and display some new document.
Its beauty is its interface. Once the user's computer is linked in some way to
the Internet (this is likely to involve the novice seeking help), the user can
make very effective use of the WWW system with little more knowledge of
computers than how to manipulate a mouse.
The new document might reside on a completely different computer to the first
document. This basic concept of hypertext is now extended so that the WWW has
evolved into an on-line hypermedia system. Hypermedia documents are hypertext
documents that contain, in addition to text, embedded graphics, movie clips,
and sound clips.
In order to read documents on the WWW, the user's computer must be connected to
the InterNet. In many institutions such connections will already exist, while
the home user may have to resort to a modem connection. The user also requires
a computer program called a browser in order to view the WWW documents. Good
examples of such programs are
NCSA
Mosaic and
Netscape
Navigator. These are available are available for many hardware systems
(such as Macintosh, UNIX Workstations, or PC). For educational users at least,
some of these programs possess the endearing quality in that, while
copyrighted, a license to use them is often free. In addition to coping with
hypertext documents, Many of these programs are also capable of transferring
files by FTP,
GOPHER,
(and
other
important methods not discussed here) using the same 'click-and-get'
interface.
The documents which the user 'reads' are provided, or 'served', by software
mounted on remote computers called
'server'
programs. These programs are may be mounted on any of several hardware
platforms such as UNIX workstations, Macintoshes, or even PCs. There is no need
for the browser program to reside on a similar piece of hardware to the server
program. This is a key feature - the transparency of the interface between
different pieces of hardware.
A document being read by a user will normally contain one or more 'hotspots'
(usually colour coded) which when the mouse cursor is clicked on that spot
causes something to happen. Typically, the user is taken automatically to a
different document. This is the basic concept of 'hypertext' and will be
familiar to those who have used 'HyperCard' or 'ToolBook' on Macs and PCs. The
key feature of a WWW hypertext document is its linking via these hotspots to
other documents which might well be written by completely different authors in
a different country. In a sense, hotspot linking in this fashion is rather like
having instant access to another document referenced in a footnote of the first
document.
There are other possibilities, however. Perhaps clicking on the hotspot causes
a file to be transferred (downloaded) automatically to the user's computer.
More advanced facilities allow the user to search a database for documents
containing specific words chosen by the user. The user is required to fill in
boxes in an on-line 'form' with a short piece of text and a remote computer
program does the rest. Even when searching databases on another continent, the
search normally only takes a few seconds. The user need not need to know how
the search works, and may not even know in which country the search computer is
located.
It is not uncommon to read suggestions that in some way on-line hypertext books
will replace printed books. This seems somewhat unlikely as a near-future
scenario, simply on grounds of convenience. However, on-line hypertext
documents do have some advantages over the printed word. Hypertext documents
contain text, but can also contain embedded graphics (such as reaction
schemes), movie clips (perhaps for animating molecular vibrations), and even
sound clips (good for commentaries). If nothing else, these embellishments can
make the document more interesting and hopefully informative. Hypertext
documents containing these features are referred to as hypermedia documents. At
the user's choice, the graphics are displayed automatically. The sounds and
movie clips are activated with a single mouse click on an appropriate hotspot.
Display and utilization of these features is expensive in terms of transfer
times since sound and graphic files are often large.
No programming experience is required in order to construct documents to be
placed on a WWW server. Documents on the WWW are written in a 'markup language'
called HTML (Hypertext Markup Language). These documents are plain text files
which any word or text processor is capable of writing to disk. The hotspots
and various formatting options such as headings or emphasized text are special
plain text strings (called tags or elements) contained between <angle>
brackets within the document. A number of on-line
tutorials
and
guides
for writing HTML are available
For instance, the start of a major heading (heading level 1) is signalled by
<H1> and terminated by </H1>. A short example of HTML text is given
in Fig. 1. Excellent on-line files are available which give advice on
preparation
of HTML files. The tags are recognized by the browser program when it comes
across them, and dealt with in an appropriate fashion. So, headings might
appear larger and bolder than normal body test, for instance. The tags are not
displayed to the user, they are instructions to the browser program on how to
display the file.
The publisher of a book has complete control over the appearance of the
publication, but this is not the case for a WWW publication. This feature of
the WWW takes a little getting used to. While the text content is fixed, the
browser, not the publisher, has control over the appearance of the document in
terms of, for instance, the text font, font sizes, some colours, and heading
formatting.
One of the most important constructions in HTML is the anchor tag - a text
string which defines a hotspot and containing information referred to as a '
URL'.
The URL (Uniform Resource Locator) specifies uniquely an object (such as
another file) on the Internet. In effect, the URL is the address of a document
on the Internet. The form of a URL is shown in Fig. 2.
The part of the URL before the colon (HTTP in this case) specifies an access
method or protocol. The part of the URL after the two slashes and before the
first single slash is the address of the machine on which the target document
is held. The remaining part is the directory address of the file on that
machine.
In the case of this URL, the browser would retrieve and display the 'home page'
of the
Department
of Chemistry at the
University
of Sheffield. The URLs '
gopher://acsinfo.acs.org/1'
and '
gopher://jchemed.chem.wisc.edu/1'
represent the gopher addresses of the American Chemical Society gopher server
and the Journal of Chemical Education gopher server. While the documents at
these addresses can be read by WWW browser programs, gopher documents are not
hypertext documents. The URL '
telnet://bids.ac.uk/'
activates an interactive TELNET session to the BIDS system at Bath. The URL '
ftp://ftp.shef.ac.uk/pub/uni/academic/A-C/chem/'
gives a list of files available by FTP at the Sheffield Macintosh Archive of
Chemistry Software.
The markup language
HTML
is an evolving standard. It seems that the next version will be
version
3. Standards for it are not yet fixed, making the task of browser program
programmers more difficult. Currently, most browsers support only some features
of HTML version 3. The chemist has always placed great demands upon
conventional typesetters, requiring a variety of special and Greek characters,
not to mention sub- and superscripts and more exotic symbols. Standards have
not been agreed for these requirements as yet, and most are not implemented.
However some browser programs such as
NCSA
Mosaic do at least support sub- and superscripts. At this stage, only a
limited number of Greek characters are supported by browser programs, while the
implementation of equations is still at a discussion stage (see Box 1). It is
only a question of time before full implementation of these features.
Fortunately, it is not always necessary to retype existing documents in order
to create HTML files. There are many utilities that are capable of converting
already-existing but properly constructed word-processor files into the
required HTML format. For instance, a number of word processors such as
Microsoft Word read and write a format called RTF (rich text format). Programs
(called filters) such as
rtftoftml
convert, as the name suggests, such files into HTML format. Filters also exist
to convert
LATEX
files to html.
So, what is there in all this for the chemist. The possibilities for chemists
are numerous, profound, and barely perceived. Currently, well over 200
Departments of Chemistry around the world have some kind of WWW site and are
listed at
http://www.shef.ac.uk/uni/academic/A-C/chem/chemistry-www-sites.html.
Most sites are in the USA, England, and Germany. At least 25 chemistry
departments in older universities in the UK have their own sites. Arguably,
most of these were developed by and are maintained by enthusiastic individuals
as a side-line to their normal tasks of research and teaching. Most sites are
relatively simple and typically advertise, often elegantly, details of
undergraduate courses, postgraduate opportunities, and academic staff research
interests. This is clearly useful to both the reader and the department
providing the information. However, there are some innovative and interesting
chemical uses of the WWW.
A natural use of hypertext is in teaching. There are a number of preliminary
efforts directed in this direction around the world, for instance at Duke
University, the
University
of Leeds, '
The
Virtual Classroom' at Rensselaer Polytechnic Institute, and, perhaps in
particular,
Virginia
Tech. A useful list of resources is maintained at:
http://www-hpcc.astro.washington.edu/scied/science.html.
The University of Sheffield's contribution to chemistry on the WWW is
WebElements
- an evolving periodic table database developed in Sheffield and now 'mirrored'
at nine other sites (three in the USA, two in Germany, one each in Austria,
Brazil, Australia, and China) around the world. The
WebElementshome
page presents a periodic table (Fig. 3) which when an element is clicked upon,
gives the user information on that element.
The user can also view attractive and informative graphical representations of
the data (Figures 4 and 5). These representations were created originally using
MacElements,
a periodic table database program running on a Macintosh, but can be viewed on
any other hardware system. While not yet implemented, one can envisage a
situation in the near future where such graphical representations are created
on-line (for a set of elements chosen by the browser) at the server computer
for transmission to the browser.
WebElements
also illustrates the concept of clickable graphics. When the user is presented
with a periodic table such as the representation in Figures 4 and 5, clicking
on an element will take the user to a file containing data for that element.
Clickable graphics also offer interesting and interactive ways to unfold the
complexities of reaction sequneces (Box 2).
WebElements
already offers some other interactive features, currently two simple on-line
calculation services -
isotope
patterns and
element
percentages. The user fills in a simple on-screen box with a chemical
formula and the browser program then requests that the server program executes
the calculation. In turn, the server program requests a slave program (in this
case a component of
MacElements,
running on a Macintosh) to execute the calculation and to return the result via
an automatically generated HTML document (Fig. 6). All the work is hidden from
the user. These calculation services were developed originally developed as
proof-of-concept devices and can clearly be extended in scope. Figure 6 also
demonstrates
NCSA
Mosaic's support for subscript characters.
Figure 6. An isotope pattern calculated over the WWW
A number of sites, (for instance
Duke
University,
Virginia
Tech,
the
University of California at Berkeley, and the
Edison
project at Columbia University) particularly in the USA are using the WWW
as a shop window to chemistry multimedia projects which once obtained by the
user are designed to run on a local machine.
Chemistry is a very visual subject. A WWW site called the Chemist's Art Gallery
at
http://www.csc.fi/lul/chem/graphics.html
in Finland gives hypertext links to many examples of visualizations in
chemistry from various groups around the world. One desirable aim of the
chemist would be the transfer of small files containing molecule data (such as
coordinates) so that they can be processed and manipulated on the user's
machine. This is part of the aim of the
chemical
MIME project based at the Departments of Chemistry at the
University
of Leeds and
Imperial
College - a mechanism (boxes 3 and 4) that has been proposed to enable the
intelligent handling of molecular information using 'helper' programs.
The WWW makes an ideal interface for providing gathered information to
interested individuals on given topics. As examples, everything you ever wished
to know about crystallography is to be found at
http://www.unige.ch/crystal/crystal_index.html,
the WWW Virtual Library: Crystallography site in Switzerland. The University of
York's
Department
of Chemistry
NMR
service maintains a useful WWW site. Some of the information is special to
local users but external users will find a good list of NMR-related software
and information.
The only way to discover more about the WWW is to browse the WWW for a while.
One useful and sophisticated feature of the WWW is the ability to search the
WWW for documents whose names or content contain specific text. The BBC
maintains a useful site at
http://www.bbcnc.org.uk/babbage/iap.html
or the
University
of Sheffield's
Department
of Information Studies provide good interfaces to and explanations of these
'search engines'. Start off by entering the single word 'chemistry' as a target
piece of text. Browsing around the WWW can be as addictive as browsing around
any good book shop, and perhaps far more time consuming.
This article is available as a WWW document on-line with the URL:
http://www.shef.ac.uk/uni/academic/A-C/chem/www-publications/chem-in-brit-95.html.
You are encouraged to read the document on-line. The on-line version contains
active links embedded within the text whose positions in thisarticle are
indicated as underlined text. It also contains a dynamic appendix consisting of
a list of 'highlights' on the WWW .to which will be added innovative chemistry
links as the author becomes aware of their existence.
Readers wishing to obtain computer software or advice related to the WWW are
requested respectfully to consult their institution's computer advisory service
and not the authors of this article. If you already have access to the WWW, you
should have a look at the WWW FAQ (frequently asked questions) at
http://sunsite.unc.edu/boutell/faq/www_faq.html
and at the 'YAHOO' site at
http://akebono.stanford.edu/yahoo/Computers/World_Wide_Web/.