The Network Effect

Weaving the Semantic Web of information

February 27, 2003

No matter what business you're in, you should be aware of emerging standards in cross-referencing digital assets. As we've seen in past Reports, the technical journal publishing industry again takes the lead.

NETTING IT OUT

The publishers of technical journals have begun to create an interlinked Web of semi-structured knowledge-in-the-making using the CrossRef collaborative reference linking service. This two-year-old grass-roots initiative is the first successful global attempt to link networked information in a structured manner.

However, CrossRef is not enough. While it solves the problem of locating articles that cite one another across the network, it doesn't address several other thorny problems, such as how to ensure access to the appropriate copy of the digital resource, how to recommend additional related materials, and how to provide services such as abstracting, translating, etc. that can act upon a digital object.

Enter the OpenURL standard, which facilitated the delivery of services pertaining to resources that are referenced in a networked environment. OpenURL introduced the concept of the ContextObject, which can be considered a referenced resource in a networked environment that can be acted upon by services.

No matter what business you are in, you should be aware of these standards. You can begin your company's migration towards Open URLs by making sure that all of your "published" content gets tagged with a base set of Digital Object Identifiers (DOIs) in preparation for a truly semantic web of digital information.

USING THE NETWORK EFFECT TO EXPAND MARKETS

As we've discussed over the past few weeks(1), the publishers of scientific, technical and medical journals have made great strides in creating sustainable business models, in uniquely identifying digital assets, and in tracking their usage. But the real excitement comes when you discover how they've already begun to build on these building blocks to create an interlinked Web of semi-structured knowledge-in-the-making. This is the true Noosphere of which Teilhard de Chardin wrote so seductively in "Phenomenon of Man,"(2) in the late 1930s, where the Jesuit philosopher posited the notion of a "Noosphere"--a layer of knowledge and consciousness that surrounds the earth. What lies beyond electronically-searchable Web pages? It's a world of interlinked, well-categorized, and constantly expanding human knowledge. It's a world in which there are no dead ends, but lots of discoveries. It's a world of knowledge that continues to expand as each contribution automatically links itself to other related information. And it's a world in which automated Web Services (cross-linking, searching, clustering, categorization) and human intention (tagging using taxonomies and synonyms) both play critical roles.

Weaving Cross-References, Citations, Forward & Backward Linkages: CrossRef

One of the most advanced shared services that are currently being provided across e-journals is the CrossRef collaborative reference linking service. CrossRef is a citation linking service that is run by a not-for-profit member organization. Its members include over 160 publishers, representing more than 7,000 journals.

WHY IS THE CROSSREF INITIATIVE SO SIGNIFICANT? It's the first successful global initiative to link networked information in a structured manner. Unlike the random nature of a Google-type search, cross-reference linkages provide accurate, relevant connections both within and across bodies of knowledge. These citations can be both reverse linkages--using the convention of citing prior related work--and forward linkages--taking advantage of the networked environment to link an article to a new one that is related and which may supersede it.

In the space of less than two years, CrossRef has experienced rapid adoption. It's truly a grass roots phenomenon, and one that has gone largely unnoticed outside of the scholarly journal and library communities.

Although it has taken root in the scholarly journal publishing field, CrossRef and its companion standards (DOI, OpenURL) are designed to be relevant for any kind of information--both digitized information as well as physical artifacts (e.g., print, film, manuscripts, dinosaur bones, moon rocks, etc.).

HOW DOES CROSSREF WORK? CrossRef members register each of their journal articles with CrossRef.org, by providing "a minimal set of metadata about each article, in a defined format (XML DTD and XML Data Rules)." This minimal set includes:

* Journal title
* ISSN
* First author, additional authors are optional
* Publication year
* Volume and Issue numbers
* Page numbers
* DOIs and URLs (including optionally, Open URLs, see below)

According to the guidelines published by the CrossRef organization, a publisher may choose to submit additional metadata, at its option. The DOI/URL will be registered with the International DOI Foundation in the DOI Directory automatically. So, once you register an article with CrossRef, it is automatically assigned a Digital Object Identifier.

In order to participate in CrossRef, each publisher has to have an online publishing environment for its journals, including full-text that is available to authorized users. The publisher's online environment must also be able to receive incoming DOI-routed users with at least a minimal response (this is defined by CrossRef, e.g., a full bibliographic citation must be shown to the user).

In addition, each registered article must have "Active Response Pages," i.e., be able to receive incoming links for those articles. This is necessary because, when metadata is deposited in CrossRef, other members and users (including Web Services) of the system can look up the DOIs immediately and create reference links.

Weaving the Semantic Web

Tim Berners-Lee, the "inventor" of the World Wide Web has been promulgating the concept of the "Semantic Web."(3) We believe that the CrossRef service is a major piece of the loom. But it's just the basics. CrossRef provides a semi-automated way for articles to be linked to one another, with no dead ends. If you follow the trail of linked references, you will also receive at least the summary and bibliographic reference, including a DOI, which will lead you to the actual source of the object (even if it is a reference to a physical artifact).

While the CrossRef linking service solves the problem of insuring that it's possible to locate articles that cite one another across the network, it does not, in and of itself, address several other thorny problems. These issues include:

* How to ensure that the reader gains access to the appropriate copy of the digital resource--the one to which he or she has access rights

* How to recommend additional, related materials--ones that may not be directly cited or linked

* How to provide a variety of services--abstracting, indexing, translation, modeling, etc.--that can act upon a digital object.

Needed: Context-Based Linking

At the same time that the CrossRef standard was being hammered out and promulgated, the National Information Standards Organization (NISO) was soliciting input from representatives from the library, publishing, and information services communities around the issue of reference linking. NISO hosted a series of workshops in 1999. Here's the background to the emergence of the NISO Open URL standard, from the NISO Web site: "A major issue that emerged from the workshops was the 'appropriate copy problem.'

This problem arises when multiple copies of a resource exist, and each copy is governed by a different access policy. A specific user should be directed to a copy of the resource that is governed by an access policy that is compatible with the user's access privileges. Conventional URL-based links cannot accomplish this. The workshop participants recognized that solving the 'appropriate copy problem' would lead to solutions for other context-based link resolution problems."

The Emergence of the OpenURL Standard & ContextObjects

The proposed solution to the need for digital information objects to be actionable--to be able to request services and/or to be acted upon by services--is found in the OpenURL standard: ANSI/NISO Z39.88-2003: OpenURL: A Transport Mechanism for ContextObjects.

To paraphrase and to summarize, the OpenURL standard facilitates the delivery of services pertaining to resources that are referenced in a networked environment. It provides a framework for the cross-domain representation (not dependent on a physical location) and for the transportation of such references (the appropriate metadata) along with the context in which these references occur (e.g., ownership, access rights, entitlements, permitted services). To capture this context, the OpenURL standard introduces the concept of the ContextObject.

So, think of a ContextObject as a referenced resource in a networked environment that can be acted upon by services. The object carries with it the metadata that describes it. To facilitate the delivery of context-sensitive services, the ContextObject also contains descriptions of other resources that constitute the context in which the core resource is referenced.

Here's the best description that we've found to-date that really captures the breadth and depth of this idea and implementation. It comes from a seminal article published by Herbert Van de Sompel and Oren Beit-Arie in 2001.

"The OpenURL specification is the glue that enables interoperability between information services and service components.

"One can easily imagine this architecture to be extended to references made on the Web in general, not just for scholarly material, but also to cities, diseases, cars, houses, abstract concepts, etc. The main pre-requisite for this extension is the existence of metadata and/or identifiers that describe the referenced items. This is very commonly the case, as many communities have created identifier or metadata schemes to achieve interoperability between systems.

"These references made on the Web in general can be regarded to be in the basic Web-plane. They can come with--default--author-embedded links. One can imagine a user reaching out into an overlaying service plane to ask specialized Web-services for alternative service-links related to a referenced item. Such service-links could be thought of as alternative routes across the web that are dynamically provided by third parties--i.e., not by the actual author of the Web-page where the reference is made--upon request of a user. They are routes in an overlaying service plane that are not available from the Web-document in which the reference occurs."(4)

Von Sompel and Beit-Arie were both major contributors to the current OpenURL standard. They point out that the emergence of and generalization of the OpenURL standard supplants a variety of proprietary efforts to associate services with tagged information, such as "Microsoft's SmartTag, NBCi's QuickClick, the Dialpad agent, the link session solution of Steve Hitchcock's PeP [Hitchcock and Hall 2001], the hypermedia link service of Microcosm, among others."

*****ENDNOTES*****
1) See" Understanding Digitization: Trends in Business Models," " Protecting Your Digital Assets ," and " Who Is Accessing Your Information "

2) Pierre Teilhard de Chardin, The Phenomenon of Man, first written in 1938; first published in 1955 (17 years after his death).

3) Berners-Lee, Tim, James Hendler, and Ora Lassila. 2001. "The Semantic Web." Scientific American. May 2001. (URL). <http://www.sciam.com/2001/0501issue/0501berners-lee.html>.

4) See http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html.
*****ENDNOTES*****