FRBRization in the Open Library

Work Sets in the Open Library (FRBR-ization)

The thing called "FRBR-izing" is really the creation of a set of records that all represent the same work. So rather than having a display that shows all of the different editions of the work separately:

... you have a single display for the work, that links to all of the editions. Different systems display this differently. Here is the OCLC FictionFinder display:

And here is the beginning of the display of all of the many editions:

This creates a two-tired database with Works and Editions. (Note: Editions are called "Manifestations" in library lingo.) The two big questions are:

What is a work?
How is this hierarchy represented in the database in a way that is efficient for searching and for display?

Note that FRBR-ization affects only a small percentage of bibliographic records. OCLC's statistics show that 78% of the items in WorldCat are unique Works. Only 1% of Works have up to 7 Editions, and only 30,000 in their database have more than 20 Editions.

What is a work?

There is no definitive answer to what is a work, especially when it comes to changes in format, such as a book that has become a screenplay and then is made into a movie. But since we only have books in the OL database at the moment, the task is somewhat simpler: bring together books that are essentially the same text. Basically, the elements that define a work are:

The title of the (original) work
The primary author

This isn't quite as simple as it seems because ideally one would also bring together different translations of the same work, and of course those do not have the same title. In some records that we receive from libraries there will be a special "work title" that contains the original title of the work regardless of the language of the translation.

  Mann, Thomas
    [Zauberberg]
    Magic Mountain.
  Mann, Thomas
    [Zauberberg]
    La Montagna incantata.

There are also Works that are the same but have been printed with different titles at different times or in different countries, such as the works of Shakespeare and Harry Potter. The work titles (called "uniform titles" in library lingo) are unfortunately not used consistently even in library records, and don't exist at all in records from our other sources. At some point we will have to rely on users to bring together works that do not get identified algorithmically. We also have a set of ISBNs from LibraryThing to use, and could probably make some use of the xISBN service from OCLC. This, however, only helps us with works that have an ISBN.

In terms of an algorithm, OCLC's work set algorithm is available. However, it makes use of some data elements that we will not have, in particular those that OCLC derived from LC Authority records.

The Work-set display and the Edition display will make use of different fields. A page on the fields and display is here.

It is quite possible that the current edition matching algorithm that we use can be adapted to determine works in a way that approximates the OCLC results. This won't be as accurate as the OCLC algorithm, but we can use OCLC's FictionFinder database as a test set against which we can measure our results.

The Database Design

There are undoubtedly many different ways that we could design a database to support FRBR. Some possible designs are:

Work-centric access and display: In this scenario, there is a work record that contains the primary author(s), the title, and subject information. General searching goes against this Work record, which contains links (probably identifiers) to all of the related Edition records. This is how FictionFinder appears to work. In FictionFinder, searches on elements specific to a particular edition (e.g. the name of an illustrator) do not return results. Individual editions can be displayed in detail from the display, however. This design is very different from what we have today in infogami and may not be feasible. It also requires a Work record for each Work in the database, including those that are comprised of only one Edition.
Edition-centric access, Work-centric display: This design would index edition records much as they are indexed today, but would bring together records for the same work in a tiered display. There could be a minimal Work record that has approximately the same functionality that the Author record does today. With the retrieval of any record in a Work set, the Work record would be displayed with all of the editions subordinate to it on the page. This is usually done with a table that contains an entry for each edition with its equivalent Work, so that the Work is displayed in the place of the Edition. This requires de-duplicating the entired retrieved set so that the Work record is displayed only once even though multiple editions for the same work are retrieved. The Work record then can be used to retrieve all of the edition records based on their having the same Work ID. This design probably requires a Work record for each Work in the database, including those that are comprised of only one Edition.What this design does not easily provide is a display of the number of editions on the Work page. It also presents some potential performance problems which would have to be studied.
Edition-centric access, Edition-centric display, Optional Work display: This would mimic the current treatment of Editions and Authors. Retrieval would retrieve Editions and would display Editions. However, there would be something in the display of retrieved records that would allow you to move to a Work view if it is available for that record (a displayed link or button). One advantage of this is that it would only require a Work record for the books that have more than one Edition (guesstimated at 25% of the database). The disadvantage is that Works will be less obvious to users and most of the users will still see multiple editions in displays for that small percentage of works that has many editions (like all of our Mark Twain examples).

Note that based on the OCLC statistics, if we create a Work record for each work (even those that have a single edition) we will increase the number of records in the database by about 75%. Creating a Work record only when there are multiple editions, however, may add complexities to display.

History

Created July 17, 2008
4 revisions

July 17, 2020	Edited by Mek	fixing frbrization page
August 17, 2008	Edited by Karen Coyle	Edited without comment.
July 17, 2008	Edited by Karen Coyle	images from kcoyle.net
July 17, 2008	Created by Karen Coyle	Edited without comment.