It looks like you're offline.
Open Library logo
additional options menu
Last edited by Edward Betts
July 9, 2010 | History

Catalog Implementation

RECORDS

a record identifier is a string encapsulating an identifier for a
catalog and an identifier for a particular record in that catalog.
for example, a record from the LC might have an identifier that is
"LC" concatenated with MARC fields 001 and 003, which uniquely
identifiy the record within the LC catalog.

all records used by the Open Library, even if later replaced by
updated records, will be stored in Archive items. metadata in those
items may provide information about the records' format and
provenance.

a file locator is an Archive item identifier concatenated with
the path to a particular file in that item, e.g.:

"marc_records_scriblio_net/part01.dat"

the file's data may be retrieved via HTTP by performing a GET
request on the appropriate URL, e.g.:

"http://www.archive.org/download/marc_records_scriblio_net/part01.dat"

a record locator is a file locator together with a byte offset
and length, e.g.:

("marc_records_scriblio_net/part01.dat", 29834, 543)

the record may be retrieved by performing an HTTP GET on the URL
associated with the file locator and providing a "Range" header
requesting the relevant bytes.

(we can provide a simple web service that, given a URL, byte-range,
and record type, will retrieve the record data and return a
human-readable representation. this way, the OL book interface may
easily provide links to the original records.)

MANIFESTATIONS

manifestation objects in tdb (currently called "editions") use the
multi-valued source_record_loc field to store record locators
(serialized into a string) for records that are considered to describe
the manifestation. for each source_record_loc entry, there will also be
a source_record_id entry giving that record's record identifier.

DISTILLATION

given a manifestation and a set of source records, distillation
means to derive the Open Library schema fields from those records.
the distillation algorithm may choose a single, most-authoritative
record to provide the data, or merge data from multiple records.

if distillation is performed after user edits have been recorded on
the manifestation (as evidenced by previous thing versions created by
non-system users), the algorithm will determine whether to overwrite
or rather incorporate those changes.

IMPORT

to import a record to the OL database (given its contents and its
record locator), compute its record identifier and determine whether
it already exists in the database by looking for a manifestation
object with a matching source_record_id field. if it does, and this
record's transaction date is the same as or older than that of the existing record,
do nothing; if it is newer,
replace the corresponding source_record_loc entry with this new record's
record locator; then re-distill the manifestation object's fields to
reflect the changed data. (it is possible, although perhaps uncommon,
that the updated record should now belong to a different
manifestation. so perhaps we should remove the old version and then
treat this as a "new" record, as below.)

if the record's identifier is not already recorded, use the
merge algorithm
to determine in which manifestation it belongs, or whether a new one
ought to be created. record this record's locator and identifier in
that existing or new manifestation object, and perform distillation on
the manifestation.

History

July 9, 2010 Edited by Edward Betts correct page type
March 7, 2009 Edited by webchick adding english translation
March 7, 2009 Edited by webchick Edited without comment.
March 5, 2009 Created by webchick Edited without comment.