BookOps-Worldcat
Overview
Requires Python 3.7 and up.
Bookops-Worldcat is a Python wrapper around OCLC's Worldcat Metadata API which supports changes released in the version 1.1 (May 2020) of the web service. The package features methods that utilize search functionality of the API as well as read-write endpoints.
The Bookops-Worldcat package simplifies some of the OCLC API boilerplate, and ideally lowers the technological threshold for cataloging departments that may not have sufficient programming support to access and utilize those web services. Python language, with its gentle learning curve, has the potential to be a perfect vehicle towards this goal.
This package takes advantage of the functionality of the popular Requests library. Interaction with OCLC's services is built around Requests sessions. MetadataSession
inherits all requests.Session
properties. Returned server responses are requests.Response
objects with all of their properties and methods.
Authorizing a session simply requires passing an access token into MetadataSession
. Opening a session allows the user to call specific methods which facilitate communication between the user's script/client and a particular endpoint of OCLC's service. Many of the hurdles related to making valid requests are hidden under the hood of this package, making it as simple as possible to access the functionalities of OCLC APIs.
Please note, not all features of the Metadata API are implemented because this tool was primarily built for our organization's specific needs. However, we are open to any collaboration to expand and improve the package.
Supported OCLC web services:
At the moment, the wrapper supports only OAuth 2.0 endpoints and flows. The specific protocols are Client Credential Grant and Access Token for authorization.
Worldcat Metadata API is a read-write service for WorldCat. It allows adding and updating records in WorldCat, maintaining holdings, and working with local bibliographic data. Access to Metadata API requires OCLC credentials. The BookOps wrapper focuses on the following API operations:
- Search functionality
- Find member shared print holdings (
/bibs-retained-holdings
) - Get summary of holdings for known items (
/bibs-summary-holdings
) - Brief bibliographic resources:
- Search brief bibliographic resources (
/brief-bibs
) - Retrieve specific brief bibliographic resource (
/brief-bibs/{oclcNumber}
) - Retrieve other editions related to a particular bibliographic resource (
/brief-bibs/{oclcNumber}/other-editions
)
- Search brief bibliographic resources (
- Find member shared print holdings (
- Full bibliographic resources
- Retrieve full bibliographic record (
/bib-data
) - Get current OCLC number (
/bib/checkcontrolnumber
)
- Retrieve full bibliographic record (
- Holdings
- Set and unset institution holding (
/ih/data
) - Retrieve status of institution holdings (
/ih/checkholdings
) - Set and unset institution holdings for a batch or records (
/ih/datalist
) - Set and unset holdings for a single record for multiple intitutions (
/ih/institutionlist
)
- Set and unset institution holding (
Installation
To install use pip:
$ pip -m install bookops-worldcat
Quickstart
Worldcat Metadata API requires OCLC credentials which can be obtained at the OCLC Developer Network site.
Obtaining Access Token
The Worldcat access token can be obtained by passing credential parameters into the WorldcatAccessToken
object.
>>> from bookops_worldcat import WorldcatAccessToken
>>> token = WorldcatAccessToken(
key="my_WSKey",
secret="my_secret",
scopes=["WorldCatMetadataAPI"],
principal_id="my_principal_id",
principlal_idns="my_principal_idns"
)
>>> print(token)
"access_token: 'tk_Yebz4BpEp9dAsghA7KpWx6dYD1OZKWBlHjqW', expires_at: '2020-01-01 17:19:58Z'"
>>> print(token.is_expired())
False
Created token
object can be directly passed into MetadataSession
to authorize requests to the Metadata API web service:
>>> from bookops_worldcat import MetadataSession
>>> session = MetadataSession(authorization=token)
Searching Brief Bibliographic Records Using Metadata API
The MetadataSession
is authenticated using the WorldcatAccessToken
object. The session allows searching brief records as well as retrieving full bibs in the MARC XML format.
Basic usage:
from bookops_worldcat import MetadataSession
with MetadataSession(authorization=token) as session:
results = session.search_brief_bibs(q="ti:zendegi AND au:greg egan")
print(results.json())
Returned brief bibliographic records are in the JSON format that can be parsed via .json()
method.
{
"numberOfRecords": 24,
"briefRecords": [
{
"oclcNumber": "430840771",
"title": "Zendegi",
"creator": "Greg Egan",
"date": "2010",
"language": "eng",
"generalFormat": "Book",
"specificFormat": "PrintBook",
"edition": "First edition.",
"publisher": "Night Shade Books",
"mergedOclcNumbers": [
"664026825"
],
"catalogingInfo": {
"catalogingAgency": "BTCTA",
"transcribingAgency": "DLC"
}
},
{
"oclcNumber": "961162511",
"title": "Zendegi",
"creator": "Greg Egan",
"date": "2013",
"language": "eng",
"generalFormat": "AudioBook",
"specificFormat": "CD",
"publisher": "Audible Studios on Brilliance Audio",
"mergedOclcNumbers": [
"947806980"
],
"catalogingInfo": {
"catalogingAgency": "AU@",
"transcribingAgency": "AU@"
}
},
]
}
Retrieving Full Bibliographic Records
To retrieve a full bibliographic record from WorldCat use the .get_full_bib
method. The server returns records in MARC XML format by default.
from bookops_worldcat import MetadataSession
with MetadataSession(authorization=token) as session:
results = session.get_full_bib(oclcNumber=430840771)
print(results.text)
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom">
<content type="application/xml">
<response xmlns="http://worldcat.org/rb" mimeType="application/vnd.oclc.marc21+xml">
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000cam a2200000 i 4500</leader>
<controlfield tag="001">on1143317889</controlfield>
<controlfield tag="003">OCoLC</controlfield>
<controlfield tag="005">20200328101446.1</controlfield>
<controlfield tag="008">200305t20202019nyuabf b 001 0 eng c</controlfield>
<datafield tag="010" ind1=" " ind2=" ">
<subfield code="a"> 2018957420</subfield>
</datafield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">NYP</subfield>
<subfield code="b">eng</subfield>
<subfield code="e">rda</subfield>
<subfield code="c">NYP</subfield>
<!--...-->
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">9780316230049</subfield>
<subfield code="q">(pbk.)</subfield>
<!--...-->
<datafield tag="100" ind1="1" ind2=" ">
<subfield code="a">Christakis, Nicholas A.,</subfield>
<subfield code="e">author.</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Blueprint :</subfield>
<subfield code="b">the evolutionary origins of a good society /</subfield>
<subfield code="c">Nicholas A. Christakis.</subfield>
</datafield>
<datafield tag="250" ind1=" " ind2=" ">
<subfield code="a">First Little, Brown Spark trade paperback edition.</subfield>
</datafield>
<datafield tag="264" ind1=" " ind2="1">
<subfield code="a">New York, NY :</subfield>
<subfield code="b">Little, Brown Spark,</subfield>
<subfield code="c">2020</subfield>
</datafield>
<!--...-->
</record>
</response>
</content>
<id>http://worldcat.org/oclc/1143317889</id>
<link href="http://worldcat.org/oclc/1143317889"></link>
</entry>
Updating Holdings
MetadataSession
can be used to check or set/unset your library holdings on a master record in Worldcat:
example:
result = session.holding_set(oclc_number="850939579")
print(result)
<Response [201]>
result = session.holding_get_status("850939579")
print(result.json())
{
"title": "850939579",
"content": {
"requestedOclcNumber": "850939579",
"currentOclcNumber": "850939579",
"institution": "NYP",
"holdingCurrentlySet": true,
"id": "http://worldcat.org/oclc/850939579"
},
"updated": "2020-10-01T04:10:13.017Z"
}
For holdings operations on batches of records see Advanced Usage>MetadataSession>Updating Holdings
Advanced Usage
Identifying your application
BookOps-Worldcat provides a default user-agent
value in headers of all requests to OCLC web services: bookops-worldcat/{version}
. It is encouraged to update the user-agent
value to properly identify your application to OCLC servers. This will provide a useful piece of information for OCLC staff if they need to assist with troubleshooting problems that may arise.
To set a custom "user-agent" in a session simply pass is as an argument when initiating the session:
session = MetadataSession(authorization=token, agent="my_client_name")
... or simply update its headers attribute:
session.headers.update({"user-agent": "my-app/version 1.0"})
The user-agent
header can be set for an access token request as well. To do that simply pass it as the agent
parameter when initiating WorldcatAccessToken
object:
token = WorldcatAccessToken(
key="my_WSKey",
secret="my_secret",
scopes=["WorldCatMetadataAPI"],
principal_id="my_principal_id",
principlal_idns="my_principal_idns",
agent="my_app/1.0.0"
)
Event hooks
MetadataSession
methods support Requests event hooks which can be passed as an argument:
def print_url(response, *args, **kwargs):
print(response.url)
hooks = {'response': print_url}
session.get_brief_bib(850939579, hooks=hooks)
WorldcatAccessToken
Bookops-Worldcat utilizes OAuth 2.0 and Client Credential Grant flow to acquire Access Token. Please note, your OCLC credentials must allow access to the Metadata API in their scope to be permitted to make requests to the web service.
Obtaining:
from bookops_worldcat import WorldcatAccessToken
token = WorldcatAccessToken(
key="my_WSKey",
secret="my_secret",
scopes=["WorldCatMetadataAPI"],
principal_id="my_principal_id",
principlal_idns="my_principal_idns",
agent="my_app/version 1.0"
)
Token object retains underlying Requests object functionality (requests.Request
) that can be accessed via the .server_response
attribute:
print(token.server_response.status_code)
200
print(token.server_response.elapsed):
0:00:00.650108
print(token.server_response.json())
{
"user-agent": "bookops-worldcat/0.1.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "application/json",
"Connection": "keep-alive",
"Content-Length": "67",
"Content-Type": "application/x-www-form-urlencoded",
"Authorization": "Basic encoded_authorization_here="
}
Checking if the token has expired can be done by calling the is_expired
method:
print(token.is_expired())
True
A failed token request raises WorldcatAuthorizationError
which provides a returned by the server error code and detailed message.
MetadataSession
A wrapper around WorldCat Metadata API. MetadataSession
inherits requests.Session
methods.
Returned full bibliographic records are by default in MARC/XML format, but it is possible to receive OCLC's native CDF XML and the CDF translation into JSON serializations by supplying appropriate values in the response_format
argument to the get_full_bib
method. Search endpoints of the Metadata API return responses serialized into JSON format only.
All MetadataSession
issued requests have a build-in access token auto-refresh feature. While a session is open, before any request is sent, a current token is checked for expiration and if needed a new access token is automatically obtained.
OCLC numbers in methods' arguments
MetadataSession
accepts OCLC numbers in methods' arguments as integers or strings with or without a prefix ("ocm", "ocn", "on"). The following are all acceptable:
session.get_brief_bib(oclcNumber="ocm00012345")
session.get_brief_bib(oclcNumber="00012345")
session.get_brief_bib(oclcNumber=12345)
session.search_current_control_numbers(oclcNumbers=["ocm00012345", "00012346", 12347])
Search Functionality
MetadataSession supports the following search functionality:
get_brief_bib
retrieves a specific brief bibliographic resourcesearch_brief_bib_other_editions
retrieves other editions related to a bibliographic resource specified with an OCLC #search_brief_bibs
retrieves brief resouces for a keyword or a fielded querysearch_current_control_numbers
retrieves current OCLC control numberssearch_general_holdings
retrieves a summary of holdings for a specified itemsearch_shared_print_holdings
finds member library holdings with a commitment to retain (Shared Print)
The server responses are returned in JSON format by default.
Obtaining brief record
with MetadataSession(authorization=token) as session:
result = session.get_brief_bib(850940548)
print(results.json())
{
"oclcNumber": "850940548",
"title": "Record Builder Added This Test Record On 06/26/2013 13:07:06.",
"creator": "OCLC RecordBuilder.",
"date": "2012",
"language": "eng",
"generalFormat": "Book",
"specificFormat": "PrintBook",
"catalogingInfo": {
"catalogingAgency": "OCPSB",
"transcribingAgency": "OCPSB"
}
}
Quering WorldCat
Metadata API provides quite robust methods to query WorldCat. In addition to a flexible query string that supports keyword and fielded searches, it is possibile to set further limits using various elements such as type of item, language, publishing date, etc. It is possible to specify the order of returned records by using the orderBy
argument. Results are returned as brief records in JSON format.
The query syntax is case-insensitive and allows keyword and phrase search (use quotation marks), boolean operators (AND, OR, NOT), wildcards (# - single character, ? - any number of additional characters), and truncation (use * character).
keyword search with item type, language, and publishing date limiters:
session.search_brief_bibs(
q="czarne oceany dukaj",
itemType="book",
inLanguage="pol",
datePublished="2015-2020"
orderBy="publicationDateDesc"
)
fielded query:
session.search_brief_bibs(
q='ti="czarne oceany" AND au:jacek dukaj AND ge="science fiction"')
More about the query syntax can be found in OCLC documentation
Obtaining Full Bibliographic Records
session.get_full_bib()
method with OCLC number as an argument sends a request for a matching full bibliographic record in WorldCat. The Metadata API correctly matches requested OCLC numbers of records that have been merged together by returning the current master record. By default get_full_bib
returns records in MARC XML format.
Returned response is a requests.Response
object with all its features:
with MetadataSession(authorization=token) as session:
result = session.get_full_bib("00000000123")
print(result.status_code)
print(result.url)
200
"https://worldcat.org/bib/data/00000000123"
UnicodeEncodeError
it is recommended to access retrieved data with .content
attribute of the response object:
print(response.content)
Retrieving Current OCLC Number
MetadataSession.search_current_control_numbers
method allows retrieval of a current control number of the master record in WorldCat. Occasionally, records identified as duplicates in WorldCat have been merged. In that case a local control number may not correctly refer to an OCLC master record. Returned responses are in JSON format by default, but it's possible to pass 'application/atom+xml'
in the response_format
argument to have the response serialized into xml.
search_current_control_numbers
method accepts control numbers as a list or a comma separated string:
session.search_current_control_numbers(oclcNumbers="00012345,00012346,00012347")
session.search_current_control_numbers(oclcNumbers=[12345, 12346, 12347], response_format="application/atom+xml")
Holdings
MetadataSession
supports the following holdings operations:
holding_get_status
retrieves holding status of a requested recordholding_set
sets holding on an individual bibliographic recordholding_unset
deletes holding on an individual bibliographic recordholdings_set
allows holdings to be set on multiple records, and is not limited by OCLC's 50 bib record limitholdings_unset
allows holdings to be deleted on multiple records, and is not limited to OCLC's 50 bib record restrictionholdings_set_multi_institutions
allows to set holdings for a single record for multiple institutionsholdings_unset_multi_institutions
deletes holdings on a single record for multiple institutions
By default, responses are returned in atom+json
format, but atom+xml
can be specified:
result = session.holding_get_status(oclcNumber="ocn123456789", response_format="application/atom+xml")
print(result.text)
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom">
<title type="text">1143317889</title>
<updated>2020-04-25T05:21:10.233Z</updated>
<content type="application/xml">
<holdings xmlns="http://worldcat.org/metadata-api-service">
<requestedOclcNumber>1143317889</requestedOclcNumber>
<currentOclcNumber>1143317889</currentOclcNumber>
<institution>NYP</institution>
<holdingCurrentlySet>true</holdingCurrentlySet>
<id>http://worldcat.org/oclc/1143317889</id>
</holdings>
</content>
</entry>
Pass OCLC record numbers for batch operations as a list of strings or integers or comma separated string with or without a prefix:
session.holdings_set(
oclcNumbers="00000000123,00000000124,00000000125,00000000126")
session.holdings_unset(oclcNumbers=[123, 124, 125, 126])
MetadataSession
permits larger batches by splitting the batch into chunks of 50 and automatically issuing multiple requests. The return object is a list of returned from server responses.
results = session.holdings_unset(oclcNumbers=[123, 124, 125, 126])
# print results of each batch of 50s
for r in results:
print(r.json())
A consortium type of organizations that serve multiple libraries can utilize holdings_set_multi_institutions
and holdings_unset_multi_institutions
methods to set and unset holdings for selected libraries. List of library OCLC codes is passed as a comma separated string to instSymbol parameter.
results = session.holdings_set_multi_institutions(oclcNumber=1234, instSymbols="BKL,NYP")
Examples
Complex search query:
from bookops_worldcat import WorldcatAccessToken, MetadataSession
# obtain access token
token = WorldcatAccessToken(
key="my_WSKey",
secret="my_secret",
scopes=["WorldCatMetadataAPI"],
principal_id="my_principal_id",
principal_idns="my_principal_idns",
agent="my_app/version 1.0"
)
with MetadataSession(authorization=token) as session:
# search Worlcat
response = session.search_brief_bibs(
q="su:civil war AND (su:antietam OR su:sharpsburg)",
datePublished="2000-2020",
inLanguage="eng",
inCatalogLanguage="eng",
catalogSource="dlc",
itemType="book",
itemSubType="digital",
orderBy="mostWidelyHeld",
limit=20)
first_bib = response.json()["briefRecords"][0]
first_bib_number = first_bib["oclcNumber"]
# get full bib
response = session.get_full_bib(oclcNumber=first_bib_number)
print(response.content)