Below is a draft of an RFC for Z39.50 URL's that extends the specification with details on how one can communicate the typical Z39.50 parameters directly within a URL. Except for replacing one semicolon with an ampersand, this specification is backward-compatible with the current RFC 2056. ------------------------------------------------------------------------------- Network Working Group Request for Comments: ____ (extends RFC 2056 published November 1996) Category: Standards Track Uniform Resource Locators for Z39.50 Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. 1. Introduction Z39.50 is an information retrieval protocol with two URL schemes: the Session URL and the Retrieval URL. The Z39.50 Session URL has the value "z3950", "z3950s" or "z39.50s" in the scheme portion of the URL. The Z39.50 Retrieval URL has the value "z3950r" or "z39.50r" in the scheme portion of the URL. Both Z39.50 URL schemes define semantics for the conventional URL "query string", i.e., anything which follows the first "?" in the URL. This string is "URL encoded", i.e., spaces are changed to "+" and reserved characters are changed to "%"xx (hexadecimal encoding). When either Z39.50 URL is used, the client is to open a new session or use an already open session with the specified host and port. When the Z39.50 Session URL is used, a query string parameter may specify to keep the session open or to close the session after retrieval. When the Z39.50 Retrieval URL is used, the client session is to be closed immediately after retrieval. The Z39.50 protocol specifies that a session may involve Search and Present services, among others. A client using the Search service provides in the Z39.50 URL the names of one or more databases and certain search criteria. The server returns pointers to selected database records comprising a result set created at the server. A client using the Present service retrieves these database records from the server result set. An alternate use of Z39.50 Search and Present services is to retrieve a single database record whose unique identifier is already known. This identifier is here called a document identifier, or "docid". If docid is included in either Z39.50 URL, the client will perform the specified search as described for the Z39.50 Retrieval URL. 2. The Z39.50 Retrieval URL A Z39.50 Retrieval URL conveys to the server a client Search Request for a specific docid. The docid string is server-defined and opaque to the client. In the terminology of the Z39.50 protocol [2], the docid is placed into a type-1 query, as the single term, in the general format (tag 45), using the Bib-1 attribute set, with a Use attribute value of docid, and a structure attribute of URx. The server Search Response indicates how many records match the request. If the number of matching records does not equal one, the retrieval is considered unsuccessful and the client behavior in such a case is not defined. If the number of matching records equals one, the server may have included the matched record in the Search Response. If not, the client may request the record with a Present Request. After the client has received the matched record, it will close the Z39.50 session. An operation equivalent to that of the Z39.50 Retrieval URL can be specified with a Z39.50 Session URL. This is accomplished in the quert string by setting the session parameter value "&close=1" and otherwise using docid as defined for the Z39.50 Retrieval URL. 3. Element Set and Record Syntax The Z39.50 URL may specify via query string parameters the desired composition and format of the requested database records. Composition is specified as an abstract "element set" representation. Format of selected elements is specified as a "record syntax" structure. In the terminology of the Z39.50 protocol [2], when element set is specified it should be used in the Search request for the value of small- and/or medium-set-element-set-names or in a Present request following a Search. If one or more record syntaxes are specified, the client should select one (preferably the first in the list that it supports) and use it in a Search or Present request as the value of "PreferredRecordSyntax". 5. Prefix Query Notation The Z39.50 URL may specify in the query string certain search terms and attributes defined with Reverse Polish Notation (RPN) in the Z39.50 protocol. In the Z39.50 URL, RPN is replaced with equivalent Prefix Query Notation (PQN). The following overview of PQN is paraphrased from documentation at . PQN search terms are sequences of characters, as in: science PQN Boolean operators use a prefix notation, as in: @and science technology PQN search terms may be associated with attributes. These attributes are indicated by the PQN @attr operator. Assuming the Z39.50 "bib-1" attribute set, the search can be constrained to the title access point by setting use-attribute (type is 1) to title (value is 4): @attr 1=4 science PQN attributes may be applied to each in a range of search terms. In the PQN search below, both search terms have constrained the search to the "title" access point, but the "tech" term is also specified as right truncated: @attr 1=4 @and @attr 5=1 tech beta A PQN search for the DatabaseInfo records from an Explain server could be expressed as: @attrset exp1 @attr 1=1 DatabaseInfo The following describes in general how PQN operators are applied. @attrset set Whole query uses the specified attribute set. If this operator is used, it must be defined at the beginning of the query. @attr list op The attributes in list are applied to op @and op1 op2 Boolean and on op1 and op2 @or op1 op2 Boolean or on op1 and op2 @not op1 op2 Boolean not on op1 and op2 @prox list op1 op2 Proximity operation on op1 and op2 @set name Result set reference The grammar of PQN is as follows: Query ::= [ AttSet ] QueryStruct. AttSet ::= string. QueryStruct ::= { Attribute } Simple | Complex. Attribute ::= '@attr' AttributeType '=' AttributeValue. AttributeType ::= integer. AttributeValue ::= integer. Complex ::= Operator QueryStruct QueryStruct. Operator ::= '@and' | '@or' | '@not' | '@prox' Proximity. Simple ::= ResultSet | Term. ResultSet ::= '@set' string. Term ::= string | '"' string '"'. Proximity ::= Exclusion Distance Ordered Relation WhichCode UnitCode. Exclusion ::= '1' | '0' | 'void'. Distance ::= integer. Ordered ::= '1' | '0'. Relation ::= integer. WhichCode ::= 'known' | 'private' | integer. UnitCode ::= integer. The following examples are valid PQN searches. dylan "bob dylan" @or "dylan" "zimmerman" @set Result-1 @or @and bob dylan @set Result-1 @attr 4=1 @and @attr 1=1 "bob dylan" @attr 1=4 "slow train coming" @attr 4=1 @attr 1=4 "self portrait" @prox 0 3 1 2 k 2 dylan zimmerman 6. BNF for Z39.50 URLs The Z39.50 Session and Retrieval URLs follow the Common Internet Scheme Syntax as defined in RFC 1738, "Uniform Resource Locators (URL)" [1]. In the definition, literals are quoted with "", optional elements are enclosed in [brackets], "|" is used to designate alternatives, and elements may be preceded with * to designate n or more repetitions of the following element; n defaults to 0. z39.50url = zscheme "://" [username ":" password "@"] host [":" port] ["/" [database *["+" database] ["?" docid] | ["search?query=(" pqn ")"]| ["scan?query=(" pqn ")"]] ["&close=" | session] ["&esn=" elementset] ["&rs=" recordsyntax *[ "+" recordsyntax]] ["&encode=" encodehtml ] ["&maxrecs=" max] ["&ss=" URL]] zscheme = "z3950" | "z3950s" |"z39.50s" | "z3950r" |"z39.50r" username = uchar password = uchar host = TCP/IP host name port = TCP/IP port number database = uchar docid = uchar pqn = uchar session = "1" | "0" elementset = "B" | "F" | "G" | "S" | uchar recordsyntax = "GRS-1" | "SUTRS" | "USMARC" | uchar encodehtml = "1" | "0" max = integer URL = TCP/IP URL Because they have special meanings in URL's, "uchar" specifies that certain reserved characters ( ; / ? : @ & = ) must otherwise be escaped, i.e., expressed with hexidecimal encoding. In a Z39.50 URL, "z3950" and "z3950s" are equivalent to "z39.50s" and both designate a Z39.50 Session URL. A Z39.50 Retrieval URL can be designated with "z3950r" or "z39.50r". The parameter "host" is always required and all other parameters are optional. The parameter "database" must be included if "docid" is included. The element "search?query" conveys the query string for Search. The element "scan?query" conveys the query string for Scan. The parameter "pqn" refers to Prefix Query Notation (see above). The parameter session indicates whether to close the session after retrieval: yes for value "close=1", no for value "close=0". The element "max" specifies the maximum number of records to retrieve. The element "recordsyntax" specifies record syntax. The element "encodehtml" specifies to turn on ("1") or off ("0") the HTML encoding of characters. The element "ss" provides the URL to a stylesheet to be applied to the results when represented in XML. The following are default values if an element is not present: "port" defaults to 210 "close" defaults to "1" (yes) in a Z39.50 Retrieval URL "close" defaults to "0" (no) in a Z39.50 Session URL "max" defaults to 5000 "rs" defaults to "GRS-1" "encodehtml" defaults to "1" (yes) 7. Security Considerations The Z39.50 URL schemes are subject to the same security implications as the general URL scheme [1], so the usual precautions apply. This means, for example, that a locator might no longer point to the object that was originally intended. It also means that it may be possible to construct a URL so that an attempt to perform a harmless operation such as the retrieval of an object will in fact cause a possibly damaging remote operation to occur. 8. References [1] Berners-Lee, T., Masinter, L., McCahill, M. (editors), "Uniform Resource Locators (URL)", RFC 1738, December 1994. [2] ANSI/NISO Z39.50-1995, "ANSI Z39.50: Information Retrieval Service and Protocol", 1995. 9. Editors' Addresses Appendix. Examples of Z39.50 URLs The following example includes use of the PQN query syntax. This Z39.50 Session URL perfoms a search in the database "kubirds" defined in PQN as (@attr 1=1 "falcon"). This means to search for the word "falcon" within titles (by default the Attribute Set is "Bib-1" wherein Use attributes are type "1" and the value "1" is "personal name"). z3950://habanero.nhm.ukans.edu/kubirds/search?query=(@attr%201=1%20"falco") Note that hexidecimal encoding ("escaped characters" such as %20 representing a space) are used for reserved characters embedded within the query string. Further examples are provided in documentation at -------------------------------------------------------------------------------