GILS
Home |
About | Technology |
Standards | Policy |
Index | Search |
UDDI to GILS Gateway
(under development by Matthew Dovey, et al, Oxford University)
GILS Technical Topic page on UDDI
On This Page:
1. Search public UDDI database
1.1 Send UDDI request, e.g.,
<find_business maxRows="100">
<name>Utah</name>
</find_business>
1.2 Hold UDDI response as “public businessList”, e.g.,
<businessInfo businessKey="80A15BE8... A4F5">
<name>BIBLIOGRAPHY OF UTAH GEOLOGY</name>
2. Search GILS sources
2.0 Note: A "zurl database" is maintained in the
Gateway that correlates a public UDDI businessKey uuid (universally unique
identifier) with the identifier for a businessEntity record retrievable through
the other Gateway sources, here referred to as a "zurl" (see
zurl note below).
2.1 Pass find_business UDDI request to other gateway sources
2.1.1 Records are to be retrieved from GILS sources up to
the limit given by the maxRows attribute of the find_business request (see
query note below)
2.2 Identify retrieved records against zurl database and
public UDDI register
2.2.1 If businessEntity retrieved from GILS does not exist
in public register,
register it and identify the record by its businessKey uuid in the zurl
database
2.2.2 If businessEntity is already in the public register
but is not equivalent (see comparison note below), update it
2.3 Hold retrieved and identified records as "GILS businessList"
3. Merge public with GILS and de-dup
3.1 Merge public and GILS businessList's, using scores or
other criteria (see ranking note below).
3.2 De-dup records in merged businessList using businessKey
3.3 Pass up to maxRows records from merged businessList as UDDI response businessList
<?xml version="1.0" encoding="UTF-8" ?>
<Envelope
xmlns=“.../soap/envelope/">
<Body>
<find_business
generic="1.0"
maxRows="100">
<name>Utah</name>
</find_business>
</Body>
</Envelope>
<?xml version="1.0" encoding="UTF-8" ?>
<Envelope
xmlns=“.../soap/envelope/">
<Body>
<businessList
generic="1.0"
truncated="false">
<businessInfos>
<businessInfo
businessKey="163.1.91.181-5e9756:e4a4de6208:-7fb8">
<name>BIBLIOGRAPHY
OF UTAH GEOLOGY</name>
<description>Keyworded
compilation of 11,300
bibliographic entries </description>
<serviceInfos>
<serviceInfo
businessKey="80A15BE8-1C18-47B4-8A4D-5A047277A4F5"
serviceKey="6C80C032-AB12-4A72-B27B-2E03DF285818">
<name>Z39.50
Service</name>
</serviceInfo>
</serviceInfos>
</businessInfo>
</businessInfos>
</businessList>
</Body>
</Envelope>
I'm assuming that there is a GILS defined persistent DocId or zurl which
we can use to identify and retrieve any given GILS record - we can sort out
the details of this later. For now I'll just refer to this as DocId. The
gateway has a local database of UDDI uuid's and their corresponding GILS
DocId.
On receiving a find..., the gateway performs a search on the GILS server(s)
and a UDDI server (e.g. Microsoft). For each GILS record, the gateway checks
whether there is a corresponding entry in the database for the DocId.
if there is, the record for the corresponding uuid is pulled from the UDDI
server and compared with the GILS record (converted to UDDI), if there are
changes an update is posted to the UDDI server.
if there isn't, an add is posted to the UDDI server and the corresponding
uuid is stored against the DocId in the local database.
The GILS records (converted to UDDI and with uuid from the local database)
are then merged with the UDDI records from the UDDI server (de-duping on uuid)
and the full set returned (this is necessary to meet the requirement
that searching any UDDI server should produce the same result as searching
any other).
- comparison
-
A comparison is needed to determine when a public UDDI businessDetail record
matches what is retrieved from the GILS source and converted to a UDDI
businessDetail record. Both records will have been cast as XML DOM (Document
Object Model). Equivalence should ignore white space, UDDI keys, and order of
elements where the parent is a "bag".
One suggestion is to use a hash function for comparison. The hash function
would have to be commutative over a bag. Another suggestion is that two DOM
nodes might be regarded as equivalent if they have identical child nodes
(count and names), and if the length of the text values at each child node
are equal between the two nodes. This suggestion ignores values in attributes
and will miss changes wherein text has been replaced with other text of exactly
the same length.
- query
- UDDI is currently designed from the perspective of "Data Query"
rather than "Information Retrieval". Data Query is less effective when
the database is heavily populated or typical queries are "fuzzy" (e.g.,
full recall is less important than relevance ranking). UDDI may also have
performance problems when large results are instantiated regardless of whether
the searcher is likely to request all of the records.
UDDI has the ability
in a request to specify the maximum rows and in a response to specify whether
the result has been truncated. It does not have the notion of a cursor to
position within a table, nor the ability to handle named result sets.
- ranking
-
These questions are not unique to this gateway but occur whenever search is
required to combine results from sources that do not have precisely equivalent
ranking schemes and comparable records.
In the merging of multiple businessList's, there is an issue of how to rank
the records of each. Say, for instance, the find_business request specifies a
max of 100 rows (records), the public businessList contains 75 matching
records and the GILS businessList contains 75 matching records. The GILS
businessList includes a relevance score for each record but the public
businessList records have no scores. If the relevance scores are to be
considered in the ranking, should a score be forced for the public records? If so,
should the forcing be contrived to favor one or the other lists or to spread
the selection? If
not, on what other basis should records be drawn from the two lists?
- zurl
- If the retrieval zurl (i.e., "z3950://host:port/dbname?docId") were both
globally unique and persistent, a simple cross-reference between UDDI uuid and
retrieval zurl would satisfy. But, a zurl with a docId is often no more persistent
than the interval between re-indexing of a database. And, one would have
difficulty distinguishing duplicates due to the fact that the same business
can be found through various z39.50 servers.
Persistence is a thorny problem and UDDI may not have a complete solution
yet, either. For the time being, we can pretend that a retrieval zurl is as
globally unique and persistent as the UDDI uuid. Periodically, we can enforce
alignment between the two identifier spaces by retrieving everything registered
by the UDDI/GILS Gateway and running a batch comparison of retrieval zurl's to
UDDI records. For any broken DocID, the batch program would send a delete_business
message to the public UDDI register.
(Use with UDDI client such as Microsoft's apiExplore) http://163.1.91.181:8080/uddi/servlet/registry
UDDI
by Chris Kurt, UDDI.ORG Program Manager
GILS and UDDI by Eliot
Christian
Comments |
Privacy Notice |
URL:/uddi2gils.html