Distributed Queries
A robust collaborative problem solving environment infrastructure must provide the access to the widest distribution of information services available to present the largest range of possible solutions for the researcher. This is a daunting task to say the least when considering the fact that information service providers could be geographically distributed across multiple grids, reside on different database platforms, and use different database schemas to describe services. This is further compounded by the fact that each service database is dependent upon a query language that may be limited algebraically in its ability to carry out certain operations. For example, the join operation that is common to the structured query language (SQL) [1] is missing from the LDAP [2] protocol. The algebraic join operation in SQL allows results from multiple tables based upon common keys to be returned as one result. Because LDAP lacks this mechanism the client is forced to query each directory independently and join the results itself. One possible answer is to create a query translation service middleware framework to handle not only direct operation translation between two query grammars, but to also allow robust query languages to be used with storage systems offering a limited query interface. The middleware would need to translate the request into a common query algebra, translate the query algebra statements into each respective repository query language, assemble the results, and return the answers in the query language requester's format. Fortunately over the last 5 years research has been done on query translation services proving that at least in concept it is possible. However, the tools used have either been based on outdated or proprietary technology. Recently, XML[3] technology has helped bridge the gap between heterogeneous data storage systems by transporting data model constructs as XML documents, combining disparate data models through the recent developments in XML Schema and XML Schema Translators, and are allowing us to perform complex queries through XML the Query Language. We believe that the recent work by W3C on the XQuery 1.0 Formal Semantics [4] (formally Query Algebra) coupled with the previously mentioned XML technologies provides the core for a universal query translation service.
Our initial research concludes that query grammars can be broken down into mathematical statements and it is possible to reformulate these mathematical statements into different query languages. This is based on two facts: · Most database query languages today rely completely or partially on a common set of algebraic operations [5]. · Database servers today already transform their user requests from query grammar into algebraic statements in order to process the user request.

Figure 1. Common translation practice used in today's databases.
In the same token, we believe that a query grammar can easily be broken down into algebraic statements through XML translators and then re-translated into a new query grammar. In a simplistic example an RDBMS project operation in SQL could be used directly against an LDAP server if the RDBMS project operation was translated from SQL into an LDAP search request. Building upon this scenario one could imagine adding middleware services extend LDAP's query operations to include a join operation. As shown in figure 2 This could be completed by having the middleware service accept the SQL request via SOAP [6], use the translator to transform the SQL request into native LDAP requests, then acting as a client query each LDAP information service, collect the results, translate the LDAP results into a single SQL result, and then return the results to the user.
A key part of the translation service is the ability to translate between service schemas. It is reasonable to predict that some service schemas may be unique and difficult to map into a general schema, while others such as the Globus Metacomputing Directory Service (MDS) [7] may be based on a standard approach. For this reason, we believe that the translation service will need to maintain a map of information service providers, their schemas, and a mapping algorithm to other schemas. Once the schema is registered information service detection agents could monitor distributed information services for schema changes and update the translation service database as necessary.
Another feature of the translation service must be the ability to provide query optimization. Query optimization will use local caching services to store frequently, or most recent results from requests. When a query request is sent to an information service, if a query optimization service exists the query would be routed through the query optimizer to return any cached data, if the results are not found in the service the request is routed directly to the information service [8] [9].
Finally, the query translation service must be extensible to incorporate new grammars and mathematical languages. While SQL is very robust, it provides a limited set of operations on simple data types. An extensible query translation service must allow new ideas such as Boolean algebra searches, queries on complex data types, query by example, and natural language queries [10] [11].
Reference Section
1. Information Technology Database Language SQL, July 1992, http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
2. Lightweight Directory Access Protocol (LDAP), http://www.umich.edu/~dirsvcs/ldap/doc/rfc/rfc1777.txt
3. Extensible Markup Language (XML) version 1.0, http://www.w3c.org/XML
4. XQuery 1.0 Formal Semantics, http://www.w3.org/TR/query-semantics/
5. Query Algebra, http://www.w3.org/TR/2001/WD-query-algebra-20010215/
6. Simple Object Access Protocol (SOAP) 1.1, W3C Note 08 May 2000, http://www.w3.org/TR/SOAP
7. Globus Metacomputing Directory Service (MDS) http://www-fp.globus.org/mds
8. Querying Network Directories (1999) http://citeseer.nj.nec.com/106339.html is a paper based on a thesis which explored sophisticated, distributed querying techniques for directory services. It addresses the inadequacies the LDAP query language.
9. Counting Twig Matches in a Tree (2000-2001) http://citeseer.nj.nec.com/417035.html discusses query optimization on LDAP using XML-QL.
10. Presentation given at OMG discussing RDF ( http://www.w3.org/2001/Talks/0710-ep-grid/ )
11. An Extensible Approach for Modeling Ontologies in RDF(s) (Nice intro paper for RDF/Ontologies) http://www.research.att.com/~mff/files.final.html

