Recent Submissions

  • In-home solid fuel use and cardiovascular disease: a cross-sectional analysis of the Shanghai Putuo study

    Lee, Mi-Sun; Hang, Jing-qing; Zhang, Feng-ying; Dai, He-lian; Su, Li; Christiani, David C (2013-09-23)
  • Facilitating interoperability among heterogeneous geographic database systems: A theoretical framework, a prototype system, and evaluation

    Park, Jinsoo (The University of Arizona., 1999)
    The objective of this research is to develop a formal semantic model, theoretical framework and methodology to facilitate interoperability among distributed and heterogeneous geographic database systems (GDSs). The primary research question is how to identify and resolve various data- and schematic-level conflicts among such information sources. Set theory is used to formalize the semantic model, which supports explicit modeling of the complex nature of geographic data objects. The semantic model is used as a canonical model for conceptual schema design and integration. The intension (including structure, integrity rules and meta-properties) of the database schema is captured in the semantic model. A comprehensive framework classifying various semantic conflicts is proposed. This framework is then used as a basis for automating the detection and resolution of semantic conflicts among heterogeneous databases. A methodology for conflict detection and resolution is proposed to develop interoperable system environment. The methodology is based on the concept of a "mediator." Several types of semantic mediators are defined and developed to achieve interoperability. An ontology is developed to capture various semantic conflicts. The metadata and ontology are stored in a common repository and manipulated by description logic-based operators. A query processing technique is developed to provide uniform and integrated access to the multiple heterogeneous databases. Logic is employed to formalize our methodology, which provides a unified view of the underlying representational and reasoning formalism for the semantic mediation process. A usable prototype system is implemented to provide proof of the concept underlying this work. The system has been integrated with the Internet and can be accessed through any Java-enabled web browser. Finally, the usefulness of our methodology and the system is evaluated using three different cases that represent different application domains. Various heterogeneous geospatial datasets and non-geographic datasets are used during the evaluation phase. The results of the evaluation suggest that correct identification and construction of both schema and ontology-schema mapping knowledge play very important roles in achieving interoperability at the both data and schema levels. The research adopts a multi-methodological approach that incorporates set theory, logic, prototyping, and case study.
  • Aggregation in temporal databases

    Kline, Rodger Nickels; Kline, Rodger Nickels (The University of Arizona.The University of Arizona., 19991999)
    Temporal database systems extend relational database systems to support time-varying information. One important such extension is support for time-varying aggregate functions, such as a time-varying average. Our research will show that temporal aggregates may be specified in a semantically well defined manner yet be efficiently implemented as simple extensions to relational databases. We introduce a taxonomy of temporal aggregation, based on a study of all major temporal query languages containing aggregates. The taxonomy categorizes the expressiveness and functionality of temporal aggregation. Based on this taxonomy, we introduce extensions to TSQL2 for temporal aggregation. The proposed language constructs allow one to express the variety of features identified in the taxonomy. We briefly discuss the semantics for the temporal aggregate language extension. We introduce an operator for evaluating temporal aggregates in a temporal relational algebra; the operator was designed to implement the tuple semantics. We show that theoretically, the most efficient evaluation of a temporal aggregate over a relation with n unique timestamps requires Θ(n log n) time, with O(n log n) space in any decision tree algorithm. We provide an example algorithm meeting these requirements, utilizing a 2-3 tree. Based on the requirements for evaluation of the algebraic operator, we introduce a series of main memory algorithms for evaluating temporal aggregates, including the aggregation tree, k-ordered aggregation tree, the chalkboard algorithm, the linked-list algorithm, and show how to perform aggregation using a 2-3 tree. The algorithms exhibit different applicability depending on aspects of the input relation, including sort order, percentage of long-lived tuples, and number of tuples. We also provide an algorithm which executes using only a user-limited amount of memory, the paging aggregation tree. We characterize the effectiveness of these algorithms based on an empirical study of their performance.
  • Indexing and path query processing for XML data

    Li, Quanzhong (The University of Arizona., 2004)
    XML has emerged as a new standard for information representation and exchange on the Internet. To efficiently process XML data, we propose the extended preorder numbering scheme, which determines the ancestor-descendant relationship between nodes in the hierarchy of XML data in constant time, and adapts to the dynamics of XML data by allocating extra space. Based on this numbering scheme, we propose sort-merge based algorithms, εA-Join and εε-Join, to process ancestor-descendant path expressions. The experimental results showed an order of magnitude performance improvement over conventional methods. We further propose the partition-based algorithms, which can be chosen by a query optimizer according to the characteristics of the input data. For complex path expressions with branches, we propose the Containment B⁺-tree (CB-tree) index and the IndexTwig algorithm. The CB-tree, which is an extension of the B⁺-tree, supports both the containment query and the reverse containment query. It is an effective indexing scheme for XML documents with or without a small number of recursions. The proposed IndexTwig algorithm works with any index supporting containment and reverse containment queries, such as the CB-tree. We also introduce a simplified output model, which outputs only the necessary result of a path expression. The output model enables the Fast Existence Test (FET) optimization to skip unnecessary data and avoid generating unwanted results. Also in this dissertation, we introduce techniques to process the predicates in XML path expressions using the EVR-tree. The EVR-tree combines the advantages of indexing on values or elements individually using B+-trees. It utilizes the high value selectivity and/or high structural selectivity, and provides ordered element access by using a priority queue. At the end of the dissertation, we introduce the XISS/R system, which is an implementation of the XML Indexing and Storage System (XISS) on top of a relational database. The XISS/R includes a web-based user interface and a XPath query engine to translate XPath queries into efficient SQL statements.
  • Micro-Specialization: Dynamic Code Specialization in DBMSes

    Zhang, Rui (The University of Arizona., 2012)
    Database management systems (DBMSes) form a cornerstone of modern IT infrastructure, and it is essential that they have excellent performance. In this research, we exploit the opportunities of applying dynamic code specialization to DBMSes, particularly by focusing on runtime invariant present in DBMSes during query evaluation. Query evaluation involves extensive references to the relational schema, predicate values, and join types, which are all invariant during query evaluation, and thus are subject to dynamic value-based code specialization. We observe that DBMSes are general in the sense that they must contend with arbitrary schemas, queries, and modifications; this generality is implemented using runtime metadata lookups and tests that ensure that control is channelled to the appropriate code in all cases. Unfortunately, these lookups and tests are carried out even when information is available that renders some of these operations superfluous, leading to unnecessary runtime overheads. We introduce micro-specialization, an approach that uses relation- and query-specific information to specialize the DBMS code at runtime and thereby eliminate some of these overheads. We develop a taxonomy of approaches and specialization times and propose a general architecture that isolates most of the creation and execution of the specialized code sequences in a separate DBMS-independent module. We show that this approach requires minimal changes to a DBMS and can improve the performance simultaneously across a wide range of queries, modifications, and bulk-loading, in terms of storage, CPU usage, and I/O time of the TPC-H and TPC-C benchmarks. We also discuss an integrated development environment that helps DBMS developers apply micro-specializations to identified target code sequences.
  • Supporting the Procedural Component of Query Languages over Time-Varying Data

    Gao, Dengfeng (The University of Arizona., 2009)
    As everything in the real world changes over time, the ability to model thistemporal dimension of the real world is essential to many computerapplications. Almost every database application involves the management oftemporal data. This applies not only to relational data but also to any datathat models the real world including XML data. Expressing queries ontime-varying (relational or XML) data by using standard query language (SQLor XQuery) is more difficult than writing queries on nontemporal data.In this dissertation, we present minimal valid-time extensions to XQueryand SQL/PSM, focusing on the procedural aspect of the two query languagesand efficient evaluation of sequenced queries.For XQuery, we add valid time support to it by minimally extendingthe syntax and semantics of XQuery. We adopt a stratum approach which maps a&tauXQuery query to a conventional XQuery. The first part of the dissertationfocuses on how to performthis mapping, in particular, on mapping sequenced queries, which are byfar the most challenging. The critical issue of supporting sequenced queries(in any query language) is time-slicing the input data while retaining periodtimestamping. Timestamps are distributed throughout anXML document, rather than uniformly in tuples, complicating the temporalslicing while also providing opportunities for optimization. We propose fiveoptimizations of our initial maximally-fragmented time-slicing approach:selected node slicing, copy-based per-expression slicing, in-placeper-expression slicing, and idiomatic slicing, each of which reducesthe number of constant periods over which the query is evaluated.We also extend a conventional XML query benchmark to effect a temporal XMLquery benchmark. Experiments on this benchmark show that in-place slicingis the best. We then apply the approaches used in &tauXQuery to temporal SQL/PSM.The stratum architecture and most of the time-slicing techniques work fortemporal SQL/PSM. Empirical comparison is performed by running a variety of temporalqueries.