• In-home solid fuel use and cardiovascular disease: a cross-sectional analysis of the Shanghai Putuo study

      Lee, Mi-Sun; Hang, Jing-qing; Zhang, Feng-ying; Dai, He-lian; Su, Li; Christiani, David C (2013-09-23)
    • Indexing and path query processing for XML data

      Li, Quanzhong (The University of Arizona., 2004)
      XML has emerged as a new standard for information representation and exchange on the Internet. To efficiently process XML data, we propose the extended preorder numbering scheme, which determines the ancestor-descendant relationship between nodes in the hierarchy of XML data in constant time, and adapts to the dynamics of XML data by allocating extra space. Based on this numbering scheme, we propose sort-merge based algorithms, εA-Join and εε-Join, to process ancestor-descendant path expressions. The experimental results showed an order of magnitude performance improvement over conventional methods. We further propose the partition-based algorithms, which can be chosen by a query optimizer according to the characteristics of the input data. For complex path expressions with branches, we propose the Containment B⁺-tree (CB-tree) index and the IndexTwig algorithm. The CB-tree, which is an extension of the B⁺-tree, supports both the containment query and the reverse containment query. It is an effective indexing scheme for XML documents with or without a small number of recursions. The proposed IndexTwig algorithm works with any index supporting containment and reverse containment queries, such as the CB-tree. We also introduce a simplified output model, which outputs only the necessary result of a path expression. The output model enables the Fast Existence Test (FET) optimization to skip unnecessary data and avoid generating unwanted results. Also in this dissertation, we introduce techniques to process the predicates in XML path expressions using the EVR-tree. The EVR-tree combines the advantages of indexing on values or elements individually using B+-trees. It utilizes the high value selectivity and/or high structural selectivity, and provides ordered element access by using a priority queue. At the end of the dissertation, we introduce the XISS/R system, which is an implementation of the XML Indexing and Storage System (XISS) on top of a relational database. The XISS/R includes a web-based user interface and a XPath query engine to translate XPath queries into efficient SQL statements.
    • Micro-Specialization: Dynamic Code Specialization in DBMSes

      Zhang, Rui (The University of Arizona., 2012)
      Database management systems (DBMSes) form a cornerstone of modern IT infrastructure, and it is essential that they have excellent performance. In this research, we exploit the opportunities of applying dynamic code specialization to DBMSes, particularly by focusing on runtime invariant present in DBMSes during query evaluation. Query evaluation involves extensive references to the relational schema, predicate values, and join types, which are all invariant during query evaluation, and thus are subject to dynamic value-based code specialization. We observe that DBMSes are general in the sense that they must contend with arbitrary schemas, queries, and modifications; this generality is implemented using runtime metadata lookups and tests that ensure that control is channelled to the appropriate code in all cases. Unfortunately, these lookups and tests are carried out even when information is available that renders some of these operations superfluous, leading to unnecessary runtime overheads. We introduce micro-specialization, an approach that uses relation- and query-specific information to specialize the DBMS code at runtime and thereby eliminate some of these overheads. We develop a taxonomy of approaches and specialization times and propose a general architecture that isolates most of the creation and execution of the specialized code sequences in a separate DBMS-independent module. We show that this approach requires minimal changes to a DBMS and can improve the performance simultaneously across a wide range of queries, modifications, and bulk-loading, in terms of storage, CPU usage, and I/O time of the TPC-H and TPC-C benchmarks. We also discuss an integrated development environment that helps DBMS developers apply micro-specializations to identified target code sequences.
    • Supporting the Procedural Component of Query Languages over Time-Varying Data

      Gao, Dengfeng (The University of Arizona., 2009)
      As everything in the real world changes over time, the ability to model thistemporal dimension of the real world is essential to many computerapplications. Almost every database application involves the management oftemporal data. This applies not only to relational data but also to any datathat models the real world including XML data. Expressing queries ontime-varying (relational or XML) data by using standard query language (SQLor XQuery) is more difficult than writing queries on nontemporal data.In this dissertation, we present minimal valid-time extensions to XQueryand SQL/PSM, focusing on the procedural aspect of the two query languagesand efficient evaluation of sequenced queries.For XQuery, we add valid time support to it by minimally extendingthe syntax and semantics of XQuery. We adopt a stratum approach which maps a&tauXQuery query to a conventional XQuery. The first part of the dissertationfocuses on how to performthis mapping, in particular, on mapping sequenced queries, which are byfar the most challenging. The critical issue of supporting sequenced queries(in any query language) is time-slicing the input data while retaining periodtimestamping. Timestamps are distributed throughout anXML document, rather than uniformly in tuples, complicating the temporalslicing while also providing opportunities for optimization. We propose fiveoptimizations of our initial maximally-fragmented time-slicing approach:selected node slicing, copy-based per-expression slicing, in-placeper-expression slicing, and idiomatic slicing, each of which reducesthe number of constant periods over which the query is evaluated.We also extend a conventional XML query benchmark to effect a temporal XMLquery benchmark. Experiments on this benchmark show that in-place slicingis the best. We then apply the approaches used in &tauXQuery to temporal SQL/PSM.The stratum architecture and most of the time-slicing techniques work fortemporal SQL/PSM. Empirical comparison is performed by running a variety of temporalqueries.