Title

Optimization of Semantic Cache Query Processing System

Abstract

High availability and low latencies of data are major requirements in accessing contemporary large and networked databases. However, it becomes difficult to achieve high availability and reduced data access latency with unreliable connectivity and limited bandwidth. These two requirements become crucial in ubiquitous environment when data is required all the times and everywhere. Disconnectivity with data is more likely with multimedia streaming, especially with frequent queries and low bandwidth. Inter-queries common data is one of the important qualities of consecutive queries. Cache is one of the promising solutions to improve availability and reduce latencies of remotely accessed data by storing and reusing the results of already processed similar queries. Conventional cache lacks in partial reuse of already accessed & locally stored data, while semantic cache overcomes the limitation of conventional cache by reusing the data for partial overlapped queries by storing description of queries with results. There is a need of an efficient cache system to improve the availability, reduce the data access latencies and the network traffic by reusing the already stored results for fully and partially overlapped queries. An efficient cache system demands efficient query processing and cache management.

In this study, a qualitative benchmark with four qualities as Accuracy, Increased Data Availability, Reduced Network Traffic and Reduced Data Access Latency is proposed to evaluate a semantic cache system, especially from query processing point of view. The qualitative benchmark is then converted into six quantitative parameters (Semantics and Indexing Structure IS, Generation of Amending Query GoAQ, Zero Level Rejection ZLR, Predicate Matching, SELECT CLAUSE Handling, Complexity of Query Matching CoQM) that help in measuring the efficiency of a query processing algorithm. As the result of evaluation, it is discovered that existing algorithms for query trimming can be optimized.

Architecture of a semantic cache system is proposed to meet the benchmark criteria. One of the important deficiencies observed in the existing system is the storage of query semantics in segments (indexing of the semantics) and the organization of these segments. Therefore, an appropriate indexing scheme to store the semantics of queries is needed to reduce query matching time. In the existing indexing schemes the number of segments grows faster than exponential, i.e., more than 2n. The semantic matching of a user query with number of segments more than 2n will be exponential and not feasible for a large value of n. The proposed schema-based indexing scheme is of polynomial time complexity for the matching process.

Another important deficiency observed is the large complexity of query trimming algorithm which is responsible to filter the semantics of incoming query into local cache and remote query. A rule based algorithm is proposed for query trimming that is faster and less complex than existing satisfiability/implication algorithms. The proposed trimming algorithm is more powerful in finding the hidden/implicit semantics, too.

The significance of the proposed algorithms is justified by examples/case studies in comparison with the previous algorithms. Correctness of proposed algorithms is tested by implementing a prototype and executing a number of case studies. The final outcomes revealed that the proposed scheme has achieved sufficient accuracy, increased availability, reduced network traffic, and reduced data access latency.

Download full paper