ABSTRACT

Encryption has been the standard for protecting data from an attack. However, in the database service provider model, there are needs to execute SQL queries over encrypted database because the server is not trusted. There have been several encryption techniques that enable users to execute SQL queries over encrypted data. Current available techniques do not operate efficiently and do not support all SQL operators. In this paper, we present an optimized framework that is able to execute SQL queries over encrypted database in an efficient manner and support more SQL operators.  

  1. INTRODUCTION

Database as a service is a current technology that lets people store their database in a cloud server and access it from anywhere at anytime via internet network. Every database operation can be sent to cloud server, which has more powerful computing resources than our usual local machines. Moreover, our local machines can get results quicker regardless of the computing resources available locally. These benefits have inspired the growth of cloud database adoption. However, this growth has not achieved its expected potential due to security and privacy concerns. Users are concerned about the safety of their sensitive database stored on the cloud server.

Encryption has been the standard for protecting data from an attack. However, in the database service provider model, there are needs to execute SQL queries over encrypted database because the server is not trusted. The client will lose the benefit of cloud database if he needs to decrypt it before executing a query. The reason behind losing that benefit is that, in general, encryption and decryption computations are more costly than query computation. Moreover, it is desirable that those operations require less work on the client because the client has limited computation power compared to the server.

There have been several encryption techniques that enable users to execute SQL queries over encrypted data. However, current available techniques do not operate efficiently and do not support all SQL operators. In this paper, we present an optimized framework that is able to execute SQL queries over encrypted cloud database in an efficient manner and support more SQL operators.

Following are the organization of this paper. In the next section, we will explain related works that will be used partially in our framework. In section 3, we will describe the reasoning behind each component in the framework and the architecture of the framework. In section 4, we will discuss our contribution over previous works. Then in section 5 we will summarize the conclusion followed by references.

  1. RELATED WORK

Hacigumus et al. developed a method to execute SQL query over encrypted data by introducing mapping function and row level encryption. The client will first compose an original query. Then a query translator will translate the original query from the client using the mapping function to make it possible to execute query on encrypted data at the server. Then the client will receive superset of the query results, decrypt and filter them to get the answer to the original query [1]. While this is a novel approach, it has several limitations such as: producing significant numbers of wrong query results where it cannot be filtered accurately without first decrypting it. In addition, it lacks support for several SQL operators such as aggregation function (SUM, COUNT, AVG, MIN, and MAX) and pattern query (LIKE).

Ravan et al. compared current available methods for executing query over encrypted data in a database as a service model (cloud database). The comparison includes performance, supported queries, and security quality [2].

Hacigumus et al. have developed an improvement over their previous work that enables users to execute SQL aggregation queries over numerical data in encrypted database. Aggregation queries here refer to SQL aggregation functions, which are SUM, COUNT, AVG, MIN, and MAX. They did it by modifying the mapping functions involving privacy homomorphism [3].

Hacigumus et al. proposed an optimization method to reduce the amount of work at the client side by partitioning the query. Then the server will send the results of first partition of the query. Then the client will decrypt and filter the results and send it back to the server. Using the filtered results of the first partition of the query, the server side can compute more accurate results on the rest of the query [4].

            Wang et al. have developed a better method to encrypt character data that enable users to execute SQL LIKE operator over encrypted character data [5][6].

  1. OPTIMIZED FRAMEWORK

3.1. Framework Component Background

Our goal is to balance in achieving accuracy and saving time. Therefore, before describing our framework architecture, following are the reasoning for including each component in our framework.

3.1.1 Data Item Level Encryption and Increased Number of Buckets

In this framework, we will use data item level encryption instead of using row level encryption. It will increase the response time at the server because the server will search for individual item one by one. However, it will save a lot of time at the client side since it will return more accurate query results to the client. In addition, to reduce the amount of work at the server side, we will also apply query optimization, which will be explained in the next section.

As described in [1], a bucket is a partition in the encryption mapping that contains several records in the same range. We will also use increased number of buckets for better query results accuracy and better performance. Increased number of buckets in the system will increase query results accuracy because each bucket will contain smaller number of records. Therefore, each query will return smaller number of wrong records from each bucket. It will also achieve better performance because the mapping condition will point to more specific bucket. In other words, increased number of buckets will achieve better filtering at the server side thus reducing response time at the server site. Moreover, it will need lower decryption computation cost at the client because less records will need to be decrypted. We will use 1,500 buckets by default in the system because it is the optimum number of buckets. It has been proven in [1] that adding number of buckets more than 1,500 will not achieve more performance.

3.1.2 Condition Checker and Query Optimization

Query optimization works by sending results of left bottom part of the query in the query tree from the server to the client. Then the client decrypts and filters it and encrypts it again and sends it back to the server. Then the server computes more accurate query results on the rest of the query. The results that have been computed by the server still not a hundred percent accurate; however, the amount of records computed will have been reduced. To reduce the communication cost between the client and the server, this will be done only once. This will be more efficient if the query has only on condition.

However, this query optimization will only help in the case of AND condition where number of records processed will imply multiplication of involved records. While in the case of OR condition, number of records processed will imply addition of involved records. Therefore, there will be no efficiency in the case of OR condition. In that regard, we will implement a condition checker before executing query optimization. Then we will only apply query optimization on the first left part of the query that contains AND condition in it.

3.1.3 Probability Checker and Most Used Attributes Table

In order to fetch most frequently used items faster, we will implement probability checker and most used attributes table. The probability checker will monitor and calculate the number of usage of each attribute. Then it will use its dynamic probability, static probability, and a threshold to copy most used items to most used attributes table. Then whenever a query request most used attributes, it will direct the query to the most used attributes table instead of the original table. In addition, if the number of usage of an item is decreasing below the threshold, it will remove it from the most used attributes table. As a result, the most used attributes table will always be updated.

3.1.4 Character Encryption

We will use different encryption for character data as described in [5][6]. This different mapping function for the encryption will be saved at the query translator and the metadata. This method works by turning the character data into characteristic values via a characteristic function, and store them in an additional field to support pattern query, e.g. SQL “Like” operator. Following are details of this method.

Definition 1. If there is a function PC: s1 → s2, where s1 denotes a string of characters c1c2 . . . cn, s2 is a string of bits b0b1 . . . bm−1, bi=0, 0 ≤ i ≤ m −1, n < m. H denotes a hash function which encodes each connected character pairs c1c2, c2c3, . . . , cn−1cn of s1 into a number between 0 and m−1, then the “signature” of the line c1c2 . . . cn is the string of m bits b0b1 . . . bm−1, where bi = 1 if and only if H(cj, cj+1) = i for some j. We call PC the Pairs Coding Function.

For example string s1 is the word “abcehklst”, a hash function maps (ab, bc, . . . , st) into an integer between 0 and 15, then s2 = PC(s1) = PC (abcehklst) = (0010100010101001)2, where there are six bits whose values are 1 in s2. The reason is that some of eight pairs have the same hash value.

Definition 2. For each relation schema R (X1, . . . , Xr, . . . Xn), where Xr field need to be encrypted, the corresponding encrypted relation schema is RE(X1, . . . , XEr , . . . Xn,XSr), where, XEr is the encrypted field, XSr = PC(Xr) and PC is the pairs coding function. We call XSr Pairs Coding Field of Xr, which is also called Index Field.

3.2 Framework Operation Description

Following are the step-by-step operation of our framework as also can be seen in Figure 1.

Step 1

The query translator will translate original query composed by the client. The query translator will utilize the metadata saved at the client side. The metadata is saved at the client because it contains sensitive data, which includes all the mapping functions. The query translator and metadata will support all the improvements previously described, such as character encryption and aggregation functions. Then the translated query will be sent to the server side.

Step 2

First task at the server side would be to check for ‘AND’ and ‘OR’ condition. Then the framework will send first left part of the query that contains ‘AND’ condition to the optimized query component. Otherwise, if there is only ‘OR’ condition, the query will be sent to standard query component.

Step 3

The results of either optimized query or standard query will then be sent to the probability checker before sending the results back to the client. The probability checker will then send the query to the most used attributes table if and only if the query contains records from the most used attributes table. Otherwise, if the query contains attributes not available at the most used attributes table, the probability checker will point it to the original table. It will also automatically monitor and calculate the number of usage of each attribute.

Step 4

And then it will check if there is LIKE operator in the query before actually performing the query. If there is a like operator in the query, then the framework will call the character encryption mapping algorithm in order to perform the LIKE query as described in Section 3.1.4.

Amin - Nidhal - Framework 5

Figure 1. Framework Architecture

  1. DISCUSSION

Our framework contributes following improvements over previous works:

  1. More accurate query results and better performance without sacrificing security.
  2. Less work on the client.
  3. Support more query type such as pattern query (SQL LIKE operator) and SQL aggregation functions, which are SUM, COUNT, AVG, MIN, and MAX.
  1. CONCLUSION

Encryption has been the standard for protecting data from an attack. However, in the database service provider model, there are needs to execute SQL queries over encrypted database because the server is not trusted. In this paper, we have shown an optimized framework that is able to execute SQL queries over encrypted database in more efficient manner and support more SQL operators than previous works. To achieve those goals, we have developed a framework combining our ideas and partial improvement ideas from other research.

REFERENCES

[1]      H. Hacigümüş, B. Iyer, C. Li, and S. Mehrotra, “Executing SQL over encrypted data in the database-service-provider model,” Proc. 2002 ACM SIGMOD Int. Conf. Manag. data SIGMOD 02, vol. 7, pp. 216–227, 2002.

[2]      R. R. Ravan, N. B. Idris, and Z. Mehrabani, “A survey on querying encrypted data for database as a service,” Proc. – 2013 Int. Conf. Cyber-Enabled Distrib. Comput. Knowl. Discov. CyberC 2013, pp. 14–18, 2013.

[3]       H. Hacigumus, B. Iyer, and S. Mehrotra, “Efficient Execution of Aggregation Queries over Encrypted Relational Databases,” Lect. Notes Comput. Sci., vol. 2973, no. 2, pp. 125–136, 2004.

[4]       H. Hacigümüs, B. Iyer, and S. Mehrotra, “Query Optimization in Encrypted Database Systems,” in Proceedings of the 10th International Conference on Database Systems for Advanced Applications, 2005, pp. 43–55.

[5]      Z. F. Wang, W. Wang, and B. L. Shi, “Storage and query over encrypted character and numerical data in database,” Proc. – Fifth Int. Conf. Comput. Inf. Technol. CIT 2005, vol. 2005, no. 4, pp. 77–81, 2005.

[6]       Z.-F. Wang, J. Dai, W. Wang, and B.-L. Shi, “Fast Query over Encrypted Character Data in Database,” Commun. Inf. Syst., vol. 04, no. 4, pp. 289–300, 2004.