On the Effectiveness of Private Information Retrieval Protocols
Due to the exponential growth of information on the Internet, Web Search Engines (WSEs) have become indispensable for the eective retrieval of information. In order to provide the results most relevant to the user, WSEs store the users’ prole that may contain sensitive information including the users age, gender, health condition, personal interests, religious or political aliation, etc. However, this raises serious concerns for the privacy of the user since the identity of a user may get exposed and misused by third parties. To address the issue of privacy infringement while using WSE, researchers have proposed several techniques such as anonymizing networks, prole obfuscation, private information retrieval (PIR) protocols. In the anonymizing network, the user’s query is forwarded to the WSE through a chain of routers. In prole obfuscation technique, fake queries are forwarded with the user’s query in order to mislead the WSE. While one well-known solution to preserve privacy is a Private Information Retrieval (PIR) protocol called Useless User Prole (UUP) that issues the queries via Peer nodes, thereby hiding user’s identity from the WSE. Despite the fact that the aforementioned methods improve the user privacy, yet some previous studies using a machine learning algorithm and user prole show that an adverse WSE is able to break prole obfuscation and anonymizing network methods. However, it is not clear if an adverse WSE is able to break UUP using machine learning techniques.
This thesis investigates the protection oered by UUP. To evaluate the eectiveness of UUP in privacy protection, we propose QuPiD (Query Prole Distance) Attack. QuPiD Attack is a machine learning based attack that determines the distance between the user’s Prole (web search history) and upcoming query using a novel feature vector. The proposed feature vector is composed of a set of numeric values of 10 major topics acquired from uClassify service. The results show that the proposed QuPiD attack associates more than 40% queries to the correct user with a precision of over 70%. Moreover, during the investigations, the proposed QuPiD attack behave unexpectedly in some cases, aecting its precision and recall. Upon detail investigation, three reasons are found responsible for that behavior: (I) variable similarity score between incoming query and user prole, (ii) lack of traces of incoming query in the user prole, and (iii) presence of more than one matching user proles. We call this behavior as ProQSim (Prole Query Similarity) Eect.
Based on the results, it is concluded that UUP does not provide satisfactory protection to users. We, therefore, develop PEM (Privacy Exposure Measure), a technique that minimizes the privacy exposure of a user while using the PIR protocols. PEM assesses the similarity between the user’s prole and query before posting to WSE and assists the user to avoid further privacy exposure. The privacy evaluation experiments with PEM (Privacy Exposure Measure) demonstrates that a user prole created with a web search engine through PEM is 95.97% different as compare to the usual user’s prole and thus oers more privacy to the user even in the case of machine-learning attack for mpeT = 10%. (Maximum privacy exposure (mpeT) is the threshold value set by the user for exposure of his/her prole to the web search engine). We have established empirically that our proposed privacy protection mechanisms (PEM) signicantly improves performance with regard to preserving privacy. Finally, we conclude this thesis with our perspectives and point out some key future research issues in the areas of PIR protocols, adverse models, and our suggested module PEM.