• 検索結果がありません。

File Information Additional Information Type Rights(URL) Doc URL Issue Date Citation Author(s) Title

N/A
N/A
Protected

Academic year: 2021

シェア "File Information Additional Information Type Rights(URL) Doc URL Issue Date Citation Author(s) Title"

Copied!
3
0
0

読み込み中.... (全文を見る)

全文

(1)

Instructions for use

Title Query-Aware Locality Sensitive Hashing for Similarity Search Problems [an abstract of dissertation and a summary of dissertation review]

Author(s) 陸, 可鏡

Citation 北海道大学. 博士(情報科学) 甲第14585号

Issue Date 2021-03-25

Doc URL http://hdl.handle.net/2115/81261

Rights(URL) https://creativecommons.org/licenses/by/4.0/

Type theses (doctoral - abstract and summary of review)

Additional Information There are other files related to this item in HUSCAP. Check the above URL.

File Information Kakyo̲Riku̲abstract.pdf (論文内容の要旨)

Hokkaido University Collection of Scholarly and Academic Papers : HUSCAP

(2)

学 位 論 文 内 容 の 要 旨

博士の専攻分野の名称  博士(情報科学)  氏名 陸 可鏡 学 位 論 文 題 名

Query-Aware Locality Sensitive Hashing for Similarity Search Problems

(類似検索問題に対するクエリ・アウェア局所鋭敏ハッシング)

Similarity search is an old but fundamental research topic which has various applications in database, data mining, information retrieval and machine learning. Although the goal of similarity search, that is, finding most similar points of issued queries in the dataset given a specified measure, is quite straightforward, it is very challenging to solve this problem efficiently due to the following two reasons. (1) In recent years, as the data size increases rapidly, we need to deal with up to billion-scale datasets, on which many traditional techniques lose the eectiveness. (2) Due to the phenomenon called the curse of dimensionality, the distances among data points become much closer as the dimension increases, which makes most of tree structures perform even worse than the linear scan. In order to overcome these two obstacles, many researchers have devoted to find more ecient techniques in the past two decades and proposed various algorithms.

Although these algorithms vary much, the following two performance metrics always attract par- ticular attention in their designs: one is the accuracy, which is often indicated by the percentage of true points which are successfully found, also called, the recall rate, and the other one is the eciency which is often indicated by the running time of searching. In order to make the accuracy more con- trollable, researchers have developed some techniques with theoretical guarantees to satisfy specified success probabilities. Among these techniques, Locality Sensitive Hashing (LSH) draws a particular interest due to its attractive query performance and robust probability guarantee. The basic idea of LSH is to build a group of projected vectors and select the most promising candidates only from the projection information of those vectors. Such an idea becomes the starting point of the research work introduced in this paper.

Since the solutions of similarity search problems vary highly over different spaces and different measures, in this paper, we particularly focus on the most important two situations: the Euclidean space equipped with2 metric and the inner product space. In practice, many applications are intimately related to either of these two situations. Accordingly, this thesis is divided into two major parts.

In the first part, we focus on the 2 metric and aim to solve approximate nearest neighbor search problems. For this purpose, we propose two disk-based LSH variants called VHP and R2LSH, which are highly inspired by the concept of query-aware search window. By exploiting more accurate pro- jection information of the generated one-dimensional projected vectors, these two methods build more eective query-centric search regions in the projected spaces. Specifically, R2LSH builds multiple query-centric balls in a group of two-dimensional projected subspaces, while VHP utilizes the infor- mation of one-dimensional distances between the query and data points on every projected vector.

Both of these two methods can prune false points more accurately than the existing query-aware LSH

(3)

variants.

In the second part, we focus on the inner product space and solve the maximum inner product search problem. For this purpose, we propose a query-aware LSH variant called AdaLSH which is built on a multi-ring structure. The basic idea of AdaLSH is that, based on dierent norms of data points, we can control adaptively the width of the query-aware search windows such that those points having larger norms can be examined more carefully to ensure the accuracy. In order to realize this idea, we design a multi-round search strategy such that query-aware search windows in different rings extend at dierent paces, which not only achieves the goal mentioned above, but also significantly decreases the total searching cost.

Since all three proposed methods are based on LSH, all of the proposed algorithms possess prob- ability guarantees in accuracy. Extensive experiments confirm the superiority of these methods over existing state-of-the-art methods. In particular, R2LSH and VHP scale up to billion-scale datasets, because they are disk-based.

参照

関連したドキュメント

Comparing the Gauss-Jordan-based algorithm and the algorithm presented in [5], which is based on the LU factorization of the Laplacian matrix, we note that despite the fact that

2 Combining the lemma 5.4 with the main theorem of [SW1], we immediately obtain the following corollary.. Corollary 5.5 Let l > 3 be

The first case is the Whitham equation, where numerical evidence points to the conclusion that the main bifurcation branch features three distinct points of interest, namely a

In this paper, we establish the boundedness of Littlewood- Paley g-functions on Lebesgue spaces, BMO-type spaces, and Hardy spaces over non-homogeneous metric measure spaces

“Breuil-M´ezard conjecture and modularity lifting for potentially semistable deformations after

Then it follows immediately from a suitable version of “Hensel’s Lemma” [cf., e.g., the argument of [4], Lemma 2.1] that S may be obtained, as the notation suggests, as the m A

In this paper we focus on the relation existing between a (singular) projective hypersurface and the 0-th local cohomology of its jacobian ring.. Most of the results we will present

Applying the frame characterization, we will then obtain some estimates of entropy numbers for the compact embeddings between Besov spaces or between Triebel–Lizorkin spaces and we