Publication by Year Web Intelligence and Data Mining Laboratory TAAI2015

(1)

Improving POI Search Effectiveness by

Integrating Multiple Search Results

整合多種搜尋引擎結果以提高POI搜尋的準確性

鄭仲庭、莊秀敏、張嘉惠中央大學資訊工程學系 2015.11.21

(2)

Outline

►

Introduction

►

Related work

►

System Architecture

►

Experiment

►

Conclusion

疾疾店家現身

(3)

Introduction

►Map search is very common in mobile application

◆Either query for store names or keywords (i.e., category)

◆Locating POIs (points of interest) for phone call, business hours, address, etc.

Challenge

►The number of POIs is insufficient

►Ide tify use s’ i te t fo POI sea h Goal

► Provide an effective POI search service

(4)

Example of General Query for Google Map

Use ’s Lo atio : Tainan University nearby

Query: Japanese cuisine Popular, Relevant, Distance

(5)

Example of General Query for Yahoo Local Search

Use ’s Lo atio : Tainan University nearby Query: Japanese cuisine

Best Match

Highest Rating Distance

(6)

Example of Specific Query for Bing Map

Query: Hamburg

Search Scope: Tainan

Local search Global search

Query: Hamburg

Search Scope: World

(7)

Goal

Problem

► The number of POIs is insufficient

► Ranking by relevance vs. distance

► Local search vs. global search

► No satisfied result for a query Solution

► Multiple sources integration

► Intelligent search method

► POI ranking model

► Query expansion

(8)

Related Work

► POI Extraction from Web

◆ Mining the Web for points of interest [SIGIR 2012]

◆ Extraction, integration and analysis of crowdsourced points of interest from multiple Web sources [ACM SIGSPATIAL 2014]

► Information Retrieval & Ranking

◆ Web search without 'stupid' results [SIGIR 2014]

◆ Learning temporal-dependent ranking models [SIGIR 2014]

► Query Expansion

◆ Adaptive query suggestion for difficult queries [SIGIR 2012]

◆ Massive query expansion by exploiting graph knowledge bases for image retrieval [ICMR 2014]

(9)

System Architecture

Keyword-Tag Matching POI Ranking

- - - Query

Suggestion POI DB

Solr

(Offline DB) Google

Places

Query ^Output

Corpus

Ranking Model Online

Search

Keyword- Tag Graph User Log

(10)

Multiple Sources Integration

►Integrate three search-results: Solr, Google Maps, Online-search 1.Utilize Solr 4.0 to index the offline database

• Provide POI-search by local search or global search mode

• Data source includes Yellow Page and locations of Facebook 2.Request Google Places API

3.Online-search by Google SE with query + use ’s location

POI Search Server

Page download Online-search

by Google Query &

Location

Snippets _POI Extraction

URLs

POI pairing Address,

POI name

Output

Online Module

(11)

POI Search Method

Algorithm Search(q, r, GPS, i)

1 Input: user query q, use ’s GPS, search scope r 2 Output: POI list

3 Initial: i is constant, i>0; _δ=0.5 4 If (i = 0) EXIT

5 MS = Solr ∪ Google Place API ∪ Online search 6 C = Ranking(MS, _δ)

7 If (C = null)

8 Search(q, r×3, GPS, i-1) 9 Else

10 C order by the relevance and distance

►Concern about search scope and ranking criterion,

– Local search with an expanding scope until global search – Ranking by POI relevance and distance

• If POI relevance are the same, they will be ranked by distance.

(12)

Problem Statement

►Filter irrelevant POIs

◆Not only increase relevant POIs, but also increase irrelevant POIs.

◆Some noisy from Google Map, Solr DB and online search

►Binary classification problem

◆

^f

^{(q, POI) =}_, ue y & POI a e eleva t , otherwise

►Ranking problem

◆Given a set of POIs for query q, it returns the rank of POI_i according their relevance.

(13)

Features of POI Ranking Model

Id Name Descriptions

1 MatchWord ^��ℎ ^�^�^,

� � �+

�=

2 MatchPosition ^{��ℎ� �}^�^, ^�

� � �+

�=

3 Cosine(q,T) ^��

�� ,

�= ^∗ _��

��

�= ^∗�= _��

4 LCS_q(q,T) ^{� �,}

�� ℎ �

5 LCS_T(q,T) ^{� �,}

�� ℎ

6 CT(q,T) # of click-through for pair(q,T)

• ^qde otes use ’s ue y

• T denotes POI-name where i is the position of terms

• V is the vector of terms for q and T, respectively.

(14)

Query Expansion

• Bi-partite Graph Construction

Input: A corpus about the descriptions of POIs.

Output: A set of the relations between words and tags



Topic modeling by LDA (latent Dirichlet allocation)



For each topic, use the top 100 words for mapping to POIs



Get the tags from the corresponding sub-categories of POIs



Build a bipartite graph between the words and the tags

corpus ^LDA

Mapping w

w w

POI POI POI T₁

T_i

tag tag tag

… …

Membership

(15)

Bi-partite Graph Construction (cont.)

Relation Property

► Construct relations(edges) between words and tags by POIs

► Each edge has a weight which represents the number of the words corresponding to the tag.

Tag Suggestion

►Recommend the top three tags with the higher weight

►A tag groups by the relevant words

►Each tag maps to an average of 42 words

w w

w

tag tag

tag

…

10

8 3 5

(16)

Experimental Dataset

►We crawled SuperhiPage and iPeen during 2013.07 to 2013.08 for the offline database (i.e., the corpus of bi-partite graph)

► POI database has 29 categories

Category _美食 _生活 _旅遊 Total (29)

# of POIs 68,366 73,766 11,207 995,748

# of sentences 954,848 1,200,288 140,769 8,647,176

# of words 55,477,352 65, 869,563 8,148,446 531,955,248

# of distinct word 1,001,037 922,884 299,488 12,024,798

Avg. Sent. Length 58 54 57 53

# of Avg. Topic 48 49 42 47

# of Total Tags 131 151 107 1,290

(17)

Evaluation

► Adopt NDCG@10 to evaluate the IR performance

– rating(i) : the relevance rating of the POI at position i

►We compare our POI search system with three map services

– Wiki apia, What’s the Nu e , Google

►Experimental setting

– Urban: 火車站(台北、桃園、台中、高雄)

– Rural: 東華大學、暨南大學、中正大學、屏東科技大學

– General query: 20 common queries (i.e., 餐廳、旅館、診 )

– Specific query：20 common stores (i.e., 星巴克、家樂福、嘟嘟 )

NDCG@10 = ^DCG@

IDCG@ ^{DCG@10 =}

rating(i) log � +

�=

(18)

Improved Performance of our POI Search

►^Solr

– Offline database

►^MS

– Solr + Google Map API + online search module

►Combinations

– General + Urban

– General + Rural

– Specific + Urban

– Specific + Rural

(19)

Performance Comparison of POI Search Services

►To evaluate the effectiveness of our system, we compared with other three map search services by NDCG.

►Use 40 queries (20 general and 20 specific queries) for urban and rural areas within a 10 km of the scope.

(20)

Effectiveness of Recommend Tags

►35 users conduct an evaluation for 600 queries

– A query has three recommended tags, which is relevant or not

►Experimental result

– A query has an average of 17 tags click-through

– The CTR is 38% for the first recommended tag

– The accumulative CTR is 53% for three recommend tags

(21)

Conclusion & Future Work

►The map search service outpe fo s Wiki apia a d What’s the Number, and is comparable to Google Map.

►We collect more relevant POIs by integrating multiple sources

►Provide more relevant queries for users

Future work

►Recommend more suitable queries for users

►Deduplication of POIs for aliases

►POI relation verification

►POI category classification

(22)

Publication by Year Web Intelligence and Data Mining Laboratory TAAI2015

Improving POI Search Effectiveness by

Integrating Multiple Search Results

整合多種搜尋引擎結果以提高POI搜尋的準確性

Outline

Introduction

Related work

System Architecture

Experiment

Conclusion

Introduction

Example of General Query for Google Map

Example of General Query for Yahoo Local Search

Example of Specific Query for Bing Map

Goal

Related Work

System Architecture

Multiple Sources Integration

POI Search Method

Problem Statement

f

Features of POI Ranking Model

Query Expansion

• Bi-partite Graph Construction









Bi-partite Graph Construction (cont.)

Experimental Dataset

Evaluation

Improved Performance of our POI Search

Performance Comparison of POI Search Services

Effectiveness of Recommend Tags

Conclusion & Future Work

Future work

THANK YOU FOR YOUR LISTENING

^f