Improving POI Search Effectiveness by
Integrating Multiple Search Results
整合多種搜尋引擎結果以提高POI搜尋的準確性
鄭仲庭、莊秀敏、張嘉惠 中央大學資訊工程學系 2015.11.21
Outline
►
Introduction
►
Related work
►
System Architecture
►
Experiment
►
Conclusion
疾疾店家現身
Introduction
►Map search is very common in mobile application
◆Either query for store names or keywords (i.e., category)
◆Locating POIs (points of interest) for phone call, business hours, address, etc.
Challenge
►The number of POIs is insufficient
►Ide tify use s’ i te t fo POI sea h Goal
► Provide an effective POI search service
Example of General Query for Google Map
Use ’s Lo atio : Tainan University nearby
Query: Japanese cuisine Popular, Relevant, Distance
Example of General Query for Yahoo Local Search
Use ’s Lo atio : Tainan University nearby Query: Japanese cuisine
Best Match
Highest Rating Distance
Example of Specific Query for Bing Map
Query: Hamburg
Search Scope: Tainan
Local search Global search
Query: Hamburg
Search Scope: World
Goal
Problem
► The number of POIs is insufficient
► Ranking by relevance vs. distance
► Local search vs. global search
► No satisfied result for a query Solution
► Multiple sources integration
► Intelligent search method
► POI ranking model
► Query expansion
Related Work
► POI Extraction from Web
◆ Mining the Web for points of interest [SIGIR 2012]
◆ Extraction, integration and analysis of crowdsourced points of interest from multiple Web sources [ACM SIGSPATIAL 2014]
► Information Retrieval & Ranking
◆ Web search without 'stupid' results [SIGIR 2014]
◆ Learning temporal-dependent ranking models [SIGIR 2014]
► Query Expansion
◆ Adaptive query suggestion for difficult queries [SIGIR 2012]
◆ Massive query expansion by exploiting graph knowledge bases for image retrieval [ICMR 2014]
System Architecture
Keyword-Tag Matching POI Ranking
- - - Query
Suggestion POI DB
Solr
(Offline DB) Google
Places
Query Output
Corpus
Ranking Model Online
Search
Keyword- Tag Graph User Log
Multiple Sources Integration
►Integrate three search-results: Solr, Google Maps, Online-search 1.Utilize Solr 4.0 to index the offline database
• Provide POI-search by local search or global search mode
• Data source includes Yellow Page and locations of Facebook 2.Request Google Places API
3.Online-search by Google SE with query + use ’s location
POI Search Server
Page download Online-search
by Google Query &
Location
Snippets POI Extraction
URLs
POI pairing Address,
POI name
Output
Online Module
POI Search Method
Algorithm Search(q, r, GPS, i)
1 Input: user query q, use ’s GPS, search scope r 2 Output: POI list
3 Initial: i is constant, i>0; δ=0.5 4 If (i = 0) EXIT
5 MS = Solr ∪ Google Place API ∪ Online search 6 C = Ranking(MS, δ)
7 If (C = null)
8 Search(q, r×3, GPS, i-1) 9 Else
10 C order by the relevance and distance
►Concern about search scope and ranking criterion,
– Local search with an expanding scope until global search – Ranking by POI relevance and distance
• If POI relevance are the same, they will be ranked by distance.
Problem Statement
►Filter irrelevant POIs
◆Not only increase relevant POIs, but also increase irrelevant POIs.
◆Some noisy from Google Map, Solr DB and online search
►Binary classification problem
◆
f
(q, POI) = , ue y & POI a e eleva t , otherwise►Ranking problem
◆Given a set of POIs for query q, it returns the rank of POIi according their relevance.
Features of POI Ranking Model
Id Name Descriptions
1 MatchWord ���ℎ ��,
� � �+
�=
2 MatchPosition ���ℎ� ��, �
� � �+
�=
3 Cosine(q,T) ��
�� ,
�= ∗ ��
��
�= ∗ �= ��
4 LCSq(q,T) � �,
�� ��ℎ �
5 LCST(q,T) � �,
�� ��ℎ
6 CT(q,T) # of click-through for pair(q,T)
• q de otes use ’s ue y
• T denotes POI-name where i is the position of terms
• V is the vector of terms for q and T, respectively.
Query Expansion
• Bi-partite Graph Construction
Input: A corpus about the descriptions of POIs.
Output: A set of the relations between words and tags
Topic modeling by LDA (latent Dirichlet allocation)
For each topic, use the top 100 words for mapping to POIs
Get the tags from the corresponding sub-categories of POIs
Build a bipartite graph between the words and the tagscorpus LDA
Mapping w
w w
POI POI POI T1
Ti
tag tag tag
… …
Membership
Bi-partite Graph Construction (cont.)
Relation Property
► Construct relations(edges) between words and tags by POIs
► Each edge has a weight which represents the number of the words corresponding to the tag.
Tag Suggestion
►Recommend the top three tags with the higher weight
►A tag groups by the relevant words
►Each tag maps to an average of 42 words
w w
w
tag tag
tag
…
…
10
8 3 5
Experimental Dataset
►We crawled SuperhiPage and iPeen during 2013.07 to 2013.08 for the offline database (i.e., the corpus of bi-partite graph)
► POI database has 29 categories
Category 美食 生活 旅遊 Total (29)
# of POIs 68,366 73,766 11,207 995,748
# of sentences 954,848 1,200,288 140,769 8,647,176
# of words 55,477,352 65, 869,563 8,148,446 531,955,248
# of distinct word 1,001,037 922,884 299,488 12,024,798
Avg. Sent. Length 58 54 57 53
# of Avg. Topic 48 49 42 47
# of Total Tags 131 151 107 1,290
Evaluation
► Adopt NDCG@10 to evaluate the IR performance
– rating(i) : the relevance rating of the POI at position i
►We compare our POI search system with three map services
– Wiki apia, What’s the Nu e , Google
►Experimental setting
– Urban: 火車站(台北、桃園、台中、高雄)
– Rural: 東華大學、暨南大學、中正大學、屏東科技大學
– General query: 20 common queries (i.e., 餐廳、旅館、診 )
– Specific query:20 common stores (i.e., 星巴克、家樂福、嘟嘟 )
NDCG@10 = DCG@
IDCG@ DCG@10 =
rating(i) log � +
�=
Improved Performance of our POI Search
►Solr
– Offline database
►MS
– Solr + Google Map API + online search module
►Combinations
– General + Urban
– General + Rural
– Specific + Urban
– Specific + Rural
Performance Comparison of POI Search Services
►To evaluate the effectiveness of our system, we compared with other three map search services by NDCG.
►Use 40 queries (20 general and 20 specific queries) for urban and rural areas within a 10 km of the scope.
Effectiveness of Recommend Tags
►35 users conduct an evaluation for 600 queries
– A query has three recommended tags, which is relevant or not
►Experimental result
– A query has an average of 17 tags click-through
– The CTR is 38% for the first recommended tag
– The accumulative CTR is 53% for three recommend tags
Conclusion & Future Work
►The map search service outpe fo s Wiki apia a d What’s the Number, and is comparable to Google Map.
►We collect more relevant POIs by integrating multiple sources
►Provide more relevant queries for users
Future work
►Recommend more suitable queries for users
►Deduplication of POIs for aliases
►POI relation verification
►POI category classification