Reverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item@RecSys2021

>100 Views

June 13, 24

スライド概要

Presenting our work "Reverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item?" at RecSys2021.

profile-image

I'm a DB and AI researcher.

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

REVERSE MAXIMUM INNER PRODUCT SEARCH: HOW TO EFFICIENTLY FIND USERS WHO WOULD LIKE TO BUY MY ITEM? Daichi Amagata1,2 and Takahiro Hara1 1. Osaka University 2. JST PRESTO

2.

BACK GROUND: MATRIX FACTORIZATION Rating Matrix: 𝑝1 User & Item Matrices <- we have these. Items 𝑝2 𝑝3 𝑝4 𝑝5 … 𝑑 𝑝𝑚 𝑚 𝑢1 𝑢2 Users 𝑢3 𝑛 Users x Items 𝑑 … 𝑢𝑛 Motivation: [1] empirically shows that MF-based recommender systems are better than Neural CF-based ones. [1] Steffen Rendle, Walid Krichene, Li Zhang, and John Anderson, Neural Collaborative Filtering vs. Matrix Factorization Revisited, In RecSys, 2020. #1

3.

BACK GROUND: MAXIMUM INNER PRODUCT SEARCH Problem 1 (MIPS): Given a user vector 𝐮 and an item matrix 𝐏, MIPS finds 𝐩∗ = arg max𝐩∈𝐏 𝐮 ⋅ 𝐩 𝑢1 𝑢2 𝑢3 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 4.1 5.0 2.5 1.8 3.0 … 𝑝𝑚 0.2 𝑚 𝑑 𝐮𝟏 x P (item vectors) 𝑑 … 𝑢𝑛 𝐩∗ = 𝐩𝟐 Note: This corresponds to ✓ making a recommendation list for a given user. <- a “user-driven” service #2

4.

BACK GROUND: HOW TO FIND USERS WHO MAY BUY A GIVEN ITEM? Question: For a given item 𝐩, which user vectors 𝐮 do satisfy 𝐩 = arg max𝐩′ ∈𝐏 𝐮 ⋅ 𝐩′ ? 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑢1 4.1 5.0 2.5 1.8 3.0 𝑢2 4.9 4.3 2.9 4.5 2.1 𝑢3 2.3 4.6 4.1 3.1 1.5 𝑢4 3.3 4.1 3.4 3.6 1.9 𝑢5 2.1 3.9 2.6 4.3 3.1 𝑑 𝑚 𝑛 Users x Items 𝑑 Note: This corresponds to ✓ making advertisements of a given item to users who prefer it. <- an “item-driven” service • Easy to know targets for new items, restocked items, and etc. ✓ knowing market size (about 𝐩). #3

5.
[beta]
REVERSE MAXIMUM INNER PRODUCT SEARCH
DEFINITION 1 (k-MIPS problem):
• Input: a set of vectors P, a query vector q, and k,
• Output: k vectors in P that have the highest inner products with q.

DEFINITION 2 (Reverse k-MIPS problem):
• Input: a query (item) vector 𝐪, k, and two sets of vectors 𝐐 (set of user vectors) and 𝐏
(set of item vectors)
• Output: all vectors 𝐮 ∈ 𝐐 s.t. q is included in the k-MIPS result of 𝐮 among 𝐏 ∪ {𝐪}.

How to solve the reverse k-MIPS problem?
• For each user vector in 𝐐, run state-of-the-art k-MIPS algorithm [2,3] on 𝐏 ∪ {𝐪}?
• The above approach incurs 𝑂 𝑛𝑚𝑑 time.
For large scale recommender systems, 𝑛 (#users) and 𝑚 (#items)

are both large, so this is impractical!
[2] Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis, FEXIPRO: fast and exact inner product retrieval in recommender systems, In SIGMOD, 2017.
[3] Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk, LEMP: Fast retrieval of large entries in a matrix product, In SIGMOD, 2015.

#4

6.

CONTRIBUTION 1. We propose the reverse k-MIPS problem. 2. We propose Simpfer, an simple, fast and exact algorithm for reverse k-MIPS. • Simpfer is easy to deploy, if you use a MF-based recommender system. • Source code is available at https://github.com/amgt-d1/Simpfer. 3. We analyze Simpfer theoretically, and its time complexity is 𝑶 𝟏 − 𝜶 𝒏𝒎′ 𝒅 . • 𝛼: pruning ratio of user vectors. • 𝑚′: the average number of accessed item vectors (empirically 𝑚′ = 𝑂 𝑘 ). 4. We analyze Simpfer empirically on four real datasets. • Simpfer is two orders of magnitude faster than baseline algorithms. • For a dataset with 2 million users, Simpfer needs 0.5[sec], while the baselines need a half hour! #5

7.
[beta]
HOW TO SPEED-UP REVERSE k-MIPS PROCESSING?
Straightforward solution: for each 𝐮 ∈ 𝐐, run k-MIPS on 𝐏 ∪ {𝐪}
𝑚

𝑛

𝐐:

x

𝐏: items

Strong points:
𝐪

users

•

The result is correct.

•

Some item vectors can be skipped by using [2,3].

Weak points:
These white vectors include 𝐪
as k-MIPS result.

•

All user vectors must be investigated.

•

Most user vectors are not included in the result.

Our Idea:
1.

User vectors that are not included in the reverse k-MIPS result should be ignored.

2.

We don’t need the k-MIPS result (i.e., vectors), and just Yes/No answer for the k-MIPS result is sufficient.

-> How to implement these ideas?
[2] Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis, FEXIPRO: fast and exact inner product retrieval in recommender systems, In SIGMOD, 2017.
[3] Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk. LEMP: Fast retrieval of large entries in a matrix product, In SIGMOD, 2015.

#6

8.

PRE-PROCESSING Lines 1-6: General data structure 𝐐 𝐮𝟏 𝐮𝟐 𝐮𝟑 𝐮𝟒 𝐮𝟓 𝐮𝟔 𝐮𝟕 𝐮𝟖 … 𝐮𝒏 𝐏 𝐩𝟏 𝐩𝟐 𝐩𝟑 𝐩𝟒 𝐩𝟓 𝐩𝟔 𝐩𝟕 𝐩𝟖 … 𝐩𝒎 norm: 𝐯 𝐏′ : the first 𝑂 𝑘𝑚𝑎𝑥 vectors in 𝐏 Lines 7-10: Lower-bound arrays (𝑘𝑚𝑎𝑥 is the maximum 𝑘) 𝐿1 4.5 4.1 3.9 … 3.2 … … … … … … 𝐿𝑛 4.8 4.7 4.2 … 3.8 ← 𝑘𝑚𝑎𝑥 -MIP in 𝐮𝟏 × 𝐏 ′ (≤ 𝐩∗ ) ← 𝑘𝑚𝑎𝑥 -MIP in 𝐮𝒏 × 𝐏′ Lines 11-18: Blocks (the block size is 𝑂(log 𝑛)) 𝐮𝟏 Only 𝑂 𝑛 𝑑 + log 𝑛 𝐮𝟐 𝐮𝟑 𝐮𝟒 𝐮𝟓 𝐮𝟔 𝐁𝟏 time & 𝑂 𝑛 space. 𝐿 𝐁𝟏 4.3 4.1 𝐮𝟕 𝐮𝟖 … 𝐮𝒏 𝐁𝟐 3.5 … 2.9 𝑗 ← 𝐿𝑗 𝐁 = min 𝐿𝑖 𝐮𝐢 ∈𝐁 … … … … … #7

9.

UPPER- & LOWER-BOUNDING Assume that we have 𝐮𝒊 ⋅ 𝐪. LEMMA 1: If 𝐮𝒊 ⋅ 𝐪 ≤ 𝐿𝑘𝑖 , 𝐪 is not in the k-MIPS result of 𝐮𝒊. <- 𝑂 1 time for saying “No”. LEMMA 2: If 𝐮𝒊 ⋅ 𝐪 ≥ 𝐮𝒊 𝐩𝒌 , 𝐪 is in the k-MIPS result of 𝐮𝒊 . <- 𝑂 1 time for saying “Yes”. LEMMA 3: Let 𝐮𝒊 be the first vector of 𝐐(𝐵). If 𝐮𝒊 𝐪 ≤ 𝐿𝑘 (𝐵), ∀𝐮 ∈ 𝐐(𝐵), <- 𝑂 1 time for saying “No”, 𝐪 is not in the k-MIPS result of 𝐮. ∀𝐮 ∈ 𝐐(𝐵) in a batch. If 𝐮𝒊 does not have Lemmas 1-3, consider a sequential scan of 𝑃. Assume that the current top-k threshold is 𝜏. 𝐩𝟏 𝐩𝟐 𝐩𝟑 𝐩𝟒 𝐩𝟓 𝐩𝟔 𝐩𝟕 𝐩𝟖 … 𝐩𝒎 COROLLARY 1: Assume that 𝐪 is in an intermediate k-MIPS result of 𝐮𝑖 , and we now <- Early stop the scan COROLLARY 2: If 𝐮𝒊 ⋅ 𝐪 < 𝜏, 𝐪 is not in the k-MIPS result of 𝐮𝒊. <- Early stop the scan evaluate 𝐩𝑗 . If 𝐮𝒊 ⋅ 𝐪 (≥ 𝜏) ≥ 𝐮𝒊 𝐩𝒋 , 𝐪 is in the k-MIPS result of 𝐮𝒊 . #8

10.

SIMPFER: SIMPLE, FAST, AND EXACT FOR REVERSE k-MIPS <- Early stop (Corollary 1) <- Batch pruning (Lemma 3) <- Lower-bounding (Lemma 1) <- Upper-bounding (Lemma 2) <- Early stop (Corollary 2) Only 𝑂 𝑚′ 𝑑 time, where 𝑚′ ≤ 𝑚 𝑶 𝟏 − 𝜶 𝒏𝒎′ 𝒅 time, where 𝛼 is pruning rate of 𝐐 (almost 𝑘 in practice). Note: Simpfer can improve its efficiency by multi-threading (blocks are evaluated in parallel). #9

11.

EXPERIMENTAL RESULT 1: IMPACT OF 𝒌 LEMP 100 FEXIPRO Simpfer FEXIPRO Simpfer 10 1 Time [sec] Time [sec] 10 vs. LEMP: x680 faster vs. FEXIPRO x536 faster 0.1 vs. LEMP: x988 faster vs. FEXIPRO x747 faster 1 0.1 0.01 0.001 0.01 5 10 15 k (MovieLens) 20 25 5 10000 10 15 k (Netflix) 20 25 10000 1000 Time [sec] 1000 Time [sec] LEMP 100 vs. LEMP: x9246 faster vs. FEXIPRO x4450 faster 100 10 1 100 vs. LEMP: x10691 faster vs. FEXIPRO x6703 faster 10 1 0.1 0.1 0.01 5 10 15 k (Yahoo!) 20 25 5 10 15 k (Amazon) 20 MovieLens (𝑛 = 138,493, 𝑚 = 26,744), Netflix (𝑛 = 480,189, 𝑚 = 17,770), Yahoo! (𝑛 = 2,088,620, 𝑚 = 200,941), Amazon (𝑛 = 1,948,882, 𝑚 = 98,211) 25 # 10

12.

EXPERIMENTAL RESULT 2: IMPACT OF 𝒏 (#USERS) LEMP 10 FEXIPRO Simpfer FEXIPRO Simpfer 10 Time [sec] Time [sec] 1 0.1 0.01 0.001 1 0.1 0.01 0.0001 0.001 0.2 0.4 0.6 #users (MovieLens) 0.8 1 10000 10000 1000 1000 100 100 Time [sec] Time [sec] LEMP 100 10 1 0.1 0.2 0.4 0.2 0.4 0.6 0.8 1 0.6 0.8 1 #users (Netflix) 10 1 0.1 0.01 0.01 0.2 0.4 0.6 #users (Yahoo!) 0.8 1 #users (Amazon) MovieLens (𝑛 = 138,493, 𝑚 = 26,744), Netflix (𝑛 = 480,189, 𝑚 = 17,770), Yahoo! (𝑛 = 2,088,620, 𝑚 = 200,941), Amazon (𝑛 = 1,948,882, 𝑚 = 98,211) # 11

13.

EXPERIMENTAL RESULT 3: IMPACT OF 𝒎 (#ITEMS) LEMP 10 FEXIPRO Simpfer FEXIPRO Simpfer 10 Time [sec] Time [sec] 1 0.1 0.01 1 0.1 0.001 0.01 0.2 0.4 0.6 #items (MovieLens) 0.8 1 10000 0.2 0.4 0.2 0.4 0.6 0.8 1 0.6 0.8 1 #items (Netflix) 10000 1000 Time [sec] 1000 Time [sec] LEMP 100 100 10 1 100 10 1 0.1 0.1 0.01 0.2 0.4 0.6 #items (Yahoo!) 0.8 1 #items (Amazon) MovieLens (𝑛 = 138,493, 𝑚 = 26,744), Netflix (𝑛 = 480,189, 𝑚 = 17,770), Yahoo! (𝑛 = 2,088,620, 𝑚 = 200,941), Amazon (𝑛 = 1,948,882, 𝑚 = 98,211) # 12

14.

EXPERIMENTAL RESULT 4: SPEED-UP RATIO VS. #CPU CORES We certainly get efficiency improvement with increase of #cores. MovieLens (𝑛 = 138,493, 𝑚 = 26,744), Netflix (𝑛 = 480,189, 𝑚 = 17,770), Yahoo! (𝑛 = 2,088,620, 𝑚 = 200,941), Amazon (𝑛 = 1,948,882, 𝑚 = 98,211) # 13

15.

CONCLUSION 1. We proposed the reverse k-MIPS problem. 2. We proposed Simpfer for reverse k-MIPS. • Source code is available at https://github.com/amgt-d1/Simpfer. 3. We analyzed Simpfer theoretically, and its time complexity is 𝑂 1 − 𝛼 𝑛𝑚′ 𝑑 . 4. We analyzed the empirical efficiency of Simpfer on four real datasets. # 14