Knowledge-Based Systems DIAG: A Deep Interaction-Attribute-Generation model for user-generated item recommendation

Most existing recommendation methods assume that all the items are provided by separate producers rather than users. How

210 100 1MB

English Pages [11] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
DIAG: A Deep Interaction-Attribute-Generation model for user-generated item recommendation
Introduction
Related work
Background and challenges
The proposed DIAG model
Latent vector learning
Feature vector learning
Item-item co-generation network construction
Item feature vector learning
User feature vector learning
Predictive vector learning
Prediction and loss function
Experiments
Datasets and evaluation measures
Datasets
Evaluation measures
Comparison experiments
Baselines and settings
Comparison results and analysis
Parameter analysis
Dimension l
Negative sampling ratio
Ablation study
Conclusions and future work
Declaration of competing interest
Acknowledgments
References
Recommend Papers

Knowledge-Based Systems 
DIAG: A Deep Interaction-Attribute-Generation model for user-generated item recommendation

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Knowledge-Based Systems 243 (2022) 108463

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

DIAG: A Deep Interaction-Attribute-Generation model for user-generated item recommendation ∗

Ling Huang a,b , Bi-Yi Chen c , Hai-Yi Ye d , Rong-Hua Lin e , Yong Tang e , , Min Fu f , ∗∗ Jianyi Huang f , Chang-Dong Wang c,g,h , a

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China Guangdong Provincial Key Laboratory of Public Finance and Taxation with Big Data Application, China c School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China d School of Mathematics, Sun Yat-sen University, Guangzhou, China e School of Computer Science, South China Normal University, Guangzhou, China f LIZHI Inc., Guangzhou, China g Guangdong Province Key Laboratory of Computational Science, Guangzhou, China h Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China b

article

info

Article history: Received 5 June 2021 Received in revised form 13 February 2022 Accepted 16 February 2022 Available online 23 February 2022 Keywords: Recommendation User-generated item User-item interaction User-item generation Item attribute Item-item co-generation network Deep learning

a b s t r a c t Most existing recommendation methods assume that all the items are provided by separate producers rather than users. However, it could be inappropriate in some recommendation tasks since users may generate some items. Considering the user–item generation relation may benefit recommender systems that only use implicit user–item interactions. However, it may suffer from a dramatic imbalance. The number of user–item generation relations may be far smaller than the number of user–item interactions because each item is generated by at most one user. At the same time, this item can be interacted with by many users. To overcome the challenging imbalance issue, we propose a novel Deep Interaction-Attribute-Generation (DIAG) model. It integrates the user–item interaction relation, the user–item generation relation, and the item attribute information into one deep learning framework. The novelty lies in the design of a new item–item co-generation network for modeling the user–item generation information. Then, graph attention network is adopted to learn the item feature vectors from the user–item generations and the item attribute information by considering the adaptive impact of one item on its co-generated items. Extensive experiments conducted on two real-world datasets confirm the superiority of the DIAG method. © 2022 Elsevier B.V. All rights reserved.

1. Introduction Recommender system has been a hot research topic in the past decades due to its wide applications in various tasks such as e-commerce [1], education [2], travel [3], entertainment [4], and catering [5]. The foremost step of the recommender system is analyzing various input information such as user–item interaction and item attributes. Then it calculates matching scores between users and items to make a recommendation. For instance, by analyzing user–item interactions (e.g. ratings, reviews, and purchase ∗ Corresponding author. ∗∗ Corresponding author at: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China. E-mail addresses: [email protected] (L. Huang), [email protected] (B.-Y. Chen), [email protected] (H.-Y. Ye), [email protected] (R.-H. Lin), [email protected] (Y. Tang), [email protected] (M. Fu), [email protected] (J. Huang), [email protected] (C.-D. Wang). https://doi.org/10.1016/j.knosys.2022.108463 0950-7051/© 2022 Elsevier B.V. All rights reserved.

information), the user latent factors and item latent factors can be established which reflect users’ preferences and characteristics of items respectively. A dot product between the latent user factor and the item latent factor can represent the user’s preference for this item. Many recommendation algorithms have been developed from different perspectives [6], which can be roughly categorized into deep learning based methods [7] and shallow learning based methods [8]. Despite the success, most existing recommendation algorithms assume that all the items are provided by separate producers rather than users, which could be inappropriate in some recommendation tasks. For instance, in some social network platforms like scholat.com,1 most of the items to be recommended (e.g. posts) are generated by users. Another example is TikTok,2 in which most of the short-form mobile videos are 1 http://www.scholat.com/. 2 https://www.tiktok.com/.

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

Fig. 1. The overview of the proposed DIAG model.

generated by users. Although user–item interactions (e.g. share and like) are essential, the user–item generation information may contain more intrinsic information about how a user pays attention to a type of item. This user–item generation relation could be another non-negligible source for making accurate recommendations. In particular, in the implicit recommendation where there is a lack of sufficient information for modeling the user preference, the negative impact caused by ignoring the generation relation between users and items becomes more severe than that in the explicit recommendation. However, to our best knowledge, integrating the user–item generation relation into recommender systems remains largely unsolved. Utilizing the user–item generation relation encounters a challenging issue raised by the dramatic imbalance between the number of user–item interactions and user–item generations. That is, the number of user–item generations is far smaller than that of user–item interactions. It is not suitable to directly integrate the two sources of information because the user–item generation information is extremely sparse. The importance of the interaction information could overpower that of the generation information. To this end, we propose a novel model for the user-generated item recommendation called Deep Interaction-AttributeGeneration (DIAG), which integrates the user–item interaction relation, the user–item generation relation, and the item attribute information into one deep learning framework. As shown in Fig. 1, the framework consists of three modules. The first module is the latent vector learning module, as shown in the left bottom part of Fig. 1. In this module, the user/item latent vectors are learned from the user–item interactions using a multi-layer perceptron network. The main novelty of DIAG is in the second module, namely the feature vector learning module, as shown in the right bottom part of Fig. 1. In this module, a novel item–item co-generation network is designed, based on which the graph attention network (GAT) is adopted to learn the item feature vectors from the user–item generations and the item attribute information by considering the adaptive impact of one item to its

co-generated items. To our best knowledge, this is the first work that designs an item–item co-generation network for encoding the user–item generation information into the feature vector learning. And then the user feature vectors can be obtained by weighted combination of the item feature vectors of the items generated by the corresponding users. Finally, in the third module, as shown in the top part of Fig. 1, the user/item latent vectors and the user/item feature vectors are concatenated respectively to form the user/item predictive vectors, which are further fed into the prediction layer for final matching score learning. Extensive experiments are conducted on two real-world datasets containing user-generated items, and the comparison results have confirmed the superiority of the proposed DIAG model. The main contributions of this paper are summarized as follows. 1. In this paper, we propose a novel DIAG model. Different from the existing recommendation algorithms that only utilize the user–item interaction information, the proposed model can recommend the items by additionally considering the user–item generation relation. 2. We design a new item–item co-generation network for modeling the user–item generation information. It is able to address the challenging issue caused by the dramatic imbalance between the number of user–item interactions and user–item generations. 3. Extensive experiments are conducted on two real-world datasets. The results confirm the superiority of the proposed model and the necessity of using user–item generation information. The rest of this paper is organized as follows. In Section 2, we will briefly review the related work. In Section 3, we will introduce the background of utilizing user–item generation relations and emphasize its challenges. In Section 4, we will describe in detail the proposed DIAG method. In Section 5, experimental results will be reported. Finally, Section 6 will draw the conclusion of this paper and present the future work. 2

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

2. Related work

Table 1 The main notations used in this paper.

In the past decades, recommender system has drawn an increasing amount of attention and many different recommendation algorithms have been proposed [6–9]. They can be roughly categorized into deep learning based methods [7] and shallow learning based methods [8]. As shown in [7], deep learning can learn the nonlinear representations of users/items and capture the nonlinear user–item relation. Therefore, deep learning is adopted in this paper. In [10], Wang et al. propose a Collaborative Deep Learning (CDL) method, which jointly performs deep representation learning and collaborative filtering. In addition, auto-encoders have been applied to learn the user and item latent representations [11–13]. In [14], a deep learning architecture called Deep Matrix Factorization (DMF) is developed to learn the user/item representations in a common low dimensional space. In [15], He et al. propose three instantiations of the neural collaborative filtering framework. In order to address the sparsity and noise issues of user–item rating information, Li et al. [11] propose a general deep architecture for collaborative filtering. In [16], Nassar et al. propose a multi-criteria collaborative filtering model based on deep learning. The user–user social network has also been adopted in the deep learning based recommendation algorithms [17–19]. Despite the success, most of the aforementioned algorithms fail to take into account the user–item generation relation. This relation could be an essential factor for improving recommendation accuracy in some scenarios where the information of user–item interaction is insufficient. In the early years, some attempts have been made in recommending user-generated contents such as user reviews and social tags [20,21]. For instance, in [20], Pessemier et al. develop a simple content-based algorithm for recommending usergenerated content. In [21], Chiluka et al. propose a link prediction method for recommending user-generated content. In [22], Huang et al. propose an aspect sentiment similarity-based personalized review recommendation method. However, the aforementioned methods cannot directly integrate user–item generation relations. That is, they fail to simultaneously consider the user–content interaction information and the user–content generation information, both of which are very important for content recommendation. In addition, the user-generated content is quite different from the user-generated item. For instance, the user review is a type of user-generated content that reflects user experiences with an item. It is discovered that using user-generated content may significantly improve the item recommendation performance [23–27]. For instance, for simultaneously utilizing various types of user generated content for item recommendation, Xu and Yin [23] develop two statistical models based on collaborative filtering and topic modeling respectively. In [24], Yu et al. propose a tensor approach for tag-driven item recommendation, which additionally considers the sparse user-generated content. In order to make full use of multilingual reviews, Liu et al. [27] develop a multilingual review-aware deep recommendation method. Recently, user-generated item list recommendation has also been investigated [28–31]. The user-generated item list is an item list generated by some user. Recommending item lists instead of individual items to users may help users organize and share items, therefore further improving the users’ experiences. One of the early attempts in this research topic is [28], where a Bayesian ranking model is proposed by considering users’ previous interactions with both item lists and individual items. He et al. [29] propose a hierarchical self-attentive recommendation model, which fits the user–list–item hierarchical structure and uses a self-attentive aggregation layer to capture user and list consistency. In [30], a consistency-aware user-generated item

Notations

Descriptions

m, n R ∈ Rm×n Rui G ∈ Rm×n F ∈ Rn×d Fi,: ∈ R1×d l pi ∈ R1×l qu ∈ R1×l

Number of users and items User-item interaction matrix User-item interaction relation between user u and item i User-item generation matrix Item attribute matrix Item attribute vector of the ith item Dimension of latent vectors and feature vectors Item latent vector of item i User latent vector of user u Item-item co-generation network Node set with each node corresponding to one item Edge set with each edge denoting the item–item co-generation relation Item feature vector of item i User feature vector of user u Item predictive vector of item i User predictive vector of user u Predicted user–item interaction relation between user u and item i Negative sampling ratio Observed interaction set Randomly selected negative sample set Set of parameters to be trained in the model

G V

= {V , E }

E

xi ∈ R1×l yu ∈ R1×l p¯ i ∈ R1×2l q¯ u ∈ R1×2l

ˆ ui R ρ Y+ Y−

Θ

list recommendation method is developed. Similarly, in [31], Yang et al. propose a gated and attentive neural collaborative filtering for user-generated list recommendation. However, the user-generated item list recommendation is different from the problem of utilizing user–item generation relations. This is because the target objects to be recommended in two problems, namely user-generated item list and user-generated item, are different from each other, although they are both generated by users. 3. Background and challenges In this section, we will introduce the background of utilizing user–item generation relations and elaborate its challenges. For clarity, Table 1 summarizes the main notations used in this paper. In the recommendation task, let m and n denote the number of users and items in the system respectively. Following the convention [15,32], a user–item interaction matrix R ∈ Rm×n is constructed to represent the user–item interaction relation as follows3 :

{ Rui =

1

There is an interaction between user u and item i

0

Otherwise. (1)

The type of interactions may vary from one task or system to another. For instance, in E-commerce, the interaction between a consumer and a product usually includes ‘‘browse’’, ‘‘collect’’, ‘‘cart’’ and ‘‘purchase’’, while in the on-line movie platform, the interaction between a user and a movie usually includes ‘‘browse’’ and ‘‘watch’’. Compared with the explicit ratings, the implicit interaction has two main problems. First of all, the implicit interaction (i.e. Rui = 1) only indicates the users’ preference indirectly without showing how much user u likes item i. Secondly, the unobserved interaction (i.e. Rui = 0) does not mean user u does not like item i, e.g., user u may have never seen item i. Therefore, it is more challenging to model user preference in the implicit recommendation. 3 In this paper, we focus on the implicit recommendation but our technique can be easily extended into the explicit recommendation. 3

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

Fig. 2. An example of user-generated post (i.e. item) in the scholar social network scholat.com.

only 5744, while the number of non-zero entries in the user–item interaction matrix reaches 858 124, which is 148 times larger than that in the user–item generation matrix. It is not suitable to directly integrate the user–item interaction matrix and the user–item generation matrix. This is because direct integration would make the importance of the extremely sparse user–item generation information overpowered by that of the user–item interaction information. Fortunately, since the items are generated by users, the item attributes contain meaningful information, which can be utilized for overcoming the imbalance issue between the user– item interaction relation and the user–item generation relation. Let F ∈ Rn×d denote the item attribute matrix, where the ith row Fi,: ∈ R1×d represents the item attribute vector of the ith item. In this paper, we propose a novel model for the user-generated item recommendation called Deep InteractionAttribute-Generation (DIAG), which integrates the user–item interaction matrix R, the user–item generation matrix G and the item attribute matrix F into one deep learning framework.

Although many efforts have been made in developing various implicit recommendation algorithms [33–37], most of them assume that all the items are provided by separate producers, which is however not true in some recommendation tasks. As an example, Fig. 2 illustrates a user-generated post (i.e. item) in the scholar social network scholat.com,4 which is generated by a user and viewed, liked and commented by 5125, 5 and 3 users respectively. Although the user–item interactions such as ‘‘view’’, ‘‘like’’ and ‘‘comment’’ contain meaningful information reflecting how the user–item pair matches, the user–item generation information encodes the intrinsic information about how a user pays attention to a type of items. For instance, it is likely that the user who generates a post about ‘‘call for paper’’ would pay special attention to the posts about ‘‘call for paper’’. Therefore, only modeling the user–item interactions such as ‘‘user–item view’’, ‘‘user–item like’’ and ‘‘user–item comment’’ while ignoring the user–item generation relation may seriously affect the item recommendation performance. In particular, in the implicit recommendation where there is a lack of explicit user–item rating information for modeling the user preference, the negative impact caused by ignoring the user–item generation relation becomes more severe than that in the explicit recommendation. However, to our best knowledge, there is still a lack of efforts for integrating the generation relation between users and items into the recommendation. Let G ∈ Rm×n denote the user–item generation matrix, where the (u, i)th entry Gui represents the user–item generation relation between user u and item i as follows:

{ Gui =

1

Item i is generated by user u

0

Otherwise.

4. The proposed DIAG model In this section, we will describe in detail the proposed DIAG model, which is illustrated in Fig. 1. It consists of three main modules, namely latent vector learning, feature vector learning and predictive vector learning. In what follows, we will first elaborate the three modules, and then describe the prediction layer and loss function.

(2) 4.1. Latent vector learning

Usually, it is assumed that each item is generated by at most one user. Therefore, for each column of G, there is at most one non-zero entry. And the entire user–item generation matrix G contains at most n non-zero entries, which implies that it is an extremely sparse matrix. In particular, the number of user–item generations is far smaller than that of user–item interactions. For instance, on the Lizhi dataset to be used in our experiments, the number of non-zero entries in the user–item generation matrix is

This module consists of two parts, namely item latent vector learning and user latent vector learning, both of which takes the user–item interaction matrix R ∈ Rm×n as input but use two different MLP networks to generate the item latent vectors and the user latent vectors. In what follows, we will describe in detail the learning procedure of the item latent vectors. For each item i, ∀i = 1, 2, . . . , n, the transposed item–user interaction vector of item i is obtained from the user–item interaction matrix R ∈ Rm×n , i.e. RT:,i ∈ R1×m . From the RT:,i ∈ R1×m ,

4 https://www.scholat.com/vpost.html?pid=143675. 4

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

is a shared attentional mechanism a : R1×l ×R1×l → R computing the attention coefficients as follows:

the item latent vector pi ∈ R1×l of item i is learned as follows: p

p

p

h1 = f (RT:,i W1 + b1 )

eij = a Fi,: Wfx , Fj,: Wfx .

p

p

(3)

p

p

where L is the number of layers in MLP, W1 , . . . , WL are the p p L learnable weight matrixes, b1 , . . . , bL are the L learnable bias p p vectors, h1 , . . . , hL−1 are the L − 1 hidden representation vectors, and f (·) denotes the activation function. Similarly, for each user u, ∀u = 1, 2, . . . , m, the user latent vector qu ∈ R1×l of user u is learned from Ru,: ∈ R1×n as follows:

... = f (hqL−2 WqL−1 + bqL−1 ) q

q

Fi,: Wfx ∥ Fj,: Wfx aT

(6)

where ∥ is the concatenation operation. In order to inject the co-generation relation of items into the above attention coefficients and make them easily comparable across different items, a neighbor-item based softmax function is applied, by which the normalized attention coefficients are obtained as follows: exp(eij ) αij = softmaxj (eij ) = ∑ (7) k∈N exp(eik )

h1 = f (Ru,: W1 + b1 ) q hL−1

] )

([

eij = LeakyReLU

q

q

(5)

The attention coefficient eij indicates the importance of item j’s features to item i. In particular, the attention mechanism a is a single-layer feedforward neural network parametrized by a weight vector a ∈ R1×2l and applying the LeakyReLU nonlinearity with negative input slope 0.2. That is, Eq. (5) can be expanded as

p

pi = f (hL−1 WL + bL )

q

)

(

... p p p p hL−1 = f (hL−2 WL−1 + bL−1 )

(4)

q

qu = f (hL−1 WL + bL ) where the meanings of the notations are similar to those in Eq. (3). The left-bottom dashed rectangle of Fig. 1 shows the latent vector learning module.

i

where Ni denotes the neighbor items of item i in the item–item co-generation network, including itself, i.e. all the items that are generated by the same user as item i. The normalized attention coefficients αij are further used to compute a linear combination of the corresponding feature vectors, resulting in the output item feature vector for each item i, i.e.

4.2. Feature vector learning This module consists of three parts, namely, item–item cogeneration network construction, item feature vector learning, and user feature vector learning, as shown in the right-bottom dashed rectangle of Fig. 1.

xi =



αij Fj,: Wfx .

(8)

j∈Ni

4.2.1. Item-item co-generation network construction From the user–item generation matrix G ∈ Rm×n , a novel item–item co-generation network is constructed, which represents the co-generation relation of any pair of items.

In our experiments, only one graph attention layer is utilized when learning the item feature vectors, which is able to capture the co-generation relation between items by considering the adaptive importance of the co-generated items to each item. In this way, the user–item generation information is well encoded into the item feature vectors.

Definition 1 (Item-Item Co-generation Network). Let G = {V , E } denote the item–item co-generation network, where V denotes a set of nodes with each node corresponding to one item, i.e. |V | = n, and E denotes a set of edges corresponding to the item–item co-generation relation, i.e. E = {(i, j)|∃u, Gui = Guj = 1}.

4.2.3. User feature vector learning After learning the item feature vectors, the user feature vectors are obtained by a weighted combination of the item feature vectors of the items generated by the corresponding users, where the item popularity is regarded as the weight. As aforementioned, the user–item generation information is encoded into the item feature vectors. Therefore, the user feature vectors also contain the user–item generation information. For each user u, an index set of items generated by user u is firstly obtained, denoted as Uu = {gu,1 , gu,2 , . . . , gu,ku } where gu,k , ∀k = 1, . . . , ku is the index of the kth item generated by user u and ku is the number of items generated by user u. Accordingly, the user feature vector of user u, denoted as yu ∈ R1×l , is defined as the weighted combination of the item feature vectors corresponding to the {gu,1 , gu,2 , . . . , gu,ku }, i.e.,

That is, for any pair of nodes i and j, if item i and item j are generated by the same user, an edge is established between nodes i and j. By comparing with directly integrating the user–item generation relations into the deep learning framework, encoding the user–item generation relations using an item–item co-generation network can eliminate the dramatic imbalance between the number of user–item interactions and the number of user–item generations. 4.2.2. Item feature vector learning After constructing the item–item co-generation network, a graph attention layer [38] is used to transform the original item attribute vectors into higher-level item feature vectors in which the complex co-generation relation between items can be encoded. The input of the graph attention layer is a set of the original item attribute vectors, i.e. {F1,: , F2,: , . . . , Fn,: }, where Fi,: ∈ R1×d , ∀i = 1, . . . , n denotes the original item attribute vector of item i. And the output of the graph attention layer is a new set of item feature vectors, denoted as {x1 , x2 , . . . , xn }, where xi ∈ R1×l , ∀i = 1, . . . , n denotes the higher-level item feature vector of item i. First of all, an item-shared linear transformation, parametrized by a weight matrix Wfx ∈ Rd×l , is performed on each item, i.e. Fi,: Wfx . Then, a self-attention is performed on the items, which

∑ku yu =

k=1 sgu,k xgu,k

∑ku

(9)

k=1 sgu,k

where sgu,k denotes the popularity of item gu,k , i.e. the number of users who interact with item gu,k . 4.3. Predictive vector learning In this module, the latent vectors and the feature vectors are integrated into the predictive vectors. In particular, for each item, the item latent vector and the item feature vector are concatenated to generate the item predictive vector. Similarly, for 5

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

each user, the user latent vector and the user feature vector are concatenated to generate the user predictive vector. This module is shown in the top dashed rectangle of Fig. 1. Let p¯ i ∈ R1×2l denote the item predictive vector, it is calculated as follows: p¯ i = pi ∥ xi .

5.1. Datasets and evaluation measures 5.1.1. Datasets In our experiments, two real-world datasets, namely Scholat and Lizhi, are adopted.5

(10)

1. Scholat: The Scholat dataset is a post recommendation dataset obtained from the scholar social network https: //www.scholat.com/. The scholars are taken as users, and the posts are taken as items. Each post is generated by at most one scholar (i.e. user–item generation) and will be read by many scholars (i.e. user–item interaction). The item attribute vector is obtained by applying word2vec to the 0/1-valued word vector representing the absence/presence of each word in the title and content of the post, which achieves the best performance among all the item attribute vector construction methods we have tried. The dictionary size, i.e. the total number of distinct words in the collection of titles and contents of all posts, is 37 191. After applying word2vec, the item attribute vector of dimension 372 is obtained for each item. Only the users with no less than 2 interactions are preserved. After preprocessing, a subset consisting of 1991 users and 5008 items is obtained. The number of user–item interactions, i.e. the total number of non-zero entries in the user–item interaction matrix R, is 13 960. Because there are 1991 × 5008 = 9 970 928 user–item pairs, the user–item interac−13 960 × 100% = 99.86%. The number tion sparsity is 9 9709 928 970 928 of user–item generations, i.e. the total number of usergenerated items, is 3776, which is smaller than 5008. That is to say, there are 1232 items that are not generated by users but by the platform, i.e. webmasters. The number of edges in the item–item co-generation network is 131 594. Therefore, the edge sparsity of the item–item co-generation network is 99.48%. 2. Lizhi: The Lizhi dataset is an audio recommendation dataset obtained from the online audio platform https: //www.lizhi.fm/. The normal users are taken as users, and the audios are taken as items. Each audio is generated by at most one user (i.e. user–item generation) and will be listened to by many users (i.e. user–item interaction). The item attribute vector characterizes the key information of the audio, namely gender and age of the user generating this audio, and the bag of words (BOW) representation of the audio tags. The dimension of the item attribute vectors is 514. Only users with at least 10 user– item interactions are preserved. After preprocessing, a subset consisting of 3716 users and 9205 items is obtained. The number of user–item interactions, i.e. the total number of non-zero entries in the user–item interaction matrix R, is 858 124. Because there are 3716 × 9205 = 34 205 780 user–item pairs, the user–item interaction spar780−858 124 × 100% = 97.49%. The number sity is 34 205 34 205 780 of user–item generations, i.e. the total number of usergenerated items, is 5744, which is smaller than 9205. That is to say, there are 3461 items that are not generated by users but by the platform or some very important users (not the normal users). The number of edges in the item– item co-generation network is 44 784. Therefore, the edge sparsity of the item–item co-generation network is 99.95%.

Similarly, for each user u, the user predictive vector q¯ u ∈ R1×2l can be obtained by q¯ u = qu ∥ yu . 4.4. Prediction and loss function After obtaining the item predictive vectors and the user predictive vectors, for each pair of user u and item i to be predicted, the predicted matching score is calculated as follows:

ˆ ui = σ ((q¯ u ⊙ p¯ i )wT ) R

(11)

where q¯ u ⊙ p¯ i denotes the element-wise product of q¯ u and p¯ i , w ∈ R1×l is a learnable weight vector, and σ (·) denotes the activation function. It is well-known that the implicit recommendation tasks often lack negative feedback, which is a necessity for training the neural network. There are usually two approaches for solving this problem. The first approach is to sample some unobserved interactions as the negative feedback, while the second one is to treat all the unobserved interactions as weak negative feedback. As aforementioned, an unobserved interaction does not mean it is impossible to happen. On the contrary, a user is very likely to buy the items he or she likes but has not yet purchased. Therefore, in this work, it is more reasonable to adopt the first approach to process all the datasets for experimental purpose. That is, we uniformly sample negative instances from unobserved interactions with the negative sampling ratio ρ , i.e. the number of negative instances per positive instance. To train the neural network, an appropriate loss function needs to be designed. The implicit recommendation problem is often formulated as an interaction prediction problem. In particular, assume that the implicit interaction Rui obeys a Bernoulli distribution, the likelihood function can be defined as follows:



L(Θ ) =

ˆ ui )Rui (1 − Rˆ ui )1−Rui (R

(u,i)∈Y+ ∪Y−

(12)

where Y+ and Y− respectively denote the observed interaction set and the randomly selected negative sample set, and Θ denotes the set of parameters to be trained. Finally, by taking the negative logarithm of the likelihood (NLL), we obtain the loss function as follows: l(Θ ) = −

∑ (u,i)∈Y+ ∪Y−

ˆ ui + (1 − Rui ) log(1 − Rˆ ui ). Rui log R

(13)

By minimizing the loss function in Eq. (13), we can obtain the parameters Θ of the entire DIAG model. With the welltrained parameters Θ , DIAG can make accurate recommendations by integrating the user–item interaction relations, the user–item generation relations, and the item attribute information jointly. 5. Experiments In this section, extensive experiments will be conducted on two real-world datasets to confirm the superiority of the proposed DIAG model. First, we will introduce the real-world datasets and evaluation measures used in our experiments. Secondly, we will report the comparison results with seven baselines to demonstrate the effectiveness of DIAG. Then, we will conduct the parameter analysis to investigate how the parameters affect the performance of DIAG. Finally, we will conduct an ablation study to show the effectiveness of using user–item generation information and item attribute information.

From the above description, we can see that the Scholat dataset suffers more seriously from the sparsity issue of the user–item interaction than the Lizhi dataset. And the Lizhi dataset suffers 5 The two datasets and the DIAG code can be downloaded from https: //www.scholat.com/research/opendata/. 6

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

5. Recommender VAE (RecVAE) [13]: It is a collaborative filtering method with implicit feedback based on the variational autoencoder. 6. LightGCN [18]: It is a GCN-based collaborative filtering. Different from the conventional GCN, it learns user and item embeddings by only linearly propagating them on the user–item interaction network and uses the weighted sum of the embeddings learned in all layers as the final embedding. 7. DiffNet++ [19]: It is a GCN-based social recommendation method. It takes as input the user–item interaction information, user attribute information, item attribute information and user–user social information. For a fair comparison, in this experiment, we exchange the role of users and items, so that the original item–item co-generation network can be regarded as the user–user social information to be fed into DiffNet++.

more seriously from the sparsity issue of the user–item generation than the Scholat dataset, i.e. larger edge sparsity of the item–item co-generation network. 5.1.2. Evaluation measures Following [15], we adopt the leave-one-out evaluation, i.e., the latest interaction of each user is used for testing, while the remaining data is for training. Since ranking all items is timeconsuming, we randomly select 100 unobserved interactions as negative samples for each user. We then rank the 100 items for each user according to the prediction. We evaluate the model ranking performance through two widely adopted evaluation measures, namely Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG), which are defined respectively as follows HR = NDCG =

#hits

(14)

#users 1 #users

#users



1

i=1

log2 (pi + 1)

In all methods, including the above baselines and the proposed method, we adopt the same settings that are the most suitable for most methods for the common parameters and procedures. For instance, the Adam method [40] is adopted for optimization with setting the initial learning rate as 0.001. The activation functions of the hidden layers and the output layer in the latent vector learning are ReLU and LeakyReLU respectively. The activation function of the prediction procedure is Sigmoid. On the Scholat data, the number of layers in MLP is 3, and on the Lizhi dataset, the number of layers in MLP is 5. The dimension l of the latent vectors and the negative sampling ratio ρ are respectively set to be 32 and 4 on both datasets. For the other parameters and procedures that are not shared, the default settings suggested by the authors are adopted or they are tuned in some ranges so as to generate the best results. In order to further investigate how additionally considering user–item generation on baselines would affect their performance, we extend three baselines, namely MLP, GMF and RecVAE, by additionally considering user–item generation information. In particular, the user–item generation matrix is concatenated into the user–item interaction matrix as supplementary. Then the augmented user–item interaction matrix is fed into each of the three baselines to obtain the final recommendation results, resulting in three variants. For notation simplicity, the three variants are denoted as MLP-var, GMF-var and RecVAE-var respectively.

(15)

where #hits is the number of users whose test item appears in the recommended list and pi is the position of the test item in the list for the ith hit. By truncating the ranked list at k for both measures, the results in terms of HR@k and NDCG@k are reported. Intuitively, HR@k measures whether the test item is present on the top-k list or not, and NDCG@k measures the ranking quality which assigns higher scores to hit at top position ranks on the top-k list. Larger values of HR@k and NDCG@k indicate better performance. 5.2. Comparison experiments In this subsection, comparison experiments will be conducted to confirm the effectiveness of the proposed model by comparing with seven baselines. 5.2.1. Baselines and settings In our experiments, we compare the proposed DIAG method with the following seven baselines6 . 1. Probabilistic Matrix Factorization (PMF) [39]: It is an algorithm utilizing a probabilistic linear model with Gaussian observation noise for modeling user preference matrix as a product of user matrix and item matrix. 2. Multi-Layer Perceptron (MLP) [15]: It is a deep learning recommendation method based on the matching function learning. In this method, the user interaction vector and the item interaction vector are concatenated into a vector, which is then fed into a multi-layer perceptron (MLP) network to predict the matching score. 3. Generalized Matrix Factorization (GMF) [15]: It is a deep matrix factorization method based on representation learning, which can be regarded as the deep learning version of matrix factorization. Different from MLP, the element-wise product of MF user vector and MF item vector is adopted rather than concatenation. 4. Neural Matrix Factorization (NeuMF) [15]: It is a deep learning recommendation method that concatenates the latent representations of GMF and MLP and then uses a fully connected layer to make prediction.

5.2.2. Comparison results and analysis The comparison results on the two datasets are listed in Tables 2 and 3 respectively. The performances are reported in terms of HR@k and NDCG@k with k being 5, 10, 15 and 20. From the two tables, we can see that the proposed DIAG model outperforms the baselines in most cases on the two datasets. In particular, on the Scholat dataset, more significant improvement has been achieved by DIAG than that on the Lizhi dataset. That is, on the Scholat dataset, except in the case of NDCG@5, the DIAG method has achieved at least 2.58% improvement, and the maximum improvement reaches as large as 4.30% in terms of HR@10. This is because the Scholat dataset suffers more seriously from the coldstart user and sparsity issues than the Lizhi dataset. And most of users on the Scholat dataset only have 2 or 3 interacted items. By incorporating the user–item generation information, DIAG can well alleviate the cold-start user and sparsity issues, therefore obtaining more significant improvement. Notice that, DiffNet++ also incorporates the user–item generation information as a type of social relation information, and therefore obtains relatively larger NDCG@k values compared with the other baselines. However, the HR@k values obtained by DiffNet++ are almost the smallest among all methods, implying the relatively weak prediction performance of DiffNet++, although

6 The codes of the baselines are downloaded from the following websites: • PMF: https://github.com/xuChenSJTU/PMF; • MLP, GMF and NeuMF: https://github.com/guoyang9/NCF; • RecVAE: https://github.com/ilya-shenbin/RecVAE; • LightGCN: https://github.com/gusye1234/LightGCN-PyTorch; • DiffNet++: https://github.com/PeiJieSun/diffnet. 7

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

Table 2 Comparison results on the Scholat dataset. In each column, the best result is highlighted in bold. And the improvement percentage obtained by DIAG is defined as The DIAG result−The best baseline . The best baseline Evaluation

HR@k

measures

k=5

k = 10

k = 15

k = 20

NDCG@k k=5

k = 10

k = 15

k = 20

PMF MLP MLP-var GMF GMF-var NeuMF RecVAE RecVAE-var LightGCN DiffNet++ DIAG

0.5565 0.5716 0.5831 0.5625 0.5902 0.5866 0.5264 0.5269 0.5610 0.5234 0.6128

0.6334 0.6389 0.6499 0.6479 0.6554 0.6499 0.5892 0.5876 0.6097 0.5455 0.6836

0.6630 0.6776 0.6916 0.6836 0.6821 0.6806 0.6263 0.6183 0.6369 0.5711 0.7147

0.6886 0.6992 0.7217 0.7002 0.7077 0.7007 0.6565 0.6424 0.6570 0.6032 0.7459

0.4428 0.4658 0.4823 0.4545 0.4820 0.4933 0.4465 0.4463 0.4772 0.5082 0.5077

0.4609 0.4903 0.4923 0.4771 0.4964 0.5135 0.4662 0.4632 0.4911 0.5154 0.5287

0.4689 0.4980 0.5020 0.4864 0.5137 0.5222 0.4788 0.4676 0.5000 0.5211 0.5426

0.4744 0.5029 0.5168 0.4918 0.5100 0.5264 0.4817 0.4728 0.5051 0.5297 0.5470

Improvement percentage

3.83%

4.30%

3.34%

3.35%

−0.10%

2.58%

3.91%

3.27%

Table 3 Comparison results on the Lizhi dataset. In each column, the best result is highlighted in bold. And the improvement percentage obtained by DIAG is defined as The DIAG result−The best baseline . The best baseline Evaluation

HR@k

measures

k=5

k = 10

k = 15

k = 20

k=5

NDCG@k k = 10

k = 15

k = 20

PMF MLP MLP-var GMF GMF-var NeuMF RecVAE RecVAE-var LightGCN DiffNet++ DIAG

0.3340 0.6128 0.6109 0.6136 0.6090 0.6437 0.6254 0.6168 0.6111 N/A 0.6523

0.4559 0.7559 0.7675 0.7573 0.7476 0.7828 0.7618 0.7637 0.7616 N/A 0.7931

0.5568 0.8353 0.8383 0.8264 0.8208 0.8515 0.8326 0.8388 0.8431 N/A 0.8606

0.6305 0.8848 0.8837 0.8738 0.8698 0.8975 0.8811 0.8829 0.8883 N/A 0.9061

0.2322 0.4488 0.4496 0.4446 0.4430 0.4785 0.4554 0.4535 0.4512 N/A 0.4816

0.2636 0.4951 0.4998 0.4923 0.4859 0.5228 0.4991 0.4965 0.5005 N/A 0.5272

0.2903 0.5219 0.5178 0.5098 0.5052 0.5415 0.5171 0.5184 0.5209 N/A 0.5452

0.3023 0.5250 0.5236 0.5203 0.5167 0.5528 0.5292 0.5261 0.5318 N/A 0.5559

Improvement percentage

1.34%

1.31%

1.07%

0.96%

0.64%

0.83%

0.68%

0.56%

latent preferences than graph neural networks (GNN). Therefore, its prediction performance is much better than that of GNN. On the other hand, GNN has stronger capability of capturing the relation between user and items, and between items and items. Therefore, it has better ranking performance than MLP. LightGCN outperforms MLP and GMF on the Scholat dataset in NDCG@k and performs better than both MLP and GMF on the Lizhi dataset in most cases. It implies that GNN outperforms MLP given enough training data. In most cases, DIAG obtains better performance in terms of both HR@k and NDCG@k than all baselines. The reason lies in that DIAG adopts MLP for modeling the user–item interaction information as well as adopting GAT for modeling the user–item generation information (represented as item–item co-generation network) and item attribute information.

it is good at ranking. The reason is that DiffNet++ can capture the item–item relation using graph neural networks. At the same time, it fails to learn the latent representations of users because it cannot make full use of user–item interaction information. In addition, due to the relatively complex structure of DiffNet++, it fails to generate the results within the given time limit (168 h) on the Lizhi dataset. Therefore, the corresponding values in Table 3 are ‘‘N/A’’. For the variants of MLP, GMF and RecVAE, namely MLP-var, GMF-var and RecVAE-var, it can be seen that in most cases, their performance is very close to that obtained by their original counterpart. That is to say, simply adding the user–item generation matrix into the user–item interaction matrix brings very marginal improvement. Even in a few cases, the variants’ performance is worse than that of the original counterpart. For instance, the results of GMF-var are worse than those of GMF on the Lizhi dataset. This analysis confirms that it is not suitable to integrate the two information sources. By comparing NeuMF and LightGCN, although both of them only utilize user–item interaction information, NeuMF performs better than LightGCN in all the cases. This is because NeuMF integrates the benefits of both MLP and GMF. However, NeuMF cannot outperform DIAG. This confirms the effectiveness of utilizing user–item generation information in making better recommendation. Another observation is that, the difference between LightGCN and NeuMF in terms of NDCG@k is smaller than that of HR@k. On the Lizhi dataset which contains more user–item interaction information (i.e. each user contains at least 10 user– item interactions), the difference between LightGCN and NeuMF in terms of HR@k is smaller than that on the Scholat dataset. It implies that MLP has stronger capability of capturing user

5.3. Parameter analysis In this subsection, we will analyze the impact of the two parameters, namely the dimension l and the negative sampling ratio ρ , on the performance of the proposed DIAG model. 5.3.1. Dimension l First of all, we will analyze the impact of the dimension l on the performance of the proposed DIAG model. To this end, we run the DIAG model on the two datasets by selecting l from {8, 16, 32, 64} while fixing the other parameters. The results are plotted in Fig. 3. From the two subfigures, it is clear that the proposed DIAG model is insensitive to the value of l. In particular, the value of HR@10 and NDCG@10 on both datasets keeps relatively stable under different l. This value increases a bit if l ranges from 8 to 32. In contrast, this value decreases slightly if 8

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

Fig. 3. Parameter analysis: The impact of the dimension l on the performance of DIAG on the two datasets.

Fig. 4. Parameter analysis: The impact of the negative sampling ratio ρ on the performance of DIAG on the two datasets. Table 4 Ablation study on the Scholat dataset: Comparing the proposed DIAG model with its two simplified variants. The best result generated by the three methods in each row is highlighted in bold. Evaluation

DIAG_L

DIAG_LA

DIAG

Improvement percentage

measures HR@10 NDCG@10

0.6334 0.4911

0.6527 0.5101

0.6836 0.5287

l changes from 32 to 64. Moreover, the best results are obtained when setting l = 32. Therefore, we set the dimension l to be 32 on both datasets in our experiments.

DIAG vs. DIAG_LA

DIAG_LA vs. DIAG_L

4.73% 3.65%

3.05% 3.87%

latent vectors that are learned in the latent vector learning module for matching score prediction. The second simplified variant is DIAG_LA, which feeds the simplified user/item predictive vectors into the prediction layer for matching score prediction. In particular, the simplified item predictive vectors are generated by concatenating the item latent vectors and the item attribute vectors that are directly obtained from the item attribute information. Similarly, the simplified user predictive vectors are generated by concatenating the user latent vectors and the user attribute vectors that are obtained by averaging the item attribute vectors of the items generated by the user. The results are listed in Tables 4 and 5 respectively. The tables show that DIAG significantly outperforms DIAG_L and DIAG_LA, and DIAG_LA performs better than DIAG_L. On the one hand, by comparing DIAG_LA and DIAG_L, DIAG_LA has obtained at least 1.27% improvement compared with DIAG_L, which confirms the necessity of utilizing the item attribute information for making a better recommendation. On the other hand, by comparing DIAG and DIAG_LA, the improvement percentages are more significant than those comparing DIAG_LA and DIAG_L in most cases except NDCG@10 on the Scholat dataset. It implies that additionally considering the user–item generation information by means of applying GAT on the item–item co-generation network

5.3.2. Negative sampling ratio ρ In order to investigate how the negative sampling ratio ρ affects the performance of DIAG, we run DIAG on the two datasets by varying ρ from {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} when fixing the other parameters. The HR@10 and NDCG@10 values are plotted as a function of ρ in Fig. 4. From the two figures, we can see that DIAG is insensitive to the value of ρ . Also, there is no apparent change under different ρ . Setting ρ as four or five generates the best results in most cases, and setting ρ as four is slightly better than setting ρ as five. Therefore, in our experiments, the negative sampling ratio ρ is four. 5.4. Ablation study In order to investigate the effectiveness of using user–item generation information and item attribute information, in this subsection, an ablation study will be conducted to compare the proposed DIAG model with its two simplified variants. The first simplified variant is DIAG_L, which removes the predictive vector learning and feature vector learning modules. It uses user/item 9

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463

Table 5 Ablation study on the Lizhi dataset: Comparing the proposed DIAG model with its two simplified variants. The best result generated by the three methods in each row is highlighted in bold. Evaluation

DIAG_L

DIAG_LA

DIAG

Improvement percentage

measures HR@10 NDCG@10

0.7553 0.4959

0.7649 0.5107

0.7931 0.5272

DIAG vs. DIAG_LA

DIAG_LA vs. DIAG_L

3.69% 3.23%

1.27% 2.98%

References

to generate the user/item feature vectors has made a larger contribution to the performance improvement than considering the item attribute information.

[1] L. Qi, X. Xu, X. Zhang, W. Dou, C. Hu, Y. Zhou, J. Yu, Structural balance theory-based E-commerce recommendation over big rating data, IEEE Trans. Big Data 4 (3) (2018) 301–312. [2] S.-T. Zhong, L. Huang, C.-D. Wang, J.-H. Lai, Constrained matrix factorization for course score prediction, in: ICDM, 2019, pp. 1510–1515. [3] G. Zhu, Y. Wang, J. Cao, Z. Bu, S. Yang, W. Liang, J. Liu, Neural attentive travel package recommendation via exploiting long-term and short-term behaviors, Knowl. Based Syst. 211 (2021) 106511. [4] C.-D. Wang, W.-D. Xi, L. Huang, Y.-Y. Zheng, Z.-Y. Hu, J.-H. Lai, A BP neural network based recommender framework with attention mechanism, IEEE Trans. Knowl. Data Eng. (2020) 1–14, in press. [5] M. Mao, S. Chen, F. Zhang, J. Han, Q. Xiao, Hybrid ecommerce recommendation model incorporating product taxonomy and folksonomy, Knowl. Based Syst. 214 (2021) 106720. [6] J. Bobadilla, F. Ortega, A. Hernando, A. Gutiérrez, Recommender systems survey, Knowl. Based Syst. 46 (2013) 109–132. [7] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A survey and new perspectives, ACM Comput. Surv. 52 (1) (2019) 5:1–5:38. [8] Y. Shi, M.A. Larson, A. Hanjalic, Collaborative filtering beyond the useritem matrix: A survey of the state of the art and future challenges, ACM Comput. Surv. 47 (1) (2014) 3:1–3:45. [9] L. Chen, J. Cao, H. Chen, W. Liang, H. Tao, G. Zhu, Attentive multi-task learning for group itinerary recommendation, Knowl. Inf. Syst. 63 (7) (2021) 1687–1716. [10] H. Wang, N. Wang, D.-Y. Yeung, Collaborative deep learning for recommender systems, in: KDD, 2015, pp. 1235–1244. [11] S. Li, J. Kawale, Y. Fu, Deep collaborative filtering via marginalized denoising auto-encoder, in: CIKM, 2015, pp. 811–820. [12] S. Sedhain, A.K. Menon, S. Sanner, L. Xie, Autorec: Autoencoders meet collaborative filtering, in: WWW, 2015, pp. 111–112. [13] I. Shenbin, A. Alekseev, E. Tutubalina, V. Malykh, S.I. Nikolenko, RecVAE: A new variational autoencoder for Top-N recommendations with implicit feedback, in: WSDM, 2020, pp. 528–536. [14] H.-J. Xue, X.-Y. Dai, J. Zhang, S. Huang, J. Chen, Deep matrix factorization models for recommender systems, in: IJCAI, 2017, pp. 3203–3209. [15] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, T.-S. Chua, Neural collaborative filtering, in: WWW, 2017, pp. 173–182. [16] N. Nassar, A. Jafar, Y. Rahhal, A novel deep multi-criteria collaborative filtering model for recommendation system, Knowl. Based Syst. 187 (2020) 104811. [17] W. Fan, Y. Ma, Q. Li, Y. He, Y.E. Zhao, J. Tang, D. Yin, Graph neural networks for social recommendation, in: WWW, 2019, pp. 417–426. [18] X. He, K. Deng, X. Wang, Y. Li, Y.-D. Zhang, M. Wang, LightGCN: Simplifying and powering graph convolution network for recommendation, in: SIGIR, 2020, pp. 639–648. [19] L. Wu, J. Li, P. Sun, R. Hong, Y. Ge, M. Wang, DiffNet++: A neural influence and interest diffusion network for social recommendation, IEEE Trans. Knowl. Data Eng. (2020) 1–14, in press. [20] T. De Pessemier, T. Deryckere, L. Martens, Context aware recommendations for user-generated content on a social network site, in: EuroITV, 2009, pp. 133–136. [21] N. Chiluka, N. Andrade, J.A. Pouwelse, A link prediction approach to recommendations in large-scale user-generated content systems, in: ECIR, 2011, pp. 189–200. [22] C. Huang, W. Jiang, J. Wu, G. Wang, Personalized review recommendation based on users’ aspect sentiment, ACM Trans. Internet Tech. 20 (4) (2020) 42:1–42:26. [23] Y. Xu, J. Yin, Collaborative recommendation with user generated content, Eng. Appl. Artif. Intell. 45 (2015) 281–294. [24] L. Yu, J. Huang, G. Zhou, C. Liu, Z.-K. Zhang, TIIREC: A tensor approach for tag-driven item recommendation with sparse user generated content, Inf. Sci. 411 (2017) 122–135. [25] G. Lv, K. Zhang, L. Wu, E. Chen, T. Xu, Q. Liu, W. He, Understanding the users and videos by mining a novel danmu dataset, IEEE Trans. Big Data (2019) 1–16, in press. [26] W. Fu, Z. Peng, S. Wang, Y. Xu, J. Li, Deeply fusing reviews and contents for cold start users in cross-domain recommendation systems, in: AAAI, 2019, pp. 94–101.

6. Conclusions and future work Considering the user–item generation relation may benefit the recommender systems, particularly those lacking enough information for learning user preferences, e.g., implicit recommender systems with only implicit user–item interactions. However, it encounters a challenging issue raised by the dramatic imbalance between the number of user–item interactions and user– item generations. To this end, we have proposed a novel DIAG model, which integrates the user–item interaction relation, the user–item generation relation, and the item attribute information into one deep learning framework. Compared with the existing models, the novelty of DIAG lies in designing an item–item co-generation network for modeling the user–item generation information, which is able to alleviate the dramatic imbalance between the number of user–item interactions and the number of user–item generations. Then the item–item co-generation network and the item attribute information are fed into GAT for computing the item feature vectors. In this manner, the item attribute information and the user–item generation information are integrated into the item feature vectors. In addition, the user feature vectors are obtained by a weighted combination of the item feature vectors of the items generated by the corresponding users where the item popularity is regarded as the weight. Therefore, the user feature vectors also encode the item attribute information and the user–item generation information. Extensive experiments conducted on two real-world datasets have confirmed the superiority of the proposed DIAG model. The ablation study results have also demonstrated the necessity of using user–item generation information and item attribute information. One limitation of the proposed model may be in the predictive learning module, i.e., a simple vector concatenation is adopted to integrate the latent vector and the feature vector into the predictive vector. In our future work, we will leverage multi-view learning [41] for better integrating the latent vector and the feature vector by regarding them as the two views of representations. In addition, it is also fascinating to investigate the possibility of utilizing the user–item generation relation for explainable recommendation, which is also a very hot research topic in recommender system [42]. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments This work was supported by NSFC, China (62106079, 61876193, U1811263 and 61772211), Natural Science Foundation of Guangdong Province, China (2020A1515110337), Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University, China (2020B1212060032) and Open Foundation of Guangdong Provincial Key Laboratory of Public Finance and Taxation with Big Data Application, China. 10

L. Huang, B.-Y. Chen, H.-Y. Ye et al.

Knowledge-Based Systems 243 (2022) 108463 [35] G.-N. Hu, X. Dai, F.-Y. Qiu, R. Xia, T. Li, S. Huang, J. Chen, Collaborative filtering with topic and social latent factors incorporating implicit feedback, ACM Trans. Knowl. Discov. Data 12 (2) (2018) 23:1–23:30. [36] J. Ding, G. Yu, X. He, Y. Quan, Y. Li, T.-S. Chua, D. Jin, J. Yu, Improving implicit recommender systems with view data, in: IJCAI, 2018, pp. 3343–3349. [37] S.-Y. Chou, J.-S.R. Jang, Y.-H. Yang, Fast tensor factorization for large-scale context-aware recommendation from implicit feedback, IEEE Trans. Big Data 6 (1) (2020) 201–208. [38] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, Y. Bengio, Graph attention networks, in: ICLR, 2018, pp. 1–12. [39] R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in: NIPS, 2007, pp. 1257–1264. [40] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: ICLR, 2015, pp. 1–13. [41] Y. Li, M. Yang, Z. Zhang, A survey of multi-view representation learning, IEEE Trans. Knowl. Data Eng. 31 (10) (2019) 1863–1883. [42] Y. Zhang, X. Chen, Explainable recommendation: A survey and new perspectives, Found. Trends Inf. Retr. 14 (1) (2020) 1–101.

[27] P. Liu, L. Zhang, J.A. Gulla, Multilingual review-aware deep recommender system via aspect-based sentiment analysis, ACM Trans. Inf. Syst. 39 (2) (2021) 15:1–15:33. [28] Y. Liu, M. Xie, L.V.S. Lakshmanan, Recommending user generated item lists, in: RecSys, 2014, pp. 185–192. [29] Y. He, J. Wang, W. Niu, J. Caverlee, A hierarchical self-attentive model for recommending user-generated item lists, in: CIKM, 2019, pp. 1481–1490. [30] Y. He, Y. Zhang, W. Liu, J. Caverlee, Consistency-aware recommendation for user-generated item list continuation, in: WSDM, 2020, pp. 250–258. [31] C. Yang, L. Miao, B. Jiang, D. Li, D. Cao, Gated and attentive neural collaborative filtering for user generated list recommendation, Knowl. Based Syst. 187 (2020) 104839. [32] Z.-H. Deng, L. Huang, C.-D. Wang, J.-H. Lai, P.S. Yu, DeepCF: A unified framework of representation learning and matching function learning in recommender system, in: AAAI, 2019, pp. 61–68. [33] D.W. Oard, J. Kim, Implicit feedback for recommender systems, in: Proceedings of the AAAI workshop on Recommender Systems, 1998, pp. 81–83. [34] Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: ICDM, 2008, pp. 263–272.

11