Abstract
Multilabel text classification refers to assigning multiple relevant category labels to each text, which has been widely applied in the real world. To enhance the performance of multilabel text classification, most existing methods only focus on optimizing document and label representations, assuming accurate labeldocument similarity is crucial. However, whether the potential relevance between labels and if the problem of the longtail distribution of labels could be solved are also key factors affecting the performance of multilabel classification. To this end, we propose a multilabel text classification model called DVMLTC, which is based on a dualview graph convolutional network to predict multiple labels for text. Specifically, we utilize graph convolutional neural networks to explore the potential correlation between labels in both the global and local views. First, we capture the global consistency of labels on the global label graph based on existing statistical information and generate label paths through a random walk algorithm to reconstruct the label graph. Then, to capture relationships between lowfrequency cooccurring labels on the reconstructed graph, we guide the generation of reasonable cooccurring label pairs within the local neighborhood by utilizing the local consistency of labels, which also helps alleviate the longtail distribution of labels. Finally, we integrate the global and local consistency of labels to address the problem of highly skewed distribution caused by incomplete label cooccurrence patterns in the label cooccurrence graph. The Evaluation shows that our proposed model achieves competitive results compared to existing stateoftheart methods. Moreover, our model achieves a better balance between efficiency and performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Multilabel text classification (MLTC) is a crucial task in natural language processing that finds applications in various domains, including sentiment analysis [1], patent classification [2], and question answering [3]. The primary objective of MLTC is to assign one or more appropriate categories to a document using a set of predefined categories or labels. In recent years, the MLTC has garnered significant attention and has become an active area of research. However, the increasing number of labels and documents, coupled with the complex interrelationships between labels and documents, pose significant challenges to MLTC. These challenges have prompted researchers to delve deeper into the field of multilabel learning.
Previous research on MLTC focused on developing enhanced document representations. Various methods have been proposed for learning labelspecific document representations [4, 5]. Moreover, some studies have used attention mechanisms to capture labelsemanticbased representations [6,7,8] and documentlabel interaction representations [9,10,11,12]. Although these approaches have shown promising results, they have not fully explored the interactions between labelspecific semantic components, thereby ignoring the rich label cooccurrence information within documents.
In recent years, label cooccurrence graphbased methods have gained attention for their ability to exploit statistical correlations between labels to construct label cooccurrence graphs [13,14,15,16,17]. In this study, we refer to the view of label cooccurrence graphs built using statistical correlations as label global consistency. We identified two issues regarding label global consistency. First, statistical label correlations may exhibit a longtailed distribution, with some categories being common and most having only a few relevant documents. Figure 1 shows the longtailed distribution on RCV1 [18], where only a few labels have a large number of articles, and these head labels also have a high cooccurrence with other labels. Second, the cooccurrence patterns between label pairs obtained from the training data are frequently incomplete. For instance, in the AAPD, the labels “computers and society (cs.CY)” and “Physics and Society (physics.socph)” cooccurred 300 times in training set, while only 6 times in test set (0.009%). This imbalance in the cooccurrence frequency of labels within the data as well as between the training and testing sets, led to a highly skewed distribution [15]. Existing methods based solely on label global consistency model label relationships, which are based on prior statistical information from the training data, fail to address above two challenges effectively.
To address these challenges, we propose a dualview graph convolutional network for multilabel text classification (DVMLTC). The proposed method aims to model label cooccurrences from both global and local perspectives, thereby offering a comprehensive solution. First, to address the longtailed distribution problem, we introduce a strategy that generates label paths for the local label graph using a random walk. By reconstructing the local label graph based on this strategy, we effectively captured the relationships between lowfrequency cooccurring labels. This approach helps alleviate the longtail distribution issue and enhances the overall performance. Second, to address the highly skewed distribution problem caused by the incompleteness of label cooccurrence patterns in the label cooccurrence graph, we leverage the power of the graph convolutional networks (GCN) [16]. By employing a GCN, we can model rich cooccurrence patterns between labels from both global and local consistency perspectives. Additionally, label local consistency is proposed to measure the rationality of label cooccurrence in local neighborhoods, further improving the accuracy of the model. Furthermore, we incorporated attention flow to extract labelspecific semantic components from the document content. This allows us to merge the semantic information of the labels and obtain the initial embedding of the dualview graph convolution. Finally, we fuse the finegrained document information with learned label correlations for classification, resulting in a comprehensive and robust classification model.
This paper makes the following contributions:

We introduced a novel neural network that leverages dualview convolutions on label cooccurrence graphs for MLTC tasks. Our model combines learned label information from a dualview graph convolution with labelspecific document representations using a dual attention flow. This integration enhanced the overall performance of the model.

To effectively capture the cooccurrence patterns between labels, we leveraged both global and local label consistencies. Additionally, we employed a dynamic construction approach for the local label graph using a randomwalk strategy. This strategy enriches the cooccurrence patterns between labels and significantly improves the performance of multilabel text classification.

To evaluate the effectiveness of the proposed model, we conducted experiments on three commonly used benchmark datasets. The experimental results demonstrate the competitiveness of our model on these datasets, demonstrating its ability to achieve impressive performance in multilabel text classification tasks.
2 Related work
2.1 Enhancing documentlabel interaction in MLTC
With the widespread application of neural network methods in document representation, innovative deeplearning approaches have been developed. XMLCNN [4] uses a convolution neural network(CNN) and dynamic pooling to learn text representations for multilabel text classification. A sequencetosequence (Seq2Seq) model based on recurrent neuron networks (RNNs) was used to capture the correlations between labels [19,20,21]. Nevertheless, they treated all words uniformly and failed to discern the informative content within the documents. Considering the negative impact of label sequence order on Seq2Seq models, S2SLSAM [22] introduces a novel Seq2Seq model with distinct semantic attention mechanisms for labels. This model incorporates label semantics and textual features through the interaction of the label semantic attention mechanism, resulting in fused information comprising both label and textual information. MLReasoner [23] utilizes a sequence model as a text feature extractor and incorporates the prediction probabilities from the previous round as an additional input in the model to reflect label correlation. This approach mitigates the reliance on label order. The aforementioned methods do not model the rich cooccurrence relationships among labels. Moreover, these methods struggle to effectively address the longtail issue associated with labels.
Recently, attention mechanisms have been used in several studies to enhance the interaction between labels and words [24], labels and documents [6, 11, 25,26,27], and labels and labels [7], in order to learn specific labelspecific document representations for classification tasks. Some methods have taken a different approach by incorporating additional sources of knowledge to enhance labelspecific document representations [28,29,30]. These approaches exhibited promising results in MLTC, underscoring the importance of investigating semantic connections. However, they did not thoroughly explore the interactions among labelspecific semantic components, which could potentially enhance the prediction of lowfrequency labels. In our research, we introduced a labelword attention module and a labelsemantic selfattention module. The former extracts important semantics specific to labels from the wordlevel document information. The latter further helps capture labellevel semantic features. Our approach enriches the semantic information of labels by combining these two modules, and this enhanced representation has the potential to improve prediction accuracy, particularly for lowfrequency labels.
2.2 Label cooccurrence graph in MLTC
To apprehend profound correlations among labels in a graph structure and delve into the semantic interactions between labelspecific components in documents, a common approach involves utilizing label graphs based on statistical cooccurrence. MAGNET [14] constructs a label graph based on frequency. DXML [31] establishes an explicit label cooccurrence graph to explore label embeddings in a lowdimensional latent space. LiGCN [32] utilized a pretrained language model as the initial embedding of a labelword heterogeneous graph and achieved outstanding classification performance while paying attention to different word choices. The methods used by LRGCN [33] and GCNMTC [15] are similar. They constructed labeled graphs based on datadriven statistical information, and the former performed better than the latter. LDGN [34] adaptively modeled the interactions among labels using dualgraph convolutional neural networks. CFTC [35] first constructed a global label cooccurrence graph and then prevented confounding shortcuts using counterfactual techniques with the help of a human causal graph. SGCN [36] leverages text, words, and labels to construct a global heterogeneous graph for mining correlations between similar documents. Subsequently, an encoder is trained to extract semantic features from document nodes, followed by utilizing graph convolutional networks to classify the text nodes. TLCXML [37] initially constructs a label correlation graph using the semantic information of labels and symmetric conditional probabilities. Subsequently, strongly correlated labels are further grouped into the same cluster. Finally, graph convolutional networks are employed to extract the intercluster correlations among the label clusters. Nevertheless, each label is assigned to only one cluster, which severely ignores the semantic correlation of labels.
However, the majority of the above methods primarily focus on the label global consistency of label cooccurrence while neglecting the potential label local consistency, which could potentially enhance classification performance. By contrast, our proposed dualview convolution module is guided by prior knowledge from cooccurrence statistics and posterior information obtained from a dynamic random walk, which can effectively capture comprehensive interactions from different views, understand the potential relationships between labels through global and local modes in the data, and improve their classification performance.
3 Proposed model
As shown in Fig. 2, our model comprises two primary modules: 1) a labelspecific document representation based on dual attention flow. This module outlines the process of extracting labelspecific semantic components from the wordlevel information of each document and further extracting labelspecific semantic components. 2) Dualview graph convolutional networks for semantic interactive learning. We present a detailed description of how this module effectively explores and captures comprehensive interactions from distinct perspectives, guided by prior knowledge of statistical cooccurrences and posterior information obtained from a dynamic random walk. Our dualview convolution module can effectively explore and capture comprehensive interactions from distinct views guided by prior knowledge of statistical cooccurrences and posterior information obtained from a dynamic random walk.
3.1 Problem definition
In the MLTC problem, we have a document set denoted as \(D = \{{d_1, d_2,..., d_{D}\}}\), and a corresponding label set denoted as \(C = \{{c_1, c_2,..., c_{C}\}}\). Here, D represents the number of documents in the document set, and C represents the total number of labels. Each document \(d_i\) contains n words and is associated with labels \(c_i \in C\), where \(c_i \in \{{0, 1\}}^{C}\), indicating whether a label is relevant.
To achieve the goal of MLTC, which is assigning the most relevant label to a new document, we define a global label cooccurrences graph \(G = (V, E)\) where V represents nodes set and E represents edges set, as in previous work [13, 34, 38]. In this graph, the nodes represent the categories, and the nodes \(v_i\) correspond to the labels \(c_i\) in the label set C. The edges in the graph represent the statistical cooccurrences between categories. Specifically, we compute the conditional probability for all label pairs in the training set, yielding the global label cooccurrence matrix \(A^{G} \in R^{C \times C}\) Here, \(A^{G}_{(i,j)} = p(v_j  v_i)\) signifies the conditional probability of a document being categorized as \(c_j\) when it belongs to category \(c_i\). Notably G is a directed graph, therefore, \(A_{(i,j)}\) may not be equal to \(A_{(j, i)}\) owning the conditional probability calculations.
3.2 Labelspecific attention networks
Given a document D containing n words, we utilized bidirectional long shortterm memory (BiLSTM) to encode wordlevel semantic information in the document representation. BiLSTM leverages its bidirectional nature to effectively capture contextual information by processing word sequences in both forward and backward directions. This enables a thorough understanding of the document’s semantic context. Upon applying BiLSTM, we obtained two sets of hidden states: forward and backward. These hidden states encapsulate the contextual information of the words within a document. To create a comprehensive word sequence representation, we concatenated the forward and backward hidden states, resulting in the matrix \(H \in \mathbb {R}^{n \times {d_a}}\). \(d_a\) denotes the dimensions of the word vectors. By concatenating the forward and backward hidden states, semantic information can be captured in both directions, thereby creating a robust and holistic representation of the word sequence within the document.
3.2.1 Labelword attention
Labels possess distinctive semantics in the context of text classification, concealed within their textual representations or descriptions. To capitalize on this semantic information, labels undergo preprocessing and are symbolized as trainable matrices \(L\in R^{C\times d_a}\) in the same latent \(d_a\)dimensional space as words. To ascertain determine the semantic relationship between each pair of words and labels, scaled dotproduct attention is employed:
where L is the query vector, H is the key vector and the value vector. \(u_i\) is the ith row vector of \(U^{w} \in R^{C \times d_a}\), denoting the semantic component in the document associated with the label \(c_i\). This representation is based on labeled text, which can be called the LabelWord (LW) attention mechanism.
3.2.2 Labelsemantic selfattention
Multiple labels may be assigned to labeled documents, and each document should encompass the contexts most relevant to its corresponding labels. Consequently, each document may comprise multiple components, and the words within a document may contribute differently to each label. To capture these distinct components of each label, a selfattention mechanism is employed. The labelsemantic (LS) selfattention score (\(Q\in R^{C \times n}\)) is calculated as follows:
where \(W_1 \in R^{d_b \times d_a}\) and \(W_2 \in R^{C \times d_b}\) are selfattention parameters that must be trained. \(d_b\) is a hyperparameter.
Labelspecific semantic components are extracted from text content using a novel approach that incorporates both labelword attention \(U^{w}\) and labelsemantic selfattention \(U^{s}\). By combining these attention flows, we obtain the labelspecific document representation \(U=U^{w}+U^{s}\), which is calculated as the sum of \(U^{w}\) and \(U^{s}\). Our approach draws inspiration from previous works, such as [25] and [39], which also utilized attention mechanisms. However, the dualattention flow module distinguishes itself based on two key aspects. First, we focused on the interaction between documents and labels, enabling a more targeted exploration of their relationships. Second, our calculation method is designed to be more straightforward and efficient while still delivering superior performance.
The resulting labelspecific document representation U serves as the input for the subsequent module: the dualview convolutional networks. These networks further process and capture the interactions between the extracted semantic components.
3.3 Dualview graph convolutional networks
To capture the interactions between labelspecific semantic components from multiple perspectives, we employed a dualview interaction approach. Specifically, we utilize global and local consistency convolutions. In the global consistency convolution, we construct a global label cooccurrence graph and apply GCN to achieve global consistency. This convolution leverages the cooccurrence patterns between labels captured by the global label cooccurrence graph. In the local consistency convolution, we generated a local label cooccurrence graph using a random walk strategy. Subsequently, we employed a GCN to perform local consistency convolution. This convolution focuses on enhancing the cooccurrence patterns between labels based on the local context captured by the local label cooccurrence graph. These convolutions consider distinct interaction views, thereby enhancing the cooccurrence patterns between labels.
3.3.1 GlobalConv
To establish deep relationships between labelspecific semantic components guided by statistical label correlations, we employ a global consistency convolution (GlobalConv). We leverage a GCN layer to propagate messages between neighboring labeled nodes, thereby enhancing their representation of these labeled nodes. The layerbylayer propagation rules are defined as follows:
where \(A^G\) in (3) is the global label cooccurrence graph. \(\sigma (\cdot )\) represents the LeakyReLU activation function. \(\hat{A}^G\) represents the normalized adjacency matrix of \(A^G\). \(D_1\) is the degree matrix of \(A^G\) and \(W^G \in \mathbb {R}^{d_a \times d_c}\) denotes the transformation matrix that must be learned. GlobalConv uses the initialized components \(U \in \mathbb {R}^{C \times d_a}\) and \(A^G\) as inputs and ultimately generates \(H^G \in \mathbb {R}^{C \times d_c}\), where \(d_c\) denotes the dimensionality of the final node representation.
GlobalConv primarily performs a 1hop diffusion process in each layer by leveraging prior statistical relationships present in the dataset. As described in a previous study [40], this process only considers the addition of feature vectors from neighboring nodes to account for the feature relationships between them. However, the statistical label correlations obtained from training data can be incomplete and noisy, and the cooccurrence patterns between label pairs may suffer from longtailed distributions [15]. Recognizing this limitation motivated us to assign a certain probability to lowfrequency cooccurring labels, indicating that they might belong to the same text rather than being directly filtered as noise. We enabled the model to learn more effective propagation and richer cooccurrence patterns by introducing local consistency convolution.
3.3.2 LocalConv
In addition to the graph structure information defined by the adjacency matrix \(A^G\), we utilized positive pointwise mutual information (PPMI) to encode the potential relationship between label pairs. First, we calculated the frequency matrix F using a random walk. Subsequently, we derived the local graph label cooccurrence graph \(A^L \in \mathbb {R}^{C \times C}\) based on F. Finally, we performed a local consistency convolution.
A random walk can be characterized as a Markov chain that delineates the sequence of nodes visited by a random walker [40]. We define a state as \(s(m) = v_i\) if a random walker is on node \(v_i\) at time m. The transition probability of moving from the current node \(v_i\) to one of its neighbors \(v_j\) is denoted as \(p(s(m+1)=v_js(m) = v_i)\). In our problem setting, given a prior label cooccurrence matrix \(A^L\), we assign:
This assignment ensures that the transition probability is proportional to the label cooccurrence in \(A^L\), thereby incorporating semantic information into the random walk process.
Algorithm 1 outlines the calculation of the frequency matrix F using random walk. This algorithm can be parallelized by simultaneously performing multiple random walks on different parts of a graph.
Following the computation of the frequency matrix F, the ith row in F corresponds to the row vector \(F_{i,:}\), while the jth column in F corresponds to the column vector \(F_{:,j}\). Specifically, \(F_{i,:}\) represents the path node context for node \(v_i\), and \(F_{:,j}\) represents the path neighbor node contextj. Moreover, \(F_{i,j}\) denotes the number of cooccurrences of \(v_i\) and \(v_j\) in all generated paths. A higher value of \(F_{i,j}\) indicates a greater frequency of cooccurrence between the two nodes.
Using the frequency matrix F, we transform it into a PPMI matrix as follows:
We apply (6) to encode the potential relationship between label pairs in F. Here, \(p_{i,j}\) represents the estimated probability of node \(v_i\) appearing in context context\(_j\); \(p_{i,}\) denotes the estimated probability of node \(v_i\), and \(p_{,j}\) indicates the estimated probability of the context context\(_j\). The adjacency matrix based on the label local consistency is computed as follows:
where \(PMI_{i,j}\) is the pointwise mutual information between node \(v_i\) and context contextj. The PPMI matrix \(A^L\) represents the adjacency matrix based on label local consistency, where any negative PMI value is set to zero.
Similar to GlobalConv, we defined an independent singlelayer GCN for LocalConv based on \(A^L\). The graph convolutional networks is given by:
where \(\hat{A}^L\) denotes the normalized label local consistency matrix, \(D_2\) is the degree matrix of \(A^L\), and \(W^L \in \mathbb {R}^{d_c \times d_c}\) is a training parameter. Notably, the dynamically reconstructed \(A^G\) based on random walk ensures label local consistency, where labels that appear on the same path are reasonably considered to belong to the same text. In addition, as the path length increases within a reasonable range, the importance of the labels becomes more prominent. Moreover, the nonpositive values in the PPMI matrix were automatically filtered out, preventing lowfrequency cooccurrence labels such as noise from disturbing the model.
Both \(H^G\) and \(H^L\) represent graph convolutionbased label representations, with the former focusing on the similarity of global labels and the latter emphasizing the cooccurrence plausibility from local perspectives. These representations had different training parameters. In this task, concatenation is employed to integrate them.
The labelspecific document representation generated under the guidance of global and local consistency can be described as matrix \(Z \in \mathbb {R}^{C \times 2d_c}\). We then make label predictions using a trainable linear layer followed by a sigmoid activation function:
where \(W_3\) represents the weights of the linear layer and \(b_2\) is the bias. Let \(y \in \mathbb {R}^{C}\) denote the true label of a document, where \(y_i \in \{{0, 1\}}^{C}\) indicates whether label i is present in the document. The proposed model was trained using multilabel crossentropy loss as follows:
In (10), N represents the number of documents, C represents the number of labels, and \(y_{ij}\) and \(\hat{y}_{ij}\) denote the true and predicted values, respectively, for the jth label of the ith document.
4 Experiment
4.1 Datasets and evaluation metrics
We evaluate the proposed model on three benchmark multilabel text classification datasets:
RCV1^{Footnote 1}: RCV1 [18] was collected and manually classified by Reuters, which collected more than 80k news texts and corresponding multiple labels from 1996 to 1997. Moreover, the testing set consisted of a significantly larger number of examples than the training set. This aspect allowed for a comprehensive evaluation of the generalization capability of the proposed model.
AAPD^{Footnote 2}: AAPD [19] was constructed by gathering the abstracts and their corresponding subjects from a computer science academic website encompassing 55,840 papers.
EURLex^{Footnote 3}: EURLex [41] is an extreme multilabel text classification dataset comprising documents related to European Union law across 3956 subjects. The public version includes 11585 instances for training and 3865 instances for testing.
These datasets were meticulously chosen due to their widespread usage and large scale, allowing us to validate the efficiency of the proposed model. Additionally, to maintain consistency with prior research, we employed the same dataset partitioning as those in earlier studies [25, 34]. These partitions were the original ones provided by the publishers of the datasets. Detailed statistics for the datasets are presented in Table 1.
Following the established conventions of previous studies [24, 25, 33, 34], we employed the accuracy of the top k (P@k) and the normalized discounted cumulative gain of the top k (nDCG@k) as performance evaluation metrics for all three datasets.
The word embeddings in our model were initialized with 300dimensional GloVe [42] word vectors that were trained on the dataset using the Skipgram [43] algorithm. The hidden sizes of the BiLSTM and GCN layers were set to 300 and 512, respectively. For the AAPD, we established q = 2 and t = 400. We determined that q = 3 and t = 450 for RCV1. Finally, for the EURLex, we set t = 3 and t = 600. We employed the Adam optimization method to minimize crossentropy loss. The learning rate was initialized to 1e3, and a cosineannealing algorithm was applied to gradually reduce the learning rate during training. To ensure a fair comparison with related baselines using the large language model (LLM), we also implemented an LLMbased version of our model. In this version, we used the word sequence token RoBERTa [44] as the output of the labelspecific attention network module in our model. The model was trained for 15 epochs with a batch size of 64. The best parameter configuration was selected based on the performance of the validation set and evaluate using a testing set.
4.2 Baselines
To demonstrate the efficiency of the proposed model, it was compared with models that achieved stateoftheart results using selected datasets. For a fair comparison, we only reused the experimental results when selecting baselines instead of reimplementing them to maintain the recommended optimal settings and results. In addition, for models that were not implemented on specific datasets, we reimplemented these models with their source codes and then evaluated them on selected datasets.
Enhancing documentlabelbased methods

XMLCNN [19]: A sequence generative model that labels correlations as an ordered sequence.

AttentionXML [24]: A model that constructs the labelaware document representation solely based on the document content.

LSAN [25]: Labelaware attention framework based on selfattention and label attention mechanisms.

HTTN [7]: This proposes a headtotail network that transfers metaknowledge from headlabels to taillabels.

MLGN [26]: A multilabel guided network capable of guiding document representation with multilabel semantic information.
Label graphbased methods

DXML [31]: A deep embedding method that simultaneously models the feature and label space.

MAGNET [14]: A model based on graph attention networks. Capturing the attentiondependent structure between labels using features and correlation matrices was proposed. In addition, the model uses BiLSTM to extract text features.

LAHA [6]: LAHA focuses on using hybrid attention to represent documents with labels. The model comprises three components: a multilabel selfattention mechanism that identifies each word’s association with labels, a depiction of label arrangement and document context, and an adaptive fusion method for classification.

LDGN [34]: A dualgraph convolution network that incorporates category information and models adaptive interactions among labels in a reconstructed graph.

LiGCN [32]: A label interpretable graph model that solves the MLTC problem by modeling tokens and labels as nodes in a heterogeneous graph and uses the pretrained language model BERT as a text encoder.

LAMLTC [39]: A labelaware network built (which we refer to as LAMLTC) a heterogeneous graph including words and labels to learn the label representation and text representation by metapath2vec.

LRGCN [33]: A multilabel text classification model combining a pretrained language model and a GCN.
4.3 Performance comparison of different methods
The performances of the different models on the three datasets are listed in Tables 2, 3, and 4 in terms of P@k and nDCG@k, respectively. For each row, the best result is highlighted in bold, and the secondbest result is underlined.
As shown in Tables 2, 3, and 4, the proposed DVMLTC model outperforms previous works on all three datasets. Specifically, the DVMLTCenhanced version of Roberta achieves better or more competitive performance on all metrics and significantly improves the previous baseline best scores compared to those with the shared source code. For example, on EURLex, DVMLTC improves P@1 and nDCG@3 from 82.59% to 83.61% and from 72.15% to 74.62%, respectively. Compared with the best baseline LRGCN on RCV1 and AAPD, our proposed model still performs better or is competitive on all metrics.
Furthermore, by observing the results in Tables 2, 3, and 4, we can see that methods that do not incorporate label correlation to improve the learning process of textual representations demonstrate inferior performance. Specifically, on AAPD, AttentionXML elevated the P@1 value of DXML from 80.54% to 83.02%, marking an increase of approximately 3.08%. It is plausible that while DXML seeks to represent information in the label space using deep embedding techniqus, AttentionXML can concentrate on the more semantically relevant document sections for each label. Nevertheless, AttentionXML solely focuses on encoding text content in the presentation layer without considering label information, thus restricting its capacity to adjust contextual representations through interactions.
The better performance of LSAN compared to other previous approaches for exploring documentlabel relationships, such as HTTN and MLGN, may be attributed to its multiview learning space mechanism and the fact that LSAN considers semantic correlations between text and labels simultaneously. The multiview learning mechanism helps stabilize adaptive fusion through the attention mechanism, which learns the text representation specific to the labels.
We observed that LRGCN performed best on RCV1 in terms of the nDCG@3. This can be explained by initializing text embedding using the pretrained language model Roberta, which can efficiently extract finegrained document information. In contrast, our model uses a simple BiLSTM architecture to represent the input text and achieves optimal or nearoptimal results. In addition, we used Roberta’s version of word embedding to obtain the same word embeddings as the LRGCN. The results of AAPD and EURLex demonstrate the effectiveness of our dualview graph convolutional networks module, with DVMLTC\(_{RoBERTa}\) achieving the best results compared to the competing models.
LDGN [34] demonstrated competitive performance on all datasets, which may be attributed to its adaptive interaction component, benefiting from a large number of adaptive parameters. Inspired by LDGN, we propose an adaptive reconstruction of the graph based on random wandering. However, the LDGN adaptive module operates as a black box, and its parameter guidance lacks explicit transparency. By contrast, our dualgraph module allows parameter sharing and provides natural interpretability. This allowed us to conduct further research on our model, particularly on parameter tuning and its implications.
We also observed that the methods that utilized labeled graphs outperformed the documentlabel based methods overall, which highlights the advantage of MLTC methods with graphs, as most of them incorporate rich interaction information to improve multilabel text prediction. The exception is the LAHA based on simple label cooccurrence, which we hypothesize captures only the representation of labels from the label cooccurrence graph without further exploring the deep relationships between labels.
4.4 Comparison on sparse dataset
To evaluate the performance of DVMLTC on longtailed labels, we categorized the labels in EURLex into three groups based on their frequency of occurrence, following the approach in [6, 25]. Figure 3 illustrates the distribution of label frequencies on EURLex, where f represents the label frequency. Approximately 55% of the labels appeared one to five times, constituting the first label collection (Collection1). The labels that appeared 537 times were assigned to Collection2, accounting for 35.35% of the entire label set. The remaining 10% of frequent labels formed the final collection (Collection3). Clearly, Collection 1 presents greater difficulty compared to the other two collections due to the lack of training data. Obviously, Collection 1 is much more difficult than the other two collections owing to the lack of training data.
Figure 4 shows the prediction results in terms of P@1, P@3, and P@5 obtained using AttentionXML, DXML, and DVMLTC, respectively. The three methods improved from Collection1 to Collection3, which is reasonable because each label contained an increasing number of documents from Collection1 to Collection3. DVMLTC significantly improves the predictive performance of Collection1. Particularly, DVMLTC achieved an average gain of over 55.83%, 96.22%, and 47.36% for AttentionXML on the three metrics of Collection1, and 63.41%, 121.73%, and 44.37% for DXML, respectively. This result demonstrates the superiority of the proposed model for multilabel text data with tail labels.
4.5 Ablation experiments
A series of ablation experiments was performed to assess the importance and necessity of each module. We performed ablation experiments on all three datasets and divided the experiments into two groups: Group1 and Group2.
Group1 experiments focus on the modules related to dualgraph convolution. The ablation components tested in this group were as follows:

1.
\(w/o \ LW\): our model without LW attention

2.
\(w/o \ LS\): our model without LS attention

3.
\(w/o \ dual \ attn\): our model without dual attention
Group2 experiments focus on modules related to dual attention. The ablation modules tested in this group were as follows:

1.
\(w/o \ GlobalConv\): our model without GlobalConv

2.
\(w/o \ LocalConv\): our model without LocalConv

3.
\(w/o \ DualConv\): our model without dualattention

4.
\(sharing \ para\): GlobalConv and LocalConv share the parameters of the GCN layer
From the results shown in Fig. 5 of the ablation experiments conducted on AAPD and RCV1, several observations about Group1 can be made: Dual Attention Flow Module: \(w/o \ LW\) and \(w/o \ LS\) outperformed w/odualattn, with large margins of 3.03% and 2.21% on AAPD, indicating that both attention flows enhance the model and are indispensable. In other words, both the labelword attention and labelsemantic selfattention modules contribute to the performance of proposed model. Label attention considers the interaction between the label and word information and captures the contribution of words to labels. Selfattention, on the other hand, focuses on the semantic information of the labels themselves.
Conclusions about Group2: (1) DualView Convolutional Modules: The experiments \(w/o \ LocalConv\) and \(w/o \ GlobalConv\) outperform \(w/o \ DualConv\), such as on RCV1, with better results of 1.98% and 1.82% on P@3. This indicates that exploring either global label or local consistency can effectively capture the semantic interactions between labelspecific components. The superiority of \(w/o \ LocalConv\) over \(w/o \ GlobalConv\) suggests that models with global consistency convolution have a significant impact on classification improvement, indicating their ability to capture semantic dependencies effectively. (2) \(w/o \ GlobalConv\) improves \(w/o \ DualConv\): The experiment \(w/o \ GlobalConv\) improves the performance of the model based on the dual attention flow, indicating that incorporating the new label cooccurrence relationship generated through a random walk and mutual information can benefit the model’s performance. (3) \(Sharing \ Parameters\): The experiment involving parameter sharing between the global convolution and local convolution shows slightly lower performance compared to the complete model. This suggests that the two sets of GCNs, which model label correlation from different perspectives and interactions, benefit from separate parameter operations rather than sharing. (4) Overall Model: The complete model, which combines dual attention flow and dualview convolutions while separating the parameters, achieves the best performance. These results demonstrate the efficacy of the suggested modules and their contributions to the overall performance of the model in capturing label dependencies and semantic interactions.
We visualized the label cooccurrence graph matrices \(A^G\) and \(A^L\) on the AAPD, as shown in Fig. 6. From the visualization, we can observe that the global label cooccurrence graph matrix \(A^G\) exhibits a longtail distribution, where there are many edges with very few cooccurrences. This distribution was based on prior statistics from a corpus. However, these lowfrequency edges may be considered noise data, and they can lead to overfitting and negatively affect classification performance. The variant without the \(A^L\) matrix (\(w/o \ LocalConv\)) did not perform optimally. This is because \(A^G\) alone, which builds a cooccurrence graph based on statistical cooccurrence, cannot provide sufficient semantic confidence between the label pairs. The dynamic edge adjustment performed by \(A^L\) through a random walk leads to a softer performance in the visualization graph. It assigns a certain edge weight to lowfrequency cooccurring label pairs, thereby allowing them to overcome the influence of lowfrequency noise. This adjustment is beneficial for the GCN because it strengthens the interactions between node pairs in the network. As for \(A^L\), the \(A^L\) with an iteration number of 1000 tends to exhibit more smoothness compared to the \(A^L\) with an iteration number of 450. Oversmoothing makes it difficult to distinguish the differences in label cooccurrence, potentially degrading the classification performance. Our proposed model integrates \(A^G\) and \(A^L\) using GlobalConv and LocalConv, respectively, and leverages both statistical cooccurrence information and dynamic edge adjustment based on random walks, leading to improved classification results.
Overall, the visualization of the weight matrices confirmed the effectiveness of incorporating both \(A^G\) and \(A^L\) in capturing label dependencies and enhancing the performance of the classification model.
4.6 Parametric analysis
We performed relevant experiments on our model using the AAPD. We used the base version of the model in the parametric analysis.
4.6.1 Effect of iteration number t on classification
We investigated the effect of the number of iterations, denoted as t, on the classification performance. The number of iterations determined the number of label paths generated by node resampling. By controlling the other parameters and varying the value of t, the impact on the classification performance was analyzed, as shown in Fig. 7. The experimental results show that when the number of iterations was small, the performance improvement of the model was insignificant. This is because the local label graph that captures local label dependencies fails to effectively capture the key and tail labels. Consequently, the role of all the local label graphs becomes similar to that of a global graph, leading to limited performance gains. As the number of iterations increased, specifically reaching a certain scale (e.g., 450), the local dynamics strongly enhanced the interaction between the key and tail graphs. By leveraging the powerful information diffusion ability of the GCN, the model achieved improved classification performance. This indicates that a sufficient number of iterations allows the local dynamics to capture crucial graph dependencies, resulting in enhanced classification accuracy.
Increasing the number of iterations beyond the optimal value did not significantly affect the model’s performance. This suggests that once the key and tail graph nodes are effectively captured and the interaction between graphs is strengthened, further increasing the number of iterations has little effect on the model. In summary, the experimental results demonstrate that the number of iterations, t, plays a crucial role in capturing graph dependencies through local dynamics. Finding the optimal value of t allows the model to effectively enhance graph interactions and improve classification performance.
4.6.2 Effect of path length q on classification
The path length parameter q plays a crucial role in the classification performance of our model, particularly in the LocalConv module. It determines the farthest distance that the random walk can traverse based on probability, with labels on the same path considered to belong to the same document. In our experiments on AAPD, we investigated the impact of q on the classification accuracy while maintaining t at an optimal value of 450. The results shown in Fig. 8 indicate that the choice of q significantly affects the performance of the model, which is consistent with our expectations. Within a reasonable range (e.g., 2 or 3), the model achieved the best classification results, suggesting that the label paths adaptively generated by the model have significant benefits. However, when q exceeds a certain threshold (e.g., 3), the performance of the model begins to decline slightly. We speculate that excessively long label paths result in excessively consistent cooccurrence relationships between nodes during the iterative process. This exacerbates the problem of oversmoothing, ultimately interfering with the discriminative power of the labels in the model. Nevertheless, by integrating the LocalConv and GlobalConv modules, our model maintains its robustness and achieves optimal performance. This highlights the effectiveness and resilience of our approach in capturing label dependencies and enhancing the classification outcomes.
4.6.3 Effect of labelratio
To assess the sensitivity and performance of the proposed model under different training data proportions, we conducted experiments using various ratios of training data. We also compared our model with other competitive approaches, namely XMLCNN [4], AttentionXML [24], LSAN [25], and LRGCN [33], while maintaining their respective settings, as described in their papers. In the case of LSAN, we utilized Word2vec for word embeddings because of the absence of pretrained embeddings in its source code. Figure 9 shows the evaluation results for different data scales with proportions of 0.05, 0.10, 0.25, 0.50, and 0.75. It is evident from the results that our model consistently achieves competitive performance compared to the baselines. Notably, our model outperformed the baseline models, particularly at low data percentages (< 0.25). We conjecture that this may be attributed to our dualview convolution module, in which local convolutions yield richer graph cooccurrence patterns, particularly in the case of few labels. This finding demonstrates that our model is robust and insensitive to the training data ratio. Therefore, it can effectively handle scenarios where only a limited number of training samples are available, making it applicable to realworld situations.
4.7 Complexity analysis
Notice that the time complexity of the model primarily arises from F in the Algorithm 1. The time complexity is \(O(ctq^2)\). Moreover, considering that the parameters t and q are set as small integers in experiments, F can be rapidly computed. Additionally, the algorithm can be parallelized by conducting multiple random walks simultaneously on different parts of a graph. Therefore, the time complexity of the model was deemed acceptable.
Compared with other graphbased models such as MAGNET, LDGN, and LRGCN, which have shown excellent results in comparative experiments, our model achieves a favorable balance between complexity and efficiency. One of the main contributors to the time complexity of the MAGNET is its graph attention networks. Assume that number of nodes is c, the number of edges is e, and the dimensions before and after feature transformation are d and \(d'\), respectively. The time complexity of MAGNET can be expressed as \(O(cdd') + O(ed')\). Owing to the potentially large number of edges (e) and relatively large dimensions (d and \(d'\)), the event complexity of MAGNET was relatively high. Similarly, for the LDGN, the computational complexity primarily arises from the dynamic reconstruction graph with a time complexity of \(O(cdd')\). In a laboratory setting, where dimensions d and \(d'\) are relatively large, the event complexity of the LDGN is also high. In comparison, the suboptimal model LRGCN does not involve redundant multiplication calculations, resulting in a slightly better time complexity than our model. However, as mentioned previously, the time complexity of our model remains acceptable. Hence, our model achieves a satisfactory balance between complexity and efficiency. In summary, although other graphbased models may outperform our model in certain comparative experiments, the advantageous balance between complexity and efficiency of our model makes it a valuable choice for practical applications.
4.8 Case studies and visualizations
To further verify the effectiveness of our label attention module and dual graph neural networks in DVMLTC, we present a typical case and visualize the similarity scores between the attention weights of document words and labelspecific components using tsne [45]. We show a testting instance from the original AAPD dataset which belongs to three categories: “physics and society” (physics.socph), “computers and society” (cs.CY), and “social and information networks” (cs.SI).
4.8.1 Label attention visualization
Figure 10 shows the label attention, revealing how different labels focus on specific parts of the document text. Each label assigns importance to its set of words for classification. For instance, in the “physics.socph” category, words like “user behaviors” and “evolution over time” were highlighted, capturing key concepts in physics within a social context. In the “cs.CY” category, words such as “user conversations”, “dynamic model”, “growth dynamics and structural properties," and "underlying rules” were emphasized, indicating a focus on computers and society. In the “cs.SI” category, attention was given to words such as “artificial factors”, “line conversations”, and “social media websites”. By examining the specific words that receive attention in each category, we gain insights into the semantics and distinguishing aspects of these categories. These visualizations intuitively demonstrate the effectiveness of the model in capturing relevant information in document text for accurate labeling.
4.8.2 Label cooccurrence graph visualization
Figure 11 visualizes the label graph, showing the roles of GlobalConv and LocalConv in capturing the label cooccurrence patterns. The heat maps in Fig. 11 represent the label cooccurrence matrices \(A^G\) and \(A^L\). In Fig. 11(a), the heat map shows \(A^G\) based on GlobalConv. However, GlobalConv failed to accurately discern the relationships between the labels in this specific test case. Notably, the cooccurrence of “computers and society (cs.cy)” and “adaptation and selforganizing systems (nlin.AO)” was not considered significant. This limitation arises from relying solely on global statistical information, which may overlook label correlations in individual instances. Conversely, Fig. 11(b) displays \(A^L\) based on LocalConv. This highlights the crucial role of LocalConv in establishing local connections between the labels. Even for label pairs with low cooccurrence, such as “computers and society (cs.CY)” and “physics and society (physics.socph)”, LocalConv assigns a label correlation. Multiple label paths generated by LocalConv generalize label relationships based on model sampling, independent of human influence. Consequently, LocalConv captures finer label associations, providing a comprehensive understanding of label cooccurrence patterns. In summary, the visualization of the label graph demonstrates how LocalConv effectively supplements the label correlations that GlobalConv alone cannot capture.
5 Conclusion and future tasks
In this study, we propose a novel dualview convolutional neural network for multilabel text classification. Our approach systematically addresses graph relationships within cooccurrences by employing global and local consistency perspectives. The global consistency convolution utilizes GCNs to model the statistical relationships among graphs based on correlation. For local consistency convolution, we strategically generate graph paths through random walks, reconstruct local graphs, and enrich the cooccurrence patterns. The initial word embeddings were generated via a dual attention flow. Extensive experiments revealed superior performance on RCV1 and EURLex and competitive results on AAPD, highlighting a favorable complexityefficiency balance. Our approach is effective in enhancing classification performance and mitigating longtailed issues. Future enhancements include constructing dynamics for sample subsets to reduce computational overhead and further exploring the leveraging of additional graph information for multigraph text classification.
Availability of Data and Materials
The datasets analyzed during the current study were all derived from the following public domain resources. [AAPD: https://git.uwaterloo.ca/jimmylin/Castordata/tree/master/datasets/AAPD/; RCV1: http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm; EURLex: http://nlp.cs.aueb.gr/software.html].
References
Huang B, Guo R, Zhu Y, Fang Z, Zeng G, Liu J, Wang Y, Fujita H, Shi Z (2022) Aspectlevel sentiment analysis with aspectspecific context position information. KnowlBased Syst 243:108473. https://doi.org/10.1016/j.knosys.2022.108473
Tang P, Jiang M, Xia BN, Pitera JW, Welser J, Chawla NV (2020) Multilabel patent categorization with nonlocal attentionbased graph convolutional network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 9024–9031. https://ojs.aaai.org/index.php/AAAI/article/view/6435
Liu W, Wang H, Shen X, Tsang IW (2022) The emerging trends of multilabel learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974. https://doi.org/10.1109/TPAMI.2021.3119334
Liu J, Chang WC, Wu Y, Yang Y (2017) Deep learning for extreme multilabel text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’17, pp 115–124. Association for computing machinery. https://doi.org/10.1145/3077136.3080834
Wu H, Qin S, Nie R, Cao J, Gorbachev S (2021) Effective collaborative representation learning for multilabel text categorization. IEEE Trans Neural Netw Learn Syst 33(10):5200–5214
Huang X, Chen B, Xiao L, Yu J, Jing L (2022) Labelaware document representation via hybrid attention for extreme multilabel text classification. Neural Process Lett 54(5):3601–3617
Xiao L, Zhang X, Jing L, Huang C, Song M (2021) Does head label help for longtailed multilabel text classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp 14103–14111
Zong D, Sun S (2023) Bgnnxml: bilateral graph neural networks for extreme multilabel text classification. IEEE Trans Knowl Data Eng 35(7):6698–6709
Zhang QW, Zhang X, Yan Z, Liu R, Cao Y, Zhang ML (2021) Correlationguided representation for multilabel text classification. In: IJCAI, pp 3363–3369
Ionescu RT, Butnaru A (2019) Vector of locallyaggregated word embeddings (vlawe): a novel documentlevel representation. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), pp 363–369. https://doi.org/10.18653/v1/N191033. https://aclanthology.org/N191033
Liu M, Liu L, Cao J, Du Q (2022) Coattention network with label embedding for text classification. Neurocomputing 471:61–69
Wang J, Chen Z, Qin Y, He D, Lin F (2023) Multiaspect coattentional collaborative filtering for extreme multilabel text classification. KnowlBased Syst 260:110110. https://doi.org/10.1016/j.knosys.2022.110110
Chen ZM, Wei XS, Wang P, Guo Y (2019) Multilabel image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5177–5186
Pal A, Selvakumar M, Sankarasubbu M (2020) Magnet: multilabel text classification using attentionbased graph neural network. In: Proceedings of the 12th international conference on agents and artificial intelligence 1, vol 2, pp 494–505. https://doi.org/10.5220/0008940304940505
Vu H, Nguyen M, Nguyen V, Tien M, Nguyen V (2022) Label correlation based graph convolutional network for multilabel text classification. In: 2022 International joint conference on neural networks (IJCNN), pp 01–08. https://ieeexplore.ieee.org/abstract/document/9892542
Kipf TN, Welling M (2017) Semisupervised classification with graph convolutional networks. In: International conference on learning representations (ICLR)
Liang Z, Guo J, Qiu W, Huang Z, Li S (2024) When graph convolution meets double attention: online privacy disclosure detection with multilabel text classification. Data Min Knowl Discov 1–22
Lewis DD, Yang Y, RussellRose T, Li F (2004) Rcv1: a new benchmark collection for text categorization research. J Mach Learn Res 5:361–397
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) Sgm: sequence generation model for multilabel classification. In: Proceedings of the 27th international conference on computational linguistics, pp 3915–3926. https://aclanthology.org/C181330
Yang P, Luo F, Ma S, Lin J, Sun X (2019) A deep reinforced sequencetoset model for multilabel classification. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 5252–5258. https://aclanthology.org/P191518
Liao W, Wang Y, Yin Y, Zhang X, Ma P (2020) Improved sequence generation model for multilabel classification via cnn and initialized fully connection. Neurocomputing 382:188–195
Zhang X, Tan X, Luo Z, Zhao J (2023) Multilabel sequence generating model via label semantic attention mechanism. Int J Mach Learn Cybern 14(5):1711–1723
Wang R, Ridley R, Qu W, Dai X et al (2021) A novel reasoning mechanism for multilabel text classification. Inf Process Manage 58(2):102441
You R, Zhang Z, Wang Z, Dai S, Mamitsuka H, Zhu S (2019) Attentionxml: label treebased attentionaware deep model for highperformance extreme multilabel text classification. In: Advances in neural information processing systems, vol 32, pp 5820–5830
Xiao L, Huang X, Chen B, Jing L (2019) Labelspecific document representation for multilabel text classification. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLPIJCNLP), pp 466–475. Association for Computational Linguistics. https://aclanthology.org/D191044
Liu Q, Chen J, Chen F, Fang K, An P, Zhang Y, Du S (2023) Mlgn: a multilabel guided network for improving text classification. IEEE Access 11:80392–80402. https://doi.org/10.1109/ACCESS.2023.3299566
Qin S, Wu H, Zhou L, Li J, Du G (2023) Learning metric space with distillation for largescale multilabel text classification. Neural Comput Appl 35(15):11445–11458
Wang Q, Zhu J, Shu H, Asamoah KO, Shi J, Zhou C (2023) Gudn: a novel guide network with label reinforcement strategy for extreme multilabel text classification. J King Saud Univ Comput Inf Sci 35(4):161–171
Xu P, Xiao L, Liu B, Lu S, Jing L, Yu J (2023) Labelspecific feature augmentation for longtailed multilabel text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol. 37, pp 10602–10610
Xiao L, Xu P, Song M, Liu H, Jing L, Zhang X (2023) Triple alliance prototype orthotist network for longtailed multilabel text classification. IEEE/ACM Trans Audio Speech Lang Process 31:2616–2628. https://doi.org/10.1109/TASLP.2023.3265860
Zhang W, Yan J, Wang X, Zha H (2018) Deep extreme multilabel learning. In: Proceedings of the 2018 ACM on international conference on multimedia retrieval, pp 100–107. https://doi.org/10.1145/3206025.3206030
Li I, Feng A, Wu H, Li T, Suzumura T, Dong R (2022) LiGCN: labelinterpretable graph convolutional networks for multilabel text classification. In: Proceedings of the 2nd workshop on deep learning on graphs for natural language processing (DLG4NLP 2022), pp 60–70. Association for Computational Linguistics. https://aclanthology.org/2022.dlg4nlp1.7
Vu H, Nguyen M, Nguyen V, Pham M, Nguyen V, Nguyen V (2023) Labelrepresentative graph convolutional network for multilabel text classification. Appl Intell 53(12):14759–14774. https://doi.org/10.1007/s1048902204106x
Ma Q, Yuan C, Zhou W, Hu S (2021) Labelspecific dual graph neural network for multilabel text classification. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: Long Papers), pp 3855–3864. Association for computational linguistics
Fan C, Chen W, Tian J, Li Y, He H, Jin Y (2023) Accurate use of label dependency in multilabel text classification through the lens of causality. Appl Intell 1–17
Zeng D, Zha E, Kuang J, Shen Y (2024) Multilabel text classification based on semanticsensitive graph convolutional network. KnowlBased Syst 284:111303
Zhao F, Ai Q, Li X, Wang W, Gao Q, Liu Y (2024) Tlcxml: transformer with label correlation for extreme multilabel text classification. Neural Process Lett 56(1):25
Huang Y, Giledereli B, Köksal A, Özgür A, Ozkirimli E (2021) Balancing methods for multilabel text classification with longtailed class distribution. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 8153–8161. Association for computational linguistics
Guo H, Li X, Zhang L, Liu J, Chen W (2021) Labelaware text representation for multilabel text classification. In: ICASSP 20212021 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 7728–7732. https://doi.org/10.1109/ICASSP39728.2021.9413921
Zhuang C, Ma Q (2018) Dual graph convolutional networks for graphbased semisupervised classification. In: Proceedings international world wide web conferences steering committee, pp 499–508. https://doi.org/10.1145/3178876.3186116
Loza Mencía E, Fürnkranz J (2008) Efficient pairwise multilabel classification for largescale problems in the legal domain. In: Joint European conference on machine learning and knowledge discovery in databases, pp 50–65
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692
Maaten L, Hinton G (2008) Visualizing data using tsne. J Mach Learn Res 9(11)
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (No. 61862058), Natural Science Foundation of Gansu Province (No. 20JR5RA518, 21JR7RA114). Industrial Support Project of Gansu Colleges (No. 2022CYZC11).
Author information
Authors and Affiliations
Contributions
X.L and B.Y: Conceptualization, Methodology, Formal analysis, Software, Investigation, Validation, Resources, Writing—original draft, review and editing, Visualization. Q.P and S.F: Resources, Writing—review and editing, Supervision.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Consent to Participate
The authors declare that they agree to participate.
Consent for Publication
The authors declare that they agree to publish.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author selfarchiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., You, B., Peng, Q. et al. Dualview graph convolutional network for multilabel text classification. Appl Intell 54, 9363–9380 (2024). https://doi.org/10.1007/s1048902405666w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1048902405666w