Background

The aging trend of the world’s population is becoming increasingly serious. According to data from the World Health Organization [1], one sixth of the world’s population will be aged 60 years or above by 2030, and the world’s population aged 60 years and older will double to 2.1 billion by 2050. According to the International Classification of Functioning, Disability and Health (ICF), ‘disability is a decrement in functioning at the body, individual or social level that arises when an individual with a health condition encounter barriers in the environment’ [2]. Disability is typically assessed by whether a person can independently perform activities of daily living (ADLs). ADLs are divided into Basic Activities of Daily Living (BADLs) and Instrumental Activities of Daily Living (IADLs). BADLs refer to essential daily repetitive activities necessary for human survival and living, including bathing, dressing, toileting, transferring, continence, and eating [3]. IADLs refer to activities necessary for an individual to maintain independent living, including cooking, housekeeping, shopping, medication management, using transportation, and managing finances [4]. Due to the acceleration of aging and changes in disease spectrum, disability has become a more and more severe public health problem. In the United States, 41.7% of elderly people aged 65 years and above reported one or more disabilities [5]. The amount of disabled old people in China has exceeded 40 million, accounting for 18.3% of the total elderly population [6], and is expected to exceed 77.65 million by 2030 [7]. Disability in the elderly can reduce their independence and quality of life, increase the rate of mortality, and bring serious care and economic burden to the family and society [8,9,10,11]. Therefore, early identification of high-risk groups for elderly disability and taking preventive measures is of great significance for improving their quality of life and reducing their mortality rate.

Risk prediction models can use multiple predictors to calculate the likelihood of the outcome event occurring for an individual, thereby identifying people at high risk of a disease [12]. In recent years, some researchers have begun to explore and establish risk prediction models for disability in the elderly [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. However, the prediction performance of different models varies greatly, and the quality of prediction models is still unclear. To our knowledge, there has not yet been a systematic review of risk prediction models for disability in older adults. This study aims to provide a systematic review of risk prediction models for disability in older adults, and to systematically assess the risk of bias, applicability, and clinical value of each model, in order to provide strong references for subsequent related researches and prevention of disability in older adults.

Methods

The review protocol was registered in PROSPERO (registration ID: CRD42023446657).

Search strategy

The databases searched included PubMed, Embase, Web of Science, Cochrane Library, Cumulative Index to Nursing and Allied Health Literature (CINAHL), China National Knowledge Infrastructure (CNKI), China Science and Technology Journal Database (VIP), and Wanfang Database, published up until June 30, 2023. The medical subject heading (MeSH) function of PubMed was used to obtain the synonyms of the search terms, and the combination of subject headings and free words was used for logical retrieval. The search terms included ‘elder*/old/older/aging/aged’, ‘disabilit*/disabled/immobility/functional decline/confined activity/activity limitation/limitation of motion/activit* of daily living/ADL’, and ‘predict*/model/score/prognos*/nomogram/risk prediction/risk stratification/risk score/risk assessment’. Specific search strategies for each database were described in Supplementary Material 1.

The PICOTS system [29] were used to clarify the purpose, search strategy, and inclusion and exclusion criteria of this systematic review. The PICOTS for this systematic review are as follows:

  • P (Population): Older adults aged 60 years and older.

  • I (Intervention model): Prediction models for disability in older adults.

  • C (Comparator): No competing model.

  • O (Outcome): As introduced in the introduction, disability refers to a decline in function at the body, individual, or social level [2]. The occurrence and extent of disability are often assessed using ADL indices. Commonly used scales include Katz’s scale [3] for BADL and Lawton and Brody’s scale [4] for IADL.

  • T (Timing): Outcomes were predicted based on current (diagnostic model) or baseline (prognostic model) characteristics of the subjects.

  • S (Setting): The expected use of the prediction model is to predict the risk of elderly disability individually and accurately, and to provide a reference for preventive measures to reduce disability in the elderly.

Inclusion and exclusion criteria

The inclusion criteria were: (1) participants were older adults aged 60 years and older, who were without disability at baseline (only in cohort studies) (2). observational study designs, including cross-sectional, case-control and cohort studies (3). a risk prediction model was reported (4). the outcome was disability.

The exclusion criteria were: (1) no risk prediction model was developed (2). the primary outcome measure was not disability (3). in cohort studies, participants were disabled at baseline (4). not written in Chinese or English (5). conference abstracts or literature for which full text was not available.

Study selection and screening

After removing the duplicate literature, according to the inclusion and exclusion criteria, we first screened the articles by title and abstract, and then read the full text for further screening. In addition, we also reviewed the references of included studies and, if necessary, conducted manual searches to identify other potential literature that met the criteria. Literature screening was performed independently by two researchers (JZ and DY), and any disagreements were resolved by discussion or consultation with a third researcher (QZ).

Data extraction

According to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [30], two reviewers (JZ and YX) independently performed data extraction, and any disagreements were resolved by discussion or consultation with a third reviewer (SD). If necessary, the authors of included studies were contacted to obtain more data.

The extracted information was divided into two parts: (1) Basic information: author/year, country, research design, prediction model type, participants, data source, outcome, outcome definition and cases/sample size (%). The prediction model types included diagnostic prediction models and prognostic prediction models. The key distinction between diagnostic prediction models and prognostic prediction models lies in the temporal concept of the study. Diagnostic model studies are typically cross-sectional, predicting the likelihood of an individual having a particular disease at a given point in time [31]. In contrast, prognostic model studies are generally longitudinal, such as retrospective and prospective cohort studies, predicting the probability of an individual experiencing a certain outcome in the future [31] (2). Detailed information: methods for handling missing data, continuous variable handling details, variable selection methods, model development methods, number and type of candidate predictors, final predictors, reported discrimination measures, reported calibration measures, validation method and model presentation.

Risk of bias, applicability, and clinical value assessment

Two reviewers (JZ and YX) independently assessed the risk of bias and applicability of included studies using the Prediction Model Risk of Bias Assessment Tool (PROBAST) [31, 32], and a structured five-step approach was used to determine which models were considered clinically valuable for practitioners [33]. Any disagreements were resolved by discussion or consultation with a third reviewer (SD).

PROBAST is a tool for assessing the risk of bias in studies of individual prognostic or diagnostic multivariate prediction models. PROBAST includes 4 domains of participants (2 items), predictors (3 items), outcome (6 items), and analysis (9 items), with a total of 20 items. Each item can be rated as yes (Y), probably yes (PY), no (N), probably no (PN), or no information (NI). Risk of bias was assessed in all four domains. The domain could be considered low risk of bias only if each item was rated as Y or PY. As long as one item was rated as N or PN, it was rated as high risk of bias. If at least one item was rated as NI and the others as Y or PY, the domain was rated as unclear risk of bias. The overall risk of bias was judged to be low if all four domains were low risk of bias, and the overall risk of bias was judged to be high if one or more of the four domains were high risk of bias. If at least one domain was judged to be unclear risk of bias and all other domains were low risk of bias, the overall risk of bias was judged to be unclear. The applicability assessment involved the first three domains, and the assessment rules were similar to the risk of bias assessment.

PROBAST can assess the risk of bias and applicability of studies, but it has limited value in selecting effective tools for clinical practice [33]. Therefore, we conducted a clinical value assessment of the included studies. According to the relevant literature [33], a model must meet the following five criteria to demonstrate clinical value:

  1. 1.

    Low applicability concerns.

  2. 2.

    Complete reporting of model performance measures.

  3. 3.

    Acceptable calibration (Hosmer-Lemeshow (HL) test: P > 0.05, calibration slope = 1, calibration intercept = 0, and/or calibration plot considered acceptable) [34, 35].

  4. 4.

    Acceptable discrimination (C statistic or area under the curve (AUC) between 0.61 and 0.75, preferably greater than 0.75) [36].

  5. 5.

    Minor impact of bias risk on the model [37].

Data synthesis

Due to the large heterogeneity of included studies, meta-analysis was not appropriate. Therefore, this study adopted a narrative synthesis to analyze, summarize and compare the included studies. We synthesized the key features of included studies and prediction models in a tabular form, including two parts: (1) Basic information: author/year, country, research design, prediction model type, participants, data source, outcome, outcome definition and cases/sample size (%). (2) Detailed information: methods for handling missing data, continuous variable handling details, variable selection methods, model development methods, number and type of candidate predictors, final predictors, reported discrimination measures, reported calibration measures, validation method and model presentation. The risk of bias and applicability assessment results were presented in tabular form. In addition, the above information was further elaborated and analyzed in the text.

Results

Search and study selection results

According to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 [38], Fig. 1 shows the flow chart of literature screening. A total of 5722 articles were initially retrieved from databases, and 4465 articles were screened after 1257 duplicates were removed. After reading the title and abstract for the initial screening, 71 articles remained. After reading the full text, 55 studies were excluded for the following reasons: 26 studies did not develop prediction models, 14 studies did not involve older adults aged 60 years and older, four studies did not have an outcome measure of disability, 10 cohort studies had participants with disability at baseline, and one study was not an observational study. Finally, a total of 16 studies were included in this systematic review.

Fig. 1
figure 1

Flowchart of study selection according to PRISMA

Study characteristics

Overall characteristics

The overall characteristics of included studies are shown in Table 1. 16 studies were published between 2012 and 2023, of which nine studies [13, 14, 20, 22, 24,25,26,27,28] were conducted in China, three studies [15, 18, 19] were conducted in the United States, three studies [17, 21, 23] were conducted in Japan, and one study [16] was conducted in Germany, the United Kingdom, Italy, and the Netherlands. Nine studies [15, 16, 18, 19, 21, 23,24,25,26] were retrospective cohort designs, two studies [17, 28] were prospective cohort designs, and five studies [13, 14, 20, 22, 27] were cross-sectional studies. In terms of data sources, 13 studies [14,15,16,17,18,19, 21,22,23,24,25,26, 28] used national or international large-scale data, two studies [13, 27] used multi-center data from multiple communities, and one study [20] used single-center data from a hospital. In terms of model types, 11 models [15,16,17,18,19, 21, 23,24,25,26, 28] were prognostic prediction models and six models [13, 14, 20, 22, 27] were diagnostic prediction models. In terms of subjects, seven studies [13, 14, 17, 22, 25,26,27] included older adults aged 60 years and older, five studies [15, 18, 21, 23, 28] included older adults aged 65 years and older, one study [19] included older adults aged 70 years and older, one study [16] included older adults aged 65 to 75 years, one study [20] included inpatients aged 60 years and older, and one study [24] included hypertensive patients aged 65 years and older. Among the 16 studies included, seven studies [14, 15, 18, 19, 24, 25, 28] evaluated disability using BADL, one [26] evaluated disability using IADL, and three [13, 16, 22] evaluated disability using a combination of BADL and IADL, the specific evaluation methods are shown in Table 1. Three studies [17, 21, 23] conducted in Japan used Japanese long term care insurance (LTCI) certification to determine whether participants were disabled. One study [27] evaluated disability using a self-reported health survey scale developed by WHO [39]. Another study [20] evaluated disability using the ability assessment for older adults standards issued by the Ministry of Civil Affairs of the People’s Republic of China [40]. The sample sizes of studies ranged from 420 to 90,889, and incidence rates ranged from 4.14% to 65.95%.

Table 1 The overall characteristics of included studies

Details of the prediction models

Details of the prediction models are presented in Table 2. If a study developed more than one prediction model using different approaches, the model with the best prediction performance was included. However, different prediction models developed for different subgroups were included. Shao and Wu [22] developed prediction models separately for urban and rural older adults. Therefore, we finally included 17 prediction models.

Table 2 Details of included prediction models
  1. a)

    Model development method

Ten models [13, 15, 16, 21, 22, 24,25,26,27] were developed using logistic regression analysis, three models [18, 23, 28] were developed using Cox proportional hazards regression, one model [19] was developed using Fine and Gray competing risk regression, and three models were developed using machine learning methods (decision tree [17], Bayesian network model [14], and random forest model [20]).

  1. b)

    Predictors

Age [16], chronic disease [16], gender [9], self-rated health [5], body mass index (BMI) [5], drinking [5], smoking [4] and education level [4] were the predictors most frequently present in the prediction models.

  1. c)

    Model validation

Five models [13, 14, 22, 23] were not validated. Eleven models were only internally validated, of which four models [15, 17, 18, 20] were validated using random splitting method, four models [19, 25, 27, 28] were validated by bootstrap method, one model [16] was internal-external cross-validated, and two models [21, 24] were three-fold cross- validated. One model [26] underwent internal and external validation.

  1. d)

    Model performance

The C statistic/AUC of 17 included models ranged from 0.650 to 0.853, and nine models [17, 20,21,22,23,24, 27, 28] had C statistics/AUC above 0.75, indicating the discrimination was acceptable [36]. Calibration was not evaluated in seven models [14, 15, 17, 20, 21, 23, 27], calibration plots were reported in six models [13, 18, 19, 25, 26, 28], HL test results were reported in four models [13, 18, 22], intercept and slope were reported in one model [16], and brier score was reported in one model [24]. The calibration of models that reported relevant measures was generally good.

  1. e)

    Model presentation

Four models [13, 22, 25] were presented as logistic regression results, four models [15, 16, 18, 21] were presented as risk scores, and three models [26,27,28] were presented as nomograms. Other presentation forms included web evaluation tool [19], decision tree model [17], risk assessment scale [23], summary plot of optimal model by shapley additive explanations (SHAP) [24], Bayesian network model [14] and random forest model [20].

Risk of bias, applicability, and clinical value of included studies

Table 3 shows the risk of bias and applicability assessment results of included studies. All 16 studies were rated as high risk of bias, 10 studies [15,16,17,18,19,20,21, 23, 24, 28] were rated as high concerns for applicability, and six studies [13, 14, 22, 25,26,27] were rated as low concerns for applicability.

Table 3 PROBAST results of the included studies

In the domain of participants, one study [17] was at a high risk of bias due to the different age groups of elderly people included in the three regions, which resulted in participants being insufficient to represent a certain age group of elderly people. 10 studies were rated as high concern for applicability due to the following reasons: five studies [15, 18, 21, 23, 28] included people aged 65 years and older, one study [16] included people aged 65 to 75 years, one study [19] included people aged 70 years and older, one study [17] included not all people aged 60 years and older, one study [24] included hypertensive patients aged 65 years and older, and one study [20] included hospitalized patients, all of which did not fully reflect the participants in this systematic review (older adults aged 60 years and older).

In the domain of predictors, one study [16] had a high risk of bias because it drew data from four different cohorts and may use different methods to define and measure predictors. The risk of bias in five studies [13, 14, 22] was unclear because they did not state that predictors were assessed without knowing the outcome data. One study [18] was rated as high concern for applicability because of concerning about the timing of predictors measurement.

In the domain of outcome, the risk of bias mainly came from the following three aspects: predictors were not excluded from the definition of outcomes in studies [14, 17, 19,20,21, 23, 28], studies [13,14,15,16, 18192224] did not indicate that outcomes were judged without knowing the information of predictors, and the time interval between the measurement of predictors and outcomes was unclear in the study [18]. All studies were at a low concern for applicability in this domain.

In the domain of analysis, all 16 studies were at a high risk of bias for the following reasons. The sample size of nine studies [13, 16,17,18, 20,21,22, 24, 26] was insufficient. In nine studies [13,14,15, 19, 22, 23, 25,26,27], continuous variables were converted into categorical variables without reasonable basis. Five studies [13, 14, 22, 23, 27] did not mention or appropriately addressed missing values. Four studies [13, 14, 22, 27] screened predictors based on univariate analysis. Some studies failed to properly consider and deal with complex issues in the data, such as censored data. The participants with missing data in seven studies [15, 17, 19, 20, 25, 26, 28] were directly excluded. Seven studies [14, 15, 17, 20, 21, 23, 27] did not evaluate calibration, while three studies [16, 22, 24] did not use calibration plots to evaluate calibration. Seven studies [13,14,15, 17, 18, 20, 22] did not properly consider overfitting and optimistic bias in model performance. The predictors and their weight allocation in the final model of one study [28] did not match the results of the multivariate analysis, while the other study [20] could not be judged because only the top five and not all predictors are presented.

Table 4 presents the clinical value assessment results of the included studies. Among the 17 models, seven [13, 14, 22, 25,26,27] had low applicability concerns, and 10 [13, 16, 18, 19, 22, 24,25,26, 28] provided complete reporting of model performance measures. Calibration was acceptable for 10 models [2, 13, 16, 18, 19, 21, 22, 24,25,26, 28, 41,42,43,44,45], while seven models [14, 15, 17, 20, 21, 23, 27] did not report calibration measures, making it impossible to assess. All 17 models had C statistic/AUC greater than 0.61. For seven models [13,14,15, 18, 19, 26, 27], risk of bias had minor impact. Based on the five-step evaluation method, the two models with the highest clinical value were:

Table 4 Clinical value assessment results of the included studies

Ai et al. (2021) [13], predicting ADL dependence in individuals aged 60 and above, with acceptable discrimination (AUC = 0.742) and calibration (HL test: χ 2 statistic = 6.746 (P = 0.564), Calibration plot closes to the 45-degree line).

Zhang et al. (2021) [26], predicting IADL dependence in individuals aged 60 and above, with acceptable discrimination (C index = 0.737) and calibration (Calibration plot closes to the 45-degree line).

Discussion

Early identification of high-risk groups of disability plays a very important role in preventing the occurrence and development of disability in the elderly. At present, the topics of systematic reviews in this field mainly include the predictors [43, 46, 47] and prognostic factors [48], the relationship between frailty and disability and its predictive effect [49,50,51], and the predictive performance of morbidity [52] and physical performance measures [53]. As far as we know, this is the first systematic review of risk prediction models for disability in older adults. The results of this study may provide a relatively comprehensive and valuable reference for the prediction and prevention of disability in older adults. A total of 16 studies and 17 prediction models were included in this study, including 12 prognostic prediction models and six diagnostic prediction models. Five models were not validated, 11 models were only validated internally, and only one model was validated internally and externally. The C statistic/AUC of 17 included models ranged from 0.650 to 0.853, and nine models had C statistics/AUC above 0.75, indicating the discrimination was acceptable [36]. However, the evaluation of calibration was insufficient, seven models did not evaluate the calibration, and 11 models did not use calibration plots to evaluate the calibration. In terms of research quality, all 16 studies were rated as high risk of bias, and 10 studies were rated as high concerns for applicability. In addition, only two studies [16, 24] reported following the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [41]. After evaluating using a five-step method, only two models met the clinical value standards. This indicates that the quality and clinical value of current studies are generally poor, and there is great room for improvement in research methodology and transparency of reporting.

Through PROBAST, we can draw lessons and experience from current studies. In the domain of participants, for multicenter studies, the same criteria should be used to include participants in order to improve the representation of research subjects. In the domain of predictors, the definition, measurement method and measurement time should be clarified, and it should be noted that the data of predictors should be collected without knowing the outcome data. In the domain of outcome, the definition, measurement method, and measurement time need to be clearly defined, and attention should be paid to exclude predictors from the definition of outcome, and outcome measurement should be conducted without knowing the information of predictors. In the domain of analysis, the sample size should be calculated based on the number of events per variable (EPV). Generally, EPV needs to be at least 10–20, and prediction models established using machine learning techniques typically require a relatively high EPV (often greater than 200) [32]. When validating the model, at least 100 research subjects with outcome events should be included [32]. Several studies [17, 20,21,22, 24] have significantly insufficient sample sizes. Furthermore, a prominent issue is that many studies convert continuous variables into categorical variables, which can lead to information loss and may significantly lower predictive power [32]. The handling of missing values should also be taken seriously. Some studies inappropriately addressed or did not mention how to deal with missing values. Some studies used univariate analysis to screen variables, which may result in inappropriate selection because they are screened based on the statistical significance of a single independent factor without taking into account other predictors [32]. In terms of considering and addressing complex issues that arose in the data, some studies directly excluded research subjects with missing values without properly handling the censored data, which can generate a selective dataset and lead to bias in predicting risk [32]. Some studies failed to properly address overfitting and optimistic biases of model performance due to insufficient sample size, internal validation using randomly split data, or not validating models. Another issue to note is that the prevalence varied greatly among different studies (4.14–65.95%), possibly due to the small sample sizes of some studies, inconsistent criteria for outcome indicators, and differences in participants (age, region, disease, etc.).

There are also many aspects worth learning from the included studies. In diagnostic prediction model researches, prediction is aimed at the existing clinical outcome, so the most appropriate study design is a cross-sectional study [32]. The study designs of the six diagnostic prediction models included were all cross-sectional studies. Correspondingly, the study designs of all included prognostic prediction models were cohort studies. Many studies used large-scale national or international data, which is beneficial for the representativeness of research subjects. In terms of model development methods, machine learning methods have become popular in recent years, but it is still unclear which machine learning method has better predictive performance and whether they improve predictive performance compared to traditional logistic regression. In the included studies, two studies [21, 24] compared the predictive performance of different machine learning methods (such as ridge regression, gradient boosting, deep neural network, random forest, support vector machine, eXtreme gradient boosting, etc.), and one study [22] contrasted the predictive performance of machine learning with that of traditional logistic regression. The results showed that ridge regression and logistic regression had better predictive performance, which gives us some inspiration. For handling missing values, two studies [16, 18] used multiple imputation, which is considered the most appropriate approach [32]. Finally, in addition to logistic regression results, the models included in the study were presented in other various forms, including nomograms, risk scores, web assessment tools, decision trees, etc., to make practical use more convenient.

In terms of the clinical value of the models, besides applicability and risk of bias addressed in PROBAST, current research also needs to improve the complete reporting of model performance measures, particularly calibration. Among the 17 models, seven did not report model calibration, which limits the evaluation of their clinical value and makes it difficult to determine if the models are suitable for clinical use.

We can also learn some insights from the predictors incorporated into the models. From the results of included studies, it can be concluded that older age, female gender, low educational level, poor self-rated health, higher BMI, smoking, and chronic diseases such as heart disease, stroke, and diabetes increased the risk of disability in older adults, while alcohol consumption appeared to be a protective factor. It is easy to understand the promoting effect of aging, poor self-rated health, and various chronic diseases on disability. When the human body is aging or ill and causes its own health level to decline, the body’s function will decline, and the ability to complete various daily activities will also decline, resulting in varying degrees of disability. Women tend to have lower decision-making power, family economic status and socio-economic status than men, which leads to poorer access to health services, particularly in rural areas, and is a possible reason why women are more prone to disability [54]. The promoting effect of low educational level on disability can be explained from two aspects. On the one hand, elderly people with low educational level often have weaker health concepts and lower disease awareness. On the other hand, their adaptability to social development is relatively weaker and it is more difficult for them to complete complex daily activities such as taking transportation and taking care of their own economy. Higher BMI can directly affect human body dynamics and posture control, thereby limiting bodily function [45]. In addition, obesity induced by a high BMI can affect disability indirectly through its well-known association with chronic diseases [42, 45]. The study [55] showed that not smoking can delay the deterioration of health status in the elderly, which may explain the promoting effect of smoking on disability in older adults. The research found that drinking a low to moderate dose of alcohol was protective to functional situation compared with non-drinkers for the elderly, possibly due to moderate drinking can reduce the risk of cardiovascular diseases [44]. Nevertheless, the relationship between alcohol consumption and health needs to be further studied in terms of frequency and quantity of alcohol consumption. Based on the above, people should pay attention to health early in life, improve disease awareness, adopt a healthy lifestyle, and maintain a normal weight to prevent and delay the occurrence of disability.

Limitations

This systematic review has the following limitations. Firstly, the vast majority of included studies were conducted in China, the United States, and Japan, so the findings may not be applicable to other countries or regions of the world. Notably, half of the included studies were conducted in China, which may be related to the use of three Chinese literature search databases, representing one of the main sources of bias in this study. Secondly, due to the great heterogeneity of included studies in terms of participants, predictors, outcome indicators, etc., meta-analysis could not be performed, and it is regrettable that quantitative synthesis results could not be presented. Finally, the inclusion of only studies in Chinese and English may have left out some studies in other languages that met the criteria.

Conclusion

This systematic review included a total of 16 studies and 17 prediction models, the C statistical/AUC ranged from 0.650 to 0.853, and most of the included prediction models had acceptable discrimination. However, according to the PROBAST, all included studies were at high risk of bias, and 10 studies were at high concerns for applicability. The overall quality of current studies is poor. After evaluation, only two models reached the standard of clinical value. Therefore, in the future, researchers should conduct related studies with larger sample sizes, more reasonable study designs (such as using the same criteria for including participants in multi-center studies, clearly defining and measuring predictor factors and outcome indicators, and ensuring measurements are made without knowing each other), and more scientific analysis methods (such as avoiding arbitrary conversion of continuous variables to categorical variables, using multiple imputation for handling missing data, and not relying on univariate analysis for variable selection), while following the TRIPOD statement and being in accordance with the PROBAST checklist, in order to improve the predictive performance and application value of prediction models.