Abstract
Background
The amount of prediction models for disability in older adults is increasing but the prediction performance of different models varies greatly, and the quality of prediction models is still unclear.
Objectives
To systematically review and critically appraise the studies on risk prediction models for disability in older adults.
Methods
A systematic literature search was conducted on PubMed, Embase, Web of Science, Cochrane Library, Cumulative Index to Nursing and Allied Health Literature (CINAHL), China National Knowledge Infrastructure (CNKI), China Science and Technology Journal Database (VIP), and Wanfang Database, published up until June 30, 2023. Data were extracted according to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). The Prediction Model Risk of Bias Assessment Tool (PROBAST) was used to assess the risk of bias and applicability of the included studies. In addition, all included studies were evaluated for clinical value.
Results
A total of 5722 articles were initially retrieved from databases, 16 studies and 17 prediction models were finally included after screening. The sample sizes of studies ranged from 420 to 90,889. Model development methods mainly included logistic regression analysis, Cox proportional hazards regression, and machine learning methods. The C statistic or area under the curve (AUC) of models ranged from 0.650 to 0.853, and nine models had C statistic/AUC higher than 0.75. Age, chronic disease, gender, self-rated health, body mass index (BMI), drinking, smoking and education level were the most common predictors. According to the PROBAST, all included studies were at high risk of bias, and 10 studies were at high concerns for applicability. Only two studies reported following the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. After evaluation, only two models reached the standard of clinical value.
Conclusion
Although most of the included prediction models had acceptable discrimination, the overall quality and clinical value of the current studies were poor. In the future, researchers should follow the TRIPOD statement and PROBAST checklist to develop prediction models with larger sample sizes, more reasonable study designs, and more scientific analysis methods, to improve the predictive performance and application value.
Trial registration
The review protocol was registered in PROSPERO (registration ID: CRD42023446657).
Similar content being viewed by others
Background
The aging trend of the world’s population is becoming increasingly serious. According to data from the World Health Organization [1], one sixth of the world’s population will be aged 60 years or above by 2030, and the world’s population aged 60 years and older will double to 2.1 billion by 2050. According to the International Classification of Functioning, Disability and Health (ICF), ‘disability is a decrement in functioning at the body, individual or social level that arises when an individual with a health condition encounter barriers in the environment’ [2]. Disability is typically assessed by whether a person can independently perform activities of daily living (ADLs). ADLs are divided into Basic Activities of Daily Living (BADLs) and Instrumental Activities of Daily Living (IADLs). BADLs refer to essential daily repetitive activities necessary for human survival and living, including bathing, dressing, toileting, transferring, continence, and eating [3]. IADLs refer to activities necessary for an individual to maintain independent living, including cooking, housekeeping, shopping, medication management, using transportation, and managing finances [4]. Due to the acceleration of aging and changes in disease spectrum, disability has become a more and more severe public health problem. In the United States, 41.7% of elderly people aged 65 years and above reported one or more disabilities [5]. The amount of disabled old people in China has exceeded 40 million, accounting for 18.3% of the total elderly population [6], and is expected to exceed 77.65 million by 2030 [7]. Disability in the elderly can reduce their independence and quality of life, increase the rate of mortality, and bring serious care and economic burden to the family and society [8,9,10,11]. Therefore, early identification of high-risk groups for elderly disability and taking preventive measures is of great significance for improving their quality of life and reducing their mortality rate.
Risk prediction models can use multiple predictors to calculate the likelihood of the outcome event occurring for an individual, thereby identifying people at high risk of a disease [12]. In recent years, some researchers have begun to explore and establish risk prediction models for disability in the elderly [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. However, the prediction performance of different models varies greatly, and the quality of prediction models is still unclear. To our knowledge, there has not yet been a systematic review of risk prediction models for disability in older adults. This study aims to provide a systematic review of risk prediction models for disability in older adults, and to systematically assess the risk of bias, applicability, and clinical value of each model, in order to provide strong references for subsequent related researches and prevention of disability in older adults.
Methods
The review protocol was registered in PROSPERO (registration ID: CRD42023446657).
Search strategy
The databases searched included PubMed, Embase, Web of Science, Cochrane Library, Cumulative Index to Nursing and Allied Health Literature (CINAHL), China National Knowledge Infrastructure (CNKI), China Science and Technology Journal Database (VIP), and Wanfang Database, published up until June 30, 2023. The medical subject heading (MeSH) function of PubMed was used to obtain the synonyms of the search terms, and the combination of subject headings and free words was used for logical retrieval. The search terms included ‘elder*/old/older/aging/aged’, ‘disabilit*/disabled/immobility/functional decline/confined activity/activity limitation/limitation of motion/activit* of daily living/ADL’, and ‘predict*/model/score/prognos*/nomogram/risk prediction/risk stratification/risk score/risk assessment’. Specific search strategies for each database were described in Supplementary Material 1.
The PICOTS system [29] were used to clarify the purpose, search strategy, and inclusion and exclusion criteria of this systematic review. The PICOTS for this systematic review are as follows:
-
P (Population): Older adults aged 60 years and older.
-
I (Intervention model): Prediction models for disability in older adults.
-
C (Comparator): No competing model.
-
O (Outcome): As introduced in the introduction, disability refers to a decline in function at the body, individual, or social level [2]. The occurrence and extent of disability are often assessed using ADL indices. Commonly used scales include Katz’s scale [3] for BADL and Lawton and Brody’s scale [4] for IADL.
-
T (Timing): Outcomes were predicted based on current (diagnostic model) or baseline (prognostic model) characteristics of the subjects.
-
S (Setting): The expected use of the prediction model is to predict the risk of elderly disability individually and accurately, and to provide a reference for preventive measures to reduce disability in the elderly.
Inclusion and exclusion criteria
The inclusion criteria were: (1) participants were older adults aged 60 years and older, who were without disability at baseline (only in cohort studies) (2). observational study designs, including cross-sectional, case-control and cohort studies (3). a risk prediction model was reported (4). the outcome was disability.
The exclusion criteria were: (1) no risk prediction model was developed (2). the primary outcome measure was not disability (3). in cohort studies, participants were disabled at baseline (4). not written in Chinese or English (5). conference abstracts or literature for which full text was not available.
Study selection and screening
After removing the duplicate literature, according to the inclusion and exclusion criteria, we first screened the articles by title and abstract, and then read the full text for further screening. In addition, we also reviewed the references of included studies and, if necessary, conducted manual searches to identify other potential literature that met the criteria. Literature screening was performed independently by two researchers (JZ and DY), and any disagreements were resolved by discussion or consultation with a third researcher (QZ).
Data extraction
According to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [30], two reviewers (JZ and YX) independently performed data extraction, and any disagreements were resolved by discussion or consultation with a third reviewer (SD). If necessary, the authors of included studies were contacted to obtain more data.
The extracted information was divided into two parts: (1) Basic information: author/year, country, research design, prediction model type, participants, data source, outcome, outcome definition and cases/sample size (%). The prediction model types included diagnostic prediction models and prognostic prediction models. The key distinction between diagnostic prediction models and prognostic prediction models lies in the temporal concept of the study. Diagnostic model studies are typically cross-sectional, predicting the likelihood of an individual having a particular disease at a given point in time [31]. In contrast, prognostic model studies are generally longitudinal, such as retrospective and prospective cohort studies, predicting the probability of an individual experiencing a certain outcome in the future [31] (2). Detailed information: methods for handling missing data, continuous variable handling details, variable selection methods, model development methods, number and type of candidate predictors, final predictors, reported discrimination measures, reported calibration measures, validation method and model presentation.
Risk of bias, applicability, and clinical value assessment
Two reviewers (JZ and YX) independently assessed the risk of bias and applicability of included studies using the Prediction Model Risk of Bias Assessment Tool (PROBAST) [31, 32], and a structured five-step approach was used to determine which models were considered clinically valuable for practitioners [33]. Any disagreements were resolved by discussion or consultation with a third reviewer (SD).
PROBAST is a tool for assessing the risk of bias in studies of individual prognostic or diagnostic multivariate prediction models. PROBAST includes 4 domains of participants (2 items), predictors (3 items), outcome (6 items), and analysis (9 items), with a total of 20 items. Each item can be rated as yes (Y), probably yes (PY), no (N), probably no (PN), or no information (NI). Risk of bias was assessed in all four domains. The domain could be considered low risk of bias only if each item was rated as Y or PY. As long as one item was rated as N or PN, it was rated as high risk of bias. If at least one item was rated as NI and the others as Y or PY, the domain was rated as unclear risk of bias. The overall risk of bias was judged to be low if all four domains were low risk of bias, and the overall risk of bias was judged to be high if one or more of the four domains were high risk of bias. If at least one domain was judged to be unclear risk of bias and all other domains were low risk of bias, the overall risk of bias was judged to be unclear. The applicability assessment involved the first three domains, and the assessment rules were similar to the risk of bias assessment.
PROBAST can assess the risk of bias and applicability of studies, but it has limited value in selecting effective tools for clinical practice [33]. Therefore, we conducted a clinical value assessment of the included studies. According to the relevant literature [33], a model must meet the following five criteria to demonstrate clinical value:
-
1.
Low applicability concerns.
-
2.
Complete reporting of model performance measures.
-
3.
Acceptable calibration (Hosmer-Lemeshow (HL) test: P > 0.05, calibration slope = 1, calibration intercept = 0, and/or calibration plot considered acceptable) [34, 35].
-
4.
Acceptable discrimination (C statistic or area under the curve (AUC) between 0.61 and 0.75, preferably greater than 0.75) [36].
-
5.
Minor impact of bias risk on the model [37].
Data synthesis
Due to the large heterogeneity of included studies, meta-analysis was not appropriate. Therefore, this study adopted a narrative synthesis to analyze, summarize and compare the included studies. We synthesized the key features of included studies and prediction models in a tabular form, including two parts: (1) Basic information: author/year, country, research design, prediction model type, participants, data source, outcome, outcome definition and cases/sample size (%). (2) Detailed information: methods for handling missing data, continuous variable handling details, variable selection methods, model development methods, number and type of candidate predictors, final predictors, reported discrimination measures, reported calibration measures, validation method and model presentation. The risk of bias and applicability assessment results were presented in tabular form. In addition, the above information was further elaborated and analyzed in the text.
Results
Search and study selection results
According to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 [38], Fig. 1 shows the flow chart of literature screening. A total of 5722 articles were initially retrieved from databases, and 4465 articles were screened after 1257 duplicates were removed. After reading the title and abstract for the initial screening, 71 articles remained. After reading the full text, 55 studies were excluded for the following reasons: 26 studies did not develop prediction models, 14 studies did not involve older adults aged 60 years and older, four studies did not have an outcome measure of disability, 10 cohort studies had participants with disability at baseline, and one study was not an observational study. Finally, a total of 16 studies were included in this systematic review.
Study characteristics
Overall characteristics
The overall characteristics of included studies are shown in Table 1. 16 studies were published between 2012 and 2023, of which nine studies [13, 14, 20, 22, 24,25,26,27,28] were conducted in China, three studies [15, 18, 19] were conducted in the United States, three studies [17, 21, 23] were conducted in Japan, and one study [16] was conducted in Germany, the United Kingdom, Italy, and the Netherlands. Nine studies [15, 16, 18, 19, 21, 23,24,25,26] were retrospective cohort designs, two studies [17, 28] were prospective cohort designs, and five studies [13, 14, 20, 22, 27] were cross-sectional studies. In terms of data sources, 13 studies [14,15,16,17,18,19, 21,22,23,24,25,26, 28] used national or international large-scale data, two studies [13, 27] used multi-center data from multiple communities, and one study [20] used single-center data from a hospital. In terms of model types, 11 models [15,16,17,18,19, 21, 23,24,25,26, 28] were prognostic prediction models and six models [13, 14, 20, 22, 27] were diagnostic prediction models. In terms of subjects, seven studies [13, 14, 17, 22, 25,26,27] included older adults aged 60 years and older, five studies [15, 18, 21, 23, 28] included older adults aged 65 years and older, one study [19] included older adults aged 70 years and older, one study [16] included older adults aged 65 to 75 years, one study [20] included inpatients aged 60 years and older, and one study [24] included hypertensive patients aged 65 years and older. Among the 16 studies included, seven studies [14, 15, 18, 19, 24, 25, 28] evaluated disability using BADL, one [26] evaluated disability using IADL, and three [13, 16, 22] evaluated disability using a combination of BADL and IADL, the specific evaluation methods are shown in Table 1. Three studies [17, 21, 23] conducted in Japan used Japanese long term care insurance (LTCI) certification to determine whether participants were disabled. One study [27] evaluated disability using a self-reported health survey scale developed by WHO [39]. Another study [20] evaluated disability using the ability assessment for older adults standards issued by the Ministry of Civil Affairs of the People’s Republic of China [40]. The sample sizes of studies ranged from 420 to 90,889, and incidence rates ranged from 4.14% to 65.95%.
Details of the prediction models
Details of the prediction models are presented in Table 2. If a study developed more than one prediction model using different approaches, the model with the best prediction performance was included. However, different prediction models developed for different subgroups were included. Shao and Wu [22] developed prediction models separately for urban and rural older adults. Therefore, we finally included 17 prediction models.
-
a)
Model development method
Ten models [13, 15, 16, 21, 22, 24,25,26,27] were developed using logistic regression analysis, three models [18, 23, 28] were developed using Cox proportional hazards regression, one model [19] was developed using Fine and Gray competing risk regression, and three models were developed using machine learning methods (decision tree [17], Bayesian network model [14], and random forest model [20]).
-
b)
Predictors
Age [16], chronic disease [16], gender [9], self-rated health [5], body mass index (BMI) [5], drinking [5], smoking [4] and education level [4] were the predictors most frequently present in the prediction models.
-
c)
Model validation
Five models [13, 14, 22, 23] were not validated. Eleven models were only internally validated, of which four models [15, 17, 18, 20] were validated using random splitting method, four models [19, 25, 27, 28] were validated by bootstrap method, one model [16] was internal-external cross-validated, and two models [21, 24] were three-fold cross- validated. One model [26] underwent internal and external validation.
-
d)
Model performance
The C statistic/AUC of 17 included models ranged from 0.650 to 0.853, and nine models [17, 20,21,22,23,24, 27, 28] had C statistics/AUC above 0.75, indicating the discrimination was acceptable [36]. Calibration was not evaluated in seven models [14, 15, 17, 20, 21, 23, 27], calibration plots were reported in six models [13, 18, 19, 25, 26, 28], HL test results were reported in four models [13, 18, 22], intercept and slope were reported in one model [16], and brier score was reported in one model [24]. The calibration of models that reported relevant measures was generally good.
-
e)
Model presentation
Four models [13, 22, 25] were presented as logistic regression results, four models [15, 16, 18, 21] were presented as risk scores, and three models [26,27,28] were presented as nomograms. Other presentation forms included web evaluation tool [19], decision tree model [17], risk assessment scale [23], summary plot of optimal model by shapley additive explanations (SHAP) [24], Bayesian network model [14] and random forest model [20].
Risk of bias, applicability, and clinical value of included studies
Table 3 shows the risk of bias and applicability assessment results of included studies. All 16 studies were rated as high risk of bias, 10 studies [15,16,17,18,19,20,21, 23, 24, 28] were rated as high concerns for applicability, and six studies [13, 14, 22, 25,26,27] were rated as low concerns for applicability.
In the domain of participants, one study [17] was at a high risk of bias due to the different age groups of elderly people included in the three regions, which resulted in participants being insufficient to represent a certain age group of elderly people. 10 studies were rated as high concern for applicability due to the following reasons: five studies [15, 18, 21, 23, 28] included people aged 65 years and older, one study [16] included people aged 65 to 75 years, one study [19] included people aged 70 years and older, one study [17] included not all people aged 60 years and older, one study [24] included hypertensive patients aged 65 years and older, and one study [20] included hospitalized patients, all of which did not fully reflect the participants in this systematic review (older adults aged 60 years and older).
In the domain of predictors, one study [16] had a high risk of bias because it drew data from four different cohorts and may use different methods to define and measure predictors. The risk of bias in five studies [13, 14, 22] was unclear because they did not state that predictors were assessed without knowing the outcome data. One study [18] was rated as high concern for applicability because of concerning about the timing of predictors measurement.
In the domain of outcome, the risk of bias mainly came from the following three aspects: predictors were not excluded from the definition of outcomes in studies [14, 17, 19,20,21, 23, 28], studies [13,14,15,16, 18, 19, 22, 24] did not indicate that outcomes were judged without knowing the information of predictors, and the time interval between the measurement of predictors and outcomes was unclear in the study [18]. All studies were at a low concern for applicability in this domain.
In the domain of analysis, all 16 studies were at a high risk of bias for the following reasons. The sample size of nine studies [13, 16,17,18, 20,21,22, 24, 26] was insufficient. In nine studies [13,14,15, 19, 22, 23, 25,26,27], continuous variables were converted into categorical variables without reasonable basis. Five studies [13, 14, 22, 23, 27] did not mention or appropriately addressed missing values. Four studies [13, 14, 22, 27] screened predictors based on univariate analysis. Some studies failed to properly consider and deal with complex issues in the data, such as censored data. The participants with missing data in seven studies [15, 17, 19, 20, 25, 26, 28] were directly excluded. Seven studies [14, 15, 17, 20, 21, 23, 27] did not evaluate calibration, while three studies [16, 22, 24] did not use calibration plots to evaluate calibration. Seven studies [13,14,15, 17, 18, 20, 22] did not properly consider overfitting and optimistic bias in model performance. The predictors and their weight allocation in the final model of one study [28] did not match the results of the multivariate analysis, while the other study [20] could not be judged because only the top five and not all predictors are presented.
Table 4 presents the clinical value assessment results of the included studies. Among the 17 models, seven [13, 14, 22, 25,26,27] had low applicability concerns, and 10 [13, 16, 18, 19, 22, 24,25,26, 28] provided complete reporting of model performance measures. Calibration was acceptable for 10 models [2, 13, 16, 18, 19, 21, 22, 24,25,26, 28, 41,42,43,44,45], while seven models [14, 15, 17, 20, 21, 23, 27] did not report calibration measures, making it impossible to assess. All 17 models had C statistic/AUC greater than 0.61. For seven models [13,14,15, 18, 19, 26, 27], risk of bias had minor impact. Based on the five-step evaluation method, the two models with the highest clinical value were:
Ai et al. (2021) [13], predicting ADL dependence in individuals aged 60 and above, with acceptable discrimination (AUC = 0.742) and calibration (HL test: χ 2 statistic = 6.746 (P = 0.564), Calibration plot closes to the 45-degree line).
Zhang et al. (2021) [26], predicting IADL dependence in individuals aged 60 and above, with acceptable discrimination (C index = 0.737) and calibration (Calibration plot closes to the 45-degree line).
Discussion
Early identification of high-risk groups of disability plays a very important role in preventing the occurrence and development of disability in the elderly. At present, the topics of systematic reviews in this field mainly include the predictors [43, 46, 47] and prognostic factors [48], the relationship between frailty and disability and its predictive effect [49,50,51], and the predictive performance of morbidity [52] and physical performance measures [53]. As far as we know, this is the first systematic review of risk prediction models for disability in older adults. The results of this study may provide a relatively comprehensive and valuable reference for the prediction and prevention of disability in older adults. A total of 16 studies and 17 prediction models were included in this study, including 12 prognostic prediction models and six diagnostic prediction models. Five models were not validated, 11 models were only validated internally, and only one model was validated internally and externally. The C statistic/AUC of 17 included models ranged from 0.650 to 0.853, and nine models had C statistics/AUC above 0.75, indicating the discrimination was acceptable [36]. However, the evaluation of calibration was insufficient, seven models did not evaluate the calibration, and 11 models did not use calibration plots to evaluate the calibration. In terms of research quality, all 16 studies were rated as high risk of bias, and 10 studies were rated as high concerns for applicability. In addition, only two studies [16, 24] reported following the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [41]. After evaluating using a five-step method, only two models met the clinical value standards. This indicates that the quality and clinical value of current studies are generally poor, and there is great room for improvement in research methodology and transparency of reporting.
Through PROBAST, we can draw lessons and experience from current studies. In the domain of participants, for multicenter studies, the same criteria should be used to include participants in order to improve the representation of research subjects. In the domain of predictors, the definition, measurement method and measurement time should be clarified, and it should be noted that the data of predictors should be collected without knowing the outcome data. In the domain of outcome, the definition, measurement method, and measurement time need to be clearly defined, and attention should be paid to exclude predictors from the definition of outcome, and outcome measurement should be conducted without knowing the information of predictors. In the domain of analysis, the sample size should be calculated based on the number of events per variable (EPV). Generally, EPV needs to be at least 10–20, and prediction models established using machine learning techniques typically require a relatively high EPV (often greater than 200) [32]. When validating the model, at least 100 research subjects with outcome events should be included [32]. Several studies [17, 20,21,22, 24] have significantly insufficient sample sizes. Furthermore, a prominent issue is that many studies convert continuous variables into categorical variables, which can lead to information loss and may significantly lower predictive power [32]. The handling of missing values should also be taken seriously. Some studies inappropriately addressed or did not mention how to deal with missing values. Some studies used univariate analysis to screen variables, which may result in inappropriate selection because they are screened based on the statistical significance of a single independent factor without taking into account other predictors [32]. In terms of considering and addressing complex issues that arose in the data, some studies directly excluded research subjects with missing values without properly handling the censored data, which can generate a selective dataset and lead to bias in predicting risk [32]. Some studies failed to properly address overfitting and optimistic biases of model performance due to insufficient sample size, internal validation using randomly split data, or not validating models. Another issue to note is that the prevalence varied greatly among different studies (4.14–65.95%), possibly due to the small sample sizes of some studies, inconsistent criteria for outcome indicators, and differences in participants (age, region, disease, etc.).
There are also many aspects worth learning from the included studies. In diagnostic prediction model researches, prediction is aimed at the existing clinical outcome, so the most appropriate study design is a cross-sectional study [32]. The study designs of the six diagnostic prediction models included were all cross-sectional studies. Correspondingly, the study designs of all included prognostic prediction models were cohort studies. Many studies used large-scale national or international data, which is beneficial for the representativeness of research subjects. In terms of model development methods, machine learning methods have become popular in recent years, but it is still unclear which machine learning method has better predictive performance and whether they improve predictive performance compared to traditional logistic regression. In the included studies, two studies [21, 24] compared the predictive performance of different machine learning methods (such as ridge regression, gradient boosting, deep neural network, random forest, support vector machine, eXtreme gradient boosting, etc.), and one study [22] contrasted the predictive performance of machine learning with that of traditional logistic regression. The results showed that ridge regression and logistic regression had better predictive performance, which gives us some inspiration. For handling missing values, two studies [16, 18] used multiple imputation, which is considered the most appropriate approach [32]. Finally, in addition to logistic regression results, the models included in the study were presented in other various forms, including nomograms, risk scores, web assessment tools, decision trees, etc., to make practical use more convenient.
In terms of the clinical value of the models, besides applicability and risk of bias addressed in PROBAST, current research also needs to improve the complete reporting of model performance measures, particularly calibration. Among the 17 models, seven did not report model calibration, which limits the evaluation of their clinical value and makes it difficult to determine if the models are suitable for clinical use.
We can also learn some insights from the predictors incorporated into the models. From the results of included studies, it can be concluded that older age, female gender, low educational level, poor self-rated health, higher BMI, smoking, and chronic diseases such as heart disease, stroke, and diabetes increased the risk of disability in older adults, while alcohol consumption appeared to be a protective factor. It is easy to understand the promoting effect of aging, poor self-rated health, and various chronic diseases on disability. When the human body is aging or ill and causes its own health level to decline, the body’s function will decline, and the ability to complete various daily activities will also decline, resulting in varying degrees of disability. Women tend to have lower decision-making power, family economic status and socio-economic status than men, which leads to poorer access to health services, particularly in rural areas, and is a possible reason why women are more prone to disability [54]. The promoting effect of low educational level on disability can be explained from two aspects. On the one hand, elderly people with low educational level often have weaker health concepts and lower disease awareness. On the other hand, their adaptability to social development is relatively weaker and it is more difficult for them to complete complex daily activities such as taking transportation and taking care of their own economy. Higher BMI can directly affect human body dynamics and posture control, thereby limiting bodily function [45]. In addition, obesity induced by a high BMI can affect disability indirectly through its well-known association with chronic diseases [42, 45]. The study [55] showed that not smoking can delay the deterioration of health status in the elderly, which may explain the promoting effect of smoking on disability in older adults. The research found that drinking a low to moderate dose of alcohol was protective to functional situation compared with non-drinkers for the elderly, possibly due to moderate drinking can reduce the risk of cardiovascular diseases [44]. Nevertheless, the relationship between alcohol consumption and health needs to be further studied in terms of frequency and quantity of alcohol consumption. Based on the above, people should pay attention to health early in life, improve disease awareness, adopt a healthy lifestyle, and maintain a normal weight to prevent and delay the occurrence of disability.
Limitations
This systematic review has the following limitations. Firstly, the vast majority of included studies were conducted in China, the United States, and Japan, so the findings may not be applicable to other countries or regions of the world. Notably, half of the included studies were conducted in China, which may be related to the use of three Chinese literature search databases, representing one of the main sources of bias in this study. Secondly, due to the great heterogeneity of included studies in terms of participants, predictors, outcome indicators, etc., meta-analysis could not be performed, and it is regrettable that quantitative synthesis results could not be presented. Finally, the inclusion of only studies in Chinese and English may have left out some studies in other languages that met the criteria.
Conclusion
This systematic review included a total of 16 studies and 17 prediction models, the C statistical/AUC ranged from 0.650 to 0.853, and most of the included prediction models had acceptable discrimination. However, according to the PROBAST, all included studies were at high risk of bias, and 10 studies were at high concerns for applicability. The overall quality of current studies is poor. After evaluation, only two models reached the standard of clinical value. Therefore, in the future, researchers should conduct related studies with larger sample sizes, more reasonable study designs (such as using the same criteria for including participants in multi-center studies, clearly defining and measuring predictor factors and outcome indicators, and ensuring measurements are made without knowing each other), and more scientific analysis methods (such as avoiding arbitrary conversion of continuous variables to categorical variables, using multiple imputation for handling missing data, and not relying on univariate analysis for variable selection), while following the TRIPOD statement and being in accordance with the PROBAST checklist, in order to improve the predictive performance and application value of prediction models.
Availability of data and materials
Data is provided within the manuscript or supplementary information files.
References
Organization WH. Ageing and health. 2021. Available from: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health.
Svestkova O. International classification of functioning, disability and health of World Health Organization (ICF). Prague Med Rep. 2008;109(4):268–74.
Katz S, Downs TD, Cash HR, Grotz RC. Progress in development of index of ADL. Gerontologist. 1970;10(1):20.
Lawton MP, Brody EM, ASSESSMENT OF OLDER PEOPLE - SELF-MAINTAINING AND INSTRUMENTAL ACTIVITIES OF DAILY LIVING. Gerontologist. 1969;9(3P1):179.
Okoro CA, Hollis ND, Cyrus AC, Griffin-Blake S. Prevalence of disabilities and health care access by disability status and type among adults - United States, 2016. MMWR-Morb Mortal Wkly Rep. 2018;67(32):882–7.
Aging TNWCo. The results of the fourth sample survey on the living conditions of the elderly in China's urban and rural areas. 2016. Available from: http://www.cmw-gov.cn/news.view-794-1.html.
Luo YA, Su BB, Zheng XY. Trends and challenges for population and health during population aging - China, 2015–2050. China CDC Wkly. 2021;3(28):593.
Fried LP, Ferrucci L, Darer J, Williamson JD, Anderson G. Untangling the concepts of disability, frailty, and comorbidity: implications for improved targeting and care. J Gerontol Ser A-Biol Sci Med Sci. 2004;59(3):255–63.
Gill TM, Robison JT, Tinetti ME. Difficulty and dependence: two components of the disability continuum among community-living older persons. Ann Intern Med. 1998;128(2):96.
Gobbens RJ. Associations of ADL and IADL disability with physical and mental dimensions of quality of life in people aged 75 years and older. PeerJ. 2018;6:17.
Nascimento CD, de Oliveira C, Firmo JOA, Lima-Costa MF, Peixoto SV. Prognostic value of disability on mortality: 15-year follow-up of the Bambui cohort study of aging. Arch Gerontol Geriatr. 2018;74:112–7.
Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? BMJ-British Med J. 2009;338:7.
Ai Y, Hu H, Wang Y, Wang L, Gao X, Wang Z, et al. Prediction model of activities of daily living ability among the community elderly in Wuhan. J Nurs Sci. 2021;36(24):94–7.
Chen L. Research on Influencing Factors of Elderly Disability Based on Harlem Model. 2021.
Clark DO, Stump TE, Tu WZ, Miller DK. A comparison and cross-validation of models to Predict Basic Activity of Daily Living Dependency in older adults. Med Care. 2012;50(6):534–9.
Jonkman NH, Colpo M, Klenk J, Todd C, Hoekstra T, Del Panta V, et al. Development of a clinical prediction model for the onset of functional decline in people aged 65–75 years: pooled analysis of four European cohort studies. BMC Geriatr. 2019;19:12.
Katayama O, Lee S, Bae S, Makino K, Chiba I, Harada K, et al. A simple algorithm to predict disability in community-dwelling older Japanese adults. Arch Gerontol Geriatr. 2022;103:8.
Kim DH, Newman AB, Lipsitz LA. Prediction of severe, persistent activity-of-daily-living disability in older adults. Am J Epidemiol. 2013;178(7):1085–93.
Lee AK, Diaz-Ramirez LG, Boscardin WJ, Smith AK, Lee SJ. A comprehensive prognostic tool for older adults: Predicting death, ADL disability, and walking disability simultaneously. J Am Geriatr Soc. 2022;70(10):2884–94.
Lu X, Jiang Z, Yuan X, Yang X, Xu L. Exploring the construction of a risk prediction model for elderly disability based on random forest. Med Health. 2022;11:0263–7.
Lu YJ, Sato K, Nagai M, Miyatake H, Kondo K, Kondo N. Machine learning-based prediction of functional disability: a Cohort Study of Japanese older adults in 2013–2019. J Gen Intern Med. 2023;38(11):2486–93.
Shao X, Wu Z. Differences in influencing factors for self-care ability between urban and rural elderly in China. Chin Rural Health Service Adm. 2022;42(6):418–26.
Tsuji T, Kondo K, Kondo N, Aida J, Takagi D. Development of a risk assessment scale predicting incident functional disability among older people: Japan gerontological evaluation study. Geriatr Gerontol Int. 2018;18(10):1433–8.
Xiang CY, Wu YF, Jia MN, Fang Y. Machine learning-based prediction of disability risk in geriatric patients with hypertension for different time intervals. Arch Gerontol Geriatr. 2023;105:7.
Zhang L, Chen YQ, Liu J, Yu YF, Cui HJ, Chen QZ, et al. Novel physical performance-based models for activities of daily living disability prediction among Chinese older community population: a nationally representative survey in China. BMC Geriatr. 2022;22(1):13.
Zhang L, Cui HJ, Chen QZ, Li Y, Yang CX, Yang YF. A web-based dynamic Nomogram for predicting instrumental activities of daily living disability in older adults: a nationally representative survey in China. BMC Geriatr. 2021;21(1):12.
Zhang Y, Yuan H, Jin Y, Yu H. Disability status and its influencing factors of the elderly in Jiading District, Shanghai. Chin J Disease Control Prev. 2022;26(7):784–9.
Zhou JH, Lyu YB, Wei Y, Wang JN, Ye LL, Wu B, et al. Prediction of 6-year risk of activities of daily living disability in elderly aged 65 years and older in China. Zhonghua Yi Xue Za Zhi. 2022;102(2):94–100.
Debray TPA, Damen J, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ-British Med J. 2017;356:11.
Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data extraction for systematic reviews of prediction modelling studies: the CHARMS hecklist. PLos Med. 2014;11(10):12.
Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170(1):51.
Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: A tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170(1):W1-33.
Naye F, Décary S, Houle C, LeBlanc A, Cook C, Dugas M, et al. Six externally validated prognostic models have potential clinical value to predict patient health outcomes in the rehabilitation of musculoskeletal conditions: a systematic review. Phys Ther. 2023;103(5):pzad021.
Austin PC, van Klaveren D, Vergouwe Y, Nieboer D, Lee DS, Steyerberg EW. Geographic and temporal validity of prediction models: different approaches were useful to examine model performance. J Clin Epidemiol. 2016;79:76–85.
Crowson CS, Atkinson EJ, Therneau TM. Assessing calibration of prognostic risk scores. Stat Methods Med Res. 2016;25(4):1692–706.
Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, et al. Discrimination and calibration of clinical prediction models users’ guides to the Medical Literature. JAMA-J Am Med Assoc. 2017;318(14):1377–84.
Venema E, Wessler BS, Paulus JK, Salah R, Raman G, Leung LY, et al. Large-scale validation of the prediction model risk of bias assessment tool (PROBAST) using a short form: high risk of bias models show poorer discrimination. J Clin Epidemiol. 2021;138:32–9.
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. J Clin Epidemiol. 2021;134:178–89.
Ustun T, Chatterji S, Villanueva M, Bendib L, Murray C. The WHO Multicountry Household Survey Study on Health and Responsiveness 2000–2001. 2003.
China MoCAotPsRo. Ability assessment for older adults. 2013.
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. Ann Intern Med. 2015;162(1):55-U103.
Mehta NK, Patel SA, Ali MK, Narayan KMV. Preventing disability: the influence of modifiable risk factors on state and national disability prevalence. Health Aff. 2017;36(4):626–35.
Moreno-Martin P, Jerez-Roig J, Rierola-Fochs S, Oliveira VR, Farrés-Godayol P, de Souza DLB, et al. Incidence and predictive factors of functional decline in older people living in nursing homes: a systematic review. J Am Med Dir Assoc. 2022;23(11):1815.
Stuck AE, Walthert JM, Nikolaus T, Büla CJ, Hohmann C, Beck JC. Risk factors for functional status decline in community-living elderly people:: a systematic literature review. Soc Sci Med. 1999;48(4):445–69.
Yin A. A Nomogram prediction model for the risk of disability incidence in the middle-aged and older adults. 2021.
McCusker J, Kakuma R, Abrahamowicz M. Predictors of functional decline in hospitalized elderly patients: a systematic review. J Gerontol Ser A-Biol Sci Med Sci. 2002;57(9):M569-577.
Hoogerduijn JG, Schuurmans MJ, Duijnstee MSH, de Rooij SE, Grypdonck MFH. A systematic review of predictors and screening instruments to identify older hospitalized patients at risk for functional decline. J Clin Nurs. 2007;16(1):46–57.
Tas U, Verhagen AP, Bierma-Zeinstra SMA, Odding E, Koes BW. Prognostic factors of disability in older people: a systematic review. Br J Gen Pract. 2007;57(537):319–23.
Chang SF, Cheng CL, Lin HC. Frail phenotype and disability prediction in community-dwelling older people: a systematic review and meta-analysis of prospective cohort studies. J Nurs Res. 2019;27(3):10.
Kojima G. Frailty as a predictor of disabilities among community-dwelling older people: a systematic review and meta-analysis. Disabil Rehabil. 2017;39(19):1897–908.
Vermeulen J, Neyens JCL, van Rossum E, Spreeuwenberg MD, de Witte LP. Predicting ADL disability in community-dwelling elderly people using physical frailty indicators: a systematic review. BMC Geriatr. 2011;11:11.
Soh CH, Ul Hassan SW, Sacre J, Lim WK, Maier AB. Do morbidity measures predict the decline of activities of daily living and instrumental activities of daily living amongst older inpatients? A systematic review. Int J Clin Pract. 2021;75(4):8.
Cavanaugh EJ, Richardson J, McCallum CA, Wilhelm M. The predictive validity of physical performance measures in determining markers of preclinical disability in community-dwelling middle-aged and older adults: a systematic review. Phys Therapy. 2018;98(12):1010–21.
Liu E, Zhang Q. Study on gender differences of rural disabled elderly and lts influence mechanism-basedon the 2014 CLHLS Data. Social Secur Stud. 2019;02:49–58.
Haveman-Nies A, de Groot L, van Staveren WA. Dietary quality, lifestyle factors and healthy ageing in Europe: the SENECA study. Age Ageing. 2003;32(4):427–34.
Acknowledgements
Not applicable.
Funding
This research was funded by the Zhejiang Province Major Social Welfare Program Project (2023C03191), the Zhejiang Province Major Social Welfare Program Project (2022C03134), and the Science and Technology Innovation 2030- “New Generation of Artificial Intelligence” Major Project (2022ZD0160703).
Author information
Authors and Affiliations
Contributions
JZ: Conceptualization, Methodology, Writing - Original Draft, Writing - Review & Editing. YX: Methodology, Formal analysis, Data Curation. DY: Formal analysis, Data Curation. QZ: Data Curation, Supervision. SD: Methodology, Supervision. HP: Writing - Review & Editing, Supervision. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhou, J., Xu, Y., Yang, D. et al. Risk prediction models for disability in older adults: a systematic review and critical appraisal. BMC Geriatr 24, 806 (2024). https://doi.org/10.1186/s12877-024-05409-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12877-024-05409-z