Abstract
Background
Accurate assessment of a patient's functional status is crucial for determining the need for treatment and evaluating outcomes. Objective functional impairment (OFI) measures, alongside patient-reported outcome measures (PROMs), have been proposed for spine diseases. The Timed-Up and Go (TUG) test, typically administered by healthcare professionals, is a well-studied OFI measure. This study investigates whether patient self-measurement of TUG is similarly reliable.
Methods
In a prospective, observational study, patients with spinal diseases underwent two TUG assessments: one measured by a healthcare professional and one self-measured by the patient. Interrater reliability was assessed using the intraclass correlation coefficient (ICC) with a two-way random-effects model, considered excellent between 0.75 – 1.00. Paired t-tests directly compared both measurements. The impact of variables such as age, sex, disease type, symptom severity (via PROMs), comorbidities, and frailty on reliability was also analysed.
Results
Seventy-four patients were included, with a mean age of 62.9 years (SD 17.8); 29 (39.2%) were female. The majority (64.9%) were treated for degenerative disc disease. The lumbo-sacral region was most affected (71.6%), and 47.3% had previous surgeries. Patient self-measurement reliability was excellent (ICC 0.8740, p < 0.001), and the difference between healthcare professional (19.3 ± 9.4 s) and patient measurements (18.4 ± 9.7 s) was insignificant (p = 0.116). Interrater reliability remained high in patients > 65 years (ICC 0.8584, p < 0.001), patients with ASA grades 3&4 (ICC 0.7066, p < 0.001), patients considered frail (ICC 0.8799, p < 0.001), and in patients not using any walking aid (ICC 0.8012, p < 0.001). High symptom severity still showed strong reliability (ICC 0.8279, p < 0.001 for Oswestry Disability Index > 40; ICC 0.8607, p = 0.011 for Neck Disability Index > 40).
Conclusions
Patients with spine diseases can reliably self-measure OFI using the TUG test. The interrater reliability between self-measurements and those by healthcare professionals was excellent across all conditions. These findings could optimize patient assessments, especially in resource-limited settings.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Patient-reported outcome measures (PROMs) are nowadays considered standard assessments in the care of patients with spinal diseases, as they help to quantify pain and disability. PROMs, such as the Core Outcome Measures Index (COMI) or the Oswestry Disability Index (ODI) are subjective measures that help to estimate how a patient perceives his or her current condition. The benefits of additional objective assessments, with measures such as the 6-min walking test (6WT) or Timed Up and Go (TUG) test, have recently been highlighted. [2, 5, 8, 9, 12, 13, 15, 17, 18, 20, 21] These measures can be used to precisely measure symptom severity and are hence valuable tools to detect progression or improvement of symptoms with both conservative and surgical treatment. Previous studies have shown that the TUG tests is quick, easy to use and highly reliable. [7] Although this type of objective evaluation does not replace PROMs, it provides additional information and is subject to bias to a lesser extent. [14] With normal population reference values available, a free smartphone app simplifies the measurement and interpretation of TUG raw values, transforming those into normalized, age- and sex-adjusted objective functional impairment (OFI) T-scores.
Until today, TUG tests are conducted by healthcare professionals, which is resource-demanding and does not allow for the determination of a patient’s condition outside a fixed appointment. If patient self-measurement by means of the TUG test were to prove sufficiently reliable, there would be many useful applications. For example, patients with severe spinal stenosis undergoing conservative treatment could monitor themselves closely and report to their spine surgeon in case of progressive functional deterioration. A similar but potentially even more clinically relevant scenario applies to patients with mild degenerative cervical myelopathy (DCM; modified Japanese Orthopaedic Association (mJOA) score of 15–17). As in this patient population close observation and timely surgical care in case of functional deterioration is recommended, [3, 4] patient self-measurement with the TUG test could prevent from undetected and potentially irreversible functional decline between fixed appointments. Similarly, the post-operative healing process could be monitored more closely and worsening, possibly as an early sign of adverse events (AEs), could trigger communication with the spine surgeon, possibly allowing to initiate diagnostic measures and treatment at an earlier time point.
The prerequisite for this, however, is that the TUG test can be reliably applied by patients themselves, on which there is currently no data. Hence, the aim of this study was to analyse the reliability of TUG test self-assessments by patients.
Methods
Study design
The study was designed as a single-centre clinical validation study, conducted at the Spine Centre of Eastern Switzerland (OSWZ) of the Kantonsspital St.Gallen. The study population consisted of adult patients, who underwent inpatient surgical or non-operative treatment for a spinal disease or pathology between 2022 and 2023. These patients were screened for inclusion and exclusion criteria. Sufficient mobility to perform the TUG test and the signing of a general informed consent sheet were required for inclusion. Patients were excluded if they were under 18 years of age, did not sign the informed consent form, or refused to participate. The study was approved by the Ethics Committee of Eastern Switzerland (EKOS 23/179).
TUG test and data collection
The TUG test is the most commonly used objective functional test for the evaluation of a patient with degenerative disease of the lumbar spine. [19] It assesses simple but important functions such as getting up, walking, changing direction, walking again, and sitting down. These basic functions are essential for performing activities of daily living (ADLs) and regaining quality of life (QoL). [1, 6, 8, 13] It is a clinically validated test that is regularly used in clinical practice. [8, 20, 21].
The TUG test requires only a chair and a walking distance of 3 m. To perform the test, patients were asked to sit in a chair with their arms resting on the back of the chair. At the request of the examiner, or at their own starting signal if they were measuring themselves, patients stood up and walked as quickly as possible (without running) behind a line marked on the floor at three metres. When they reached the line, they turned 180 degrees and returned to the chair as quickly as possible to sit down again. Timing was started when patients stood up and stopped when they sat down again. Patients were allowed to wear their normal shoes and, if necessary, use a walking aid such as crutches, walkers or a trolley. The walking aid used was then recorded. All TUG tests were conducted using standardized chairs (47 cm seat height) and on the same non-slip hospital floor surface to ensure consistency across measurements and to allow for direct comparison of results. A self-designed, freely available smartphone app for both Apple and Android smartphones (TUG app) was used to standardize the measurements. This app offers the function of a stopwatch and simultaneously calculates additional parameters such as the normalized T-score (adjusted for age and sex) and OFI (Fig. 1).
Only patients, who completed the TUG test twice were included in the analysis. The first measurement was performed by a healthcare professional, and the second measurement was performed by the patients themselves. The order of the test was kept constant throughout the study to avoid potential differences in the time measured by the patients due to insufficient understanding of the TUG test and how the time is measured. The interval between the two measurements was kept between one and two hours to avoid patients measuring a higher time due to fatigue, while also minimizing the risk of a difference due to fluctuations in their clinical condition. Other required data (age, sex, walking aid, American Society of Anesthesiology (ASA) risk scale, smoking status, Charlson Comorbidity Index (CCI), Body Mass Index (BMI), underlying disease type, affected spinal region) were part of the standard patient care and documentation of patients and were stored in a pseudo-anonymised manner. Between both measurements, patients were asked to fill out PROM questionnaires for a self-assessment of their current condition, including the COMI back/neck as well as the ODI/NDI. Depending on the Canadian Frailty Index (CFI), patients were grouped into "Very Fit", "Well", "Managing Well", "Vulnerable", "Mildly Frail", "Moderately Frail", "Severely Frail", "Very Severely Frail" and "Terminally Ill". [16] The disease types were subdivided into "degenerative", "trauma", "tumour/neoplastic", "infectious" and "deformity".
Statistical analysis
It was our null hypothesis that there will be no significant differences in the time measured in seconds by patients and by healthcare professionals, meaning that patients are able to measure themselves reliably. For this evaluation, test–retest reliability was calculated using intraclass correlation coefficients (ICCs). A value close to 1.00 is considered as perfect correlation, while a value close to 0.00 indicates poor or no correlation between the two groups. Subgroup analyses were conducted, based on the demographic data and the functional status, to evaluate, whether self-assessments with the TUG test are reliable for all patients in general or only for certain subgroups. Furthermore, paired, two-sided t-tests were performed for the TUG test raw value to determine whether patients over- or underrate themselves, in case any difference existed.
All statistical analyses and generation of graphs were performed using StataSE 18.0 (StataCorp. 2023. Stata Statistical Software: Release 18. College Station, TX: StataCorp LLC). Descriptive statistics were employed, describing the sample as count (percent) and mean (standard deviation (SD)). Graphical illustrations of results were used to explore relationships. A p-value below 0.05 was considered statistically significant.
Results
Study sample and demographics
During the data collection period, 83 patients were included, of which nine were excluded because they did not have a second measurement and therefore treated as “lost to follow-up” (Table 1).
The mean age of the study population was 62.9 (SD 17.8) years, with 29 patients being female, representing 39.2% of the study population. Most patients had an ASA surgical risk scale of II (n = 40, 54.1%) followed by III (n = 19, 25.7%), I (n = 12, 16.2%) and 3 patients with IV (4.1%). The Charlson comorbidity index was > 1 in most patients (n = 33, 44.6%), 0 in 29 patients (39.2%) and 1 in 12 patients (16.2%). For the study population, just under 2/3 of patients had a “very fit” to “managing well” Canadian frailty index (n = 48, 64.9%). The disease type was divided into degenerative disc disease (n = 48, 64.9%), trauma (n = 13, 17.6%), infection (n = 12, 16.2%) and deformity (n = 1, 1.4%). The spinal region affected by the underlying disease was lumbosacral in most patients (n = 53, 71.6%), followed by cervical (n = 12, 16.2%) and thoracic (n = 9, 12.2%). Of the patients included, 22 required a walking aid to perform the TUG test, representing 29.7% of the total. The aids used remained unchanged from the first to the second measurement.
Patient Reported Outcome Measures
The mean ODI of the cohort with thoracolumbar pathologies was 46.2%, whereas the mean NDI of the cohort with cervical pathologies was 38.7%. The Visual analog scale (VAS), measured for the intensity of pain in the back and the extremities, ranged on average from 4.4 to 5.5 (Table 2).
Test–retest reliability
The ICC over the total cohort was 0.8740 with a p-value of < 0.001. Further subgroup analyses using the ICC for comparisons based on demographic parameters such as age, sex, ASA surgical risk scale, BMI, frailty index, ODI, NDI and use of walking aids all had high coefficients above 0.7065, with p-values < 0.05 (Table 3,Fig. 2).
Mean TUG test times measured by the healthcare professional were 19.3 s (SD 9.4) and by patients themselves were 18.4 s (SD 9.7; p = 0.116; Fig. 3).
Discussion
For certain spinal pathologies, close clinical follow-ups play an important role in deciding whether to treat conservatively or surgically to prevent neurological deterioration. This would be particularly the case for cervical spinal stenosis with mild cervical myelopathy. For other spinal pathologies, serial clinical follow-ups are important in order to better monitor the healing process. A more detailed description of the clinical course with low granularity likely allows for a better estimation of the prognosis and expected healing process. The TUG test provides information regarding OFI and can be carried out quickly without major effort or specialized equipment. [7] This test would be suitable for self-administration by the patient in the home environment. With this test as a tool, it would be possible to monitor the patient's objective functional capacity much more closely and to intervene timely in the event of a relevant functional deterioration. However, for this application to be admissible, the reliability of the test needs to be examined if it is administered by the patients themselves. This present study examined the question of the reliability of the TUG test when measured by the patient himself.
Our study cohort of 74 evaluated patients reflects a representative, broad collective of patients with spinal diseases managed surgically or non-operative in terms of age and comorbidity. [10] We included spinal pathologies of the cervical spine, the thoracic spine and the lumbar spine that induced mobility restrictions either by mechanical or radicular pain, or by neurological deficits. The prerequisite for mobility, albeit limited, was given in part without (68.9%) and in part with aids (29.7%).
Essentially, our study found that the test–retest reliability of measurements performed by healthcare professionals or patients themselves was excellent. A direct comparison of both measurements showed a mean difference of 0.9 s, which was statistically insignificant (p = 0.116). Considering that the minimum clinically important difference (MCID) of the TUG test for spinal pathologies ranges between 2.1 – 3.4 s., the difference between both measurements can also be considered clinically irrelevant as the difference in measurement is smaller than the smallest detectable difference in functional impairment. The ICC of 0.8740 indicates excellent test–retest reliability for the entire cohort, according to the recommendations by Koo et al. [11]. Influencing factors that limit the reliability of the self-measured TUG test are conceivable, however. To rule out lower reliability in certain settings and under specific conditions, variables including age and sex as demographic factors as well as ASA, BMI, frailty, the use of walking support and PROMs (NDI/ODI) were analysed as functional factors in subgroup analyses (Table 3). The highest ICCs were found in patients of younger age (under 65 years; ICC 0.9047, p < 0.001), regardless of gender, lower ASA score, lower frailty and lower NDI or ODI. Interrater reliability was slightly inferior in patients > 65 years (ICC 0.8584, p < 0.001), patients with ASA grades 3 and 4 (ICC 0.7066, p < 0.001), patients considered vulnerable or frail (ICC 0.8799, p < 0.001), and in patients not using any type of walking aid (ICC 0.8070, p < 0.001). The higher ICC observed in younger patients (≤ 65 years) compared to those over 65 years of age may be partially attributed to greater familiarity with smartphone technology among younger individuals. This technological proficiency could lead to more accurate self-measurements in this demographic. Symptom severity, determined by an ODI of > 40 points for patients with thoracolumbar disease did not influence interrater reliability, but patients with cervical diseases and a NDI of > 40 points scored slightly worse (ICC 0.8607, p = 0.011). These results are in line with expectations, as certain comorbidities and symptoms may have a negative impact on the understanding of the exact performance of the test and correct timekeeping. The ICC remained acceptably high even under these conditions, however, indicating that overall, self-measured TUG test results can be considered sufficiently reliable in patients with spinal pathologies.
Strengths and Weaknesses.
The prospective study design incorporating a defined set of clinically relevant variables and PROM scores can be considered a strength of this study. Moreover, robust statistical methods were applied in a reasonably large cohort without missing data for the main outcome variable.
The period between the both measurements was intentionally set short, in order to reduce possible bias by fluctuations in the patient's underlying clinical condition. At the same time, this can potentially lead to a poorer result in the second measurement due to fatigue in severely deconditioned or impaired patients. Moreover, the study setting differs somewhat from a self-measurement in the home environment, where there naturally exists a longer time interval between the last instruction on how to perform the test and the self-measurement. Self-measurements at home may therefore be influenced by incorrect performance, albeit this risk can be considered low for this simple test. It should be noted that our study employed the use of standardized chairs and surfaces for all measurements, which may not be entirely replicable in home environments. Consequently, future studies should assess the impact of potential variability in home furniture and flooring on the reliability of the self-administered TUG test. Compliance with correct test performance might be improved by image-, video-based and/or written instructions. No statement can be made about this based on our data, however. Lastly, randomizing the sequence of testing (either healthcare personal or patient self-measurement first) would have been ideal to rule out systematic differences in test performance resulting from repetitive testing. The lower test results by 0.9 s on average in the second TUG test may correspond to a minor “learning effect”, which could have been eliminated by randomizing the sequence. Overall, however, we felt that instructing the patient during the first TUG test by healthcare personal would ensure a correct test conduction and outweigh the disadvantages.
Implications for clinical practice.
Considering the excellent test–retest reliability of the TUG test, determination of OFI may be “outsourced” from healthcare personal to patients, which helps to save time and resources in daily patient care. Even though our findings were made exclusively made in an inpatient setting, extrapolating the results to an outpatient setting can be considered. Here, patient self-examination would open the door towards a more thorough serial patient assessment with higher granularity. This is particularly helpful in spine conditions where functional decline is relevant and timely (surgical) treatment may be required even between planned follow-up visits, e.g., mild DCM, spinal cord cavernomas, syringomyelia, thoracic disc herniation, lumbar spinal stenosis among others. The application of self-measured TUG test extends even beyond spinal applications, considering conditions such as normal pressure hydrocephalus that frequently present with gait difficulties and mobility restrictions. Even if our results do not yet permit the unfiltered application of the self-measured TUG test in the outpatient setting and repeating a similar study with patient home-measurement would be ideal, it is questionable whether such a study will ever be conducted in the future. Although our findings indicate a high degree of reliability in self-administered TUG tests in an inpatient setting, it is advisable to exercise caution when extrapolating these results to home environments. The controlled hospital setting differs from home conditions in terms of the availability of standardized equipment, the surrounding environment, and the level of immediate professional oversight. While our results are promising, further research is needed to specifically evaluate the reliability of self-administered TUG tests in home settings. Variables such as furniture, flooring, and potential distractions may influence test performance, and thus require further investigation. Future studies should focus on validating the reliability of home-based, self-administered TUG tests to fully assess their potential for widespread clinical application in outpatient monitoring and care.
Conclusion
This study provides evidence for a high reliability of self-testing by means of the TUG test. Our findings implicate the possibility of patients performing the TUG test without supervision by trained healthcare personal, which helps to save time and resources. Although further research would be ideal for evaluating the reliability of TUG test self-measurement in the outpatient setting it seems reasonable to expand its use for serial self-examinations in the home environment.
Data availability
No datasets were generated or analysed during the current study.
References
Barry E, Galvin R, Keogh C, Horgan F, Fahey T (2014) Is the Timed Up and Go test a useful predictor of risk of falls in community dwelling older adults: a systematic review and meta-analysis. BMC Geriatr 14:14
Corniola M-V, Stienen MN, Joswig H, Smoll NR, Schaller K, Hildebrandt G, Gautschi OP (2016) Correlation of pain, functional impairment, and health-related quality of life with radiological grading scales of lumbar degenerative disc disease. Acta Neurochir (Wien) 158(3):499–505
Davies BM, Mowforth OD, Smith EK, Kotter MR (2018) Degenerative cervical myelopathy. BMJ 360:k186
Fehlings MG, Tetreault LA, Riew KD et al (2017) A Clinical Practice Guideline for the Management of Patients With Degenerative Cervical Myelopathy: Recommendations for Patients With Mild, Moderate, and Severe Disease and Nonmyelopathic Patients With Evidence of Cord Compression. Global Spine J 7(3 Suppl):70S-83S
Gautschi OP, Corniola MV, Joswig H, Smoll NR, Chau I, Jucker D, Stienen MN (2015) The timed up and go test for lumbar degenerative disc disease. J Clin Neurosci 22(12):1943–1948
Gautschi OP, Corniola MV, Smoll NR, Joswig H, Schaller K, Hildebrandt G, Stienen MN (2016) Sex differences in subjective and objective measures of pain, functional impairment, and health-related quality of life in patients with lumbar degenerative disc disease. Pain 157(5):1065–1071
Gautschi OP, Joswig H, Corniola MV, Smoll NR, Schaller K, Hildebrandt G, Stienen MN (2016) Pre- and postoperative correlation of patient-reported outcome measures with standardized Timed Up and Go (TUG) test results in lumbar degenerative disc disease. Acta Neurochir (Wien) 158(10):1875–1881
Gautschi OP, Smoll NR, Corniola MV, Joswig H, Chau I, Hildebrandt G, Schaller K, Stienen MN (2016) Validity and Reliability of a Measurement of Objective Functional Impairment in Lumbar Degenerative Disc Disease: The Timed Up and Go (TUG) Test. Neurosurgery 79(2):270–278
Gautschi OP, Stienen MN, Corniola MV, Joswig H, Schaller K, Hildebrandt G, Smoll NR (2017) Assessment of the Minimum Clinically Important Difference in the Timed Up and Go Test After Surgery for Lumbar Degenerative Disc Disease. Neurosurgery 80(3):380–385
Issa TZ, Lambrechts MJ, Canseco JA, Hilibrand AS, Kepler CK, Vaccaro AR, Schroeder GD (2023) Reporting demographics in randomized control trials in spine surgery - we must do better. Spine J 23(5):642–650
Koo TK, Li MY (2016) A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 15(2):155–163
Maldaner N, Sosnova M, Zeitlberger AM, Ziga M, Gautschi OP, Regli L, Weyerbrock A, Stienen MN (2020) Evaluation of the 6-minute walking test as a smartphone app-based self-measurement of objective functional impairment in patients with lumbar degenerative disc disease. J Neurosurg Spine 33(6):779–788
Maldaner N, Sosnova M, Ziga M, Zeitlberger AM, Bozinov O, Gautschi OP, Weyerbrock A, Regli L, Stienen MN (2022) External Validation of the Minimum Clinically Important Difference in the Timed-up-and-go Test After Surgery for Lumbar Degenerative Disc Disease. Spine (Phila Pa 1976) 47(4):337–342
Maldaner N, Stienen MN (2020) Subjective and Objective Measures of Symptoms, Function, and Outcome in Patients With Degenerative Spine Disease. Arthritis Care Res (Hoboken) 72(Suppl 10):183–199
Podsiadlo D, Richardson S (1991) The timed “Up & Go”: a test of basic functional mobility for frail elderly persons. J Am Geriatr Soc 39(2):142–148
Rockwood K, Song X, MacKnight C, Bergman H, Hogan DB, McDowell I, Mitnitski A (2005) A global clinical measure of fitness and frailty in elderly people. CMAJ 173(5):489–495
Smith-Turchyn J, Adams SC, Sabiston CM (2021) Testing of a Self-administered 6-Minute Walk Test Using Technology: Usability, Reliability and Validity Study. JMIR Rehabil Assist Technol 8(3):e22818
Stienen MN, Gautschi OP, Staartjes VE, Maldaner N, Sosnova M, Ho AL, Veeravagu A, Desai A, Zygourakis CC, Park J, Regli L, Ratliff JK (2019) Reliability of the 6-minute walking test smartphone application. J Neurosurg Spine 31(6):786–793. https://doi.org/10.3171/2019.6.SPINE19559
Stienen MN, Ho AL, Staartjes VE et al (2019) Objective measures of functional impairment for degenerative diseases of the lumbar spine: a systematic review of the literature. Spine J 19(7):1276–1293
Stienen MN, Maldaner N, Joswig H, Corniola MV, Bellut D, Prömmel P, Regli L, Weyerbrock A, Schaller K, Gautschi OP (2019) Objective functional assessment using the “Timed Up and Go” test in patients with lumbar spinal stenosis. Neurosurg Focus 46(5):E4
Stienen MN, Maldaner N, Sosnova M, Zeitlberger AM, Ziga M, Weyerbrock A, Bozinov O, Gautschi OP (2021) External Validation of the Timed Up and Go Test as Measure of Objective Functional Impairment in Patients With Lumbar Degenerative Disc Disease. Neurosurgery 88(2):E142–E149
Funding
No funding was received for this research. No financial support was received for this study.
Author information
Authors and Affiliations
Contributions
Each author made substantial contributions to this article. Conception and design: all authors. Acquisition of data: M.L. Statistical analysis: F.S. and M.S.. Analysis and interpretation of data: M.L., F.S, M.S. Drafting the article: M.L., F.S. M.S. Reviewed submitted version of manuscript: all authors. Approved the final version of the manuscript: all authors. Administrative/technical/material support: all authors. Study supervision: F.S. and M.S.
Corresponding author
Ethics declarations
Ethics
The study was approved by the Ethics Committee of Eastern Switzerland (EKOS 23/179). Patients were screened for inclusion and exclusion criteria. Sufficient mobility to perform the TUG test and the signing of a general informed consent sheet were required for inclusion.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lüssi, M.R., Fischer, G., Bertulli, L. et al. Reliability of self-measured objective functional impairment using the timed up and go test in patients with diseases of the spine. Acta Neurochir 166, 391 (2024). https://doi.org/10.1007/s00701-024-06293-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00701-024-06293-7