A patient might ask the clinician: "How long will it take me to get back to sport?" or "How long until I'm feeling back to myself again?". These questions ask the clinician to make a prognosis - to predict the future.
Often we rely on our clinical experience or intuition to answer with a prognosis. Sometimes we might know some prognostic factors, which can give us some big-picture ideas, but they're rarely enough to give the full picture.
Today physiotherapists and researchers Daniel Feller and Dr Alessandro Chiarotto (Erasmus Medical Centre, Rotterdam, The Netherlands) explain prognostic prediction models: what they are, how they might help in practice, and what to look for when you're deciding whether a tool like the STarT Back is suitable for your practice.
------------------------------
RESOURCES
When is a prognostic prediction model ready for clinical use?: https://www.jospt.org/doi/10.2519/jospt.2026.13868
[00:00:00] This June, APTA Orthopedics is doing something we've never done before, and we're inviting the entire orthopedic PT community to be a part of it. The APTA Orthopedics Virtual Conference 2026 is happening Saturday, June 27th. A full day of evidence-based education, live interactive sessions, and real connections with orthopedic PT professionals from around the world.
[00:00:24] No travel, no hotel, just high-caliber orthopedic education from wherever you are. Earn CEUs, choose from multiple session tracks, and gain access to session recordings. This is your chance to be a part of APTA Orthopedics history. Registration starts at just $25 for students, $75 for members, and $225 for non-members. Visit orthopt.org today to register.
[00:00:55] Dr. I think this situation is actually very common, Claire, at least in rehabilitation and muscle skeletal care. It is rare to have models at low risk of bias, with proper validation, and really applicable at the patient in front of us. Nonetheless, and I think this is a very important point, prediction models are tools to support decisions and not to replace clinical reasoning.
[00:01:25] Hello, and welcome to JOSPT Insights, the podcast that aims to help you translate quality research to quality practice. I'm Claire Ardern, the Editor-in-Chief of the Journal of Orthopedic and Sports Physical Therapy. It's great to have you listening today. One question a patient might ask you is, how long is it going to take me to get back to sport? Or, how long until I'm feeling back to myself again?
[00:01:52] These questions ask you to make a prognosis, to predict the future. Often we rely on our clinical experience or our intuition to answer with a prognosis. Sometimes we might know some prognostic factors which can give us big picture ideas, but they're rarely enough to give the full picture. My guests today have devoted many research hours to developing and testing prognostic prediction models that are helpful for clinicians,
[00:02:17] and they're here to explain what to look for when you're deciding whether a tool like the StartBack is suitable for your practice. Daniel Feller is a physiotherapist and biostatistician who is completing his PhD at Erasmus Medical Centre in Rotterdam, the Netherlands. Dr Alessandro Chiarotto is an assistant professor in the Department of General Practice, also at the Erasmus Medical Centre.
[00:02:41] They both do research that focuses on improving diagnosis, prognosis and treatment for patients with spine pain. Dr Alessandro Chiarotto, Daniel Feller, welcome to JOSPT Insights. Thank you, Claire, for having me. Thank you, Claire. Very looking forward to having this discussion with you. Me too. And thank you both for taking the time. Predicting the future is hard, and clinicians are often asked to make predictions about the future.
[00:03:09] You've both done some lovely and very helpful work here to help us get our heads around the prediction tools that are available for clinicians to help in practice. Let's start, Alessandro, by exploring what are some of the examples of prognostic prediction models that our listeners might have heard of or have used before, and how they're typically used in practice. There are different types of prognostic prediction models that can be used in practice.
[00:03:37] Our listeners might be mainly familiar with tools like the StartBack tool or the Orobromusculoskeletal pain questionnaire or the OSPRO questionnaire. And these are tools that have been mainly used in patients with low back pain. These tools usually give a score from zero to something, and they are scored by patients. And usually the higher the score, the higher the risk of a poor prognosis is.
[00:04:07] These are patient-reported, I would call them questionnaires. And these questionnaires are not the only form of prediction models that can be used in practice. Sometimes prediction models also have online tools, online calculators, that can be used to put in some patient characteristics by the clinicians and to estimate the probability of a given outcome at a certain point in time.
[00:04:34] These tools are usually based on logistic regression if they give a probability. Also other tools can be based on linear regression models in which they give the score that the patient might experience in the future at a given point in time on a given outcome. So there are different ways that the prediction tools could look like. Also, there are different ways in which prediction tools could be used in practice.
[00:05:02] For example, sometimes they are just paper versions of the tools. So the clinicians, together with the patients, can fill in the tool. Most often nowadays, they are electronic forms of these tools. So they are integrated into electronic healthcare records. Sometimes also some tools can be available as online calculators.
[00:05:26] So they are available on a website and the clinicians can go to that website and ask the patient to fill in the questions to give a predicted probability of the outcome. Thanks, Alessandro. And you've given us a couple of really good examples there that will sound familiar, I'm sure, to our listeners in StartBack, in Ospro. So, Daniel, it brings me to you. What are the hallmarks of a trustworthy prediction model?
[00:05:53] What are we looking for or people listening to us today look for when reading these papers, looking for these tools? What should the researchers have done for us to feel confident that this is a prediction model or a tool that's helpful for me in clinical practice? Well, I have to say that in general, developing a trustworthy prediction model is actually very difficult. And researchers need to do several things correctly.
[00:06:20] And these are obviously also the same things that clinicians should pay attention to when reading prediction model papers. Among these various aspects, the first key one is, at least in my opinion, validation. Because researchers start with a dataset and they use this dataset to create the model. The problem is that frequently, the performance of the model is tested only in the development dataset.
[00:06:49] It is a problem because a prediction model always performs too optimistically when it is tested in the same dataset used to develop it. So, to put it in a simple way, this happens because the model partially learns some random noise specific to the dataset used to create the model. And so, the results do not generalize to other cohort of patients.
[00:07:15] And for this reason, model needs to be validated. And this means that their performance should always be tested beyond the original development process. And to do that, there are essentially two main ways. So, two main types of validation. The first one is called internal validation. And the second one is called external validation.
[00:07:41] In internal validation, the dataset is technically still the same one used for the development of the model. But we use special statistical techniques to correct for this optimistic performance estimate computed in the development dataset. Maybe some methods listeners may have heard of are, for example, bootstrapping or cross-validation.
[00:08:06] In the second case, so in external validation, the model is truly tested in a completely different cohort of patients. Often from, for example, another setting or another hospital. So, the first thing a researcher should do and leaders should consequently check is whether the model has actually been validated. So, tested outside the development dataset.
[00:08:32] Another important aspect to judge the trustworthiness of a promising model is for sure how the model performance was evaluated. Because in general, there are two major classes, two major types of performance metrics. On one side, we have discrimination. And on the other side, we have calibration. These two metrics assess two completely different aspects.
[00:09:01] And therefore, should both be reported in prognostic model papers. In fact, discrimination tells us whether the model can distinguish between patients with a good prognosis and those with a poor prognosis. While on the other hand, calibration tells us whether the predicted probabilities, so the output of the model is actually accurate.
[00:09:26] Let's say that a model predicts a 30% risk for a given outcome. So, it provides a 30% probability of a given prognosis. The real question is, does that risk truly correspond to what happens in reality? And this question is answered through calibration and not through discrimination. The big problem, unfortunately, is that many studies report only discrimination.
[00:09:53] But that does not give us the full picture without a calibration measure. So, it is also not a case that calibration has been called by some authors as the Achilles heel of prognostic modeling. Even discrimination and calibration are not enough in reality. Because a model may, in fact, perform well from a statistical point of view. So, in terms of discrimination and calibration. But still fail to improve patient care.
[00:10:20] Usually, we test this through a randomized control trial where one group of clinicians use the model and another does not. And then we follow up the patients. And a follow-up, we can measure if the patient that used the model had a better outcome compared to the other one that does not use the model. But unfortunately, these studies are still very rare in the literature.
[00:10:44] And for this reason, prediction model studies are usually expected to include at least surrogate measure of clinical effectiveness, which is called in statistical terms clinical utility. And clinical utility can be assessed in different ways. And one of the most common approaches is called decision curve analysis.
[00:11:05] To explain it in a very easy and simple way, clinical utility essentially evaluates whether using the model is more beneficial than strategies such as treating everyone or treating no one. To summarize, in my opinion, the hallmarks of trustworthy prediction models are the presence of validation, the reporting of discrimination, calibration, and clinical utility, and also studies that are methodologically robust.
[00:11:35] Excellent, Daniel. Great summary. Thank you. And I think the point here is that you're not trying to turn listeners into statisticians. You're giving us really helpful guiding questions to ask and things to look for. So let me summarize what I think I heard. When we're reading these papers or even looking at a potential tool to incorporate into our clinical practice, we want to see, has the model been validated? Have the, or how have the researchers tested that model?
[00:12:03] Have they used a randomized controlled trial to do some testing? And then can we find a measure of clinical utility? So there's quite a few things to look for, but at least some really concrete things to look for. And I guess, or at least what I was hearing from you is that not necessarily all of this stuff is going to happen in one paper. So there's a whole process to develop a clinical prediction tool. Yeah, I think you summarize perfectly.
[00:12:31] So Alessandro, then how can listeners decide whether a prediction model is even ready to use in clinical practice? So we've heard from Daniel about what it is we're looking for to see whether this model has been developed well. What's the next step? The first question that clinicians should ask is whether the model has been validated. It doesn't really matter if it's internal or external.
[00:12:56] In the past, it was thought that the external validation absolutely needed to be present. But nowadays, also an internal validation can be sufficient to implement a model in clinical practice. The really important question that clinicians need to answer is whether the validation, whether internal or external, was performed in a sample that was representative of the clinical population that managed in clinical practice.
[00:13:23] So if the clinical population is not the same of patients that the clinician manages in practice or the inclusion and exclusion criteria of a study don't match those that clinicians in practice, then it means that the sample is probably not representative of the target population.
[00:13:44] So in that case, clinicians should not be used a model that has been validated in a sample that was not representative basically for their practice. Let's assume that validation has been performed. And let's assume also that the validation sample was representative of the clinical population. Then clinicians should look at the different performance metrics that Daniel mentioned before.
[00:14:12] So they should look at the results for discrimination, calibration and clinical utility. Again, I think the aim here is not to transform clinicians into statisticians. And luckily, there are some thresholds on how to evaluate this performance metrics and whether the model is satisfactory. For example, regarding discrimination, it is possible to look at the value of the area under the curve of C statistic,
[00:14:41] which can range from 0.5 to 1. And usually a value of 0.7 or higher is considered to be a value of satisfactory discrimination. The higher the value, the better it is. It means that the model can discriminate better people who have the outcome from people who do not have the outcome. And there are similar thresholds or different ways to interpret also calibration and clinical utility.
[00:15:07] These are, this might be difficult to explain in a podcast. So my suggestion would be to look at what we have written in the paper, in the clinical commentary published in JOSPT on prognostic prediction models, a primer for clinicians, so that you can also see what values should be looked at when looking at specific calibration values. And also how to interpret, for example, a calibration plot or a decision curve analysis.
[00:15:36] I think that would greatly help the clinicians to better understand also and better interpret the results of existing papers. Another very important question is whether the model is suitable for use within the clinical workflow. In practice, it means whether the model is applicable for being used in a busy setting. There might be models who perform very well in terms of discrimination and calibration,
[00:16:03] but they are very lengthy to use in practice because, for example, other items include some full questionnaires. And this is the case, for example, for some models we developed some years ago, the base models for older adults with back pain, where full questionnaires are included as single items. And by full questionnaires, I mean questionnaires that listeners might be familiar with, like the Roland Morris Disability Questionnaire or the pain catastrophizing scale.
[00:16:31] After having checked whether the model is applicable in practice, the last question, and I go back to what Daniel has already mentioned here, is to ask whether the model is at low risk of bias. Clinicians don't have to assess the risk of bias of a prediction model by themselves, but luckily they can refer to systematic reviews in which the results of different models are usually presented
[00:16:55] and whether the clinicians can check whether the model is at low or high risk of bias. Just making a parallel with randomized controlled trials, we don't want people to use interventions that are based on randomized controlled trials that are at high risk of bias. And the same thing applies to prediction models. Alessandro, I'm really pleased that you make the point about researchers needing to do a good job for clinicians.
[00:17:24] That's something that is really important. There is a lot of trust that clinicians can place in researchers, and researchers need to do the work, the hard work of validating these models and setting them up in a way that is appropriate for clinical practice. Which brings me to my last question, Daniel. What should I do if there isn't a prediction model available for my clinical question? I think this situation is actually very common, Claire.
[00:17:54] At least in rehabilitation and muscle skeletal care in general. It is rare to have models at low risk of bias, with proper validation, and really applicable at the patient in front of us. Nonetheless, and I think this is a very important point, prediction models are tools to support decisions and not to replace clinical reasoning. So if there is not trustworthy model available, we should merely stick to our clinical reasoning
[00:18:23] and to the patient assessment. So history taking, appropriate clinical examination, and so on. And in addition to that, and this is also very important, in my opinion, clinicians can look during the assessment of the patient for the so-called pronostic factors. Pronostic factors are variables, usually in the form of patient characteristics,
[00:18:47] such as age, gender, pain intensity at baseline, and et cetera, associated with a better or worse outcomes over time. A classical example of a pronostic factor in spinal pain might be having a high pain intensity, because we know that patients with a high pain intensity at baseline have, on average, a worse prognosis when compared to patients with low levels of pain intensity.
[00:19:16] I must admit that there is also a big problem related to the pronostic factors, and the reason why we should use pronostic models if available, because pronostic factors provide information at a group level, not at the level of the single patient, of the single individual. So they essentially tell us that, on average, patients with certain characteristics, so with certain pronostic factors, may have a higher risk of a poor outcome.
[00:19:46] But they do not provide a personalized probability for a specific patient in the same way a prediction model does. Also, many principles that we explained earlier to judge prediction models also apply. So, for example, we still need studies with low-risk bias that assess prognostic factors. We need studies with appropriate models, methods, and so on. So if a good prediction model is not available,
[00:20:15] my suggestion for clinicians would be to try to integrate the presence or absence of prognostic factors within their clinical reasoning to try to estimate our patients' likely prognosis. Clinicians, however, should remember that prognostic factor research is not immune to methodological limitation, I would say. Not all reported prognostic factors are equally trustworthy.
[00:20:44] A deeper discussion of how to evaluate prognostic factor studies is beyond the scope of this discussion, but I don't know. It may be perhaps a good topic for a future clinical commentary for the hospity. I think so too, Daniel. Nice way to work in another paper. Excellent. So I like that you're empowering clinicians. You're giving clinicians license to trust their clinical judgment. That's really important. And then the other part of this is to look for the prognostic factors
[00:21:14] within the patient population you're working with. You've both given us loads of really helpful pointers and tips and guidance on what to look for when reading these prognostic prediction models, the studies about these prognostic prediction models. Daniel Feller, Dr. Alessandro Ciarotto, thanks for joining me today on JOSPT Insights. Thank you, Claire. Thank you, Claire, very much. Thanks for listening to this episode of JOSPT Insights.
[00:21:44] For more discussion of the issues in musculoskeletal rehabilitation that are relevant to your practice, you can subscribe to JOSPT Insights on Apple Podcasts, Spotify, TuneIn, Stitcher, Google, or your favourite podcast app. If you like JOSPT Insights, help others find us. Tell your friends and colleagues and rate and review us. To keep up to date with all the latest JOSPT content, be sure to follow us on Twitter, we're at JOSPT,
[00:22:11] and Facebook, we're JOSPT Official. Talk with you next time.

