From Model Outputs to Real-World Claims
So far, we have:
- built baseline models
- evaluated model performance
- assessed stability using cross-validation
- interpreted how models use features
Now we reach a critical step:
How do we translate model outputs into real-world statements without overstating what the model proves?
This lesson connects model evaluation and interpretation to one of the most important applied data science skills: making claims that match the evidence.
A model can produce predictions.
A model can show associations.
A model can reveal patterns in a dataset.
But a model does not automatically prove why those patterns exist.
How to Run This Lesson
Run the supporting script from the project root:
python scripts/python/14a_translate_model_outputs_to_claims.pyThis creates the expected outputs in the reports/ directory:
reports/diabetes-claim-review-table.csv
reports/diabetes-model-claim-summary.txt
reports/figures/diabetes-observed-vs-predicted-error-highlighted.png
Then render the Quarto site:
quarto renderYou can also run the code blocks inside this chapter interactively.
The script-based workflow is preferred for reproducibility because it leaves behind files that can be inspected, compared, committed, or reused in later chapters.
Load the Dataset
We continue using the saved diabetes dataset.
import pandas as pd
df = pd.read_csv("data/diabetes.csv")
X = df.drop(columns=["disease_progression"])
y = df["disease_progression"]
df.head()The dataset gives us features and an outcome.
The model will learn predictive patterns from these data.
The claims we make later must stay within that scope.
Fit a Baseline Model
We fit the same type of baseline model used in earlier lessons.
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)This model provides predicted disease progression values for the test set.
Those predictions can be evaluated and interpreted.
But they are not automatically clinical conclusions.
Visualize Observed vs Predicted Values
An observed-versus-predicted plot helps us see where the model performs well and where it makes larger errors.
import matplotlib.pyplot as plt
import numpy as np
error = np.abs(y_test - y_pred)
plt.figure(figsize=(7, 5))
scatter = plt.scatter(
y_test,
y_pred,
c=error,
cmap="viridis",
edgecolors="black",
linewidth=0.4,
alpha=0.9
)
min_val = min(y_test.min(), y_pred.min())
max_val = max(y_test.max(), y_pred.max())
plt.plot([min_val, max_val], [min_val, max_val], linestyle="--", linewidth=1)
plt.xlabel("Observed disease progression")
plt.ylabel("Predicted disease progression")
plt.title("Observed vs Predicted Values")
plt.colorbar(scatter, label="Absolute error")
plt.grid(alpha=0.2)
plt.show()Points close to the diagonal line represent more accurate predictions.
Points farther from the diagonal indicate larger errors.
This plot supports a statement about predictive performance.
It does not support a causal statement.
Save the Plot as a Project Output
In an applied workflow, important figures should be saved as reusable artifacts.
from pathlib import Path
Path("reports/figures").mkdir(parents=True, exist_ok=True)
plt.figure(figsize=(7, 5))
scatter = plt.scatter(
y_test,
y_pred,
c=error,
cmap="viridis",
edgecolors="black",
linewidth=0.4,
alpha=0.9
)
plt.plot([min_val, max_val], [min_val, max_val], linestyle="--", linewidth=1)
plt.xlabel("Observed disease progression")
plt.ylabel("Predicted disease progression")
plt.title("Observed vs Predicted Values")
plt.colorbar(scatter, label="Absolute error")
plt.grid(alpha=0.2)
plt.tight_layout()
plt.savefig(
"reports/figures/diabetes-observed-vs-predicted-error-highlighted.png",
dpi=300
)
plt.show()The figure is now part of the project record.
What the Model Actually Provides
The model provides:
- predicted outcomes
- estimated relationships between features and predictions
- patterns learned from the dataset
- evidence about model behavior under this setup
It does not directly provide:
- causal proof
- clinical recommendations
- biological mechanisms
- universal truths beyond the data
This distinction is central to responsible analysis.
A model output is not the same as a real-world explanation.
From Output to Claim
A common mistake is to move too quickly from:
model output → real-world conclusion
For example, suppose BMI appears as an important predictor.
A weak statement would be:
BMI is included in the model.
A stronger but still defensible statement would be:
BMI is an important predictor of disease progression in this fitted model.
An unsupported statement would be:
BMI causes disease progression.
The model may show that BMI helps predict disease progression in this dataset.
It does not prove that BMI is the direct cause of disease progression.
A Simple Analogy
Umbrella sales may strongly predict wet streets.
But umbrellas do not cause streets to become wet.
Both are related to another factor: rain.
In the same way, a feature can be strongly associated with an outcome because it reflects:
- underlying processes
- correlated variables
- shared patterns in the data
- measurement structure
- unobserved factors
A strong feature can support prediction.
It does not automatically establish causation.
Build a Claim Review Table
One useful applied practice is to explicitly classify claims by how strongly they are supported.
claim_review_df = pd.DataFrame({
"claim": [
"BMI is included as a feature in the model.",
"BMI contributes to predictions in this fitted model.",
"Higher BMI is associated with higher predicted disease progression in this model.",
"BMI causes disease progression.",
"Reducing BMI will reduce disease progression for a patient."
],
"claim_strength": [
"descriptive",
"model_based",
"model_based_association",
"causal",
"intervention_effect"
],
"supported_by_this_workflow": [
"yes",
"yes",
"yes_with_scope_limits",
"no",
"no"
],
"reason": [
"The feature table contains BMI.",
"The fitted model uses BMI when forming predictions.",
"The statement is limited to model behavior and association.",
"The workflow is predictive, not causal.",
"The workflow does not estimate intervention effects."
]
})
claim_review_dfThis table turns interpretation into a reviewable artifact.
Instead of relying on vague wording, it forces us to ask whether each claim is supported by the workflow.
Save the Claim Review Table
Path("reports").mkdir(exist_ok=True)
claim_review_df.to_csv(
"reports/diabetes-claim-review-table.csv",
index=False
)The saved table can be reused in reports, presentations, or decision memos.
Correct Interpretation
A defensible statement is:
Higher BMI is associated with higher predicted disease progression in this fitted model, using this dataset and this modeling setup.
This statement is:
- accurate
- grounded in the model
- limited to the analysis performed
- careful about causation
- clear about scope
The phrase “in this fitted model” matters.
It reminds the reader that we are describing model behavior, not proving a biological mechanism.
Why Scope Matters
Model outputs depend on:
- the dataset
- the features included
- preprocessing choices
- train/test split strategy
- model type
- evaluation method
- assumptions behind the analysis
Changing any of these can change the result.
This is why a claim should not sound broader than the workflow that produced it.
A narrow, accurate claim is stronger than a broad, unsupported one.
Levels of Claim Strength
We can think of claims at different levels.
Descriptive Claims
These describe what is present in the data or workflow.
Examples:
- BMI is included as a feature.
- The model was trained on the diabetes dataset.
- The test set predictions were compared with observed values.
These are usually easy to support.
Model-Based Claims
These describe how the fitted model behaves.
Examples:
- BMI contributes to the model predictions.
- The model assigns a positive coefficient to BMI.
- The model has moderate predictive performance under this evaluation setup.
These are valid when tied to the model and dataset.
Generalized Claims
These extend beyond the immediate dataset.
Examples:
- The model will perform similarly in another population.
- The same features will matter in another setting.
These require stronger external validation.
Causal Claims
These describe cause-and-effect relationships.
Examples:
- BMI causes disease progression.
- Changing BMI will change disease progression.
These are not supported by a standard predictive modeling workflow.
They require causal study design, assumptions, and evidence beyond what we have done here.
Connecting Back to Stability
In previous lessons, we used repeated evaluation and cross-validation.
That strengthens our confidence that the model behavior is not entirely driven by one lucky split.
However, stability does not establish causation.
A stable predictive pattern is still a predictive pattern.
It should be reported as such.
Cross-validation can support claims like:
- the model shows reasonably stable predictive performance
- performance varies moderately across folds
- the model is not being judged from a single split alone
It cannot support claims like:
- this feature causes the outcome
- changing this feature will change the outcome
Write a Short Model Claim Summary
We can write a short summary that respects the evidence.
summary_text = """
The fitted linear regression model shows moderate predictive ability for disease progression
using the diabetes dataset. Observed-versus-predicted values indicate that some predictions
are close to the observed outcome, while others show meaningful error. Feature-level patterns
should be interpreted as model-based associations, not causal mechanisms. Claims from this
workflow should therefore focus on prediction, model behavior, and association within the
current dataset and modeling setup.
""".strip()
Path("reports/diabetes-model-claim-summary.txt").write_text(summary_text)
print(summary_text)This creates a reusable plain-language interpretation.
What This Means in Practice
When reporting model results, focus on:
- what the model predicts
- how well it predicts
- how stable the performance is
- which features the model uses
- what the results do and do not support
Avoid:
- causal language without causal evidence
- clinical or policy recommendations unsupported by the workflow
- claims beyond the dataset
- claims that ignore uncertainty or error
- presenting model interpretation as objective truth
Better and Worse Example Statements
Appropriate
- The model uses BMI as an important predictor of disease progression.
- Higher BMI is associated with higher predicted progression in this fitted model.
- The model shows moderate predictive performance under the current evaluation setup.
- These findings describe model behavior and should not be interpreted as causal evidence.
Not Appropriate
- BMI is the main cause of disease progression.
- Reducing BMI will reduce disease progression.
- The model explains the biological mechanism of disease progression.
- The same findings will apply to all future populations.
The difference is not just wording.
The difference is scientific and analytical honesty.
CDI Insight
The strength of a claim must match the strength of the evidence.
Predictive models can support predictive claims.
Interpretable models can support model-behavior claims.
They do not automatically support causal claims.
A defensible data science system protects this boundary.
Summary
In this lesson, we moved from model outputs to real-world claims.
We:
- fitted a baseline model
- visualized observed versus predicted values
- distinguished predictions from explanations
- separated association from causation
- created a claim review table
- saved a plain-language claim summary
- practiced wording claims at the correct strength
The key idea is:
Good analysis is not only about producing outputs. It is about making claims that the outputs can actually support.
What Comes Next
Once claims are appropriately scoped, they still need to be communicated clearly.
In the next lesson, we will focus on:
- communicating model results clearly
- writing responsible summaries
- explaining uncertainty
- presenting findings to decision-makers without overstating the evidence
→ Communicating results clearly and responsibly