The hubVis
package contains a function called
plot_step_ahead_model_output()
that can be used to plot
model output that is in the format of forecasts or projects that look
multiple horizons into the future.
This function plots forecasts/scenario projections and optional target data. Faceted plots can be created for multiple scenarios, locations, forecast dates, models, etc. Currently, the function can plot only quantile data, with the possibility to add “median” information from the model projections.
For more information about the Hubverse standard format, please refer to the HubDocs website.
The following vignette describes the principal usage of the
plot_step_ahead_model_output()
function.
Plots are available in two output formats:
interactive
parameter to FALSE. See
end of the document for examples.To demonstrate the functionality of the
plot_step_ahead_model_output()
function, we will use the
examples data from the hubExamples
package.
scenario_outputs
: example scenario projection data
that represents model outputs and an ensemble (generated with
hubEnsemble
) from a scenario hub with predictions for one
target (inc hosp
) in one location ("US"
), one
round (“2021-03-07”) and four scenarios.
scenario_target_ts
: contains time series target data
associated with the scenario projection data.
forecast_outputs
: example forecast data that
represents model outputs from a forecast hub with predictions for three
influenza-related targets (wk inc flu hosp, wk flu hops rate category,
and wk flu hosp rate) for two reference dates in 2022.
forecast_target_ts
: contains time series target data
associated with the forecast projection data.
library(hubExamples)
# Scenario examples
head(scenario_outputs)
#> # A tibble: 6 × 9
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <date> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>
head(scenario_target_ts)
#> # A tibble: 6 × 4
#> location date observation target
#> <chr> <chr> <int> <chr>
#> 1 US 2020-10-03 300678 inc case
#> 2 US 2020-10-10 334493 inc case
#> 3 US 2020-10-17 388282 inc case
#> 4 US 2020-10-24 484422 inc case
#> 5 US 2020-10-31 571389 inc case
#> 6 US 2020-11-07 776479 inc case
# Forecast examples
head(forecast_outputs)
#> # A tibble: 6 × 9
#> model_id location reference_date horizon target_end_date target output_type
#> <chr> <chr> <date> <int> <date> <chr> <chr>
#> 1 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> 2 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> 3 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> 4 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> 5 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> 6 Flusight-b… 48 2022-12-17 0 2022-12-17 wk fl… cdf
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
head(forecast_target_ts)
#> date location observation
#> 1 2020-01-11 01 0
#> 2 2020-01-11 15 0
#> 3 2020-01-11 18 0
#> 4 2020-01-11 27 0
#> 5 2020-01-11 30 0
#> 6 2020-01-11 37 0
The forecast and scenario output should be a
model_out_tbl
. In addition to the standard requirements for
this class, the plot_step_ahead_model_output()
function in
hubVis
has other requirement.
"target_date"
column,
although this could be over-ridden by specifying a different column
using the x_col_name
argument.quantile
and median
are the only accepted
output type
# Add a `target_date` column in the scenario example
projection_data <- dplyr::mutate(scenario_outputs,
target_date = as.Date(origin_date) +
(horizon * 7) - 1)
head(projection_data)
#> # A tibble: 6 × 10
#> model_id origin_date scenario_id location target horizon output_type
#> <chr> <date> <chr> <chr> <chr> <int> <chr>
#> 1 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 2 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 3 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 4 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 5 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> 6 HUBuni-simexamp 2021-03-07 A-2021-03-05 US inc case 1 quantile
#> # ℹ 3 more variables: output_type_id <dbl>, value <dbl>, target_date <date>
# Filter only `quantile` output type in the forecast example
forecast_quantile <- dplyr::filter(forecast_outputs, output_type == "quantile")
head(forecast_quantile)
#> # A tibble: 6 × 9
#> model_id location reference_date horizon target_end_date target output_type
#> <chr> <chr> <date> <int> <date> <chr> <chr>
#> 1 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> 2 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> 3 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> 4 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> 5 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> 6 Flusight-b… 25 2022-12-17 0 2022-12-17 wk in… quantile
#> # ℹ 2 more variables: output_type_id <chr>, value <dbl>
The plotting function requires only 2 parameters:
model_output_data
: a
model_out_tbl
object containing all the Hubverse
standard columns, including "target_date"
and
"model_id"
columns. As all model_output in
model_output_data will be plotted, any filtering needs to happen outside
this function.
target_data
: a data.frame
object
containing the target data, including the columns: "date"
and "observation"
.
The projection_data
and target_data
contain
information for multiple locations, and scenarios.
To plot the model projections for the US, Scenario A:
# Pre-filtering
projection_data_a_us <- dplyr::filter(projection_data,
scenario_id == "A-2021-03-05",
location == "US")
# Limit date for layout reason
target_data_us <-
dplyr::filter(scenario_target_ts, location == "US",
date < min(projection_data$target_date) + 21,
date > "2020-10-01")
plot_step_ahead_model_output(projection_data_a_us, target_data_us)
By default, the 50%, 80% and 95% intervals are plotted, with a
specific color palette per model_id
.
In general, it is hard to see multiple intervals when multiple models are plotted, so specifying only one interval can be useful:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8)
It is also possible to add a median line on the plot with the
use_median_as_point
parameter:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8,
use_median_as_point = TRUE)
By default plots are interactive, but that can be easily switched to static:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8,
use_median_as_point = TRUE,
interactive = FALSE)
To plot the forecast projections for one reference dates (2022-11-19) for Massachusetts (25).
# Pre-filtering
forecast_quantile <- dplyr::mutate(forecast_quantile,
output_type_id = as.numeric(output_type_id))
forecast_quantile_1 <- dplyr::filter(forecast_quantile,
reference_date == "2022-11-19",
location == 25)
# Limit date for layout reason
forecast_target_ma <- dplyr::filter(forecast_target_ts, location == 25,
date >= "2022-11-01",
date <= "2023-01-01")
As the forecast projections used the column
target_end_date
and contains the quantiles:
"0.05"
, "0.1"
, "0.25"
,
"0.5"
, "0.75"
, "0.9"
,
"0.95"
, the parameters in the
plot_step_ahead_model_output()
need to be ajusted:
plot_step_ahead_model_output(forecast_quantile_1, forecast_target_ma,
intervals = c(0.9, 0.5),
use_median_as_point = TRUE,
x_col_name = "target_end_date")
A “facet” (or subplot) plot can also be created for each scenario
# Pre-filtering
projection_data_us <- dplyr::filter(projection_data,
location == "US")
plot_step_ahead_model_output(projection_data_us, target_data_us,
facet = "scenario_id")
The layout of the “facets” can be adjusted, with the different
facet_
parameters.
plot_step_ahead_model_output(projection_data_us, target_data_us,
use_median_as_point = TRUE,
facet = "scenario_id", facet_scales = "free_x",
facet_nrow = 2, facet_title = "bottom left")
Or with the additional facet_ncol
parameter for the
statics plot
plot_step_ahead_model_output(projection_data_us, target_data_us,
use_median_as_point = TRUE, interactive = FALSE,
facet = "scenario_id", facet_scales = "free_x",
facet_ncol = 4, facet_title = "bottom left")
A “facet” (or subplot) plot can also be created for each model. In
this case, the legend will be adapted to return the
model_id
value.
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
facet = "model_id")
The legend can be removed with the parameter
show_legend = FALSE
.
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
facet = "model_id", show_legend = FALSE)
A “facet” (or subplot) plot can also be created for each location
forecast_quantile_1 <- dplyr::filter(forecast_quantile,
reference_date == "2022-11-19")
forecast_target <- dplyr::filter(forecast_target_ts, date >= "2022-11-01",
date <= "2023-01-01",
location %in% forecast_quantile_1$location)
plot_step_ahead_model_output(forecast_quantile_1, forecast_target,
intervals = c(0.9, 0.5),
use_median_as_point = TRUE,
x_col_name = "target_end_date",
facet = "location")
By default, the 50%, 80% and 95% intervals are plotted. However, it is possible to also plot the 90% intervals or a subset of these intervals. When plotting 6 or more models, the plot will be reduced to show the widest intervals provided (95% by default).
To illustrate this we will use the projections for only one model in the scenario example
# Pre-filtering
projection_data_mod <- dplyr::filter(projection_data,
location == "US",
model_id == "hub-ensemble")
plot_step_ahead_model_output(projection_data_mod, target_data_us,
use_median_as_point = TRUE, facet = "scenario_id",
facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95))
The opacity of the intervals can be adjusted:
plot_step_ahead_model_output(projection_data_mod, target_data_us,
use_median_as_point = TRUE, facet = "scenario_id",
facet_nrow = 2, intervals = c(0.5, 0.8, 0.9, 0.95),
fill_transparency = 0.15)
Plots without intervals are also possible:
plot_step_ahead_model_output(projection_data_mod, target_data_us,
use_median_as_point = TRUE, facet = "scenario_id",
facet_nrow = 2, intervals = NULL)
Several other parameters are available to update the plot output. Here is some examples of some parameters.
It is possible to assign a specific color and behavior to a specific
model_id
. Typically, this is done to highlight an ensemble,
so the name for these arguments are ens_name
and
end_color
. The model specified by ens_name
will be the top layer of the resulting plot.
plot_step_ahead_model_output(projection_data_us, target_data_us,
use_median_as_point = TRUE,
facet = "scenario_id", facet_nrow = 2,
ens_name = "hub-ensemble", ens_color = "black",
intervals = 0.8)
An optional parameter group
in the
plot_step_ahead_model_output()
function to allow to group
or partition the input data in the plot according to a specific column.
This feature is currently only available for “static” plot. Please refer
to ggplot2::aes_group_order
for more information.
To illustrate this we will use the forecast example for both reference date:
plot_step_ahead_model_output(forecast_quantile, forecast_target,
intervals = c(0.9, 0.5),
use_median_as_point = TRUE,
x_col_name = "target_end_date",
facet = "location", interactive = FALSE,
group = "reference_date")
Multiple layout update are possible:
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
plot_target = FALSE)
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
top_layer = "target")
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
title = "Incident Cases in the US")
RColorBrewer::display.brewer.all()
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
pal_color = "Dark2")
The fill_by
parameter can be change to another valid
column names to change the legend and colors attributes to this new
column.
plot_step_ahead_model_output(projection_data_us, target_data_us,
facet = "model_id", fill_by = "scenario_id")
It is possible to use only blues for all models, by setting the
pal_color
parameter to NULL
. This might be
especially useful when used for many models in conjunction with
highlighting the ensemble forecast using the ens_name
and
ens_color
argument.
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8,
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)
The default blue color can be changed with the one_color
parameter
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
intervals = 0.8, one_color = "orange",
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
interactive = FALSE)
The input data frames can have different column names for the date
information. In this case, the two x_col_name
and
x_target_col_name
parameters can be used to indicate the
variables that should be mapped to the x-axis.
names(target_data_us)[names(target_data_us) == "date"] <- "time"
names(projection_data_a_us)[names(projection_data_a_us) == "target_date"] <-
"date"
plot_step_ahead_model_output(projection_data_a_us, target_data_us,
x_col_name = "date", x_target_col_name = "time")
For other parameters, please consult the documentation associated
with the function: ?plot_step_ahead_model_output