Package 'prettyglm' reference manual

Title:	Pretty Summaries of Generalized Linear Model Coefficients
Description:	One of the main advantages of using Generalised Linear Models is their interpretability. The goal of 'prettyglm' is to provide a set of functions which easily create beautiful coefficient summaries which can readily be shared and explained. 'prettyglm' helps users create coefficient summaries which include categorical base levels, variable importance and type III p.values. 'prettyglm' also creates beautiful relativity plots for categorical, continuous and splined coefficients.
Authors:	Jared Fowler [cre, aut]
Maintainer:	Jared Fowler <[email protected]>
License:	GPL-3
Version:	1.0.1
Built:	2025-03-31 05:57:16 UTC
Source:	https://github.com/jared-fowler/prettyglm

actual_expected_bucketed

Description

Provides a rank plot of the actual and predicted.

Usage

actual_expected_bucketed(
  target_variable,
  model_object,
  data_set = NULL,
  number_of_buckets = 25,
  ylab = "Target",
  width = 800,
  height = 500,
  first_colour = "black",
  second_colour = "#cc4678",
  facetby = NULL,
  prediction_type = "response",
  predict_function = NULL,
  return_data = F
)
actual_expected_bucketed(
  target_variable,
  model_object,
  data_set = NULL,
  number_of_buckets = 25,
  ylab = "Target",
  width = 800,
  height = 500,
  first_colour = "black",
  second_colour = "#cc4678",
  facetby = NULL,
  prediction_type = "response",
  predict_function = NULL,
  return_data = F
)

Arguments

`target_variable`	String of target variable name.
`model_object`	GLM model object.
`data_set`	Data to score the model on. This can be training or test data, as long as the data is in a form where the model object can make predictions. Currently developing ability to provide custom prediction functions, currently implementation defaults to 'stats::predict'
`number_of_buckets`	number of buckets for percentile
`ylab`	Y-axis label.
`width`	plotly plot width in pixels.
`height`	plotly plot height in pixels.
`first_colour`	First colour to plot, usually the colour of actual.
`second_colour`	Second colour to plot, usually the colour of predicted.
`facetby`	variable user wants to facet by.
`prediction_type`	Prediction type to be pasted to predict.glm if predict_function is NULL. Defaults to "response".
`predict_function`	prediction function to use. Still in development.
`return_data`	Logical to return cleaned data set instead of plot.

Value

plot Plotly plot by defualt. ggplot if plotlyplot = F. Tibble if return_data = T.

Examples


library(dplyr)
library(prettyglm)

data('titanic')

columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model <- stats::glm(Survived ~
                               Sex:Age +
                               Fare +
                               Embarked +
                               SibSp +
                               Parch +
                               Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))

prettyglm::actual_expected_bucketed(target_variable = 'Survived',
                                    model_object = survival_model,
                                    data_set = titanic)

library(dplyr)
library(prettyglm)

data('titanic')

columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model <- stats::glm(Survived ~
                               Sex:Age +
                               Fare +
                               Embarked +
                               SibSp +
                               Parch +
                               Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))

prettyglm::actual_expected_bucketed(target_variable = 'Survived',
                                    model_object = survival_model,
                                    data_set = titanic)

Bank marketing campaigns data set analysis

Description

It is a dataset that describing Portugal bank marketing campaigns results. Conducted campaigns were based mostly on direct phone calls, offering bank client to place a term deposit. If after all marking efforts client had agreed to place deposit - target variable marked 'yes', otherwise 'no'

Usage

data(bank)
data(bank)

Format

An object of class "data.frame"

job: Type of job
marital: marital status
education: education
default: has credit in default?
housing: has housing loan?
loan: has personal loan?
age: age
y: has the client subscribed a term deposit? (binary: "yes","no")

Details

Sourse of the data https://archive.ics.uci.edu/ml/datasets/bank+marketing

References

This dataset is public available for research. The details are described in S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

Examples


data(bank)
head(bank_data)
data(bank)
head(bank_data)

clean_coefficients

Description

Processing to split out base levels and add variable importance to each term. Inspired by 'tidycat::tidy_categorical()', modified for use in prettyglm..

Usage

clean_coefficients(
  d = NULL,
  m = NULL,
  vimethod = "model",
  spline_seperator = NULL,
  ...
)
clean_coefficients(
  d = NULL,
  m = NULL,
  vimethod = "model",
  spline_seperator = NULL,
  ...
)

Arguments

`d`	Data frame `tibble` output from `tidy.lm`; with one row for each term in the regression, including column 'term'
`m`	Model object `glm`
`vimethod`	Variable importance method. Still in development
`spline_seperator`	Sting of the spline separator. For example AGE_0_25 would be "_"
`...`	Any additional parameters to be past to `vi`

Value

Expanded tibble from the version passed to 'd' including additional columns:

`variable`	The name of the variable that the regression term belongs to.
`level`	The level of the categorical variable that the regression term belongs to. Will be an the term name for numeric variables.

Author(s)

Jared Fowler, Guy J. Abel

cut3

Description

Hmisc::cut2 bones repackaged to remove errors with importing Hmisc

Usage

cut3(
  x,
  cuts,
  m = 150,
  g,
  digits,
  minmax = TRUE,
  oneval = TRUE,
  onlycuts = FALSE,
  formatfun = format,
  ...
)
cut3(
  x,
  cuts,
  m = 150,
  g,
  digits,
  minmax = TRUE,
  oneval = TRUE,
  onlycuts = FALSE,
  formatfun = format,
  ...
)

Arguments

`x`	numeric vector to classify into intervals.
`cuts`	cut points.
`m`	desired minimum number of observations in a group. The algorithm does not guarantee that all groups will have at least m observations.
`g`	number of quantile groups
`digits`	number of significant digits to use in constructing levels.
`minmax`	if cuts is specified but min(x)<min(cuts) or max(x)>max(cuts), augments cuts to include min and max x
`oneval`	if an interval contains only one unique value, the interval will be labeled with the formatted version of that value instead of the interval endpoints, unless oneval=FALSE
`onlycuts`	set to TRUE to only return the vector of computed cuts. This consists of the interior values plus outer ranges.
`formatfun`	format function
`...`	additional arguments passed to formatfun

Value

vector of cut

one_way_ave

Description

Creates a pretty html plot of one way actual vs expected by specified predictor.

Usage

one_way_ave(
  feature_to_plot,
  model_object,
  target_variable,
  data_set,
  plot_type = "predictions",
  plot_factor_as_numeric = FALSE,
  ordering = NULL,
  width = 800,
  height = 500,
  number_of_buckets = 30,
  first_colour = "black",
  second_colour = "#cc4678",
  facetby = NULL,
  prediction_type = "response",
  predict_function = NULL,
  upper_percentile_to_cut = 0.01,
  lower_percentile_to_cut = 0
)
one_way_ave(
  feature_to_plot,
  model_object,
  target_variable,
  data_set,
  plot_type = "predictions",
  plot_factor_as_numeric = FALSE,
  ordering = NULL,
  width = 800,
  height = 500,
  number_of_buckets = 30,
  first_colour = "black",
  second_colour = "#cc4678",
  facetby = NULL,
  prediction_type = "response",
  predict_function = NULL,
  upper_percentile_to_cut = 0.01,
  lower_percentile_to_cut = 0
)

Arguments

`feature_to_plot`	A string of the variable to plot.
`model_object`	Model object to create coefficient table for. Must be of type: glm, lm
`target_variable`	String of target variable name in dataset.
`data_set`	Data set to calculate the actual vs expected for. If no input default is to try and extract training data from model object.
`plot_type`	one of "Residual", "predictions" or "actuals" defaults to "predictions"
`plot_factor_as_numeric`	Set to TRUE to return data.frame instead of creating kable.
`ordering`	Option to change the ordering of categories on the x axis, only for discrete categories. Default to the ordering of the factor. Other options are: 'alphabetical', 'Number of records', 'Average Value'
`width`	Width of plot
`height`	Height of plot
`number_of_buckets`	Number of buckets for continuous variable plots
`first_colour`	First colour to plot, usually the colour of actual.
`second_colour`	Second colour to plot, usually the colour of predicted.
`facetby`	Variable to facet the actual vs expect plots by.
`prediction_type`	Prediction type to be pasted to predict.glm if predict_function is NULL. Defaults to "response".
`predict_function`	A custom prediction function can be provided here.It must return a data.frame with an "Actual_Values" column, and a "Predicted_Values" column.
`upper_percentile_to_cut`	For continuous variables this is what percentile to exclude from the upper end of the distribution. Defaults to 0.01, so the maximum percentile of the variable in the plot will be 0.99. Cutting off some of the distribution can help the views if outlier's are present in the data.
`lower_percentile_to_cut`	For continuous variables this is what percentile to exclude from the lower end of the distribution. Defaults to 0.01, so the minimum percentile of the variable in the plot will be 0.01. Cutting off some of the distribution can help the views if outlier's are present in the data.

Value

plotly plot of one way actual vs expected.

Examples

library(dplyr)
library(prettyglm)
data('titanic')
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model <- stats::glm(Survived ~
                               Sex:Age +
                               Fare +
                               Embarked +
                               SibSp +
                               Parch +
                               Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))

# Continuous Variable Example
one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic,
            number_of_buckets = 20,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1)

# Discrete Variable Example
one_way_ave(feature_to_plot = 'Pclass',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic)

# Custom Predict Function and facet
a_custom_predict_function <- function(target, model_object, dataset){
  dataset <- base::as.data.frame(dataset)
  Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target))))
  if(class(Actual_Values) == 'factor'){
    Actual_Values <- base::as.numeric(as.character(Actual_Values))
  }
  Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response'))

  to_return <-  base::data.frame(Actual_Values = Actual_Values,
                                 Predicted_Values = Predicted_Values)

  to_return <- to_return %>%
    dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.3,0.3,Predicted_Values))
  return(to_return)
}

one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic,
            number_of_buckets = 20,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1,
            predict_function = a_custom_predict_function,
            facetby = 'Pclass')


library(dplyr)
library(prettyglm)
data('titanic')
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model <- stats::glm(Survived ~
                               Sex:Age +
                               Fare +
                               Embarked +
                               SibSp +
                               Parch +
                               Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))

# Continuous Variable Example
one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic,
            number_of_buckets = 20,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1)

# Discrete Variable Example
one_way_ave(feature_to_plot = 'Pclass',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic)

# Custom Predict Function and facet
a_custom_predict_function <- function(target, model_object, dataset){
  dataset <- base::as.data.frame(dataset)
  Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target))))
  if(class(Actual_Values) == 'factor'){
    Actual_Values <- base::as.numeric(as.character(Actual_Values))
  }
  Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response'))

  to_return <-  base::data.frame(Actual_Values = Actual_Values,
                                 Predicted_Values = Predicted_Values)

  to_return <- to_return %>%
    dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.3,0.3,Predicted_Values))
  return(to_return)
}

one_way_ave(feature_to_plot = 'Age',
            model_object = survival_model,
            target_variable = 'Survived',
            data_set = titanic,
            number_of_buckets = 20,
            upper_percentile_to_cut = 0.1,
            lower_percentile_to_cut = 0.1,
            predict_function = a_custom_predict_function,
            facetby = 'Pclass')

predict_outcome

Description

Processing to predict response for various actual vs expected plots

Usage

predict_outcome(
  target,
  model_object,
  dataset,
  prediction_type = NULL,
  weights = NULL
)
predict_outcome(
  target,
  model_object,
  dataset,
  prediction_type = NULL,
  weights = NULL
)

Arguments

`target`	String of target variable name.
`model_object`	Model object. prettyglm currently supports
`dataset`	This is used to plot the number in each class as a barchart if plotly is TRUE.
`prediction_type`	type of prediction to be passed to the model object. For ...GLM defaults to ....
`weights`	weightings to be provided to predictions if required.

Value

dataframe

Returns a dataframe of Actual and Predicted Values

Author(s)

Jared Fowler

pretty_coefficients

Description

Creates a pretty kable of model coefficients including coefficient base levels, type III P.values, and variable importance.

Usage

pretty_coefficients(
  model_object,
  relativity_transform = NULL,
  relativity_label = "relativity",
  type_iii = NULL,
  conf.int = FALSE,
  vimethod = "model",
  spline_seperator = NULL,
  significance_level = 0.05,
  return_data = FALSE,
  ...
)
pretty_coefficients(
  model_object,
  relativity_transform = NULL,
  relativity_label = "relativity",
  type_iii = NULL,
  conf.int = FALSE,
  vimethod = "model",
  spline_seperator = NULL,
  significance_level = 0.05,
  return_data = FALSE,
  ...
)

Arguments

`model_object`	Model object to create coefficient table for. Must be of type: `glm`, `lm`.
`relativity_transform`	String of the function to be applied to the model estimate to calculate the relativity, for example: 'exp(estimate)-1'. Default is for relativity to be excluded from output.
`relativity_label`	String of label to give to relativity column if you want to change the title to your use case.
`type_iii`	Type III statistical test to perform. Default is none. Options are 'Wald' or 'LR'. Warning 'LR' can be computationally expensive. Test performed via `Anova`
`conf.int`	Set to TRUE to include confidence intervals in summary table. Warning, can be computationally expensive.
`vimethod`	Variable importance method to pass to method of `vi`. Defaults to "model". Currently supports "permute" and "firm", pass any additional arguments to `vi` in ...
`spline_seperator`	Separator to look for to identity a spline. If this input is not null, it is assumed any features with this separator are spline columns. For example an age spline from 0 to 25 you could use: AGE_0_25 and "_".
`significance_level`	Significance level to P-values by in kable. Defaults to 0.05.
`return_data`	Set to TRUE to return `data.frame` instead of creating `kable`.
`...`	Any additional parameters to be past to `vi`

Value

kable if return_data = FALSE. data.frame if return_data = TRUE.

Examples


library(dplyr)
library(prettyglm)
data('titanic')
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
 dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
 dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
 dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
               Age_25_50 = prettyglm::splineit(Age,25,50),
               Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
 dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
               Fare_250_600 = prettyglm::splineit(Fare,250,600))

# A simple example
survival_model <- stats::glm(Survived ~
                              Pclass +
                              Sex +
                              Age +
                              Fare +
                              Embarked +
                              SibSp +
                              Parch +
                              Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))
pretty_coefficients(survival_model)

# A more complicated example with a spline and different importance method
survival_model3 <- stats::glm(Survived ~
                                        Pclass +
                                        Age_0_25 +
                                        Age_25_50 +
                                        Age_50_120 +
                                        Sex:Fare_0_250 +
                                        Sex:Fare_250_600 +
                                        Embarked +
                                        SibSp +
                                        Parch +
                                        Cabintype,
                              data = titanic,
                              family = binomial(link = 'logit'))
pretty_coefficients(survival_model3,
                    relativity_transform = 'exp(estimate)-1',
                    spline_seperator = '_',
                    vimethod = 'permute',
                    target = 'Survived',
                    metric = "roc_auc",
                    event_level = 'second',
                    pred_wrapper = predict.glm,
                    smaller_is_better = FALSE,
                    train = survival_model3$data, # need to supply training data for vip importance
                    reference_class = 0)


library(dplyr)
library(prettyglm)
data('titanic')
columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
 dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
 dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
 dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
               Age_25_50 = prettyglm::splineit(Age,25,50),
               Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
 dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
               Fare_250_600 = prettyglm::splineit(Fare,250,600))

# A simple example
survival_model <- stats::glm(Survived ~
                              Pclass +
                              Sex +
                              Age +
                              Fare +
                              Embarked +
                              SibSp +
                              Parch +
                              Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))
pretty_coefficients(survival_model)

# A more complicated example with a spline and different importance method
survival_model3 <- stats::glm(Survived ~
                                        Pclass +
                                        Age_0_25 +
                                        Age_25_50 +
                                        Age_50_120 +
                                        Sex:Fare_0_250 +
                                        Sex:Fare_250_600 +
                                        Embarked +
                                        SibSp +
                                        Parch +
                                        Cabintype,
                              data = titanic,
                              family = binomial(link = 'logit'))
pretty_coefficients(survival_model3,
                    relativity_transform = 'exp(estimate)-1',
                    spline_seperator = '_',
                    vimethod = 'permute',
                    target = 'Survived',
                    metric = "roc_auc",
                    event_level = 'second',
                    pred_wrapper = predict.glm,
                    smaller_is_better = FALSE,
                    train = survival_model3$data, # need to supply training data for vip importance
                    reference_class = 0)

pretty_relativities

Description

Creates a pretty html plot of model relativities including base Levels.

Usage

pretty_relativities(
  feature_to_plot,
  model_object,
  plot_approx_ci = TRUE,
  relativity_transform = "exp(estimate)-1",
  relativity_label = "Relativity",
  ordering = NULL,
  plot_factor_as_numeric = FALSE,
  width = 800,
  height = 500,
  iteractionplottype = NULL,
  facetorcolourby = NULL,
  upper_percentile_to_cut = 0.01,
  lower_percentile_to_cut = 0,
  spline_seperator = NULL
)
pretty_relativities(
  feature_to_plot,
  model_object,
  plot_approx_ci = TRUE,
  relativity_transform = "exp(estimate)-1",
  relativity_label = "Relativity",
  ordering = NULL,
  plot_factor_as_numeric = FALSE,
  width = 800,
  height = 500,
  iteractionplottype = NULL,
  facetorcolourby = NULL,
  upper_percentile_to_cut = 0.01,
  lower_percentile_to_cut = 0,
  spline_seperator = NULL
)

Arguments

`feature_to_plot`	A string of the variable to plot.
`model_object`	Model object to create coefficient table for. Must be of type: glm, lm
`plot_approx_ci`	Set to TRUE to include confidence intervals in summary table. Warning, can be computationally expensive.
`relativity_transform`	String of the function to be applied to the model estimate to calculate the relativity, for example: 'exp(estimate)'. Default is for relativity to be 'exp(estimate)-1'.
`relativity_label`	String of label to give to relativity column if you want to change the title to your use case, some users may prefer to refer to this as odds ratio.
`ordering`	Option to change the ordering of categories on the x axis, only for discrete categories. Default to the ordering of the fitted factor. Other options are: 'alphabetical', 'Number of records', 'Average Value'
`plot_factor_as_numeric`	Set to TRUE to return data.frame instead of creating kable.
`width`	Width of plot
`height`	Height of plot
`iteractionplottype`	If plotting the relativity for an interaction variable you can "facet" or "colour" by one of the interaction variables. Defaults to null.
`facetorcolourby`	If iteractionplottype is not Null, then this is the variable in the interaction you want to colour or facet by.
`upper_percentile_to_cut`	For continuous variables this is what percentile to exclude from the upper end of the distribution. Defaults to 0.01, so the maximum percentile of the variable in the plot will be 0.99. Cutting off some of the distribution can help the views if outlier's are present in the data.
`lower_percentile_to_cut`	For continuous variables this is what percentile to exclude from the lower end of the distribution. Defaults to 0.01, so the mimimum percentile of the variable in the plot will be 0.01. Cutting off some of the distribution can help the views if outlier's are present in the data.
`spline_seperator`	string of the spline separator. For example AGE_0_25 would be "_".

Value

plotly plot of fitted relativities.

Examples

library(dplyr)
library(prettyglm)
data('titanic')

columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model3 <- stats::glm(Survived ~
                                Pclass:Embarked +
                                Age_0_25  +
                                Age_25_50 +
                                Age_50_120  +
                                Sex:Fare_0_250 +
                                Sex:Fare_250_600 +
                                SibSp +
                                Parch +
                                Cabintype,
                              data = titanic,
                              family = binomial(link = 'logit'))

# categorical factor
pretty_relativities(feature_to_plot = 'Cabintype',
                    model_object = survival_model3)

# continuous factor
pretty_relativities(feature_to_plot = 'Parch',
                    model_object = survival_model3)

# splined continuous factor
pretty_relativities(feature_to_plot = 'Age',
                    model_object = survival_model3,
                    spline_seperator = '_',
                    upper_percentile_to_cut = 0.01,
                    lower_percentile_to_cut = 0.01)

# factor factor interaction
pretty_relativities(feature_to_plot = 'Pclass:Embarked',
                    model_object = survival_model3,
                    iteractionplottype = 'colour',
                    facetorcolourby = 'Pclass')

# Continuous spline and categorical by colour
pretty_relativities(feature_to_plot = 'Sex:Fare',
                    model_object = survival_model3,
                    spline_seperator = '_')

# Continuous spline and categorical by facet
pretty_relativities(feature_to_plot = 'Sex:Fare',
                    model_object = survival_model3,
                    spline_seperator = '_',
                    iteractionplottype = 'facet')
library(dplyr)
library(prettyglm)
data('titanic')

columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model3 <- stats::glm(Survived ~
                                Pclass:Embarked +
                                Age_0_25  +
                                Age_25_50 +
                                Age_50_120  +
                                Sex:Fare_0_250 +
                                Sex:Fare_250_600 +
                                SibSp +
                                Parch +
                                Cabintype,
                              data = titanic,
                              family = binomial(link = 'logit'))

# categorical factor
pretty_relativities(feature_to_plot = 'Cabintype',
                    model_object = survival_model3)

# continuous factor
pretty_relativities(feature_to_plot = 'Parch',
                    model_object = survival_model3)

# splined continuous factor
pretty_relativities(feature_to_plot = 'Age',
                    model_object = survival_model3,
                    spline_seperator = '_',
                    upper_percentile_to_cut = 0.01,
                    lower_percentile_to_cut = 0.01)

# factor factor interaction
pretty_relativities(feature_to_plot = 'Pclass:Embarked',
                    model_object = survival_model3,
                    iteractionplottype = 'colour',
                    facetorcolourby = 'Pclass')

# Continuous spline and categorical by colour
pretty_relativities(feature_to_plot = 'Sex:Fare',
                    model_object = survival_model3,
                    spline_seperator = '_')

# Continuous spline and categorical by facet
pretty_relativities(feature_to_plot = 'Sex:Fare',
                    model_object = survival_model3,
                    spline_seperator = '_',
                    iteractionplottype = 'facet')

splineit

Description

Splines a continuous variable

Usage

splineit(var, min, max)
splineit(var, min, max)

Arguments

`var`	Continuous vector to spline.
`min`	Min of spline.
`max`	Max of spline.

Value

Splined Column

Examples

library(dplyr)
library(prettyglm)
data('titanic')

columns_to_factor <- c('Pclass',
                      'Sex',
                      'Cabin',
                      'Embarked',
                      'Cabintype',
                      'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
 dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
 dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
 dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
               Age_25_50 = prettyglm::splineit(Age,25,50),
               Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
 dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
               Fare_250_600 = prettyglm::splineit(Fare,250,600))

library(dplyr)
library(prettyglm)
data('titanic')

columns_to_factor <- c('Pclass',
                      'Sex',
                      'Cabin',
                      'Embarked',
                      'Cabintype',
                      'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
 dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
 dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
 dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
               Age_25_50 = prettyglm::splineit(Age,25,50),
               Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
 dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
               Fare_250_600 = prettyglm::splineit(Fare,250,600))

Titanic Data

Description

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Usage

data(titanic)
data(titanic)

Format

An object of class "data.frame"

survival: Survival
pclass: Ticket class
sex: Sex
Age: Age in years
sibsp: number of siblings / spouses
parch: number of parents / children
ticket: Ticket number
fare: Passenger fare
cabin: Cabin Number
cabintype: Type of cabin
embarked: Port of Embarkation

References

This data set sourced from https://www.kaggle.com/c/titanic/data?select=train.csv

Examples


data(titanic)
head(titanic)
data(titanic)
head(titanic)

Package 'prettyglm'

Help Index

actual_expected_bucketed

Description

Usage

Arguments

Value

Examples

Bank marketing campaigns data set analysis

Description

Usage

Format

Details

References

Examples

clean_coefficients

Description

Usage

Arguments

Value

Author(s)

See Also

cut3

Description

Usage

Arguments

Value

one_way_ave

Description

Usage

Arguments

Value

Examples

predict_outcome

Description

Usage

Arguments

Value

Author(s)

See Also

pretty_coefficients

Description

Usage

Arguments

Value

Examples

pretty_relativities

Description

Usage

Arguments

Value

Examples

splineit

Description

Usage

Arguments

Value

Examples

Titanic Data

Description

Usage

Format

References

Examples