Title: | Tools for Binning Data |
---|---|
Description: | Manually bin data using weight of evidence and information value. Includes other binning methods such as equal length, quantile and winsorized. Options for combining levels of categorical data are also available. Dummy variables can be generated based on the bins created using any of the available binning methods. References: Siddiqi, N. (2006) <doi:10.1002/9781119201731.biblio>. |
Authors: | Aravind Hebbali [aut, cre] |
Maintainer: | Aravind Hebbali <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.1 |
Built: | 2024-11-05 12:24:14 UTC |
Source: | https://github.com/rsquaredacademy/rbin |
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.
mbank
mbank
A tibble with 4521 rows and 17 variables:
age of the client
type of job
marital status
education level of the client
has credit in default?
has housing loan?
has personal loan?
contact communication type
last contact month of year
last contact day of the week
last contact duration, in seconds
number of contacts performed during this campaign and for this client
number of days that passed by after the client was last contacted from a previous campaign
number of contacts performed before this campaign and for this clien
outcome of the previous marketing campaign
has the client subscribed a term deposit?
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
Create dummy variables from bins.
rbin_create(data, predictor, bins)
rbin_create(data, predictor, bins)
data |
A |
predictor |
Variable for which dummy variables must be created. |
bins |
An object of class |
data
with dummy variables.
k <- rbin_manual(mbank, y, age, c(29, 39, 56)) rbin_create(mbank, age, k)
k <- rbin_manual(mbank, y, age, c(29, 39, 56)) rbin_create(mbank, age, k)
Bin continuous data using the equal frequency binning method.
rbin_equal_freq(data = NULL, response = NULL, predictor = NULL, bins = 10) ## S3 method for class 'rbin_equal_freq' plot(x, print_plot = TRUE, ...)
rbin_equal_freq(data = NULL, response = NULL, predictor = NULL, bins = 10) ## S3 method for class 'rbin_equal_freq' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
bins |
Number of bins. |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
A tibble
.
bins <- rbin_equal_freq(mbank, y, age, 10) bins # plot plot(bins)
bins <- rbin_equal_freq(mbank, y, age, 10) bins # plot plot(bins)
Bin continuous data using the equal length binning method.
rbin_equal_length( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE ) ## S3 method for class 'rbin_equal_length' plot(x, print_plot = TRUE, ...)
rbin_equal_length( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE ) ## S3 method for class 'rbin_equal_length' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
bins |
Number of bins. |
include_na |
logical; if |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
A tibble
.
bins <- rbin_equal_length(mbank, y, age, 10) bins # plot plot(bins)
bins <- rbin_equal_length(mbank, y, age, 10) bins # plot plot(bins)
Weight of evidence and information value for categorical data.
rbin_factor(data = NULL, response = NULL, predictor = NULL, include_na = TRUE) ## S3 method for class 'rbin_factor' plot(x, print_plot = TRUE, ...)
rbin_factor(data = NULL, response = NULL, predictor = NULL, include_na = TRUE) ## S3 method for class 'rbin_factor' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
include_na |
logical; if |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
bins <- rbin_factor(mbank, y, education) bins # plot plot(bins)
bins <- rbin_factor(mbank, y, education) bins # plot plot(bins)
Manually combine levels of categorical data.
rbin_factor_combine(data, var, new_var, new_name)
rbin_factor_combine(data, var, new_var, new_name)
data |
A |
var |
An object of class |
new_var |
A character vector; it should include the names of the levels to be combined. |
new_name |
Name of the combined level. |
A tibble
.
upper <- c("secondary", "tertiary") out <- rbin_factor_combine(mbank, education, upper, "upper") table(out$education) out <- rbin_factor_combine(mbank, education, c("secondary", "tertiary"), "upper") table(out$education)
upper <- c("secondary", "tertiary") out <- rbin_factor_combine(mbank, education, upper, "upper") table(out$education) out <- rbin_factor_combine(mbank, education, c("secondary", "tertiary"), "upper") table(out$education)
Create dummy variables for categorical data.
rbin_factor_create(data, predictor)
rbin_factor_create(data, predictor)
data |
A |
predictor |
Variable for which dummy variables must be created. |
A tibble
with dummy variables.
upper <- c("secondary", "tertiary") out <- rbin_factor_combine(mbank, education, upper, "upper") rbin_factor_create(out, education)
upper <- c("secondary", "tertiary") out <- rbin_factor_combine(mbank, education, upper, "upper") rbin_factor_create(out, education)
Bin continuous data manually.
rbin_manual( data = NULL, response = NULL, predictor = NULL, cut_points = NULL, include_na = TRUE ) ## S3 method for class 'rbin_manual' plot(x, print_plot = TRUE, ...)
rbin_manual( data = NULL, response = NULL, predictor = NULL, cut_points = NULL, include_na = TRUE ) ## S3 method for class 'rbin_manual' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
cut_points |
Cut points for binning. |
include_na |
logical; if |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
Specify the upper open interval for each bin. 'rbin' follows the left closed and right open interval. If you want to create_bins 10 bins, the app will show you only 9 input boxes. The interval for the 10th bin is automatically computed. For example, if you want the first bin to have all the values between the minimum and including 36, then you will enter the value 37.
A tibble
.
bins <- rbin_manual(mbank, y, age, c(29, 31, 34, 36, 39, 42, 46, 51, 56)) bins # plot plot(bins)
bins <- rbin_manual(mbank, y, age, c(29, 31, 34, 36, 39, 42, 46, 51, 56)) bins # plot plot(bins)
Bin continuous data using quantiles.
rbin_quantiles( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE ) ## S3 method for class 'rbin_quantiles' plot(x, print_plot = TRUE, ...)
rbin_quantiles( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE ) ## S3 method for class 'rbin_quantiles' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
bins |
Number of bins. |
include_na |
logical; if |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
A tibble
.
bins <- rbin_quantiles(mbank, y, age, 10) bins # plot plot(bins)
bins <- rbin_quantiles(mbank, y, age, 10) bins # plot plot(bins)
Bin continuous data using winsorized method.
rbin_winsorize( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE, winsor_rate = 0.05, min_val = NULL, max_val = NULL, type = 7, remove_na = TRUE ) ## S3 method for class 'rbin_winsorize' plot(x, print_plot = TRUE, ...)
rbin_winsorize( data = NULL, response = NULL, predictor = NULL, bins = 10, include_na = TRUE, winsor_rate = 0.05, min_val = NULL, max_val = NULL, type = 7, remove_na = TRUE ) ## S3 method for class 'rbin_winsorize' plot(x, print_plot = TRUE, ...)
data |
A |
response |
Response variable. |
predictor |
Predictor variable. |
bins |
Number of bins. |
include_na |
logical; if |
winsor_rate |
A value from 0.0 to 0.5. |
min_val |
the low border, all values being lower than this will be replaced by this value. The default is set to the 5 percent quantile of predictor. |
max_val |
the high border, all values being larger than this will be replaced by this value. The default is set to the 95 percent quantile of predictor. |
type |
an integer between 1 and 9 selecting one of the nine quantile algorithms detailed in |
remove_na |
logical; if |
x |
An object of class |
print_plot |
logical; if |
... |
further arguments passed to or from other methods. |
A tibble
.
bins <- rbin_winsorize(mbank, y, age, 10, winsor_rate = 0.05) bins # plot plot(bins)
bins <- rbin_winsorize(mbank, y, age, 10, winsor_rate = 0.05) bins # plot plot(bins)
Manually bin continuous data using weight of evidence.
rbinAddin(data = NULL)
rbinAddin(data = NULL)
data |
A |
## Not run: rbinAddin(data = mbank) ## End(Not run)
## Not run: rbinAddin(data = mbank) ## End(Not run)
Manually combine categorical variables using weight of evidence.
rbinFactorAddin(data = NULL)
rbinFactorAddin(data = NULL)
data |
A |
## Not run: rbinFactorAddin(data = mbank) ## End(Not run)
## Not run: rbinFactorAddin(data = mbank) ## End(Not run)