Title: | Estimate the Shattering Coefficient for a Particular Dataset |
---|---|
Description: | The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping f:X -> Y given f is selected from its search space bias F. This formal result depends on the Shattering coefficient function N(F,2n) to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of F, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias F, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Reference: de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" <arXiv:1911.05461>; de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory". |
Authors: | Rodrigo F. de Mello [aut, cre] |
Maintainer: | Rodrigo F. de Mello <[email protected]> |
License: | GPL-3 |
Version: | 1.0.7 |
Built: | 2024-11-13 04:29:01 UTC |
Source: | https://github.com/cran/shattering |
This function applies the set of SVM classifiers to perform the supervised learning task based on the topological data analysis
apply_classifier(model, X, only.best.classifiers = FALSE)
apply_classifier(model, X, only.best.classifiers = FALSE)
model |
model built using function build_classifier |
X |
matrix defining the input space of your test set |
only.best.classifiers |
if TRUE, only the most performing classification functions will be considered |
prediction results
de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" arXiv:https://arxiv.org/abs/1911.05461
de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory"
This function outputs a set of SVM classifiers to perform the supervised learning task based on the topological data analysis
build_classifier( X, Y, train.size = 0.7, quantile.percentage = 1, min.points = 3, gamma.length = 50, cost = 10000, weights = c(0.25, 0.75), best.stdev.purity = 0 )
build_classifier( X, Y, train.size = 0.7, quantile.percentage = 1, min.points = 3, gamma.length = 50, cost = 10000, weights = c(0.25, 0.75), best.stdev.purity = 0 )
X |
matrix defining the input space of your dataset |
Y |
numerical vector defining the output space (labels/classes) of your dataset |
train.size |
fraction of examples used for training |
quantile.percentage |
real number to define the quantile of distances to be considered (e.g. 0.1 means 10%) |
min.points |
minimal number of examples per classification region of the input space |
gamma.length |
number of possible gamma parameters to test the radial kernel for SVM |
cost |
the cost for the SVM optimization |
weights |
weights to be used in our SVM optimization |
best.stdev.purity |
the stdev to compute data purity |
A list of classifiers composing the final classification model
de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" arXiv:https://arxiv.org/abs/1911.05461
de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory"
# require(NMF) # # X = cbind(rnorm(mean=-1, sd=1, n=200), rnorm(mean=-1, sd=1, n=200)) # X = rbind(X, cbind(rnorm(mean=1, sd=1, n=200), rnorm(mean=1, sd=1, n=200))) # Y = c(rep(-1,200), rep(+1,200)) # plot(X, col=Y+2, pch=20, cex=3, cex.axis=2) # # model = build_classifier(X, Y, train.size=0.5, quantile.percentage=1, # min.points=10, gamma.length=15, cost=10000) # result = apply_classifier(model, X) # points(X, col=as.numeric(result$classification.ensembled)+2, pch=20, cex=1.5) # # x = seq(min(X), max(X), length=100) # z = outer(x, x, function(x,y) { # apply_classifier(model, as.matrix(cbind(x,y)))$classification.ensembled } ) # filled.contour(x,x,z) # # x = seq(min(X), max(X), length=100) # z = outer(x, x, function(x,y) { # apply_classifier(model, as.matrix(cbind(x,y)), # only.best.classifiers=TRUE)$classification.ensembled } ) # locator(1) # filled.contour(x,x,z)
# require(NMF) # # X = cbind(rnorm(mean=-1, sd=1, n=200), rnorm(mean=-1, sd=1, n=200)) # X = rbind(X, cbind(rnorm(mean=1, sd=1, n=200), rnorm(mean=1, sd=1, n=200))) # Y = c(rep(-1,200), rep(+1,200)) # plot(X, col=Y+2, pch=20, cex=3, cex.axis=2) # # model = build_classifier(X, Y, train.size=0.5, quantile.percentage=1, # min.points=10, gamma.length=15, cost=10000) # result = apply_classifier(model, X) # points(X, col=as.numeric(result$classification.ensembled)+2, pch=20, cex=1.5) # # x = seq(min(X), max(X), length=100) # z = outer(x, x, function(x,y) { # apply_classifier(model, as.matrix(cbind(x,y)))$classification.ensembled } ) # filled.contour(x,x,z) # # x = seq(min(X), max(X), length=100) # z = outer(x, x, function(x,y) { # apply_classifier(model, as.matrix(cbind(x,y)), # only.best.classifiers=TRUE)$classification.ensembled } ) # locator(1) # filled.contour(x,x,z)
Full analysis on the lower and upper shattering coefficient functions for a given supervised dataset
complexity_analysis( X = NULL, Y = NULL, my.delta = 0.05, my.epsilon = 0.05, directory = tempdir(), file = "myreport", length = 10, quantile.percentage = 0.5, epsilon = 1e-07 )
complexity_analysis( X = NULL, Y = NULL, my.delta = 0.05, my.epsilon = 0.05, directory = tempdir(), file = "myreport", length = 10, quantile.percentage = 0.5, epsilon = 1e-07 )
X |
matrix defining the input space of your dataset |
Y |
numerical vector defining the output space (labels/classes) of your dataset |
my.delta |
upper bound for the probability of the empirical risk minimization principle (in range (0,1)) |
my.epsilon |
acceptable divergence between the empirical and (expected) risks (in range (0,1)) |
directory |
directory used to generate the report for your dataset |
file |
name of the PDF file to be generated (without extension) |
length |
number of points to divide the sample while computing the shattering coefficient |
quantile.percentage |
real number to define the quantile of distances to be considered (e.g. 0.1 means 10%) |
epsilon |
a real threshold to be removed from distances in order to measure the open balls in the underlying topology |
A list including the number of hyperplanes and the shattering coefficient function. A report is generated in the user-defined directory.
de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" arXiv:https://arxiv.org/abs/1911.05461
de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory"
# Analyzing the complexity of the shattering coefficients functions # (lower and upper bounds) for the Iris dataset # require(datasets) # complexity_analysis(X=as.matrix(iris[,1:4]), Y=as.numeric(iris[,5]))
# Analyzing the complexity of the shattering coefficients functions # (lower and upper bounds) for the Iris dataset # require(datasets) # complexity_analysis(X=as.matrix(iris[,1:4]), Y=as.numeric(iris[,5]))
This function compresses the input space according to the equivalence relations, i.e., it compresses whenever an example has other elements inside its open ball but having the same class label as the ball-centered instance.
compress_space(M, Y)
compress_space(M, Y)
M |
sparse matrix representing all equivalence relations |
Y |
numerical vector indentifying the output space of variables |
A list containing sparse vectors (from package slam) identifying the equivalence relations
This function computes the greatest as possible open ball connecting a given input example to every other under the same class label, thus homogeneizing space regions.
equivalence_relation( X, Y, quantile.percentage = 1, epsilon = 0.001, chunk = 250 )
equivalence_relation( X, Y, quantile.percentage = 1, epsilon = 0.001, chunk = 250 )
X |
matrix indentifying the input space of variables |
Y |
numerical vector indentifying the output space of variables |
quantile.percentage |
real number to define the quantile of distances to be considered (e.g. 0.1 means 10%) |
epsilon |
a real threshold to be removed from distances in order to measure the open balls in the underlying topology |
chunk |
number of elements to compute the Euclidean distances at once (if you set a large number, you might have memory limitations to perform the operations) |
A list with the equivalence relations in form of a list
This function estimates the number of hyperplanes
estimate_number_hyperplanes( X, Y, length = 20, quantile.percentage = 0.05, epsilon = 1e-07 )
estimate_number_hyperplanes( X, Y, length = 20, quantile.percentage = 0.05, epsilon = 1e-07 )
X |
matrix indentifying the input space of variables |
Y |
numerical vector indentifying the output space of variables |
length |
number of data points used to estimate the shattering coefficient |
quantile.percentage |
real number to define the quantile of distances to be considered (e.g. 0.1 means 10%) |
epsilon |
a real threshold to be removed from distances in order to measure the open balls in the underlying topology |
A data frame whose columns are: (1) the original sample size; (2) the reduced sample size after connecting homogeneous space regions; (3) the lower bound for the number of hyperplanes required to shatter the input space; and (4) the upper bound for the number of hyperplanes required to shatter the input space
# Generating some random dataset with 2 classes: # 50 examples in class 1 and 50 in class 2 (last column) data = cbind(rnorm(mean=1, sd=1, n=50), rnorm(mean=1, sd=1, n=50), rep(1, 50)) data = rbind(data, cbind(rnorm(mean=-1, sd=1, n=50), rnorm(mean=-1, sd=1, n=50), rep(2, 50))) # Building up the input and output sets X = data[,1:2] Y = data[,3] # Plotting our dataset using classes as colors plot(X, col=Y, main="Original dataset", xlab="Attribute 1", ylab="Attribute 2") # Here we estimate the number of hyperplanes required to shatter (divide) the given sample # in all possible ways according to the organization of points in the input space Hyperplanes = estimate_number_hyperplanes(X, Y, length=10, quantile.percentage=0.1, epsilon=1e-7)
# Generating some random dataset with 2 classes: # 50 examples in class 1 and 50 in class 2 (last column) data = cbind(rnorm(mean=1, sd=1, n=50), rnorm(mean=1, sd=1, n=50), rep(1, 50)) data = rbind(data, cbind(rnorm(mean=-1, sd=1, n=50), rnorm(mean=-1, sd=1, n=50), rep(2, 50))) # Building up the input and output sets X = data[,1:2] Y = data[,3] # Plotting our dataset using classes as colors plot(X, col=Y, main="Original dataset", xlab="Attribute 1", ylab="Attribute 2") # Here we estimate the number of hyperplanes required to shatter (divide) the given sample # in all possible ways according to the organization of points in the input space Hyperplanes = estimate_number_hyperplanes(X, Y, length=10, quantile.percentage=0.1, epsilon=1e-7)
This function computes the maximal number of regions an R^n space can be divided using m hyperplanes
number_regions(m, n)
number_regions(m, n)
m |
number of hyperplanes |
n |
space dimensionality |
Maximal number of space regions
de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" arXiv:https://arxiv.org/abs/1911.05461
de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory"
https://onionesquereality.wordpress.com/2012/11/23/maximum-number-of-regions-in-arrangement-of-hyperplanes/
number_regions(m=2, n=2)
number_regions(m=2, n=2)
Description: The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping f:X -> Y given f is selected from its search space bias F. This formal result depends on the Shattering coefficient function N(F,2n) to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of F, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias F, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Reference: https://arxiv.org/abs/1911.05461
de Mello, R.F. (2019) "On the Shattering Coefficient of Supervised Learning Algorithms" arXiv:https://arxiv.org/abs/1911.05461
de Mello, R.F., Ponti, M.A. (2018, ISBN: 978-3319949888) "Machine Learning: A Practical Approach on the Statistical Learning Theory"
This packages comes with functions to estimate the shattering coefficient.