Title: | Model Averaging for Multivariate GLMM with Null Models |
---|---|
Description: | Tools for univariate and multivariate generalized linear mixed models with model averaging and null model technique. |
Authors: | Masatoshi Katabuchi and Akihiro Nakamura |
Maintainer: | Masatoshi Katabuchi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2024-11-07 03:42:05 UTC |
Source: | https://github.com/mattocci27/mglmn |
Returns variables for the best model based on AIC
best.vars(x)
best.vars(x)
x |
A list of results of 'maglm' and 'mamglmg' |
A vector of terms of the best model.
#load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr env_sp <- capcay$env_sp #to fit a poisson regression model: res <- maglm(data = env_sp, y = "adj.sr", family = "gaussian") best.vars(res)
#load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr env_sp <- capcay$env_sp #to fit a poisson regression model: res <- maglm(data = env_sp, y = "adj.sr", family = "gaussian") best.vars(res)
Species composition and environmental data from Capricornia Cays
data(capcay)
data(capcay)
A list containing the elements
A data frame with 14 observations of abundance of 13 ant species
A vector of adjusted species richness of ants based on sample-based rarefaction curves to standardise sampling intensity across sites (see Nakamura et al. 2015 for more details).
A data frame of 10 environmental variables, which best explained the variation in the matrix of similarity values.
A data frame of 10 environmental variables, which best explained the variation in the matrix of similarity values.
The data frame abund
has the following variables:
(numeric) relative abundance of Camponotus mackayensis
(numeric) relative abundance of Cardiocondyla nuda
(numeric) relative abundance of Hypoponera spA
(numeric) relative abundance of Hypoponera spB
(numeric) relative abundance of Iridomyrmex spA
(numeric) relative abundance of Monomorium leave
(numeric) relative abundance of Ochetellus spA
(numeric) relative abundance of Paratrechina longicornis
(numeric) relative abundance of Paratrechina spA
(numeric) relative abundance of Tapinoma spA
(numeric) relative abundance of Tetramorium bicarinatum
The data frame env_sp
has the following variables:
(numeric) native plant species richness
(numeric) log-transformed relative abundance of Pheidole megacephala
(numeric) presence/absence of Pheidole megacephala
(numeric) presence/absence of frequent human visitiation
(numeric) mean daily maximum temp(degree celsius)
(numeric) total rainfall in the past 4 weeks (mm)
(numeric) distance to the nearest continent (km)
(numeric) log-transformed distance to the nearest island (km)
(numeric) Y coordinate
(numeric) X coordinate * Y coordinate
The data frame env_assem
has the following variables:
(numeric) log-transformed island size (ha)
(numeric) log-transformed exotic plant species richness
(numeric) native plant species richness
(numeric) presence/absence of Pheidole megacephala
(numeric) presence/absence of frequent human visitiation
(numeric) log-transformed total rainfall during sampling (mm)
(numeric) distance to the nearest continent (km)
(numeric) log-transformed distance to the nearest island (km)
(numeric) Y coordinate
(numeric) X coordinate * Y coordinate
Nakamura A., Burwell C.J., Lambkin C.L., Katabuchi M., McDougall A., Raven R.J. and Neldner V.J. (2015), The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach, Journal of Biogeography, DOI: 10.1111/jbi.12520
Model averaging for GLM based on information theory.
maglm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
maglm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
data |
Data frame, typically of environmental variables. Rows for sites and colmuns for environmental variables. |
y |
Vector of independent variables. |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE) |
AIC.restricted |
Whether to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
A list of results
res.table |
data frame with "AIC", AIC of the model, "log.L", log-likelihood of the model, "delta.aic", AIC difference to the best model, "wAIC", weighted AIC to the model, "n.vars", number of variables in the model, and each term. |
importance |
vector of relative importance value of each term, caluclated as as um of the weighted AIC over all of the model in whith the term aperars. |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE |
AIC.restricted |
Whether to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
Dobson, A. J. (1990) An Introduction to Generalized Linear Models. London: Chapman and Hall.
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multi-model inference: a practical information-theoretic approach. Springer Verlag, New York.
Nakamura, A., C. J. Burwell, C. L. Lambkin, M. Katabuchi, A. McDougall, R. J. Raven, and V. J. Neldner. (2015) The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach. Journal of Biogeography 42:1406-1417.
#load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr env_sp <- capcay$env_sp #to fit a regression model: maglm(data = env_sp, y = "adj.sr", family = "gaussian", AIC.restricted = TRUE)
#load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr env_sp <- capcay$env_sp #to fit a regression model: maglm(data = env_sp, y = "adj.sr", family = "gaussian", AIC.restricted = TRUE)
Utility function for data manipulation, which is implemented in maglm and mamglm.
make.formula(lhs, vars.vec, rand.vec = NULL)
make.formula(lhs, vars.vec, rand.vec = NULL)
lhs |
Numeric vector of dependent variables. |
vars.vec |
Character vector of independet variables. |
rand.vec |
Character vector of random variables (default = NULL). |
an object of class '"formula"'
Model averaging for multivariate GLLVM based on information theory.
mamgllvm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
mamgllvm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
data |
Data frame, typically of environmental variables. Rows for sites and colmuns for environmental variables. |
y |
Name of 'mvabund' object (character) |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE) |
AIC.restricted |
Whether to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
A list of results
res.table |
data frame with "AIC", AIC of the model, "log.L", log-likelihood of the model, "delta.aic", AIC difference to the best model, "wAIC", weighted AIC to the model, "n.vars", number of variables in the model, and each term. |
importance |
vector of relative importance value of each term, caluclated as as um of the weighted AIC over all of the model in whith the term aperars. |
family |
the 'family' object used. |
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multi-model inference: a practical information-theoretic approach. Springer Verlag, New York.
Niku, J., Warton, D. I., Hui, F. K. C., and Taskinen, S. (2017). Generalized linear latent variable models for multivariate count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics, 22:498-522.
Niku, J., Brooks, W., Herliansyah, R., Hui, F. K. C., Taskinen, S., and Warton, D. I. (2018). Efficient estimation of generalized linear latent variable models. PLoS One, 14(5):1-20.
Warton, D. I., Guillaume Blanchet, F., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C. and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology & Evolution, 30:766-779.
#load species composition and environmental data library(mvabund) data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:2] freq.abs <- mvabund(log(capcay$abund + 1)) #to fit a gaussian regression model to frequency data: mamgllvm(data = env_assem, y = "freq.abs", family = "gaussian") #to fit a binomial regression model to presence/absence data" pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) mamgllvm(data = env_assem, y = "pre.abs", family = "binomial")
#load species composition and environmental data library(mvabund) data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:2] freq.abs <- mvabund(log(capcay$abund + 1)) #to fit a gaussian regression model to frequency data: mamgllvm(data = env_assem, y = "freq.abs", family = "gaussian") #to fit a binomial regression model to presence/absence data" pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) mamgllvm(data = env_assem, y = "pre.abs", family = "binomial")
Model averaging for multivariate GLM based on information theory.
mamglm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
mamglm(data, y, family, scale = TRUE, AIC.restricted = FALSE)
data |
Data frame, typically of environmental variables. Rows for sites and colmuns for environmental variables. |
y |
Name of 'mvabund' object (character) |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE) |
AIC.restricted |
Whether to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
A list of results
res.table |
data frame with "AIC", AIC of the model, "log.L", log-likelihood of the model, "delta.aic", AIC difference to the best model, "wAIC", weighted AIC to the model, "n.vars", number of variables in the model, and each term. |
importance |
vector of relative importance value of each term, caluclated as as um of the weighted AIC over all of the model in whith the term aperars. |
family |
the 'family' object used. |
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multi-model inference: a practical information-theoretic approach. Springer Verlag, New York.
Wang, Y., Naumann, U., Wright, S.T. & Warton, D.I. (2012) mvabund- an R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution, 3, 471-474.
Warton, D.I., Wright, S.T. & Wang, Y. (2012) Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3, 89-101.
Nakamura, A., C. J. Burwell, C. L. Lambkin, M. Katabuchi, A. McDougall, R. J. Raven, and V. J. Neldner. (2015) The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach. Journal of Biogeography 42:1406-1417.
#load species composition and environmental data library(mvabund) data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:5] freq.abs <- mvabund(log(capcay$abund + 1)) #to fit a gaussian regression model to frequency data: mamglm(data = env_assem, y = "freq.abs", family = "gaussian") #to fit a binomial regression model to presence/absence data" pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) mamglm(data = env_assem, y = "pre.abs", family = "binomial")
#load species composition and environmental data library(mvabund) data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:5] freq.abs <- mvabund(log(capcay$abund + 1)) #to fit a gaussian regression model to frequency data: mamglm(data = env_assem, y = "freq.abs", family = "gaussian") #to fit a binomial regression model to presence/absence data" pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) mamglm(data = env_assem, y = "pre.abs", family = "binomial")
Standardized effect size of relative importance values for model averaging mutlivariate GLM.
ses.maglm( data, y, family, scale = TRUE, AIC.restricted = TRUE, par = FALSE, runs = 999 )
ses.maglm( data, y, family, scale = TRUE, AIC.restricted = TRUE, par = FALSE, runs = 999 )
data |
Data frame, typically of environmental variables. Rows for sites and colmuns for environmental variables. |
y |
Vector of independent variables. |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE) |
AIC.restricted |
Wheter to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
par |
Wheter to use parallel computing (default = FALSE) |
runs |
Number of randomizations. |
The currently implemented null model shuffles the set of environmental variables across sites, while maintains species composition. Note that the function would take considerable time to execute.
A data frame of resluts for each term
res.obs |
Observed importance of terms |
res.rand.mean |
Mean importance of terms in null communites |
res.rand.sd |
Standard deviation of importance of terms in null communites |
SES |
Standardized effect size of importance of terms (= (res.obs - res.rand.mean) / res.rand.sd) |
res.obs.rank |
Rank of observed importance of terms vs. null communites |
runs |
Number of randomizations |
Dobson, A. J. (1990) An Introduction to Generalized Linear Models. London: Chapman and Hall.
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multi-model inference: a practical information-theoretic approach. Springer Verlag, New York.
Nakamura, A., C. J. Burwell, C. L. Lambkin, M. Katabuchi, A. McDougall, R. J. Raven, and V. J. Neldner. (2015) The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach. Journal of Biogeography 42:1406-1417.
library(mvabund) #load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr #use a subset of data in this example to reduce run time env_sp <- capcay$env_sp[, 1:5] #to execute calculations on a single core: ses.maglm(data = env_sp, y = "adj.sr", par = FALSE, family = "gaussian", runs = 4) ## Not run: #to execute parallel calculations: sfInit(parallel = TRUE, cpus = 4) sfExportAll() ses.maglm(data = env_sp, y = "adj.sr", par = TRUE, family = "gaussian", runs = 4) ## End(Not run)
library(mvabund) #load species composition and environmental data data(capcay) adj.sr <- capcay$adj.sr #use a subset of data in this example to reduce run time env_sp <- capcay$env_sp[, 1:5] #to execute calculations on a single core: ses.maglm(data = env_sp, y = "adj.sr", par = FALSE, family = "gaussian", runs = 4) ## Not run: #to execute parallel calculations: sfInit(parallel = TRUE, cpus = 4) sfExportAll() ses.maglm(data = env_sp, y = "adj.sr", par = TRUE, family = "gaussian", runs = 4) ## End(Not run)
Standardized effect size of relative importance values for model averaging GLM.
ses.mamglm( data, y, family, scale = TRUE, AIC.restricted = TRUE, par = FALSE, runs = 999 )
ses.mamglm( data, y, family, scale = TRUE, AIC.restricted = TRUE, par = FALSE, runs = 999 )
data |
Data frame, typically of environmental variables. Rows for sites and colmuns for environmental variables. |
y |
Name of 'mvabund' object (character) |
family |
the 'family' object used. |
scale |
Whether to scale independent variables (default = TRUE) |
AIC.restricted |
Wheter to use AICc (TRUE) or AIC (FALSE) (default = TRUE). |
par |
Wheter to use parallel computing (default = FALSE) |
runs |
Number of randomizations. |
The currently implemented null model shuffles the set of environmental variables across sites, while maintains species composition. Note that the function would take considerable time to execute.
A data frame of resluts for each term
res.obs |
Observed importance of terms |
res.rand.mean |
Mean importance of terms in null communites |
res.rand.sd |
Standard deviation of importance of terms in null communites |
SES |
Standardized effect size of importance of terms (= (res.obs - res.rand.mean) / res.rand.sd) |
res.obs.rank |
Rank of observed importance of terms vs. null communites |
runs |
Number of randomizations |
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multi-model inference: a practical information-theoretic approach. Springer Verlag, New York.
Wang, Y., Naumann, U., Wright, S.T. & Warton, D.I. (2012) mvabund- an R package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution, 3, 471-474.
Warton, D.I., Wright, S.T. & Wang, Y. (2012) Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3, 89-101.
Nakamura, A., C. J. Burwell, C. L. Lambkin, M. Katabuchi, A. McDougall, R. J. Raven, and V. J. Neldner. (2015) The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach. Journal of Biogeography 42:1406-1417.
library(mvabund) #load species composition and environmental data data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:5] pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) #to execute calculations on a single core: ses.mamglm(data = env_assem, y = "pre.abs", par = FALSE, family = "binomial", AIC.restricted=FALSE,runs=4) ## Not run: #to execute parallel calculations: sfInit(parallel = TRUE, cpus = 4) sfExportAll() ses.mamglm(data = env_assem, y = "pre.abs", par = TRUE, family = "binomial", AIC.restricted = FALSE, runs = 4) ## End(Not run)
library(mvabund) #load species composition and environmental data data(capcay) #use a subset of data in this example to reduce run time env_assem <- capcay$env_assem[, 1:5] pre.abs0 <- capcay$abund pre.abs0[pre.abs0 > 0] = 1 pre.abs <- mvabund(pre.abs0) #to execute calculations on a single core: ses.mamglm(data = env_assem, y = "pre.abs", par = FALSE, family = "binomial", AIC.restricted=FALSE,runs=4) ## Not run: #to execute parallel calculations: sfInit(parallel = TRUE, cpus = 4) sfExportAll() ses.mamglm(data = env_assem, y = "pre.abs", par = TRUE, family = "binomial", AIC.restricted = FALSE, runs = 4) ## End(Not run)