Compute coefficient estimate from regression model and its standard error give z-statistic, sample size and standard deviation of response and covariate
Arguments
- z
z-statistic
- n
sample size
- sd_x
standard deviation of covariate
- sd_y
standard deviation of response
- phi
case ratio in the response for logistic regression. If specified,
sd_yis set tophiand the coefficient is transformed to the logistic scale- p
allele frequency of baseline allele
- useMethod
numeric. 1: first order method from Pirinen, et al. 2: second order method from Pirinen, et al. 3: method from Lloyd-Jones, et al.
Details
Given a z-statistic, we want to obtain the coefficient value from a linear regression. Adapting the approach from Zhu, et al (2016, Methods eqn 6), we estimate the coefficient as $$\beta_{linear} = z * sd_y / (sd_x*sqrt(n + z^2)),$$ where \(sd_x\) is the standard deviation of the covariate and \(sd_y\) is the standard error of the response. For a model with no covariates, this transformation gives the exact coefficient estimate. With covariates, it is approximate.
The coeffient estimate from linear regression can be converted to the logistic scale using the approach of Pirinen, et al. (2013) according to $$\beta_{logistic} = \beta_{linear} / (\phi(1-\phi)),$$ where \(\phi\) is the case ratio in the logistic regression. This approximates the coefficient as if the model had been fit with logistic regression. An alternative approximation is given by Lloyd-Jones, et al. (2018).
References
Zhu, et al. (2016). Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics. 48:481–487 doi:10.1038/ng.3538
Pirinen, Donnelly and Spencer. (2013). Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7(1): 369-390 doi:10.1214/12-AOAS586
Lloyd-Jones, Robinson, Yang and Visscher. (2018). Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics, 208(4), pp.1397-1408. doi:10.1534/genetics.117.300360
Examples
# Linear regression
#------------------
# simulate data
set.seed(1)
n = 100
x = rnorm(n, 0, 7)
y = x*3 + rnorm(n)
data = data.frame(x, y)
# fit regression model
fit <- lm(y ~ x, data=data)
# get z-statistic
z = coef(summary(fit))[2,'t value']
# coef and se from regression model
coef(summary(fit))[2,-4]
#> Estimate Std. Error t value
#> 2.99984852 0.01538958 194.92730329
# coef and se from summary statistics
coef_from_z(z, n, sd(x), sd(y))
#> coef se method
#> 1 2.99977 0.01538917 linear
# Logistic regression
#--------------------
# simulate data
n = 1000
p = .3
x = rbinom(n, 2, p)
eta = x*.1
y = rbinom(n, size=1, prob=plogis(eta))
data = data.frame(x, y)
# fit regression model
fit <- glm(y ~ x, data=data, family=binomial)
# get z-statistic
z = coef(summary(fit))[2,3]
# get case ratio
phi = sum(y) / length(y)
# coef and se from regression model
coef(summary(fit))[2,]
#> Estimate Std. Error z value Pr(>|z|)
#> 0.14614232 0.09590596 1.52380859 0.12755653
# when phi is given, coef is transformed to logistic scale
coef_from_z(z, n, sd(x), phi=phi, p=p)
#> coef se method
#> 1 0.1472442 0.0966291 logistic