Skip to contents

Compute coefficient estimate from regression model and its standard error give z-statistic, sample size and standard deviation of response and covariate

Usage

coef_from_z(z, n, sd_x, sd_y, phi)

Arguments

z

z-statistic

n

sample size

sd_x

standard deviation of covariate

sd_y

standard deviation of response

phi

case ratio in the response for logistic regression. If specified, sd_y is set to phi and the coefficient is transformed to the logistic scale

Value

data.frame storing coef, se, and method

Details

Given a z-statistic, we want to obtain the coefficient value from a linear regression. Adapting the approach from Zhu, et al (2016, Methods eqn 6), we estimate the coefficient as $$\beta_{linear} = z * sd_y / (sd_x*sqrt(n + z^2)),$$ where \(sd_x\) is the standard deviation of the covariate and \(sd_y\) is the standard error of the response. For a model with no covariates, this transformation gives the exact coefficient estimate. With covariates, it is approximate.

The coeffient estimate from linear regression can be converted to the logistic scale using the approach of Pirinen, et al. (2013) according to $$\beta_{logistic} = \beta_{linear} / (\phi(1-\phi)),$$ where \(\phi\) is the case ratio in the logistic regression. This approximates the coefficient as if the model had been fit with logistic regression.

References

Zhu, et al. (2016). Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics. 48:481–487 doi:10.1038/ng.3538

Pirinen, Donnelly and Spencer. (2013). Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7(1): 369-390 doi:10.1214/12-AOAS586

Examples

# Linear regression
#------------------

# simulate data
set.seed(1)
n = 100
x = rnorm(n, 0, 7)
y = x*3 + rnorm(n)
data = data.frame(x, y)

# fit regression model
fit <- lm(y ~ x, data=data)

# get z-statistic
z = coef(summary(fit))[2,'t value']

# coef and se from regression model
coef(summary(fit))[2,-4]
#>     Estimate   Std. Error      t value 
#>   2.99984852   0.01538958 194.92730329 

# coef and se from summary statistics
coef_from_z(z, n, sd(x), sd(y))
#>      coef         se method
#> 1 2.99977 0.01538917 linear

# Logistic regression
#--------------------

# simulate data
n = 1000
p = .3
x = rbinom(n, 2, p)
eta = x*.1 
y = rbinom(n, size=1, prob=plogis(eta))
data = data.frame(x, y)

# fit regression model
fit <- glm(y  ~ x, data=data, family=binomial)

# get z-statistic
z = coef(summary(fit))[2,3]

# get case ratio 
phi = sum(y) / length(y)

# coef and se from regression model
coef(summary(fit))[2,]
#>   Estimate Std. Error    z value   Pr(>|z|) 
#> 0.14614232 0.09590596 1.52380859 0.12755653 

# when phi is given, coef is transformed to logistic scale
coef_from_z(z, n, sd(x), phi=phi)
#>        coef         se   method
#> 1 0.1471922 0.09659496 logistic