Skip to contents

Compute coefficient estimate from regression model and its standard error give z-statistic, sample size and standard deviation of response and covariate

Usage

coef_from_z(z, n, sd_x, sd_y, phi, p, useMethod = 2)

Arguments

z

z-statistic

n

sample size

sd_x

standard deviation of covariate

sd_y

standard deviation of response

phi

case ratio in the response for logistic regression. If specified, sd_y is set to phi and the coefficient is transformed to the logistic scale

p

allele frequency of baseline allele

useMethod

numeric. 1: first order method from Pirinen, et al. 2: second order method from Pirinen, et al. 3: method from Lloyd-Jones, et al.

Value

data.frame storing coef, se, and method

Details

Given a z-statistic, we want to obtain the coefficient value from a linear regression. Adapting the approach from Zhu, et al (2016, Methods eqn 6), we estimate the coefficient as $$\beta_{linear} = z * sd_y / (sd_x*sqrt(n + z^2)),$$ where \(sd_x\) is the standard deviation of the covariate and \(sd_y\) is the standard error of the response. For a model with no covariates, this transformation gives the exact coefficient estimate. With covariates, it is approximate.

The coeffient estimate from linear regression can be converted to the logistic scale using the approach of Pirinen, et al. (2013) according to $$\beta_{logistic} = \beta_{linear} / (\phi(1-\phi)),$$ where \(\phi\) is the case ratio in the logistic regression. This approximates the coefficient as if the model had been fit with logistic regression. An alternative approximation is given by Lloyd-Jones, et al. (2018).

References

Zhu, et al. (2016). Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics. 48:481–487 doi:10.1038/ng.3538

Pirinen, Donnelly and Spencer. (2013). Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7(1): 369-390 doi:10.1214/12-AOAS586

Lloyd-Jones, Robinson, Yang and Visscher. (2018). Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics, 208(4), pp.1397-1408. doi:10.1534/genetics.117.300360

Examples

# Linear regression
#------------------

# simulate data
set.seed(1)
n = 100
x = rnorm(n, 0, 7)
y = x*3 + rnorm(n)
data = data.frame(x, y)

# fit regression model
fit <- lm(y ~ x, data=data)

# get z-statistic
z = coef(summary(fit))[2,'t value']

# coef and se from regression model
coef(summary(fit))[2,-4]
#>     Estimate   Std. Error      t value 
#>   2.99984852   0.01538958 194.92730329 

# coef and se from summary statistics
coef_from_z(z, n, sd(x), sd(y))
#>      coef         se method
#> 1 2.99977 0.01538917 linear

# Logistic regression
#--------------------

# simulate data
n = 1000
p = .3
x = rbinom(n, 2, p)
eta = x*.1 
y = rbinom(n, size=1, prob=plogis(eta))
data = data.frame(x, y)

# fit regression model
fit <- glm(y  ~ x, data=data, family=binomial)

# get z-statistic
z = coef(summary(fit))[2,3]

# get case ratio 
phi = sum(y) / length(y)

# coef and se from regression model
coef(summary(fit))[2,]
#>   Estimate Std. Error    z value   Pr(>|z|) 
#> 0.14614232 0.09590596 1.52380859 0.12755653 

# when phi is given, coef is transformed to logistic scale
coef_from_z(z, n, sd(x), phi=phi, p=p)
#>        coef        se   method
#> 1 0.1472442 0.0966291 logistic