Joscha Krause, Jan Pablo Burgard (Department of Economic and Social Statistics, Trier University) and Domingo Morales (Operations Research Center, University Miguel Hernández of Elche)
Abstract: Regional prevalence estimation requires the use of suitable statistical methods on epidemiologic data with substantial local detail. Small area estimation with medical treatment records as covariates marks a promising combination for this purpose. However, medical routine data often has strong internal correlation due to diagnosis-related grouping in the records. Depending on the strength of the correlation, the space spanned by the covariates can become rank-deficient. In this case, prevalence estimates suffer from unacceptable uncertainty as the individual contributions of the covariates to the model cannot be identified properly. We propose an area-level logit mixed model for regional prevalence estimation with a new fitting algorithm to solve this problem. We extend the Laplace approximation to the log-likelihood by an ℓ2-penalty in order to stabilize the estimation process in the presence of covariate rank-deficiency. Empirical best predictors under the model and a parametric bootstrap for mean squared error estimation are presented. A Monte Carlo simulation study is conducted to evaluate the properties of our methodology in a controlled environment. We further provide an empirical application where the district-level prevalence of multiple sclerosis in Germany is estimated using health insurance records.