Auf der Seite finden Sie einen Überblick zu aktuellen Forschungsarbeiten des Lehrstuhls. Die Working Paper können gerne auf Nachfrage versendet werden.
- Iterative kernel density estimation applied to grouped data: Estimating poverty and inequality indicators from the German Microcensus
Walter, P.; Groß, M.; Schmid, T.; Weimer, K.
Abstract: The estimation of poverty and inequality indicators based on survey data is trivial as long as the variable of interest (e.g. income or consumption) is measured on a metric scale. However, estimation is not directly possible, using standard formulas, when the income variable is grouped due to confidentiality constraints or in order to decrease item non-response. We propose an iterative kernel density algorithm that generates metric pseudo samples from the grouped variable for the estimation of indicators. The corresponding standard errors are estimated by a non-parametric bootstrap that accounts for the additional uncertainty due to the grouping. The algorithm enables the use of survey weights and household equivalence scales. The proposed method is applied to the German Microcensus for estimating the regional distribution of poverty and inequality in Germany.
- Experimental UK regional consumer price inflation with model-based expenditure weights
Dawber, J.; Würz, N.; Smith, P.; Flower, T.; Heledd, T.; Schmid, T.; Tzavidis, N.
Abstract: Like many other countries, the UK produces a national consumer price index (CPI) to measure inflation rates. Presently, CPI measures are not produced for regions within the UK. It is believed that, using only available data sources, a regional CPI would not be precise or reliable enough as an official statistic, primarily because the regional partitioning of the data makes sample sizes too small. We investigate this claim by producing an experimental regional CPI measure using publically available price data, and deriving expenditure weights from the Living Costs and Food survey. We detail the methods and challenges of developing a regional CPI and evaluate its reliability. We then assess whether model-based methods such as smoothing and small area estimation significantly improve the measures. We find that a regional CPI can be produced with available data sources, however it appears to be excessively volatile over time, mainly due to the weights. Smoothing and small area estimation improve the reliability of the regional CPI series to some extent but there remain reliability issues that would benefit from further investigation. This research provides a valuable framework for the development of a more viable regional CPI measure for the UK in the future.
- Estimating regional income indicators under transformations and access to limited population auxiliary information
Würz, N.; Schmid, T.; Tzavidis, N.
Abstract: Spatially disaggregated income indicators are typically estimated by using model-based methods that assume access to auxiliary information from population micro-data. In many countries like Germany and the UK population micro-data are not publicly available. In this work we propose small area methodology when only aggregate population-level auxiliary information is available. We use data-driven transformations of the response to satisfy the parametric assumptions of the used models. In the absence of population micro-data, appropriate bias-corrections for small area prediction are needed. Under the approach we propose in this paper, the use of aggregate statistics (means and covariances) and kernel density estimation to resolve the issue of not having access to population micro-data. We further explore the estimation of the mean squared error using parametric bootstrap. Extensive model-based and design-based simulations are used to compare the proposed method to alternative methods. Finally, the proposed methodology is applied to the 2011 Socio-Economic Panel and aggregate census information from the same year to estimate the average income for 96 regional planning regions in Germany.
- Domain prediction with grouped income data
Walter, P.; Groß, M.; Schmid, T.; Tzavidis, N.
Abstract: One popular small area estimation method for estimating poverty and inequality indicators is the empirical best predictor under the unit-level nested error regression model with a continuous dependent variable. However, parameter estimation is more challenging when the response variable is grouped due to data confidentiality concerns or concerns about survey response burden. The work in this paper proposes methodology that enables fitting a nested error regression model when the dependent variable is grouped. Model parameters are then used for small area prediction of finite population parameters of interest. Model fitting in the case of a grouped response variable is based on the use of a stochastic expectation-maximization algorithm. Since the stochastic expectation-maximization algorithm relies on the Gaussian assumptions of the unit-level error terms, adaptive transformations are incorporated for handling departures from normality. The estimation of the MSE of the small area parameters is facilitated by a parametric bootstrap that captures the additional uncertainty due to the grouping mechanism and the possible use of adaptive transformations. The empirical properties of the proposed methodology are assessed by using model-based simulations and its relevance is illustrated by estimating deprivation indicators for municipalities in the Mexican state of Chiapas.
- Intercensal updating using structure-preserving methods and satellite imagery
Koebe, T.; Arias-Salazar, A.; Rojas-Perilla, N.; Schmid, T.
Abstract: Censuses are fundamental building blocks of most modern-day societies, yet collected every ten years at best. We propose an extension of the widely popular census updating technique Structure Preserving Estimation by incorporating auxiliary information in order to take ongoing subnational population shifts into account. We apply our method by incorporating satellite imagery as additional source to derive annual small area updates of multidimensional poverty indicators from 2013 to 2020 for a population at risk: female-headed households in Senegal. We evaluate the performance of our proposal using data from two different census periods.
- A framework for producing small area estimates based on area-level models in R
Harmening, S.; Kreutzmann, A.-K.; Pannier, S.; Salvati, N.; Schmid, T.
Abstract: The R package emdi facilitates the estimation of regionally disaggregated indicators using small area estimation methods and provides tools for model building, diagnostics, presenting, and exporting the results. The package version 1.1.7 includes unit-level small area models that rely on access to micro data which may be challenging due to confidentiality constraints. In contrast, area-level models are less demanding with respect to (a) data requirements, as only aggregates are needed for estimating regional indicators, and (b) computational resources, and enable the incorporation of design-based properties. Therefore, the area-level model (Fay and Herriot 1979) and various extensions have been added to version 2.0.2 of the package emdi. These extensions include amongst others (a) transformed area-level models with back-transformations, (b) spatial and robust extensions, (c) adjusted variance estimation methods, and (d) area-level models that account for measurement errors. Corresponding mean squared error estimators are implemented for assessing the uncertainty. User-friendly tools like a stepwise variable selection function, model diagnostics, benchmarking options, high quality maps and export options of the results enable the user a complete analysis procedure - from model building to diagnostics. The functionality of the package is demonstrated by illustrative examples based on synthetic data for Austrian districts.
- Estimating regional unemployment with mobile network data for functional urban areas in Germany
Hadam, S.; N. Würz; Kreutzmann, A.-K.; Schmid, T.
Abstract: The ongoing growth of cities due to better job opportunities is leading to increased labour-related commuter flows in several countries. On the one hand, an increasing number of people commute and move to the cities, but on the other hand, the labour market indicates higher unemployment rates in urban areas than in the surrounding areas. We investigate this phenomenon on regional level by an alternative definition of unemployment rates in which commuting behaviour is integrated. We combine data from the labour force survey with dynamic mobile network data by small area models for the federal state North Rhine-Westphalia in Germany. From a methodical perspective, we use a transformed Fay-Herriot model with bias correction for the estimation of unemployment rates and propose a parametric bootstrap for the mean squared error estimation that includes the bias correction. The performance of the proposed methodology is evaluated in a case study based on official data and in model-based simulations. The results in the application show that unemployment rates (adjusted by commuters) in German cities are lower than traditional official unemployment rates indicate.
- Small area estimation with multiple imputed survey data
Runge, M.; Schmid, T.
Abstract: Many statistical surveys suffer from a) high non-response rates due to sensitive questions and response burden and b) too small sample sizes to allow for reliable estimates on disaggregated levels due to budget constraints. One way to deal with missing values is to replace them by several plausible values based on a model. Small area estimation is used to estimate regionally disaggregated indicators when direct estimates are imprecise due to small sample sizes. In this paper we propose a framework that tackles both problems at the same time. In particular, we extend the general class of transformed Fay-Herriot models to account for the additional uncertainty from multiple imputation. We derive three subcases of the Fay-Herriot model with particular transformations and provide point and mean squared error estimators. Depending on the subcase, the mean squared error is estimated by analytic solutions or resampling methods. Comprehensive model-based simulations in a controlled environment and design-based simulations based on European income and wealth data show that the proposed methodology leads to reliable and precise results in terms of bias and mean squared error.
- Flexible domain prediction using mixed effects random forests
Krennmair, P.; Schmid, T.
Abstract: This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualized within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo León. Finally, the methodology is evaluated in extensive model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages.
- Scale estimation and data-driven tuning constant selection for M-quantile regression
Dwaber, J.; Salvati, N.; Schmid, T.; Tzavidis, N.
Abstract: M-quantile regression is a general form of quantile-like regression which usually utilises the Huber inﬂuence function and corresponding tuning constant. Estimation requires a nuisance scale parameter to ensure the M-quantile estimates are scale invariant, with several scale estimators having previously been proposed. In this paper we assess these scale estimators and evaluate their suitability, as well as proposing a new scale estimator based on the method of moments. Further, we present two approaches for estimating data-driven tuning constant selection for M-quantile regression. The tuning constants are obtained by i) minimising the estimated asymptotic variance of the regression parameters and ii) utilising an inverse M-quantile function to reduce the eﬀect of outlying observations. We investigate whether data-driven tuning constants, as opposed to the usual ﬁxed constant, for instance, at c=1.345, can improve the eﬃciency of the estimators of M-quantile regression parameters. The performance of the data-driven tuning constant is investigated in diﬀerent scenarios using model-based simulations. Finally, we illustrate the proposed methods using a European Union Statistics on Income and Living Conditions data set.
- Asymptotic distribution of regression quantiles in a mixed effects model
Hensel, S.; Pannier, S.; Schmid, T.; Tzavidis, N.
Abstract: Linear quantile models allow for a robust analysis of the conditional distribution of the variable of interest. The introduction of a random effects term extended their range of application to data with complex dependency structures, as they occur in many studies. This paper proposes a higher theoretical understanding of linear quantile mixed models by analysing the asymptotic behaviour of the corresponding maximum likelihood estimator. We will proof the estimators to be consistent and show that it is asymptotically normally distributed. Additionally, a plug-in variance estimator is derived, and its finite sample behaviour is demonstrated in a simulation study.