New paper in Computational Statistics & Data Analysis

Nicolas Frink and Timo Schmid apply generalized tree-based machine learning to analyze educational count data.

Small area prediction of counts under machine learning-type mixed models

 

Frink, N.; Schmid, T. 

Abstract: Small area estimation methods are proposed that use generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, two existing approaches based on random forests - the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF) - are extended to accommodate count outcomes, addressing key challenges such as overdispersion. Additionally, three bootstrap methodologies designed to assess the reliability of point estimators for area-level means are evaluated. The numerical analysis shows that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. In a case study using real-world data from the state of Guerrero, Mexico, the proposed methods effectively estimate area-level means while capturing the uncertainty inherent in overdispersed count data. These findings highlight their practical applicability for small area estimation.

 

Nicolas Frink & Timo Schmid (2025) Small area prediction of counts under machine learning-type mixed models, Computational Statistics & Data Analysis, DOI: https://doi.org/10.1016/j.csda.2025.108218