Diet-related chronic disease in the northeastern United States: a model-based clustering approach
Obesity and diabetes are global public health concerns. Studies indicate a relationship between socioeconomic, demographic and environmental variables and the spatial patterns of diet-related chronic disease. In this paper, we propose a methodology using model-based clustering and variable selection to predict rates of obesity and diabetes. We test this method through an application in the northeastern United States.
We use model-based clustering, an unsupervised learning approach, to find latent clusters of similar US counties based on a set of socioeconomic, demographic, and environmental variables chosen through the process of variable selection. We then use Analysis of Variance and Post-hoc Tukey comparisons to examine differences in rates of obesity and diabetes for the clusters from the resulting clustering solution.
We find access to supermarkets, median household income, population density and socioeconomic status to be important in clustering the counties of two northeastern states. The results of the cluster analysis can be used to identify two sets of counties with significantly lower rates of diet-related chronic disease than those observed in the other identified clusters. These relatively healthy clusters are distinguished by the large central and large fringe metropolitan areas contained in their component counties. However, the relationship of socio-demographic factors and diet-related chronic disease is more complicated than previous research would suggest. Additionally, we find evidence of low food access in two clusters of counties adjacent to large central and fringe metropolitan areas. While food access has previously been seen as a problem of inner-city or remote rural areas, this study offers preliminary evidence of declining food access in suburban areas.
Model-based clustering with variable selection offers a new approach to the analysis of socioeconomic, demographic, and environmental data for diet-related chronic disease prediction. In a test application to two northeastern states, this method allows us to identify two sets of metropolitan counties with significantly lower diet-related chronic disease rates than those observed in most rural and suburban areas. Our method could be applied to larger geographic areas or other countries with comparable data sets, offering a promising method for researchers interested in the global increase in diet-related chronic disease.