Empirical Mining of Large Data Sets Helps to Solve Practical Large-Scale Forest Management and Monitoring Problems
We present a panoply of examples where empirical mining and statistical analysis of large data sets have already proven useful to help handle vexing problems within the realm of large-scale forest ecology. Some prejudices may exist against empirical approaches, in favor of more process-oriented analytical methods. Because a full understanding and appreciation of particular ecological phenomena are possible only after process-driven, hypothesis-directed research, some forest ecologists may feel that purely empirical data harvesting may represent a less-than-satisfactory approach. Restricting ourselves exclusively to process-driven approaches, however, may result in substantially slower progress, and we may not be able to afford the delays caused by such specific approaches.
Empirical methods allow trends, relationships and associations to emerge freely from the data themselves, unencumbered by preconceived theories, ideas and prejudices. Empirical methods can be extremely efficient at uncovering strong correlations with intermediate "linking" variables. Once identified, these correlative structures directly provide sufficient prognostic talent and predictive power to be harnessed by, e.g., Bayesian Belief Nets, which bias ecological management decisions made with incomplete information toward favorable outcomes. Empirical data-harvesting also generates a myriad of testable hypotheses regarding processes, some of which may even be correct.
Quantitative statistical regionalizations using Multivariate Geographic Clustering have lended insights into carbon eddy-flux direction and magnitude, wildfire biophysical conditions, phenological ecoregions useful for vegetation type and monitoring, potential areas susceptible to sudden oak death, an invasive oak pathogen, global aquatic ecoregions and susceptibility to aquatic invasives, and forest vertical structure ecoregions, using extensive LiDAR data sets. Multivariate Spatio-Temporal Clustering, which quantitatively places alternative future conditions on a common footing with present conditions, allows prediction of present and future shifts in tree species ranges, given alternative climatic change forecasts.
Unsupervised statistical multivariate clustering of smoothed phenology data every 8 days over a 14-year period produces a detailed set of annual maps of national vegetation types, including major disturbances. Examining the constancy of these phenological classifications at a particular location from year to year produces a national map showing the persistence of vegetation, regardless of vegetation type. Using temporal unmixing methods, national maps of evergreen and deciduous vegetation can be produced. A by-cell regression trend line can be used to analyze decadal trends in forest health nationally. Forest Decline maps are a composite of insect, disease, and anthropogenic factors causing chronic decreases in the satellite greenness of these forests, including hemlock wooly adelgid, aspen decline, mountain pine beetle, wildfire, tree harvest, and urbanization. Because the trend in phenological changes monitors vegetation responses, all disturbance and recovery events are detected and mapped through the behavior of the vegetation itself. As ecological changes occur with increasing rapidity, these empirical data-mining approaches may be the quickest means to find the most-actionable ecological policies and directions.