Please join us as we welcome Dr. Paul Roebber, a Distinguished Professor, College of Letters and Science in the Department of Mathematical Sciences at the University of Wisconsin-Milwaukee, for a seminar titled "The Application of Data Analytics to Atmospheric Science". Coffee will be served.
The Application of Data Analytics to Atmospheric Science
Even my 87-year old mother has heard of “Big Data” and “Data Analytics.” But stripping away the marketing noise, what is it, really, and why should physical scientists such as us care about it? The ability to leverage data to improve understanding has always been important, but is becoming increasingly so as data becomes more readily available and the need increases to extract some measure of value from its rising volume. Data analytics provides the methodology. The requirements for a practitioner in this field are application-oriented math and statistics knowledge allied with substantive domain expertise. Since the software tools needed to perform the necessary analyses are not mature, and often must be custom-designed, programming skills are also important.Multiple linear regression (MLR) has seen wide use in economics and affiliated fields, as it is a useful technique for assessing the relationships between variables and thereby developing understanding from data. MLR represents an early, simple application of data analytics to weather prediction in the form of Model Output Statistics (MOS), which seeks to map numerical weather prediction model output to observations. More sophisticated techniques, like artificial neural networks (ANN), including its extension to Deep Learning, or various machine-learning approaches such as Evolutionary Programming, are now gaining currency in many fields, and have excellent potential for use in atmospheric sciences. A straightforward example of an atmospheric science question that can be answered with data analytics is “Can we forecast daily peak electricity load given available atmospheric inputs?” Rather than build a comprehensive, numerical model that encompasses both the meteorology and the built-environment energy usage that results, using data analytics, we would start by collecting relevant data and building a data model using MLR or and ANN. Given the curse of dimensionality, which requires an exponential increase in the length of time-series data as the number of variables considered increases, we would need to know something about energy usage to guide our choice of data to collect. The built data model would confirm that the most predictive variable by far is temperature, and in the warm season, apparent temperature (the combination of temperature and humidity), but that other information such as time-of-day, wind speed and direction, cloud cover, and snow on the ground are also relevant in some situations, and likewise, that changing energy usage patterns over time need to be accounted for in the analysis. A question of interest to a fan of American football might be “What is the contribution of penalty calls to NFL home field advantage?” Rather than simply argue about it over a beer, data analytics can provide an answer. One would collect play-by-play data (available online) to build a model of the contribution of factors like position on the field, time remaining in the game, down-and-distance, score, and so on to estimate for any situation the win probability. Using that model, we would find that the answer to our original question is approximately 18%. Data analytics methods are similar in each example, but the specifics in each are guided by an understanding of the domain under study. In this seminar, I will provide specific examples in the meteorological domain using MLR, multiple logistic regression, ANN, and Evolutionary Programs. I will present some future directions I am developing, including Deep Learning applications, which are highly suited to the ubiquitous pattern recognition problems of weather prediction and are likely to gain increasing importance in meteorology.