Talk: The Application of Data Analytics to Atmospheric Science

Event

Burnside Hall Room 934, 805 rue Sherbrooke Ouest, Montreal, QC, H3A 0B9, CA

Invited Speaker: Prof. Paul J. Roebber, Distinguished Professor, Atmospheric Science Group, Department of Mathematical Sciences, University of Wisconsin at Milwaukee 

Where: Burnside Hall, Room 934

Abstract:

Even my 87-year old mother has heard of “Big Data” and “Data Analytics.” But stripping away the marketing noise, what is it, really, and why should physical scientists such as us care about it? The ability to leverage data to improve understanding has always been important, but is becoming increasingly so as data becomes more readily available and  the  need  increases  to  extract  some  measure  of  value  from  its  rising  volume.  Data  analytics  provides  the methodology. The requirements for a practitioner in this field are application-oriented math and statistics knowledge allied with substantive domain expertise. Since the software tools needed to perform the necessary analyses are not mature, and often must be custom-designed, programming skills are also important.  

Multiple linear regression (MLR) has seen wide use in economics and affiliated fields, as it is a useful technique for assessing the relationships between variables and thereby developing understanding from data. MLR represents an early, simple application of data analytics to weather prediction in the form of Model Output Statistics (MOS), which seeks  to  map  numerical  weather  prediction  model  output  to  observations.  More  sophisticated  techniques,  like  artificial neural networks (ANN), including its extension to Deep Learning, or various machine-learning approaches such as Evolutionary Programming, are now gaining currency in many fields, and have excellent potential for use in atmospheric sciences. 

A straightforward example of an atmospheric science question that can be answered with data analytics is “Can we forecast daily peak electricity load given available atmospheric inputs?”  Rather than build a comprehensive, numerical model that encompasses both the meteorology and the built-environment energy usage that results, using data analytics, we would start by collecting relevant data and building a data model using MLR or and ANN. Given the curse of dimensionality, which requires an exponential increase in the length of time-series data as the number of variables considered increases, we would need to know something about energy usage to guide our choice of data to collect. The built data model would confirm that the most predictive variable by far is temperature, and in the warm season, apparent temperature (the combination of temperature and humidity), but that other information such  as  time-of-day,  wind  speed and direction,  cloud  cover,  and  snow  on  the  ground are  also  relevant  in  some situations, and likewise, that changing energy usage patterns over time need to be accounted for in the analysis. 

A question of interest to a fan of American football might be “What is the contribution of penalty calls to NFL home field advantage?” Rather than simply argue about it over a beer, data analytics can provide an answer. One would collect play-by-play data (available online) to build a model of the contribution of factors like position on the field, time remaining in the game, down-and-distance, score, and so on to estimate for any situation the win probability. Using that model, we would find that the answer to our original question is approximately 18%. Data analytics methods are similar in each example, but the specifics in each are guided by an understanding of the domain under study.

In this seminar, I will provide specific examples in the meteorological domain using MLR, multiple logistic regression, ANN,  and  Evolutionary  Programs.  I  will  present  some  future  directions  I  am  developing,  including  Deep  Learning applications, which are highly suited to the ubiquitous pattern recognition problems of weather prediction and are likely to gain increasing importance in meteorology.

Additional note: Prof. Roebber is also available to meet with people who are interested on Fri afternoon from 2:30 onwards.  If you are interested, please contact John Gyakum (john.gyakum [at] mcgill.ca).