My ARIA project was about an algorithm called AutoIV. An instrumental variable is an econometric method used to estimate causal relations between variables given the presence of a confounder variable. However, a usual issue is that they are very hard to find because they must satisfy exclusion and association criterias, the first one being unverifiable. The AutoIV method was developed by Yuan et al (2022) and uses a series of statistical artificats to artificially create an instrumental variable from other observed data.
I was very interested in this project because it combines econometrics, statistics, and machine learning. When I graduate I would like to work somewhere in the field of economic research and that usually involves a lot of statistical analysis which can be complemented with machine learning techniques. So I believe this project was a great way to combine all my interests and have a glance of what work could look for me in the future. In fact, I believe this project would be great to talk about during job interviews.
The learning objective of the project was to understand the method that was being used in Yuan et al (2022). To do so, I needed to understand the basics of conterfactual prediction and instrumental variables. Moreover, we wanted to implement the AutoIV algorithm to see if we got similar results as the paper.
The highlight of the ARIA project occurred when I finally managed to clean all the code and make everything run smoothly, so that I could finally just evaluate the results and compare them to the papers’ results. This was a hurray moment. Another highlight of the project was writing the report for the math department. This is a seven-page report that consolidates the theory I learned at the beginning of the project, the main ideas of Yuan et al (2022)., the results I got, and a short discussion of possible extensions and caveats.
Now, to get there, I went through the greatest debugging experience of my life. I know how to code fairly well, but I had never inherited someone else’s code to work with. For this project, however, I started with the code that was provided in the GitHub repository from Yuan et al. (2022) which I initially assumed would run smoothly. However, because they were using an older version of a machine learning package called TensorFlow, the code no longer could run. In order to fix this, me and professor Steele tried different things like installing the older version of the package and installing a virtual computer. However, none of this work. What I ended up doing was manually debugging the code by trying to run it, finding where it failed, and Googling the updated version of that specific line of code. Another challenge I had to overcome was to realize that I was not going to understand 100% of the AutoIV technique. The statistical artifacts they use to create the IV involve a lot of advanced statistics that I would probably only learn about if I did a PhD in statistics. But being used to cover topics extensively during class, it was a learning curve to understand that I would satisfy the requirements of the project by understanding the big picture of the procedure.
This project has made me consider a masters in statistics at McGill. I took a math major in undergrad because I thought it would complement my interest for economics well. But after taking statistics-related classes as well as this project, I think I should have taken a statistics major instead. In particular, instrumental variables which are the most important concept of this project are commonly used to solve causal questions in the field of economics. Even if I don’t end up pursuing a masters in statistics, having this experience has exposed me to how research works. In particular, I learned that it’s not a linear process and that things take time and that is okay.
Finally, I would like to thank my donor Mark W. Gallop for giving me the opportunity to dedicate the summer to learning about statistics, machine learning, and econometrics. All the best.