Somewhere between being a lab rat and writing software to make your online search results better, I've ended up here as a second year Ph.D student at McGill's Quantitative Life Sciences programme. It's been a wild ride, and I'd love to set the stage for how and why I've been excited about cancer research and probabilistic models, the sort behind the scenes of your online searches for pet food.
From my first year as an undergraduate up until the end of my master's degree, I felt lucky to be able to work at the bench in molecular biology laboratories that studied cancer. If I learned one thing during those five years running experiments and poking at cells in a dish, it was that biological data was messy and hard to get. Experiments that can take weeks to plan and months to perform have millions of factors to consider (did you know your DNA expression changes with the seasons and weather or that lab mice behave differently with men than they do with women?). The only thing you can guarantee is that you've failed to control for something.
Statistics can help us a bit when it comes to messiness. When noise and weird things out of our control get in the way of us directly pin-pointing an answer, we can often use statistics to get within a ball-park and better understand what's going on. It's always exciting when you're able to estimate with a certain confidence seemingly impossible things like the chance of it raining tomorrow or understanding which genes interact with one another.
The last obstacle is that biological data is hard to collect. Luckily, labs around the world conducting experiments have been sharing their findings online. More than 300,000 experiments get added each year to the Genome Expression Omnibus, an online database for storing data from experiments measuring which genes appear to be making proteins by measuring the next step (i.e. RNA transcription).
I've been working with Prof. Yue Li to better understand how we can leverage and wrangle the increasing amount of biomedical data we as a community have been collecting, and create tools to better derive meaning from the otherwise dizzyingly complexity of it all. We're also interested in using information already collected to help us learn more from less. An example of that is "tumour deconvolution", whereby scientists try to infer cell populations (difficult and expensive to measure) from the average expression of genes across a large number of cells (much easier and less expensive to measure).
I'm excited to start this next chapter with my fellow QLS students, all of which have similarly worn many hats and aren't afraid to wear them all at once.