Structural Bioinformatics and Citizen Science
Prof. Waldispühl leads the research group Structural Bioinformatics and Citizen Science. One of their research lines involves developing crowd-computing games to analyze genomic data. In the first game they created, Phylo, the player moves coloured blocks in a Tetris-like environment. What the player is actually doing is finding a solution for a difficult DNA multiple sequence alignment problem. The player needs no knowledge of genomics to play, just the simple rules of the game. The website also features a researcher portal allowing scientists to upload their sequence and an educational site for instructors.
Prof. Waldispühl leads also the DNA puzzles project, which combines computer scientists and bioinformaticians at the School of Computer Science. They develop artificial intelligence systems that combine the intuition of humans with the speed of computers to solve complex problems in biology and genomics. Their technology aims to tap into the vast reservoir of problem-solving skills acquired by video gamers while maintaining the entertaining experience. Their data and software are freely accessible. Some of the projects they are working on are Project Discovery and Borderlands Science.
Data Mining and Security
Introducing Kam1n0, a MapReduce-based Assembly Clone Search for Reverse Engineering
Detecting malware is a difficult problem, largely because the pernicious code's purpose can only be known once its assembly code is reverse engineered into a human-readable form and then analyzed by a software engineer. The volume of new assembly code introduced every day makes such analysis impossible, if we had to start from scratch each time. However, many pieces of assembly code are reused from one program to another. Ph.D. candidate Steven H.H. Ding and Prof. Benjamin Fung, in collaboration with Defence Research and Development Canada (DRDC), have developed Kam1n0, a tool that allows engineers to search through a repository of already-analyzed assembly code. A deployed demo system is publicly available. Extensive experimental results suggest that Kam1n0 is accurate, efficient, and scalable for handling large volume of assembly code. This software won the second prize in the Hex-Rays Plug-In Contest 2015. Here is a short video introducing Kam1n0:
More information can be obtained on the Data Mining and Security Lab's web page.
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 461-470, San Francisco, CA: ACM Press, August 2016.
An Online Journal on the Computational Study of Culture
The Journal of Cultural Analytics, edited by Prof. Andrew Piper, is an open-access journal dedicated to the computational study of culture. The aim of this online journal is to promote high quality scholarship that applies computational and quantitative methods to the study of cultural objects (sound, image, text), cultural processes (reading, listening, searching, sorting, hierarchizing) and cultural agents (artists, editors, producers, composers).
Prof. Piper also directs the lab .txtLAB, which explores the use of computational and quantitative approaches towards understanding literature and culture in both the past and present. Their aim is to use the tools of data science, network analysis and machine learning to promote a more inclusive understanding of culture and creativity.
Hermeneutica: text analysis using a new and innovative set of tools
Prof. Stéfan Sinclair's book Hermeneutica introduces text analysis using modern research methods. The book presents the theory and offers a set of analytical tools called Voyant. Using Voyant, users can integrate interpretation into texts by creating hermeneutica—small embeddable “toys” that can be embedded into blogs, wikis or other online writing environments. The book’s companion website, Hermeneuti.ca, offers the example essays with both text and embedded interactive panels. The tools themselves can be used online at voyant-tools.org. Here is what the output of those tools looks like when used on the "What Is Data Science" page from this site:
Machine Learning and Reasoning
A System that Participates in Dialogue
A team of researchers, including McGill Ph.D. student Ryan Lowe, Prof. Joelle Pineau and partners from Université de Montréal and Maluuba, derived a new neural network model that can generate the next sentence in a dialogue, given the context of the conversation. Their approach uses latent variables to model the sentence generation process as a sequence of decisions: first, the model decides a high-level topic by sampling the latent variable, and then the model turns this idea into words using a recurrent neural network (RNN). We show that our model is preferred by humans over several baselines and generates longer sentences that are more on-topic. Their method even learns to reply in different languages according to the context of the conversation. You can read their paper here.
How Not to Evaluate a Dialogue System
Finding methods to automatically evaluate the quality of a dialogue system is an open problem, especially for chatbots when there is no specific task to accomplish. Our research team showed that current methods used to automatically evaluate the quality of a response from a dialogue system in this setting are almost entirely useless. In particular, they show that these methods correlate very weakly or not at all with how humans would rate the quality of the responses. More research is needed to derive effective automatic evaluation methods. Read more about their findings here.
About 4.3 million people in the U.S. alone use powered wheelchairs. A smart wheelchair would greatly improve the lives of these individuals. The SmartWheeler project's goal is to increase the autonomy and safety of individuals with severe mobility impairments by developing a robotic wheelchair that is adapted to their needs. The project tackles a range of challenging issues, focusing in particular on tasks pertaining to human-robot interaction, and on robust control of the intelligent wheelchair. The platform also serves as a test-bed for validating novel concepts and algorithms for automated decision-making for socially assistive robots. You can read more about the SmartWheeler at the project's web-site or in these papers:
B. Kim, J. Pineau. "Socially Adaptive Path Planning in Dynamic Environments Using Inverse Reinforcement Learning". International Journal of Social Robotics (SORO). 2015. [paper]
Mathematics and Statistics
Insurance Premium Prediction
What is a fair insurance premium? This question is an important one for the Insurance Industry, because setting premiums too high can make good customers leave while only the riskiest ones stay, the worst of both worlds. A team of researchers including Prof. Yi Yang have developed a gradient tree-boosting algorithm TDboost that forgoes linear assumptions and provides accurate premium predictions for complex data sets. Their algorithm has applications in other domains, including ecology, meteorology and political science. You can read their paper here. Supplementary material is available as well.
Network Dynamics Lab
The Network Dynamics Lab investigates questions related to robustness of life, the propagation of behavior and knowledge through social networks, and life cycle of human communities.
Living complex systems range in size, function, and composition from microscopic cells to international financial markets. The behavior of these systems emerge out of actions and interactions among the many individual agents that comprise them.
- What is the interplay between system structure and dynamics that gives rise to each system’s overarching behavior?
- Can we use knowledge of a system’s design and composition to predict its behavior?
- Can we measure and characterize the life cycles of complex systems?