Research

Computer Networks Research

The Friendship Paradox

Your friends probably have more friends than you do. That is, if one examines any social network, the number of friendships of any given person will likely be less than the average number of friendships of that person's friends. This phenomenon is known as the Friendship Paradox. McGill Researchers Naghmeh Momeni and Prof. Michael Rabbat explored that paradox by looking at Twitter networks and creating six new measures to quantify social influence within their network. Their research shows that Twitter networks have a more hierarchicial than a star-like structure. That is, a user is more likely to follow someone who has more followers than they do. That means that influence becomes concentrated into a relative few number of users. You can read more details of their findings in their paper.

More Data Science Publications from the Computer Networks Research Group:

S. Lawlor, T. Sider, N. Eluru, M. Hatzopoulou, and M.G. Rabbat, Detecting convoys using license plate recognition data, IEEE Transactions on Signal and Information Processing Over Networks, accepted May 2016. [paper]

S. Magnusson, P.C. Weeraddana, M.G. Rabbat, and C. Fischione, On the convergence of alternating direction Lagrangian methods for nonconvex structured optimization problems, IEEE Transactions on Control of Networked Systems, accepted July 2015.  [paper]

A. Iscen, M. Rabbat and T. Furon, Efficient large-scale similarity search using matrix factorization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, June 2016. [paper] [code] [pre-computed models]

Computer Science and Biology

Prof. Waldispuhl and his teams of researchers have created online games that allow them to crowdsource computationally challenging problems. Theur first such game was was Phylo, in which the player moves coloured blocks in a Tetris-like environment. What the player is actually doing is finding a solution for a difficult DNA multiple sequence alighnment problem. The player needs no knowledge of genomics to play, just the simple rules of the game.

Encouraged by the results obtained from this game, Prof. Waldispuhl and a second team created Ribo, which helps solve multiple sequence alignment problems for RNA.

You can play either of these games by following the links above and then read about the insights they helped obtain in these papers:

D. Kwak, A. Kam, D. Becerra, Q. Zhou, A. Hops, E. Zarour, A. Kam, L. Sarmenta, M. Blanchette, and J. Waldispühl. Open-phylo: a customizable crowd-computing platform for multiple sequence alignment. Genome Biology, 14(10):R116, October 2013. [paper]

A. Kawrykow, G. Roumanis, A. Kam, D. Kwak, C. Leung, C. Wu, E.e Zarour, Phylo players, L. Sarmenta, M. Blanchette, and J. Waldispühl. Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One, 7(3):e31362, March 2012. [paper]

J. Waldispühl, A. Kam, and P. P. Gardner. Crowdsourcing RNA structural alignments with an online computer game. In Proceedings of the 20th Pacific Symposium on Biocomputing, 2015. [paper]

Data Mining and Security

Introducing Kam1n0, a MapReduce-based Assembly Clone Search for Reverse Engineering

Detecting malware is a difficult problem, largely because the pernicious code's purpose can only be known once its assembly code is reverse engineered into a human-readable form and then analyzed by a software engineer. The volume of new assembly code introduced every day makes such analysis impossible, if we had to start from scratch each time. However, many pieces of assembly code are reused from one program to another. Ph.D. candidate Steven H.H. Ding and Prof. Benjamin Fung, in collaboration with Defence Research and Development Canada (DRDC), have developed Kam1n0, a tool that allows engineers to search through a repository of already-analyzed assembly code. A deployed demo system is publicly available. Extensive experimental results suggest that Kam1n0 is accurate, efficient, and scalable for handling large volume of assembly code. This software won the second prize in the Hex-Rays Plug-In Contest 2015. Here is a short video introducing Kam1n0:

More information, including the paper and demo software, can be obtained on the Data Mining and Security Lab's web page.

More Data Science Publications from the Data Mining and Security Lab:

​J. Liu, K. Wang, and B. C. M. Fung. Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering (TKDE), 28(5):1245-1257, May 2016. IEEE Computer Society. [paper]

S. H. H. Ding, B. C. M. Fung, and M. Debbabi. A visualizable evidence-driven approach for authorship attribution. ACM Transactions on Information and System Security (TISSEC), 17(3):12.1-12.30, March 2015. ACM Press. [paper]

K. Al-Hussaeni, B. C. M. Fung, and W. K. Cheung. Privacy-preserving trajectory stream publishing. Data & Knowledge Engineering (DKE), 94(A):89-109, November 2014. Elsevier. [paper]

S. Goryczka, L. Xiong, and B. C. M. Fung. m-privacy for collaborative data publishing. IEEE Transactions on Knowledge and Data Engineering (TKDE), 26(10):2520-2533, October 2014. IEEE Computer Society. [paper]

R. Chen, B. C. M. Fung, P. S. Yu, and B. C. Desai. Correlated network data publication via differential privacy. Very Large Data Bases Journal (VLDBJ), 23(4):653-676, August 2014. Springer. [paper]

R. H. Khokhar, R. Chen, B. C. M. Fung, and S. M. Lui. Quantifying the costs and benefits of privacy-preserving health data publishing. Journal of Biomedical Informatics (JBI): Special Issue on Informatics Methods in Medical Privacy, 50:107-121, August 2014. Elsevier. [paper]

A. Basher and B. C. M. Fung. Analyzing topics and authors in chat logs for crime investigation. Knowledge and Information Systems (KAIS): An International Journal, 39(2):351-381, May 2014. Springer. [paper]

N. Mohammed, D. Alhadidi, B. C. M. Fung, and M. Debbabi. Secure two-party differentially private data release for vertically-partitioned data. IEEE Transactions on Dependable and Secure Computing (TDSC), 11(1):59-71, January/February 2014. IEEE Computer Society. [paper]

M. Ghasemzadeh, B. C. M. Fung, R. Chen, and A. Awasthi. Anonymizing trajectory data for passenger flow analysis. Transportation Research Part C: Emerging Technologies (TRC): An International Journal, 39:63-79, February 2014. Elsevier. [paper]

Digital Humanities

An Online Journal on the Computational Study of Culture

The Journal of Cultural Analytics is a new online journal edited by Prof. Andrew Piper. CA is "a new open-access journal dedicated to the computational study of culture. Its aim is to promote high quality scholarship that intervenes in contemporary debates about the study of culture using computational and quantitative methods." 

Prof. Piper directed a multiuniversity initiative NovelTM: Text Mining the Novel, a partnership that "seeks to produce the first large-scale cross-cultural study of the novel according to quantitative methods." Prof. Piper also directs the lab .txtLAB, whose publications are listed here

Hermeneutica: text analysis using a new and innovative set of tools

Prof. Steven Sinclair's book Hermeneutica introduces text analysis using modern research methods. The book presents the theory and offers a set of analytical tools called Voyant. Using Voyant, users can integrate interpretation into texts by creating hermeneutica—small embeddable “toys” that can be embedded into blogs, wikis or other online writing environments. The book’s companion website, Hermeneuti.ca, offers the example essays with both text and embedded interactive panels. The tools themselves can be used online at voyant-tools.org. Here is what the output of those tools looks like when used on the "What Is Data Science" page from this site:

More Data Science Publications from Digital Humanities:

 A. Piper and E. Portelance, “How Cultural Capital Works: Prizewinning Novels, Bestsellers, and the Time of Reading.” Post45 (2016). [paper]

H. Vala, A. Piper, and D. Ruths. “The More Antecedents the Merrier: Tackling Multiple Antecedents in Anaphor Resolution.” Association for Computational Linguistics (ACL-2016). [paper]

H. Vala, S. Dimitrov, D. Jurgens, A. Piper, and D. Ruths. “Annotating Characters in Literary Corpora: A Scheme, the Charles Tool, and an Annotated Novel.” Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC-2016). [paper]

H. Vala, D. Jurgens, A. Piper, and D. Ruths, “Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts.” Conference on Empirical Methods in Natural Language Processing (EMNLP-2015). [paper]

S. Dimitrov, F. Zamal, A. Piper, and D. Ruths, “Goodreads vs Amazon: The Effect Of Decoupling Book Reviewing And Book Selling.” Proceedings of the Ninth International Conference on Web and Social Media (ICWSM-14) 2015. [paper]

A. Piper, “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel.” New Literary History 46.1 (2015): 63-98. [paper]

Machine Learning and Reasoning

A System that Participates in Dialogue

A team of researchers, including McGill Ph.D. student Ryan Lowe, Prof. Joelle Pineau and partners from Université de Montréal and Maluuba, derived a new neural network model that can generate the next sentence in a dialogue, given the context of the conversation. Their approach uses latent variables to model the sentence generation process as a sequence of decisions: first, the model decides a high-level topic by sampling the latent variable, and then the model turns this idea into words using a recurrent neural network (RNN). We show that our model is preferred by humans over several baselines and generates longer sentences that are more on-topic. Their method even learns to reply in different languages according to the context of the conversation. You can read their paper here.

How Not to Evaluate a Dialogue System

Finding methods to automatically evaluate the quality of a dialogue system is an open problem, especially for chatbots when there is no specific task to accomplish. Our research team showed that current methods used to automatically evaluate the quality of a response from a dialogue system in this setting are almost entirely useless. In particular, they show that these methods correlate very weakly or not at all with how humans would rate the quality of the responses. More research is needed to derive effective automatic evaluation methods. Read more about their findings here.

The SmartWheeler

About 4.3 million people in the U.S. alone use powered wheelchairs. A smart wheelchair would greatly improve the lives of these individuals. The SmartWheeler project's goal is to increase the autonomy and safety of individuals with severe mobility impairments by developing a robotic wheelchair that is adapted to their needs. The project tackles a range of challenging issues, focusing in particular on tasks pertaining to human-robot interaction, and on robust control of the intelligent wheelchair. The platform also serves as a test-bed for validating novel concepts and algorithms for automated decision-making for socially assistive robots. You can read more about the SmartWheeler at the project's web-site or in these papers:

B. Kim, J. Pineau. "Socially Adaptive Path Planning in Dynamic Environments Using Inverse Reinforcement Learning". International Journal of Social Robotics (SORO). 2015. [paper]

A. Leigh, J. Pineau, N. Olmedo, H. Zhang. "Person Tracking and Following with 2D Laser Scanners". International Conference on Robotics and Automation (ICRA). 2015. [paper], [code], [datasets]

Mathematics and Statistics

Insurance Premium Prediction

What is a fair insurance premium? This question is an important one for the Insurance Industry, because setting premiums too high can make good customers leave while only the riskiest ones stay, the worst of both worlds. A team of researchers including Prof. Yi Yang have developed a gradient tree-boosting algorithm TDboost that forgoes linear assumptions and provides accurate premium predictions for complex data sets. Their algorithm has applications in other domains, including ecology, meteorology and political science. You can read their paper here. Supplementary material is available as well.

More publications from Mathematics and Statistics:

W. Qian, Y. Yang, and H. Zou (2016). Tweedie's Compound Poisson Model With Grouped Elastic Net. Journal of Computational and Graphical Statistics. 25(2), 606-625. [paper] [appendix]

Y. Yang, and H. Zou (2014). A Fast Unified Algorithm for Solving Group-Lasso Penalized Learning Problems. Statistics and Computing. 25(6), 1129–1141. [paper]

Y. Yang, and H. Zou (2013). An Efficient Algorithm for Computing The HHSVM and Its Generalizations. Journal of Computational and Graphical Statistics. 22(2), 396–415. [paper]

Y. Yang, and H. Zou (2012). A Cocktail Algorithm for Solving The Elastic Net Penalized Cox’s Regression in High Dimensions. Statistics and Its Interface. 6(2), 167-173. [paper]

Network Dynamics Lab

D. Jurgens, T. Finethy, J. McCorriston, Y.T. Xu, D. Ruths (2015). Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice.  In Proceedings of the Ninth International AAAI Conference on Web and Social Media, 2015. [paper]
 
D. Jurgens, J. McCorriston, D. Ruths (2015). An Analysis of Exercising Behavior in Online Populations. In Proceedings of the Ninth International AAAI Conference on Web and Social Media, 2015. [paper]
 
J. McCorriston, D. Jurgens, D. Ruths (2015). Organizations are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter.  In Proceedings of the Ninth International AAAI Conference on Web and Social Media, 2015. [paper]
 
D. Ruths and J. Pfeffer (2014). Social media for large studies of behavior. Science 346(6213):1063-1064. [paper]
 
J. Ruths and D. Ruths (2014). Control Profiles of Complex Networks. Science 343(6177): 1373-1376. [paper]

Operations Management

S. Samiedaluie, B. Kucukyazici, V. Verter and D. Zhang (2016). Patient Admission Policies in a Neurology Ward. Forthcoming in Operations Research, (accepted September 2016). [paper]

A. Graber, M. Carter, V. Verter, (2016). Restructuring the Education System for Improving the Equity of Access to Primary Care. Forthcoming in European Journal of Operational Research, (accepted September 2016). [paper]

R. Ibrahim,  B. Kucukyazici, V. Verter, M. Gendreau, and M. Blostein (2016). Designing Personalized Treatment: An Application to Anticoagulation Therapy. Production and Operations Management 25(5), 902–918. [paper]

W. Klement, W. Szymon , W. Michalowski, K. Farion, M.H. Osmond, and V. Verter (2012). Predicting Need for CT Imaging of Children with Minor Head Injury using an Ensemble of Naive Bayes Prediction Models. Artificial Intelligence in Medicine 54(3), 163-170. [paper]

B. Kucukyazici, V. Verter, and N. Mayo (2011). An Analytical Framework for Designing Community-based Care for Chronic Diseases, Production and Operations Management 20(3), 474-488. [paper]