The award from the Association for Computing Machinery recognizes outstanding work in the areas of data science, machine learning and data mining.
Ren (MS ’16, Ph D ’18), who is now an assistant professor of computer science at USC, points out that many real-world applications rely on being able to quickly understand and analyze text data – from news, medical texts, and any number of other sources – and at volumes he says are “almost impossible for human to digest and curate.” But different types of data about different subjects are often expressed in ways specific to that subject, or in language unique to the individual author – in other words, they’re messy.
“The key is leveraging existing knowledge-base facts, which are already curated by human crowds, to automatically generate labeled data at a large scale, and train noise-robust machine-learning models with such automatically labeled data,” he said.
Being able to do that depends on two key factors: Ren says the methodology he proposed in his dissertation has led to the current work he is doing with his students at USC and with collaborators at Illinois, Stanford and elsewhere on a much broader set of techniques for information extraction and text mining.
Ren is only the most recent student from Abel Bliss Professor Jiawei Han’s Data Mining Group to receive ACM SIGKDD Dissertation Awards.
Others include Xiaoxin Yin in 2009, Yizhou Sun in 2013, and Chi Wang in 2015.
“Xiang did brilliant work during his Ph D study and his work has been well cited and well recognized,” Han said.
Ren called studying under Han “an amazing experience” that prepared him for an academic career by including him in grant-proposal writing, paper reviewing, principal investigator meetings, and guest lectures.
Perozzi is the first Ph D student in the Department of Computer Science, which is part of the College of Engineering and Applied Sciences at Stony Brook University, to receive this award.
About the Association for Computing Machinery Founded in 1947, the ACM is the largest and oldest scientific and industrial computing society.