Automating Predictive Modeling and Knowledge Discovery
Ioannis Tsamardinos, Ph.D., is a Professor in the Computer Science Department of UoC. He obtained his Ph.D. (in 2001) from the Intelligent Systems Program of University of Pittsburgh. Subsequently, he joined the faculty of the Department of Biomedical Informatics at Vanderbilt University until 2006 when he returned to Greece. His research interests lie in the field of Machine Learning, Data Science, and Bioinformatics and particularly variable selection, causal discovery, and automation of machine learning. He has mostly applied such methods on Bioinformatics and Biomedical Informatics. Ioannis Tsamardinos has over 100 international refereed publications in journals, conferences and edited volumes, more than 6000 citations in Google Scholar, and 2 US patents. He has been awarded the ERC Consolidator Grant and the Greek national grant on research excellence ARISTEIA II.
There is an enormous, constantly increasing need for data analytics (collectively meaning machine learning, statistical modeling, pattern recognition, and data mining applications) in a vast plethora of applications and including biological, biomedical, and business applications. The primary bottleneck in the application of machine learning is the lack of human analyst expert time and thus, a pressing need to automate machine learning, and specifically, predictive and diagnostic modeling. In this talk, we present the scientific and algorithmics problems arising from trying to automate this process, such as appropriate choice of the combination of algorithms for preprocessing, transformations, imputation of missing values, and predictive modeling, tuning of the hyper-parameter values of the algorithms, and estimating the predictive performance and producing confidence intervals. In addition, we present the problem of feature selection and how it fits within an automated analysis pipeline, arguing that feature selection is the main tool for knowledge discovery in this context.
Emojis, Sentiment and Stance in Social Media
Dr. Petra Kralj Novak is a researcher at the Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia. Her research belongs to the wide area of knowledge discovery from databases. Currently, as a postdoctoral researcher, she analyses social and mainstream media focusing on the mediated sentiment. She publishes in main machine learning and interdisciplinary journals and conferences. Avant-garde research in analyzing the role of emojis in conveying sentiment was published in P. Kralj Novak, et al. "Sentiment of emojis" and is the main reference for current research in emoji use. Dr. Petra Kralj Novak is also assistant professor at the Jožef Stefan International Postgraduate School (Ljubljana, Slovenia), and at Faculty of Information Studies in Novo Mesto (Slovenia). She has given seminars to academic (e.g., Georgia State University, Fudan University, University of Ljubljana) and industrial audiences (Career Builder, LLC [USA]). She was also invited speaker at international conferences (CMC Corpora 2016, SCSC 2018).
Social media are computer-based technologies that provide means of information and idea sharing, as well as entertainment and engagement handly available as mobile applications and websites to both private users and businesses. As social media communication is mostly informal, it is an ideal environment for the use of emoji. We have collected Twitter data and engaged 83 human annotators to label over 1.6 million tweets in 13 European languages with sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. We have computed the sentiment of the emojis from the sentiment of the tweets in which they occur. We observe no significant differences in the emoji rankings between the 13 languages. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. In this talk, several emoji, sentiment and stance analysis applications will be presented, varying in data source, topics, language, and approaches used.