Prof. Cédric Heuchenne - HEC, Liège, Belgium
INVITED SPEAKER
Cédric Heuchenne holds a Master’s degree in Applied Sciences and a Ph.D. in Statistics from the Catholic University of Louvain and has more than 14 years working as a Statistics/Data Science professor at the University of Liège.
From 2017 until now, he has been the Scientific Advisor at International Research Institute for Artificial Intelligence and Data Science (IAD), Dong A University, Vietnam.
He published more than 40 articles in prestigious peer-reviewed scientific journals (with impact factor and indexed by Scopus), more than 30 articles presented at international conferences, and he has been invited by more than 30 universities for seminars, workshops and research stays. He also participated in writing some chapters of statistics/finance/engineering books, working for two scientific journals as associate editor, and supervised 6 completed Ph.D. Thesis. Besides, Cédric got supported by five important projects funded by Belgian/European funds (more than 400000 euros each) and supervised the whole process from the research content to the management of the research group (he hired more than 20 persons - doctoral students and post-docs -) and the communication with partners, included other universities and companies.
Research interests:
- Survival or duration data analysis,
- Nonparametric statistical inference,
- Machine learning,
- Statistical process control,
- Quality management,
- Risk modeling.
Title: Imputation techniques for data fusion and anonymization of survey data
Abstract:
When dealing with administrative/survey data to solve complex public economics problems, we often use a large number of variables available in different databases. We thus need to analyze data from different sources; the observations, which only share a subset of the variables, cannot always be paired to detect common individuals. This is the case, for example, when the information required to study a certain phenomenon comes from different sample surveys. Statistical matching is a common practice to combine these data sets. In this talk, we investigate and extend to statistical matching three methods based on Kernel Canonical Correlation Analysis (KCCA; [2]), SuperOrganizing Map (Super-OM; [1]) and Autoencoders-Canonical Correlation Analysis (A-CCA; [3]). These methods are designed to deal with various variable types, sample weights and incompatibilities among categorical variables. In our context, data privacy and anonymization are important. Under these circumstances, the need for synthetic databases that replicate the characteristics of the population while preserving privacy is arising. In this presentation, we investigate how we can use Wasserstein Generative Adversarial Networks (WGANs), developed by Arjovsky 2017 [4] in the context of image synthesis, to create administrative databases and we also adapt it to take weights into account. Administrative data have the specificity of mixing continuous and categorical data, which should be taken into account in the architecture of the WGANs.
Researchgate: https://www.researchgate.net/profile/Cedric-Heuchenne