Study Artificial Intelligence and Data Science at Dong A University

In a broad sense, AI is a set of algorithms that enable computers to simulate intellectual activities and perform some “intelligent” tasks like humans. AI algorithms include Machine Learning, Deep Learning, and Reinforcement Learning. These algorithms are programmed based on the strong development of computational and programming tools of computer science

1. What is AI?

In a broad sense, AI is a set of algorithms that enable computers to simulate intellectual activities and perform some “intelligent” tasks like humans. AI algorithms include Machine Learning, Deep Learning, and Reinforcement Learning. These algorithms are programmed based on the strong development of computational and programming tools of computer science. Nowadays, the applications of AI can be seen visually in a lot of practical issues such as virtual assistant system, customer recommendation system, self-driving car design, face recognition software, smart game programming (for example AlphaGo).

2. What is Data Science?

An interdisciplinary field that combines mathematical, statistical and machine learning skills to analyze data to help users better understand the meaning of data to validate hypotheses, simulate happened or happening situations, and to anticipate future events.

Data science aims to collect, analyze and apply information about customers, clients, needs, demands, context, behaviours,… then provide a scientific approach that helps businesses transform a large amount of available data into usable information.

3. Objectives of education

The major “Artificial Intelligence and Data Science” is built on practices from industrial projects and recruitment needs of companies, with the aim to train high-quality, qualified engineers who also own self-study and self-research ability to work effectively and meet the development needs of society in the era of the 4th Industrial Revolution.

4. Career prospects

With the explosion of the 4th Industrial Revolution, engineers specializing in AI and DS are having an extremely promising career in the 21st century. In the US market, AI and DS engineers are the top 10 best earners with an average salary of about USD 110,000/ year. In Vietnam, this field of study is expected to be a dream job in the near future.

5. Career opportunities

Companies in Vietnam and around the world are still hunting engineers working in this field. According to NY Tech Search, the demand for recruiting AI and DS engineers since 2013 has increased by 300%. Many job opportunities after graduation at companies, corporations, large manufacturing enterprises such as:

  • Prediction Analysis and Business Strategies Development Specialist at manufacturing companies.
  • Risk Analysis and Insurance Management Specialist at banks, financial companies.
  • Chatbot development, product recommendation system engineers at businesses, advertising companies.
  • Data scientist at big companies, businesses, organizations.
  • Data analyst at public organizations, providing technical support for data processing, statistics, and policy and decision making support.
  • Diseases analysis and diagnosis engineer.
  • Agricultural disease analysis and diagnosis engineer.
  • Automatic robot system, self-driving car, intelligent system engineer.
  • AI algorithms development engineer at software solution companies.

6. The major Artificial Intelligence and Data Science at Dong A University

The topic of unsual findings using AI in Data Science has been studied since 2016 by UDA’s lecturers. These studies have been published in leading journals in the field of data science applied in manufacturing and business. In order to realize the training of major Data Science at UDA, a group of scientists (professors, researchers) from many countries, headed by Professor Cedric Heuchenne (Liege University, Belgium) will join hands with UDA to build and educate the major Artificial Intelligence and Data Science with the goal of bringing the curriculum to catch up with that of developed countries in the region. Moreover, the curriculum also aims to open up new horizons for businesses in taking advantage of AI and DS to optimize activities, operation, production, manufacture.

The curriculum will cover the core content of applied mathematics, statistics, background knowledge and characteristics of Big Data, AI algorithms with real-world applications such as text mining, visual computing, web development, automation system design, etc. These subjects will be taught by data science experts of UDA who graduated from European prestigious universities. In addition, some courses and projects will be taught and guided by data science professors from Belgium, France, UK, USA, Canada, Italy,…

Study Artificial Intelligence and Data Science at Dong A University

Especially, Dong A University and in particular the International Research Institute for Artificial Intelligence and Data Science has links with many universities in Europe and the US. It is possible for a student in engineering in AI and DS to follow one part of his program in Paris and get a master in “systèmes décisionnels et big data”:

http://eisti.fr/fr/international/master-international-en-systemes-decisionnelsbig-data.  

We have a strong anchoring with the industrial world, the service industry, the administration and the institutions both in Vietnam and in Europe. Thanks to this network, we offer many practical internships in order to apply the skills of our future graduates to the needs of the current world.

 First-year

First semester

Second semester

Course

Credits

Course

Credits

Mathematics for AI 1

5 (45+60)

Mathematics for AI 2

5

(45+60)

Introduction to R and descriptive statistics

3

(30+30)

Data analysis with R

3

(30+30)

Introduction to programming

4

(45+30)

English 2

3

(30+30)

English 1

3

(30+30)

First year project

3

(0+90)

Second-year

First semester

Second semester

Course

Credits

Course

Credits

Mathematics for AI 3

5

(45+60)

ML1 - Classification techniques in Big Data

4

(30+30)

Introduction to Python

3

(30+30)

Data Structures and Algorithms with Python

3

(30+30)

ML1 - Classification techniques in Big Data

4

(30+30)

ML2 - Regression techniques in Big Data

3

(30+30)

English 3

3

(30+30)

Second-year project

3

(0+90)

Third-year

First semester

Second semester

Course

Credits

Course

Credits

Visualization and Visual Analytics for Data Science

3

(30+30)

Deep Learning 2

4

(30+30)

Deep Learning 1

(45+60)

4

(45+30)

Data Management

3

(30+30)

System and Network Security

 

3

(30+30)

Design a Dashboard

3

(15+60)

Databases

3

(30+30)

Third-year project

3

(30+30)

Fourth-year

First semester

Second semester

Course

Credits

Course

Credits

Systems for Big Data Analytics

3

(30+30)

 

Engineer Project

 

15

Reinforcement learning

4

(30+30)

Network Analysis

3

(30+30)

Text Mining

4

(30+30)

Data Mining for Business and Society

3

(30+30)

 Description of the module

  1. Mathematics for Machine Learning (ML) 1

The aim of this module is to provide students with the basic knowledge in mathematics that is needed for them to be able to gain deeper and advanced knowledge in Machine Learning, Deep Learning, and Data mining. The first part will introduce students the linear algebra with the concepts related to the matrix, linear systems, and vector space. The second part presents the problem of calculus 1 that focuses on the differential and integral calculus for the univariate function. The third part will provide students with the fundamental probability, random variable, Bayes Theorem, the cumulative probability function, the density function, the central limit theorem, and some probability distributions.          

  • Linear algebra.
  • Calculus 1.
  • Probability.
  1. Mathematics for ML2

This module focuses on the advanced linear Matrix Decompositions, Matrix Decompositions, Matrix Approximation), Analytic Geometry (Inner Products, Orthogonal Complement, Inner Product of Functions, Orthogonal Projections), integration techniques (integration by parts, substitution, rationale fractions, Euler integration) and Calculus 2 (Partial Differentiation and Gradients, Gradients of Matrices, Higher-order Derivatives, Linearization and Multivariate Taylor Series, convex or nonconvex optimization under constraints), which is important to understand the theoretical results in Machine Learning and Deep Learning.

  • Advanced Linear Algebra and Analytic Geometry.
  •  Calculus 2.
  •  Integration.
  • Optimization under constraints.
  1. Mathematics for ML3
  • Statistical Inference.
  • Advanced Statistics.

This module provides both the basic principle of Statistics and advanced Statistics. In the second part, we focus on introducing students to the Bayesian methods for machine learning. The main ingredients of Bayesian thinking are presented and typical situations where a Bayesian treatment of the learning task is useful are exemplified. The computational challenges of Bayesian analysis and major approximation methods such as Monte-Carlo-Markov-Chain sampling and sequential sampling schemes are also introduced. The following content will be included:

  • Bayesian model, prior-posterior, exponential family.
  • Bayesian modeling and decision theory (Naïve Bayes, Bayesian Linear Regression, Bayesian decision theory).
  • Approximation methods (Monte-Carlo sampling methods, MCMC).
  1. Introduction to Data Science and Big Data

For this module, the students need some programming skills and the knowledge of statistics: linear regression, interpretation of a hypothesis test (Mathematics for ML). There is an overview of this course:

  • Data formats, storage.
  • Data exploration, visualization: Relation and the operations adapted between two types of variables - quantitative and qualitative.
  • Statistical methods, machine learning.
  • Big data frameworks, deep learning.

This module is limited by the introduction about Data Science, it helps students take an overview, a first “picture” about the domain of Data Science.

  1. Introduction to R and descriptive statistics

The objective of this course is to use descriptive statistical analysis techniques to summarize and analyze the data with software R. That consists of:

  • Computing measures of the center of the data, measures of spread, measures of relative positions.
  • Computing the correlation coefficient and making interpretations of the data.
  • Understanding the concepts of the sample, population, and the different methods of sampling.
  • Distinguishing the different types of variables (categorical, continuous…).
  • Constructing graphs that describe the distribution of a random variable (histogram, boxplot…).
  • Learning more advanced graphs that exhibit the main characteristics of a sample according to the data under study.
  1. Data analysis with R

This second R training consists of:

  • Conditional statements, loops, and functions to power R scripts.
  • Making R code more efficient and readable using apply functions.
  • Speeding with regular expressions in the R programming language data structure manipulations.
  • Importing data into R to start analyses of CSV and text files.
  • Efficiently importing flat file data (using reader and data. table packages) and reading XLS files in R (using readxl and gdata).
  • Importing data from relational databases, coming from the web or statistical software such SAS, STATA and SPSS.
  • Cleaning data in R using the tidyr, dplyr, and stringr packages.
  1. Introduction to programming

The first chapter of the course addresses the construction of algorithms. Primitive objects are given with primitive operations upon them. It is shown how arbitrarily complex algorithms acting on the objects can be built using sequential, conditional, and iterative compositions of algorithms. The emphasis is put on specifications and on the use of assertions to derive the correct code. The Java programming language is used in a controlled way as a tool to make the algorithms amenable to execution by a computer.

The second chapter is concerned with the representation of data.  The two's complement representations for negative numbers are explained as well as the representation of floating-point numbers. ASCII and Unicode representations for characters are described and discussed.

In the third chapter, the CPU of a simple computer is described together with its machine language. Students are taught how to write programs in this simple machine language. The design of subprograms with standard conventions for subprogram calls and returns as well as parameter passing is discussed in detail as it allows the student to better understand procedure calls and parameter passing methods in higher-level programming languages.

The fourth chapter addresses programming language concepts in a more systematic way. The Java programming language is used to illustrate the concepts but no attempt is made to provide a global overview of Java. In contrast, a limited number of topics are studied with care and in detail: primitive data types, literals, variables, arrays, the String class, static methods, expressions, a small subset of composed statements, primitive type conversions, method overloading. Classical algorithms for searching and sorting are built with this subset of Java.

  1. Databases

This module hinges on three phases:

  • Understand: Both the historical context and recent challenges and developments in the database field. relational theory, why it has been invented and how it fits in practice. implementation techniques and major algorithms for data organization, query and transaction processing.
  • Design: From conceptual modeling (e.g., Entity-Relationship, UML) down to physical database tuning (e.g., indexes, query plans), through logical database design (e.g., functional dependencies, normal forms, normalization algorithms) and reasoning (relational algebra, views and constraints).
  • Use: Installing and configuring database management systems, creating and tuning databases, using query languages in practice (e.g., SQL), connecting to databases (e.g., call interfaces, ORMs), integrating database systems in software designs.
  1. Introduction to Python

Python is a general-purpose programming language that is popular for doing data science. This course consists of

  • Using Python interactively and through a script.
  • Briefly introduce basic data types.
  • Learning how to store, access and manipulate data in the list.
  • Learning how to work with functions, methods, and packages.
  • Using the Python package NumPy, a smart tool to construct arrays and explore data.
  1. Data Structures and Algorithms with Python

This course introduces a collection of fundamental data structures and algorithms. It can be taught using any programming language. We will choose Python because of its benefits over other languages such as C++ or Java. The basic concepts related to abstract data types, data structures, and algorithms are presented in this module:

  • Abstract Data Types (ADT): some general definitions, use, and implementation.
  • Arrays: One-dimensional array, two-dimensional array are presented and implemented.
  • Sets and maps.
  • Algorithm analysis: The basic concept and importance of complexity analysis by evaluating the operations of Python’s list structure and the set ADT as implemented in the previous chapter.
  1.  Classification and Regression techniques in Big Data

These modules aim to provide the students with the most popular algorithms in machine learning. The students will be equipped with both theoretical results and practical algorithms that are necessary for a data scientist. Moreover, the related optimization problems are also introduced. In particular, the following algorithms are included:

  • Regression algorithms.
  • Classification algorithms.
  • Instance-based algorithms.
  • Regularization algorithms.
  • Bayesian algorithms.
  • Clustering algorithms.
  • Artificial neural network algorithms.
  • Dimensionality reduction algorithms.
  • Ensemble algorithms.
  1. First and second-year projects

During the first and second years, the student is invited to solve a real-life database problem. It involves the content of the courses studied previously or simultaneously in the program. Using the different methodologies developed during his training, the student analyzes and criticizes his results, provides solutions that can be used in practice and writes a report that explains concisely but clearly his conclusions. This project requires the involvement of the student corresponding to the working load of one course and aims at preparing him to construct his final engineer project during the fourth year.         

  1. Deep Learning 1 & 2

These two modules cover practical techniques for optimizing deep neural networks. Those will enable the students to be operational for the study and implementation of advanced models of learning on complex data, thanks to the following techniques and tools:

  • Libraries:  NumPy, Scikit-Learn, TensorFlow, Keras.
  • Optimization, transfer and regulation techniques.
  • Knowledge of classical and state-of-the-art architectures.

In particular, students will implement these methods for the following applications:

  • Image analysis through deep convolution networks.
  • Language analysis through unsupervised learning of word representations and recurrent networks.
  1. Data Management

The main goal of this is to present the basic concepts of data management systems. The first part of the module introduces the main aspects of relational database systems, including basic functionalities, file and index organizations, and query processing. The second part of the module aims at presenting the main non-relational approaches to data management, in particular, multidimensional data management, large-scale data management, and open data management. In particular, the module includes:

  • Introduction to relational databases: The relational data model, SQL.
  • The structure of a DataBase Management System.
  • Physical structures for data.
  • Multidimensional data management.
  • Large-scale data management.
  • Distributed query evaluation, NoSQL databases, graph databases.
  • Open data management.
  1. Data Mining in Business and Society

This module can be considered to be an extension of the module Introduction of Data Science in which students have approached the real problems in some specific areas in Business, Governance, Media, or Health. Upon completion of the course, students will have acquired the skills necessary to apply data mining to extract information from large data sets and transform it into an understandable structure. In particular, different patterns will be detected like groups (cluster analysis), unusual behaviors (anomaly detection), associations, sequential patterns, etc. These patterns will be considered to be summaries (including visualization) that can be further used in machine learning techniques. Finally, the role of data mining in the whole Knowledge Discovery in Databases (KDD) process will be exhibited in relation to the other courses provided in the cursus. The aim of this course is application-oriented and provides the students with the knowledge and practice that is in line with the current demand for skilled data scientists.   

  1. Systems for Big Data Analytics

The module covers the fundamental design principles of influential software systems for Big Data Analytics. First, it reminds the design principles of relational database systems for business data processing, including declarative querying, algorithm design, and query optimization, as well as the extensions to online analytic processing. Then, it examines fundamental architectural changes to scale data processing beyond the limit of a single server, including parallel databases, MapReduce, distributed key-value stores, iterative analytics and Machine Learning using Spark, and scalable stream analytics.

  1. Network Analysis

By using Python to glean value from raw data, this module provides students with the methods to simplify the complex journey from data to value. The students will learn how to use Python for data preparation, cleaning, reformatting, data munging, data visualization, and predictive analytics. The module will help students to understand deeper the machine learning algorithms, as well as outlier analysis and cluster analysis. Moreover, it also focuses on how to create web-based data visualizations with Plot.ly, and how to use Python to scrape the web and capture data sets. Topics include:

  • Getting started with Jupyter Notebooks.
  • Visualizing data: advanced charts, time series, and statistical plots.
  • Preparing for analysis: treating missing values and data transformation.
  • Data analysis: high dimensional summary statistics and correlation analysis.
  • Outlier analysis: univariate, multivariate, and linear projection methods.
  • Reducing data set dimensionality with PCA.
  • Simulating a social network with NetworkX.
  • Creating Plot.ly charts.
  1. Design a Dashboard

The dashboard is a powerful tool to display information and arrange multiple data visualizations that help people to have enough context to consistently make great decisions. This module will help students to understand the process of designing a dashboard from the beginning, which involves four steps: Define, prototype, build and deploy. The following important sills are aimed at:

  • Identifying key roles.
  • Determine the metrics to monitor.
  • Find the best visualizations for your metrics.
  • Arranging your charts as a dashboard.
  • Dashboard prototyping and feedback.
  • Finding the data that builds metrics.
  • Build the metrics.
  • Sharing the dashboard – distribution strategies.
  • Scaling dashboard.
  1. System and Network Security

This module provides an introduction to practical security concepts. The goal is to understand common attacks and countermeasures in a range of topics. The course is practice-oriented, it describes real attacks and countermeasures. The different steps are

  • Identify some of the factors driving the need for network security.
  • Identify and classify particular examples of attacks.
  • Define the term’s vulnerability, threat and attack.
  • Identify physical points of vulnerability in simple networks.
  • Compare and contrast symmetric and asymmetric encryption systems and their vulnerability to attack, and explain the characteristics of hybrid systems.

​   20. Visualization and Visual Analytics for Data Science

This module presents the principles underlying visualization, related to perception and cognition. It is designed for the construction of effective visual representations and interaction techniques that allow the students to give meaning to complex data, extract insights, and sometimes discover unexpected information from the data.

In particular, the issue of how visualization can be used in combination with large databases and machine learning will be shown to help students to understand higher-level phenomena, such as debugging complex machine learning models, understanding the behavior of deep learning systems, and exploring data at scale.

  1. Text Mining

This course covers basic algorithms and techniques in natural language processing (NLP) and text mining. Among others, it will include some basic NLP concepts, such as word tokenization and subsequent methods to extract topics in a text, identify them based on text frequencies and build classifiers. Some basic libraries such as NLTK will be used, alongside libraries that utilize deep learning to solve common NLP problems. Methodologically, the topics will include general machine learning (decision trees, Bayesian learning, neural networks) as well as specific tools for NLP (language modeling, neural language models, deep learning).

    22. Reinforcement Learning

Reinforcement learning is one powerful paradigm for realizing the dreams and impact of AI, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This course will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning -  an extremely promising new area that combines deep learning techniques with reinforcement learning. By the end of the course students could be able to:

  • Define the key features of reinforcement learning that distinguish it from AI and non-interactive machine learning (as assessed by the exam).
  • Given an application problem (e.g., from computer vision, robotics, etc.), decide if it should be formulated as an RL problem.
  • Implement in code common RL algorithms (as assessed by the homework).
  • Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on metrics: e.g., regret, sample complexity, computational complexity, empirical performance, convergence, etc.
  • Describe the exploration vs exploitation challenge and compare and contrast at least two approaches to address this challenge

      23. Engineer Project

During the last year, the student prepares his Engineer data science project, a work that completes his whole cursus and summaries the training and knowledge developed during his studies. It can be oriented to research or developed as a partnership with a firm. The best students have the opportunity to realize their work abroad, in companies or universities in Europe or the US. Some grants can be awarded to fund the related stays. The project represents the working load of half a year, concludes the training and opens the door to the international academic, business and engineering worlds.

Tags