DAT 303 - Data Science III

Credits: 3
Lecture Hours: 2
Lab Hours: 2
Practicum Hours: 0
Work Experience: 0
Course Type: Voc/Tech

In this course, students will be introduced to a variety of machine learning model styles, how these models work, and when they should be applied. Students will also get to utilize tools, both desktop applications and the Python development environment, to apply these models to datasets.
Prerequisite: DAT 202 and CIS 289 and (MAT 157 OR MAT 162)
Competencies

Compare and contrast general machine learning concepts
1. Describe the advantages/disadvantages of each model class
2. Explain when a supervised leaning model should be used vs. an unsupervised one
3. Discuss problems related to algorithmic and data bias, as well as privacy and integrity of data
Evaluate unsupervised learning models and their concepts
1. Explain when and why a hierarchical clustering model is the appropriate tool for analyzing a dataset
2. Discuss when and why k-Means clustering is the appropriate tool for analyzing a dataset
Analyze neural networks and their concepts
1. Explain the advantages and disadvantages of using a neural network and when it is appropriate for making predictions
2. Demonstrate understanding of backward propagation and how it applies to neural networks
Demonstrate an understanding of supervised learning models and the advantages/disadvantages of each
1. Explain when and why a regression model is the appropriate tool for analyzing a dataset
2. Discuss when and why a Naïve Bayes model is the appropriate tool for analyzing a dataset
3. Describe when and why a decision trees model is the appropriate tool for analyzing a dataset
Build models to analyze datasets utilizing open source desktop machine learning tools
1. Construct an unsupervised learning model to analyze a dataset
2. Develop a supervised learning model to analyze a dataset
3. Create a neural network model to analyze a dataset
Explore topics in data sampling
1. Describe the different types of biases in data sampling
2. Demonstrate the danger of overfitting
3. Explain the purpose of training, validation and test datasets
4. Use k-fold cross validation to evaluate the performance of a model
Evaluate modeling results and interpret the meaning/value of the results
1. Define true/false positive/negative
2. Give examples of recall, precision and accuracy
3. Generate and use a ROC curve to evaluate prediction performance
4. Interpret model quality by applying performance metrics such as root mean squared error (RMSE), confusion matrices, gain charts and silhouette scores
5. Demonstrate an understanding of overfitting and underfitting and their causes
Construct machine learning models in Python to do data analysis on datasets
1. Design a project that utilizes an unsupervised machine learning model to analyze a dataset
2. Develop a project that utilizes a supervised machine learning model to analyze a dataset
Create an ensemble learning model
1. Build an ensemble learning model that utilizes multiple machine learning model techniques to analyze a dataset
2. Demonstrate an understanding of why and when you would most effectively utilize ensemble learning.

Competencies Revised Date: 2020

Print this Page