Dec 14, 2024  
2024-2025 Course Catalog 
    
2024-2025 Course Catalog
Add to Portfolio (opens a new window)

DAT 303 - Data Science III

Credits: 3
Lecture Hours: 2
Lab Hours: 2
Practicum Hours: 0
Work Experience: 0
Course Type: Voc/Tech
In this course, students will be introduced to a variety of machine learning model styles, how these models work, and when they should be applied. Students will also get to utilize tools, both desktop applications and the Python development environment, to apply these models to datasets.
Prerequisite: DAT 202   with a minimum grade of C- and CIS 289  with a minimum grade of C- and (MAT 156  OR MAT 162 )
Competencies
  1. Compare and contrast general machine learning concepts
    1. Describe the advantages/disadvantages of each model class
    2. Explain when a supervised leaning model should be used vs. an unsupervised one
    3. Discuss problems related to algorithmic and data bias, as well as privacy and integrity of data
  2. Evaluate unsupervised learning models and their concepts
    1. Explain when and why a hierarchical clustering model is the appropriate tool for analyzing a dataset
    2. Discuss when and why k-Means clustering is the appropriate tool for analyzing a dataset
  3. Analyze neural networks and their concepts
    1. Explain the advantages and disadvantages of using a neural network and when it is appropriate for making predictions
    2. Demonstrate understanding of backward propagation and how it applies to neural networks
  4. Demonstrate an understanding of supervised learning models and the advantages/disadvantages of each
    1. Explain when and why a regression model is the appropriate tool for analyzing a dataset
    2. Discuss when and why a Naïve Bayes model is the appropriate tool for analyzing a dataset
    3. Describe when and why a decision trees model is the appropriate tool for analyzing a dataset
  5. Build models to analyze datasets utilizing open source desktop machine learning tools
    1. Construct an unsupervised learning model to analyze a dataset
    2. Develop a supervised learning model to analyze a dataset
    3. Create a neural network model to analyze a dataset
  6. Explore topics in data sampling
    1. Describe the different types of biases in data sampling
    2. Demonstrate the danger of overfitting
    3. Explain the purpose of training, validation and test datasets
    4. Use k-fold cross validation to evaluate the performance of a model
  7. Evaluate modeling results and interpret the meaning/value of the results
    1. Define true/false positive/negative
    2. Give examples of recall, precision and accuracy
    3. Generate and use a ROC curve to evaluate prediction performance
    4. Interpret model quality by applying performance metrics such as root mean squared error (RMSE), confusion matrices, gain charts and silhouette scores
    5. Demonstrate an understanding of overfitting and underfitting and their causes
  8. Construct machine learning models in Python to do data analysis on datasets
    1. Design a project that utilizes an unsupervised machine learning model to analyze a dataset
    2. Develop a project that utilizes a supervised machine learning model to analyze a dataset
  9. Create an ensemble learning model
    1. Build an ensemble learning model that utilizes multiple machine learning model techniques to analyze a dataset
    2. Demonstrate an understanding of why and when you would most effectively utilize ensemble learning.

Competencies Revised Date: AY2023



Add to Portfolio (opens a new window)