Josh Innovations

Data science

By: professor

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.

Course Content

Data science

Fundamentals of Python

  • Advantages of Python
  • Python compiler and PVM
  • Python installation and environment

Datatypes in Python

  • strings
  • char
  • lists
  • tuples
  • range
  • sets
  • dictionaries

Operators in Python

Input / Output

Control Statements

  • if statement
  • if...else statement
  • if...elif...else statement
  • while loop
  • for loop
  • break statement
  • continue statement
  • pass statement

Numpy Arrays

  • Array creation
  • Array attributes
  • 1D and 2D
  • Matrices

Functions in Python

  • Built in and User defined functions
  • Writing our own functions
  • importing functions

Modules and Packages

Data Analysis in Python using pandas

  • Series
  • Dataframes
  • Creation of Dataframes from different sources
  • Viewing data in Dataframe
  • Operations on Dataframe
  • Handling missing data

Data visualization using matplotlib

  • Line plot
  • Bar graph
  • Pie chart
  • Sub plots
  • Histogram

Data visualization using seaborn

  • Distribution plot
  • Kde plot
  • Countplot
  • Box plot
  • Scatter plot
  • Sub plots
  • Lmplot
  • Pair plots

Introduction to Statistics

  • What is statistics?
  • Types of statistics
  • Descriptive statistics
  • Inferential statistics

Statistical terms

  • Population
  • Sample
  • Variable (discrete and continuous)
  • Data and types of data
  • Qualitative (nominal and ordinal)
  • Quantitative (interval scale and ratio scale)

Measures of Central Tendency

  • Mean
  • Median
  • Mode

Probability

  • Probability with replacement
  • Probability without replacement
  • Probability Mass Function (PMF)
  • Probability Density Function (PDF)

Measures of Shape

  • Skewness
  • Kurtosis

Measures of Dispersion or Variability

  • variance
  • std
  • percentile
  • quartile
  • range
  • IQR

Application of Variance or Std

  • Empirical Rule
  • Problems on Empirical Rule
  • Chebyshev’s Theorem

Probability Distributions

  • Normal distribution
  • Standard normal distribution
  • Sampling distribution of Sample means
  • Central limit theorem
  • T- Distribution
  • Student T- Test
  • Chi Square Test (Goodness of Fit)
  • Binomial distribution
  • Bernoulli distribution
  • Geometric distribution
  • Hypergeometric distribution
  • Poisson distribution

Hypothesis Testing

  • Upper tail test
  • Lower tail test
  • Two tail test

ANOVA

Introduction to Tableau

  • Tableau tools
  • Datatypes in Tableau
  • Viewing data

Creating Pivot table

Data blending

Cross Database joins

Calculations on data

  • Aggregate functions

Data visualizations in Tableau

  • Symbol maps
  • Bar chart
  • Stacked bar chart
  • Line chart
  • Pareto chart
  • Heat map
  • Pie chart
  • Scatter plot
  • Area chart
  • Dual Axis chart
  • Histogram
  • Bubble chart

Dash Board Creation

Exploratory data analysis (EDA)

Outliers and their treatment

Supervised Learning vs Unsupervised Learning

Feature extraction and conversion

  • One hot encoding using dummy variables
  • One hot encoding using One hot encoder

Regression Models

  • Simple Linear regression
  • Multiple Linear regression
  • Polynomial Linear regression
  • Ridge regression
  • Bias and Variance tradeoff
  • Lasso regression
  • ElasticNet regression

Classification Models

  • Logistic regression
  • Naïve Bayes (Gaussian NB and Multinomial NB)
  • KNN Classifier
  • SVM
  • Regularization
  • Kernel Trick
  • Decision Tree
  • Entropy
  • Gini Index
  • Random forest
  • Confusion Matrix
  • Bootstrapping, Bagging and Boosting

Unsupervised Learning

  • K-Means clustering
  • Elbow technique

Association Rule Learning

  • Apriori Algorithm

Model selection

  • Selecting appropriate model for our data

Introduction to Deep Learning

  • Biological Neural Network
  • Artificial Neural Network
  • Perceptrons
  • Layers of a Network

Activation functions

  • Identity function
  • Binary step function or Threshold function
  • Logistic function or Sigmoid function
  • ReLU function
  • Hyperbolic Tangent function
  • Softmax

Creating Neural Network in Python

  • ANN
  • ANN with Activation functions

Tensor Flow and Keras

  • Variables
  • Constants
  • Placeholders
  • Graph / Tensor / Session

ANN in Tensorflow and Keras

Convolutional Neural Network

Recurrent Neural Network

NLP Concepts

  • Tokenization
  • Stemming
  • Lemmatization
  • Stop words
  • POS

Feature Extraction

  • CountVectorizer
  • TfidfVectorizer

Text Classification using NLP

Object detection by Computer

Register

Copyright © Josh Innovations 2021.All right reserved.Created by Starsite