Course duration
- 5 days
Course Benefits
- Translate everyday business questions as well as more complex problems into Machine Learning tasks in order to make truly data-driven decisions
- Use Python Pandas, Matplotlib & Seaborn libraries to Explore, Analyze & Visualize data from varied sources (the Web, Word documents, Email, Twitter, NoSQL stores, Databases, Data Warehouses & more) for patterns and trends relevant to your business
- Train a Machine Learning Classifier using different algorithmic techniques from the Scikit-Learn library (e.g., Decision Trees,
- Logistic Regression, Neural Networks)
- Re-segment your customer market using K-Means & Hierarchical algorithms for better alignment of products & services to customer needs
- Discover hidden customer behaviors from Association Rules and build a Recommendation Engine based on behavioral patterns
- Investigate relationships & flows between people and business relevant entities using Social Network Analysis
- Build predictive models of revenue and other numeric variables using Linear Regression
Course Outline
- Lesson 1
- What is the required Skill-set of a Data Scientist
- Combining the Technical and Non-technical roles of a Data Scientist
- The difference between a Data Scientist and a Data Engineeer
- Explore the full lifecycle of Data Science efforts within the organization
- Discuss how to turn business questions into Machine Learning (ML) and Artificial Intelligence (AI) models
- Explore diverse and wide-ranging data sources, internal and external to the organization that can be used to answer business questions
- Lesson 2
- Introduce the features of Python that make it an ideal tool for Data Scientists and Data Engineers alike
- Viewing Data Sets using Python’s Pandas library
- Importing, Exporting and working with all forms of Data, from Relational Databases to Google Images using the Python
- Selecting, Filtering, Combining, Grouping and Applying Functions using Python’s Pandas library
- Dealing with Duplicates, Missing Values, Rescaling, Standardizing and Normalizing Data
- Visualizing Data for both Exploration and Communication with the Pandas, Matplotlib and Seaborn Python libraries
- Lesson 3
- Preprocess Unstructured Data such as web adverts, emails, blog posts, in order to use it our AI/ML models
- Explore the most popular approaches to Natural Language Processing (NLP) such as stemming, and “stop” words
- Prepare a term-document matrix (TDM) of unstructured documents in preparation for analysis
- Lesson 4
- Express a business problem such as customer revenue prediction as a linear regression task
- Assess variables as potential Predictors of the required Target eg. Education as a predictor of Salary
- Build, Interpret and Evaluate a Linear Regression model in Python using measures such as RMSE
- Explore the Feature Engineering possibilities to improve the Linear Regression model
- Lesson 5
- Learn how AI/ML Classifiers are built and used to make predictions such as Customer Churn
- Explore how AI/ML Classification models are built using Training, Test and Validation Datasets
- Build, Apply and Evaluate the strength of a Decision Tree Classifier
- Lesson 6
- Examine some alternative approaches to classification
- Consider how Activation Functions are integral to Logistic Regression Classifiers
- Investigate how Neural Networks and Deep Learning are used to build self-driving cars
- Explore the probability foundations of Naive Bayes classifiers
- Review different approaches to measuring the performance of AI/ML Classification Models
- ROC curves, AUC measures, Precision, Recall, Confusion Matrix
- Lesson 7
- Uncover new ways of segmenting your customers, products or services through the use of clustering algorithms
- Explore what the concept of similarity means to humans and how it can be implemented programmatically through
- distance measures on descriptive variables
- Perform top-down clustering with Python’s Scikit-Learn K-Means algorithm
- Perform bottom-up clustering with Scikit-Learn’s hierarchical clustering algorithm
- Examine clustering techniques on unstructured data (eg. Tweets, Emails, Documents, etc)
- Lesson 8
- Build models of customer behaviors or business events from logged data using Association Rules
- Evaluate the strength of these models through probability measures of support, confidence, and lift
- Employ feature engineering approaches to improve the models
- Build a recommender for your customers that is unique to your product/service offering
- Lesson 9
- Analyze your organization, its people and environment as a network of inter-relationships
- Visualize these relationships to uncover previously unseen business insights
- Explore ego-centric and socio-centric methods of analyzing connections important to your organization
- Lesson 10
- Examine Cloud (Microsoft, Amazon, Google) approaches to handling Big Data analytics
- Explore the communications and ethics aspects of being a Data Scientist
- Survey the paths of continual learning for a Data Scientist
Class Materials
Each student will receive a comprehensive set of materials, including course notes and all the class examples.
Experience in the following is required for this Python class:
- An interest in gaining foundational knowledge of data science. This data scientist training course is designed for technical and non-technical beginners.
Instructor-led courses are offered via a live Web connection, at client sites throughout Europe, and at our Geneva Training Center.