Pandas for Everyone: Python Data Analysis
Sharing practical insights into solving real-world data science problems using Pandas library and Python programming language.
(PYTHON-PANDAS.AP1) / ISBN : 978-1-64459-413-1About This Course
This course, Pandas for Everyone: Python Data Analysis, teaches how to tackle real-world data analysis problems using the popular Pandas library. You'll begin with the fundamentals, learning how to load data sets, explore their structure, and create basic visualizations. As you progress, you'll explore data manipulation techniques and be introduced to powerful data cleaning and transformation tools. Finally, the course will briefly introduce you to the broader Python data science ecosystem, touching on tools like scikit-learn for machine learning and visualization libraries like Seaborn.
Skills You’ll Get
- Load, explore, and manipulate data using Pandas DataFrames
- Create basic data visualizations in pandas labs
- Combine and clean messy datasets
- Handle missing values and work with different data types
- Perform groupby operations and data normalization
- Apply functions and regular expressions for data transformation
- Conduct statistical modeling using techniques like linear regression and logistic regression
- Gain exposure to the broader Python data science ecosystem
Get the support you need. Enroll in our Instructor-Led Course.
Interactive Lessons
47+ Interactive Lessons | 100+ Exercises | 90+ Quizzes | 109+ Flashcards | 109+ Glossary of terms
Gamified TestPrep
50+ Pre Assessment Questions | 50+ Post Assessment Questions |
Hands-On Labs
30+ LiveLab | 20+ Video tutorials | 43+ Minutes
Preface
- Breakdown of the Course
- How to Read This Course
- Setup
Pandas DataFrame Basics
- Introduction
- Load Your First Data Set
- Look at Columns, Rows, and Cells
- Grouped and Aggregated Calculations
- Basic Plot
- Conclusion
Pandas Data Structures Basics
- Create Your Own Data
- The Series
- The DataFrame
- Making Changes to Series and DataFrames
- Exporting and Importing Data
- Conclusion
Plotting Basics
- Why Visualize Data?
- Matplotlib Basics
- Statistical Graphics Using matplotlib
- Seaborn
- Pandas Plotting Method
- Conclusion
Tidy Data
- Columns Contain Values, Not Variables
- Columns Contain Multiple Variables
- Variables in Both Rows and Columns
- Conclusion
Apply Functions
- Primer on Functions
- Apply (Basics)
- Vectorized Functions
- Lambda Functions (Anonymous Functions)
- Conclusion
Data Assembly
- Combine Data Sets
- Concatenation
- Observational Units Across Multiple Tables
- Merge Multiple Data Sets
- Conclusion
Data Normalization
- Multiple Observational Units in a Table (Normalization)
- Conclusion
Groupby Operations: Split-Apply-Combine
- Aggregate
- Transform
- Filter
- The pandas.core.groupby. DataFrameGroupBy object
- Working With a MultiIndex
- Conclusion
Missing Data
- What Is a NaN Value?
- Where Do Missing Values Come From?
- Working With Missing Data
- Pandas Built-In NA Missing
- Conclusion
Data Types
- Data Types
- Converting Types
- Categorical Data
- Conclusion
Strings and Text Data
- Introduction
- Strings
- String Methods
- More String Methods
- String Formatting (F-Strings)
- Regular Expressions (RegEx)
- The regex Library
- Conclusion
Dates and Times
- Python's datetime Object
- Converting to datetime
- Loading Data That Include Dates
- Extracting Date Components
- Date Calculations and Timedeltas
- Datetime Methods
- Getting Stock Data
- Subsetting Data Based on Dates
- Date Ranges
- Shifting Values
- Resampling
- Time Zones
- Arrow for Better Dates and Times
- Conclusion
Linear Regression (Continuous Outcome Variable)
- Simple Linear Regression
- Multiple Regression
- Models with Categorical Variables
- One-Hot Encoding in scikit-learn with Transformer Pipelines
- Conclusion
Generalized Linear Models
- About This Lesson
- Logistic Regression (Binary Outcome Variable)
- Poisson Regression (Count Outcome Variable)
- More Generalized Linear Models
- Conclusion
Survival Analysis
- Survival Data
- Kaplan Meier Curves
- Cox Proportional Hazard Model
- Conclusion
Model Diagnostics
- Residuals
- Comparing Multiple Models
- k-Fold Cross-Validation
- Conclusion
Regularization
- Why Regularize?
- LASSO Regression
- Ridge Regression
- Elastic Net
- Cross-Validation
- Conclusion
Clustering
- k-Means
- Hierarchical Clustering
- Conclusion
Life Outside of Pandas
- The (Scientific) Computing Stack
- Performance
- Dask
- Siuba
- Ibis
- Polars
- PyJanitor
- Pandera
- Machine Learning
- Publishing
- Dashboards
- Conclusion
It’s Dangerous To Go Alone!
- Local Meetups
- Conferences
- The Carpentries
- Podcasts
- Other Resources
- Conclusion
Appendix A: Concept Maps
Appendix B: Installation and Setup
- B.1 Install Python
- B.2 Install Python Packages
- B.3 Download Book Data
Appendix C: Command Line
- C.1 Installation
- C.2 Basics
Appendix D: Project Templates
Appendix E: Using Python
- E.1 Command Line and Text Editor
- E.2 Python and IPython
- E.3 Jupyter
- E.4 Integrated Development Environments (IDEs)
Appendix F: Working Directories
Appendix G: Environments
- G.1 Conda Environments
- G.2 Pyenv + Pipenv
Appendix H: Install Packages
- H.1 Updating Packages
Appendix I: Importing Libraries
Appendix J: Code Style
- J.1 Line Breaks in Code
Appendix K: Containers: Lists, Tuples, and Dictionaries
- K.1 Lists
- K.2 Tuples
- K.3 Dictionaries
Appendix L: Slice Values
Appendix M: Loops
Appendix N: Comprehensions
Appendix O: Functions
- O.1 Default Parameters
- O.2 Arbitrary Parameters
Appendix P: Ranges and Generators
Appendix Q: Multiple Assignment
Appendix R: Numpy ndarray
Appendix S: Classes
Appendix T: SettingWithCopyWarning
- T.1 Modifying a Subset of Data
- T.2 Replacing a Value
- T.3 More Resources
Appendix U: Method Chaining
Appendix V: Timing Code
Appendix W: String Formatting
- W.1 C-Style
- W.2 String Formatting: .format() Method
- W.3 Formatting Numbers
Appendix X: Conditionals (if-elif-else)
Appendix Y: New York ACS Logistic Regression Example
Appendix Z: Replicating Results in R
- Z.1 Linear Regression
- Z.2 Logistic Regression
- Z.3 Poisson Regression
Pandas DataFrame Basics
- Performing Grouped and Aggregated Calculations Using the .groupby() Method
Pandas Data Structures Basics
- Creating a DataFrame and Making Changes to it
Plotting Basics
- Creating a Scatter Plot Using Multivariate Data
- Creating a Density Plot Using Bivariate Data
Tidy Data
- Using Functions and Methods to Process and Tidy Data
Apply Functions
- Performing Calculations Across DataFrames
- Vectorizing Functions
Data Assembly
- Performing Concatenation Using the concat() Function
- Merging Multiple Data Sets Using the .merge() Function
Data Normalization
- Understanding Multiple Observational Units in a Data Set
Groupby Operations: Split-Apply-Combine
- Performing Data Summarization Using Group-by Operations
- Performing Boolean Subsetting on the Data
- Performing Operations on Grouped Objects
Missing Data
- Finding and Cleaning Missing Data
Data Types
- Performing Data Type Conversion
Strings and Text Data
- Finding and Substituting a Pattern
Dates and Times
- Converting an Object Type into a datetime Type
- Extracting Date Components from the Data
- Getting Stock Data and Subsetting it Based on Dates
- Resampling Dates Using the .resample() Method
Linear Regression (Continuous Outcome Variable)
- Performing Linear Regression
- Performing Multiple Regression
Generalized Linear Models
- Performing Logistic Regression
- Performing Poisson Regression Using the poisson() Function
Survival Analysis
- Performing Survival Analysis Using the KaplanMeierFitter() Function
Model Diagnostics
- Comparing Models Using Cross-Validation
Regularization
- Performing L1 Regularization Using the Lasso() Function
- Performing L2 Regularization Using the Ridge() Function
Clustering
- Performing k-Means Clustering
- Using Hierarchical Clustering Algorithms
Any questions?Check out the FAQs
Still have unanswered questions and need to get in touch?
Contact Us NowPandas in Python are a powerful open-source library for data analysis. It offers data structures like DataFrames and tools to manipulate, clean, and visualize that data.
Yes, Python is excellent for data analysis. It's easy to learn, has versatile libraries (like Pandas), and a large, supportive community. Python's flexibility makes it useful for various data science tasks.
While some basic programming experience can be helpful, this course is designed to be accessible for beginners. We'll start with the fundamentals of Python and Pandas, gradually building your skills throughout the course.