37 Understanding Python

37.1 Why Python for Business Analytics?

Python is one of the most widely used programming languages in business analytics and data science due to its simplicity, flexibility, and extensive ecosystem of libraries. In this book, we will use Python to implement various statistical techniques, data analysis methods, and visualization tools to support business decision-making.

37.1.1 Key Advantages of Python in Business Analytics

Easy to Learn and Use → Python has a simple syntax that makes it easy for business analysts to perform statistical analysis and automation.
Rich Ecosystem → Python has powerful libraries like Pandas, NumPy, SciPy, Statsmodels, and Scikit-learn, which provide built-in functions for statistical analysis.
Scalability and Efficiency → Python can handle large datasets efficiently and integrates well with databases, cloud computing, and machine learning models.
Extensive Visualization Support → Libraries like Matplotlib and Seaborn make it easy to create meaningful visualizations.
Automation and Integration → Python can automate repetitive tasks, streamline workflows, and integrate with tools like Excel, SQL databases, and web applications.

37.1.2 Differences Between Python and R

Python and R are two of the most widely used programming languages in business analytics, data science, and statistical computing. Both languages have their strengths, but they cater to slightly different needs. Below is a comparison of Python vs. R in various aspects.

Purpose and Usage

Aspect	Python	R
Primary Use	General-purpose programming, data science, automation, web development	Statistical computing, data visualization, academic research
Best For	Machine Learning, Automation, Data Science	Statistical Analysis, Data Visualization, Research

Ease of Learning

Aspect	Python	R
Syntax	Simple, readable, similar to English	More complex syntax, designed for statisticians
Learning Curve	Easier for beginners, widely used in software development	Steeper learning curve, but powerful for statistical analysis

Libraries and Packages

Aspect	Python	R
Data Manipulation	Pandas, NumPy	dplyr, data.table
Statistical Analysis	Statsmodels, SciPy	Base R, car, lme4
Machine Learning	Scikit-learn, TensorFlow, PyTorch	caret, randomForest, xgboost
Data Visualization	Matplotlib, Seaborn, Plotly	ggplot2, lattice, plotly

Data Handling and Performance

Aspect	Python	R
Data Handling	Handles structured and unstructured data well	Primarily designed for structured data
Big Data Support	Integrates with Spark, Dask for large datasets	Not optimized for big data but integrates with Hadoop and Spark
Speed & Efficiency	Generally faster for ML and large datasets	Slower for big data but optimized for statistical tasks

Business & Industry Use Cases

Aspect	Python	R
Used In	Finance, AI, Web Development, Automation, ML	Academic Research, Healthcare, Pharma, Government
Common Applications	AI-driven analytics, automated reporting, cloud computing	Statistical modeling, survey analysis, experimental research

Community and Industry Support

Aspect	Python	R
Community	Large, growing community in AI & ML	Strong academic and research community
Industry Adoption	Used by companies like Google, Netflix, Tesla	Preferred by universities, research institutions

Integration and Flexibility

Aspect	Python	R
Integration	Works well with APIs, web apps, databases	Strong integration with statistical packages
Flexibility	More versatile, can be used in different fields	Specialized for data analysis and statistics

📌 Which One Should You Use?

✅ Use Python if: You need machine learning, automation, web development, or large-scale data processing.
✅ Use R if: You need advanced statistical analysis, data visualization, or academic research tools.
✅ Use Both if: Your work involves both statistical analysis and machine learning.

37.2 Installing Python and Anaconda navigator

37.2.1 Installing Python

Download Python

Visit the download page of official Python website
Go to the Downloads section. The website typically suggests the best version for your operating system.
Click on the download link for your operating system (Windows, macOS, Linux/UNIX).

Install Python

After downloading, run the installer.
For Windows: Ensure you check the box that says “Add Python to PATH” before you click “Install Now”.
Follow the prompts in the Python Install Wizard.

Verify Installation

Open your command line (Command Prompt on Windows, Terminal on macOS and Linux).
Type python –version and press Enter. This should display the version of Python that you just installed.

37.2.2 Installing Anaconda Navigator

Download Anaconda

Visit the Anaconda download page
Scroll down to the Anaconda Installers section.
Download the appropriate version for your operating system.

Install Anaconda

Run the downloaded installer.
Follow the prompts in the Anaconda Install Wizard. Accept the default settings unless you have specific preferences.
It’s generally recommended to allow Anaconda to add its executable to your PATH environment variable.

Verify Installation

Open Anaconda Navigator
For Windows: Search for Anaconda Navigator in the Start menu.
For macOS/Linux: Use the terminal or search in your applications folder.
If Anaconda Navigator opens successfully, the installation is complete.

37.2.3 Import the following packages

Numpy – A Python library that is used for numerical mathematical computation and handling multidimensional ndarray, it also has a very large collection of mathematical functions to operate on this array.
Pandas – A Python library built on top of NumPy for effective matrix multiplication and dataframe manipulation, it is also used for data cleaning, data merging, data reshaping, and data aggregation.
Matplotlib – It is used for plotting 2D and 3D visualization plots, it also supports a variety of output formats including graphs for data.

Sample python code

print('Hello, Python!')

Hello, Python!

37.2.4 Install pandas and matplotlib packages

!pip3 install pandas

!pip3 install matplotlib

Sample plot using python

import numpy as np
import matplotlib.pyplot as plt

r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
fig, ax = plt.subplots(
  subplot_kw = {'projection': 'polar'} 
)
ax.plot(theta, r)
ax.set_rticks([0.5, 1, 1.5, 2])
ax.grid(True)
plt.show()