🎯 Why Data Visualization Matters
Data visualization is the bridge between raw data and human understanding
When done right, it helps us:
- ✨ Reveal patterns, trends, and correlations in the data
- 💬 Communicate insights clearly to stakeholders
- ⚡ Speed up decision-making by simplifying complex datasets
- 📖 Make data storytelling engaging and accessible to all
💡 John Tukey's Wisdom: "The greatest value of a picture is when it forces us to
notice what we never expected to see."
Exploratory vs Explanatory Visualizations
| Aspect |
Exploratory |
Explanatory |
| Goal |
Find insights |
Communicate insights |
| Audience |
Analyst / Data Scientist |
Stakeholders / Public |
| Style |
Raw, fast, flexible |
Polished, focused, clean |
| Examples |
Pair plots, correlation heatmaps |
Bar charts in presentations |
🎨 5 Basic Principles of Good Visualizations
- Clarity: Avoid clutter. Use labels, legends, and proper axis scales
- Context: What is being measured? Over what time frame? In what units?
- Focus: Highlight the key insight using colors and annotations
- Storytelling: Don't just show data — tell a story. Guide the viewer
- Accessibility: Use color palettes that enhance readability for all viewers
Pro Tip: Always ask yourself: "What is the ONE thing I want the viewer to
understand from this visual?"
📈 Introduction to Matplotlib
What is matplotlib.pyplot?
matplotlib.pyplot is a module in Matplotlib — it's like a
paintbrush for your data.
We usually import it as plt to save typing!
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
🎮 Interacting with Plots
When a plot appears, you can:
- 🔍 Zoom In/Out
- ✋ Pan around
- ⬅️ Use arrows to navigate history
- 🏠 Reset to home view
- 💾 Save as PNG using the disk icon
📊 Real Example: Cricket Player Runs Over Time
years = [1990, 1992, 1994, 1996, 1998, 2000, 2003, 2005, 2007, 2010]
runs = [500, 700, 1100, 1500, 1800, 1200, 1700, 1300, 900, 1500]
plt.plot(years, runs)
plt.xlabel("Year")
plt.ylabel("Runs Scored")
plt.title("Sachin Tendulkar's Yearly
Runs")
plt.show()
🎨 Customization Options
Format Strings
plt.plot(years, runs, 'ro--')
plt.plot(years, runs, 'g^:')
Color and Line Styles
plt.plot(years, runs,
color='orange',
linestyle='--',
linewidth=3,
label="Player 1")
plt.legend()
plt.grid(True)
plt.tight_layout()
🎭 Plot Styles
print(plt.style.available)
plt.style.use("ggplot")
plt.style.use("seaborn-v0_8-bright")
with plt.xkcd():
plt.plot(years, runs)
plt.title("Epic Battle!")
💡 Pro Tips:
- Always start with simple plots
- Add labels and legends early
- Use
plt.grid() and plt.tight_layout() for readability
- Try different styles to find what works best
📊 Bar Charts
Bar charts are perfect for comparing quantities across categories. They're easy to read and
powerful for visual analysis.
Basic Vertical Bar Chart
years = [1990, 1992, 1994, 1996, 1998, 2000]
runs = [500, 700, 1100, 1500, 1800, 1200]
plt.bar(years, runs, edgecolor='black')
plt.xlabel("Year")
plt.ylabel("Runs Scored")
plt.title("Yearly Performance")
plt.show()
🎯 Side-by-Side Comparison
import numpy as np
sachin = [500, 700, 1100, 1500, 1800]
kohli = [0, 500, 800, 1100, 1300]
sehwag = [0, 200, 900, 1400, 1600]
x = np.arange(len(years))
width = 0.25
plt.bar(x - width, sachin, width, label="Sachin")
plt.bar(x, sehwag, width, label="Sehwag")
plt.bar(x + width, kohli, width, label="Kohli")
plt.xticks(x, years)
plt.legend()
plt.show()
🔍 Why use xticks()?
By default, plt.bar() uses numeric x-values (0, 1, 2, ...). We use plt.xticks() to set the correct
category labels like years or names.
↔️ Horizontal Bar Charts
players = ["Sachin", "Sehwag", "Kohli", "Yuvraj"]
total_runs = [5600, 4100, 2400, 3700]
plt.barh(players, total_runs, color="skyblue")
plt.xlabel("Total Runs in First 5 Years")
plt.title("Performance Comparison")
plt.show()
📝 Adding Value Labels
players = ["Sachin", "Sehwag", "Kohli"]
runs = [1500, 1200, 1800]
plt.bar(players, runs, color="skyblue")
for i in range(len(players)):
plt.text(i, runs[i] + 50, str(runs[i]), ha='center')
plt.show()
📋 Quick Reference
| Feature |
Use |
plt.bar() |
Vertical bars for categorical comparison |
plt.barh() |
Horizontal bars (great for long labels) |
width= |
Control thickness/spacing of bars |
edgecolor= |
Add borders to bars |
plt.xticks() |
Replace index numbers with real labels |
🥧 Pie Charts
Pie charts show part-to-whole relationships. They're visually appealing but best used with
fewer categories (3-6 slices ideal).
Basic Pie Chart
labels = ["Sachin", "Sehwag", "Kohli", "Yuvraj"]
runs = [18000, 8000, 12000, 9500]
plt.pie(runs, labels=labels, autopct='%1.1f%%')
plt.title("Career Runs Distribution")
plt.show()
🎨 Customization Options
colors = ['#ff9999','#66b3ff','#99ff99','#ffcc99']
explode = [0.1, 0, 0, 0]
plt.pie(
runs,
labels=labels,
colors=colors,
explode=explode,
autopct='%1.1f%%',
shadow=True,
startangle=140,
wedgeprops={'edgecolor': 'black'}
)
plt.show()
📊 Key Parameters
| Parameter |
Description |
labels |
Label each slice |
colors |
Customize slice colors |
explode |
Pull out slices for emphasis |
autopct |
Show percentage text ('%1.1f%%') |
shadow |
Add 3D-like depth |
startangle |
Rotate pie chart |
⚠️ When to Avoid Pie Charts:
- Too many categories (>6 slices)
- When precise comparison is needed
- When values are similar in size
- Better alternatives: Bar charts or horizontal bar charts
📚 Stack Plots
Stack plots show how multiple quantities change over time, stacked on top of each other. Perfect
for tracking composition over time!
Use Cases
- ⏱️ Time spent on different activities over days
- 👥 Distribution of tasks by team members
- 📈 Website traffic sources over time
- 💰 Budget allocation across departments
Stack Plot Example
days = [1, 2, 3, 4, 5, 6, 7]
studying = [3, 4, 3, 5, 4, 3, 4]
playing = [2, 2, 1, 1, 2, 3, 2]
watching_tv = [2, 1, 2, 2, 1, 1, 1]
sleeping = [5, 5, 6, 5, 6, 5, 5]
labels = ['Studying', 'Playing', 'Watching TV', 'Sleeping']
colors = ['skyblue', 'lightgreen', 'gold', 'lightcoral']
plt.stackplot(days, studying, playing, watching_tv, sleeping,
labels=labels, colors=colors, alpha=0.8)
plt.legend(loc='upper left')
plt.title('Weekly Activity Tracker')
plt.xlabel('Day')
plt.ylabel('Hours')
plt.show()
💡 Stack Plot vs Pie Chart:
• Use pie charts for a snapshot in time
• Use stack plots to see how data changes over time
📊 Histograms
Histograms show the distribution of numerical data. They're essential for
understanding data spread, detecting outliers, and seeing patterns.
When to Use Histograms
- 📈 Understand distribution of numerical data (age, salary, test scores)
- 🔍 Detect skewness and outliers
- 📐 Check if data is normally distributed
- 🎯 Analyze frequency within specific ranges
Understanding Bins
The bins argument controls how data is grouped:
- Integer: Number of equal-width bins
- List: Custom bin edges for specific ranges
plt.hist(ages, bins=10, edgecolor='black')
plt.hist(ages, bins=[10, 20, 30, 40, 60,
100], edgecolor='black')
Adding Reference Lines
import numpy as np
ages = [22, 25, 47, 52, 46, 56, 55, 60, 34, 43, ...]
bins = [10, 20, 30, 40, 50, 60, 70]
plt.hist(ages, bins=bins, edgecolor='black')
plt.axvline(np.mean(ages), color='red',
linestyle='--', linewidth=2,
label='Average Age')
plt.legend()
plt.title('Age Distribution with Mean')
plt.show()
📝 Key Parameters:
• bins: Number or custom edges
• edgecolor: Border color for bars
• axvline: Vertical reference line
🎯 Scatter Plots
Scatter plots reveal relationships between two variables. They're perfect for
finding correlations, patterns, and outliers.
Basic Scatter Plot
study_hours = [1, 2, 3, 4, 5, 6, 7, 8, 9]
exam_scores = [40, 45, 50, 55, 60, 65, 75, 85, 90]
plt.scatter(study_hours, exam_scores)
plt.title('Study Hours vs Exam Score')
plt.xlabel('Study Hours')
plt.ylabel('Exam Score')
plt.grid(True)
plt.show()
🎨 Adding Color & Size
sizes = [score * 2 for score in exam_scores]
colors = ['red' if score < 60 else 'green'
for score in exam_scores]
plt.scatter(study_hours, exam_scores, s=sizes, c=colors)
plt.title('Colored & Sized Scatter
Plot')
plt.show()
🌈 Using Colormaps
plt.scatter(study_hours, exam_scores,
c=exam_scores, cmap='viridis')
plt.colorbar(label='Score')
plt.title('Scatter with Gradient Colors')
plt.show()
📝 Adding Annotations
plt.scatter(study_hours, exam_scores)
for i in range(len(study_hours)):
plt.annotate(f'Student {i+1}',
(study_hours[i], exam_scores[i]))
plt.title('Scatter with Labels')
plt.show()
👥 Multiple Groups
class_a_hours = [2, 4, 6, 8]
class_a_scores = [45, 55, 65, 85]
class_b_hours = [1, 3, 5, 7, 9]
class_b_scores = [40, 50, 60, 70, 90]
plt.scatter(class_a_hours, class_a_scores,
label='Class A', color='blue')
plt.scatter(class_b_hours, class_b_scores,
label='Class B', color='orange')
plt.legend()
plt.show()
🎛️ Subplots - Multiple Plots in One Figure
Subplots allow you to display multiple plots side-by-side or in a grid. Perfect for comparing
datasets or showing different aspects of your data!
Method 1: Using plt.subplot()
x = [1, 2, 3, 4, 5]
y1 = [i * 2 for i in x]
y2 = [i ** 2 for i in x]
plt.subplot(1, 2, 1)
plt.plot(x, y1)
plt.title('Double of x')
plt.subplot(1, 2, 2)
plt.plot(x, y2)
plt.title('Square of x')
plt.tight_layout()
plt.show()
2×2 Grid of Subplots
y3 = [i ** 0.5 for i in x]
y4 = [10 - i for i in x]
plt.figure(figsize=(8, 6))
plt.subplot(2, 2, 1)
plt.plot(x, y1)
plt.title('x * 2')
plt.subplot(2, 2, 2)
plt.plot(x, y2)
plt.title('x squared')
plt.subplot(2, 2, 3)
plt.plot(x, y3)
plt.title('sqrt(x)')
plt.subplot(2, 2, 4)
plt.plot(x, y4)
plt.title('10 - x')
plt.tight_layout()
plt.show()
Method 2: Using plt.subplots() (Recommended)
This method is cleaner and more flexible. It returns a figure and axes objects.
fig, axs = plt.subplots(1, 2,
figsize=(10, 4))
axs[0].plot(x, y1)
axs[0].set_title('x *
2')
axs[1].plot(x, y2)
axs[1].set_title('x
squared')
fig.suptitle('Comparison Plots',
fontsize=14)
fig.tight_layout()
fig.subplots_adjust(top=0.85)
fig.savefig('my_plots.png')
plt.show()
🎯 Key Differences:
• axs - for working on individual plots
• fig - for settings that apply to the whole figure
🔄 Looping Over Subplots
fig, axs = plt.subplots(2, 2,
figsize=(8, 6))
ys = [y1, y2, y3, y4]
titles = ['x * 2', 'x squared', 'sqrt(x)', '10 - x']
for i in range(2):
for j in range(2):
idx = i * 2 + j
axs[i, j].plot(x, ys[idx])
axs[i, j].set_title(titles[idx])
plt.tight_layout()
plt.show()
This approach is ideal for:
- Dynamic or repetitive data series
- Creating dashboards
- Comparing multiple datasets efficiently
🎨 Introduction to Seaborn
What is Seaborn?
Seaborn is a Python library built on top of Matplotlib that makes it easier to create beautiful,
complex visualizations.
Why Choose Seaborn?
- ✨ Less Code: High-level interface for complex plots
- 🎨 Better Looking: Automatic styling and themes
- 📊 DataFrame Ready: Works seamlessly with pandas
- 🔧 Built-in Features: Statistical plots, color palettes, themes
- 📦 Built-in Datasets: Practice data included
Setup & Installation
!pip install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
🎭 Seaborn Themes
sns.set_theme(style="darkgrid")
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
sns.lineplot(x=x, y=y)
plt.title('Beautiful Line Plot')
plt.show()
📦 Built-in Datasets
Seaborn includes real-world datasets for practice and learning!
print(sns.get_dataset_names())
tips = sns.load_dataset('tips')
print(tips.head())
Common Datasets:
tips - Restaurant bills and tips
iris - Iris flower measurements
titanic - Titanic passenger data
flights - Flight passenger counts
penguins - Penguin species data
📊 Basic Plot Types
1. Line Plot
tips = sns.load_dataset('tips')
sns.lineplot(x="total_bill", y="tip", data=tips)
plt.title('Line Plot Example')
plt.show()
2. Scatter Plot with Color
sns.scatterplot(x="total_bill", y="tip",
hue="time", data=tips)
plt.title('Scatter Plot with Color by
Time')
plt.show()
3. Bar Plot
sns.barplot(x="day", y="total_bill", data=tips)
plt.title('Average Bill per Day')
plt.show()
4. Box Plot (Distribution Analysis)
sns.boxplot(x="day", y="total_bill", data=tips)
plt.title('Boxplot of Total Bill per Day')
plt.show()
📊 Boxplots show:
• Median (middle line)
• Quartiles (box edges)
• Outliers (dots)
• Data spread
5. Heatmap (Correlation Matrix)
flights = sns.load_dataset('flights')
pivot_table = flights.pivot("month", "year", "passengers")
sns.heatmap(pivot_table, annot=True,
fmt="d", cmap="YlGnBu")
plt.title('Heatmap of Passengers')
plt.show()
🐼 Working with Pandas DataFrames
import pandas as pd
df = pd.DataFrame({
"age": [22, 25, 47, 52, 46, 56, 55, 60, 34,
43],
"salary": [25000, 27000, 52000, 60000, 58000,
62000, 61000, 65000, 38000, 45000],
"gender": ["M", "F", "M", "F", "F",
"M", "M", "F", "F", "M"]
})
sns.scatterplot(x="age", y="salary", hue="gender", data=df)
plt.title('Salary vs Age by Gender')
plt.show()
📋 Matplotlib vs Seaborn Comparison
| Feature |
Matplotlib |
Seaborn |
| Default Styles |
Basic |
Beautiful ✨ |
| Syntax Level |
Low-level |
High-level |
| DataFrame Support |
Manual |
Native & Easy |
| Complex Plots |
Tedious |
Very Easy |
| Statistical Plots |
Manual calculation |
Built-in |
| Customization |
Full control |
Smart defaults + customizable |
💡 Best Practice:
• Start with Seaborn for quick, beautiful plots
• Use Matplotlib for fine-tuning and customization
• Combine both for maximum power!
🎯 Final Recommendations
- 📚 Learn Matplotlib basics - Foundation for customization
- 🚀 Use Seaborn daily - Faster development, prettier results
- 🔧 Combine both - Best of both worlds
- 📊 Practice with real data - Use built-in datasets
- 🎨 Experiment with styles - Find what works for you
🎓 Key Takeaway:
In real-world Data Science projects, Seaborn saves hours of manual work by offering higher-level,
smarter defaults. Start with Seaborn, customize with Matplotlib!