Master NumPy with Interactive Visualizations

The fundamental package for scientific computing in Python. Learn how to handle large, multi-dimensional arrays with speed and efficiency.

01

Why Use NumPy?

Python lists are flexible but slow for numerical computing. NumPy solves these problems with powerful array operations optimized in C.

Performance Comparison

Python List (Loop) ~120ms
NumPy Array (Vectorized) ~1.2ms

Adding two arrays of 1,000,000 elements

Python List Memory

Scattered pointers: High overhead

NumPy Array Memory

Contiguous block: Cache efficient

import numpy as np
import time

size = 1_000_000

# Python list approach
list1 = list(range(size))
list2 = list(range(size))
start = time.time()
result = [x + y for x, y in zip(list1, list2)]
python_time = time.time() - start

# NumPy approach
arr1 = np.array(list1)
arr2 = np.array(list2)
start = time.time()
result = arr1 + arr2  # Vectorized operation
numpy_time = time.time() - start

print(f"Speedup: {python_time/numpy_time:.0f}x faster")
02

Creating Arrays

From Python Lists

import numpy as np

# 1D array
arr1 = np.array([1, 2, 3, 4, 5])

# 2D array
arr2 = np.array([[1, 2, 3], 
                 [4, 5, 6]])

print(arr2.shape)  # (2, 3)

Built-in Methods

np.zeros((3, 3))        # 3x3 zeros
np.ones((2, 4))         # 2x4 ones
np.full((2, 2), 7)      # Fill with 7
np.eye(4)               # Identity matrix
np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)    # 5 values from 0 to 1

Interactive Array Builder

Click a button to generate array
03

Indexing & Slicing

Important: Views vs Copies

Slicing returns a view, not a copy. Modifying the slice modifies the original array. Use .copy() if you need independence.

Basic Indexing

arr = np.array([10, 20, 30, 40, 50])

arr[0]       # 10 (first)
arr[-1]      # 50 (last)
arr[1:4]     # [20, 30, 40]
arr[::2]     # [10, 30, 50]

Boolean Masking

arr = np.array([10, 20, 30, 40, 50])

# Filter values > 25
mask = arr > 25
arr[mask]    # [30, 40, 50]

# Direct filtering
arr[arr > 25]  # Same result

Visual Indexing Demo

04

Multidimensional Arrays

Understanding axes is crucial. In a 2D array: Axis 0 = rows (vertical), Axis 1 = columns (horizontal).

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Access element
arr[1, 2]      # 6 (row 1, col 2)

# Slice rows
arr[0:2, 1:3]  # [[2, 3],
               #  [5, 6]]
# Operations along axes
np.sum(arr, axis=0)  # Sum columns: [12, 15, 18]
np.sum(arr, axis=1)  # Sum rows: [6, 15, 24]

# Modify column
arr[:, 1] = 0  # Set column 1 to 0

3D Array Structure

1
2
3
4
5
6
Sheet 0 (axis 0)
+
7
8
9
10
11
12
Sheet 1 (axis 0)
arr3D[0, 1, 2] → 6 (Sheet 0, Row 1, Col 2)
05

Data Types

NumPy arrays are homogeneous (same type). Choosing the right dtype optimizes memory and performance.

Type Description Memory
int32 Integer 4 bytes
int64 Integer (default) 8 bytes
float32 Single precision 4 bytes
float64 Double precision (default) 8 bytes
bool Boolean 1 byte
# Specify dtype
arr = np.array([1, 2, 3], dtype=np.float32)

# Convert dtype
arr_int = arr.astype(np.int32)

# Check memory
arr_int64 = np.array([1, 2, 3], dtype=np.int64)   # 24 bytes
arr_int32 = np.array([1, 2, 3], dtype=np.int32)   # 12 bytes
06

Broadcasting

Broadcasting allows operations on arrays of different shapes without creating copies. Smaller arrays are "stretched" to match larger ones.

Array 1 (2x3)
1
2
3
4
5
6
+
Array 2 (1x3) → Broadcasted
10
20
30
10
20
30
arr1 = np.array([[1, 2, 3], [4, 5, 6]])      # Shape: (2, 3)
arr2 = np.array([10, 20, 30])                 # Shape: (3,)

result = arr1 + arr2  # Broadcasting adds arr2 to each row
# [[11, 22, 33],
#  [14, 25, 36]]

Real-world: Data Normalization

data = np.array([[10, 20, 30],
                 [15, 25, 35],
                 [20, 30, 40]])

mean = data.mean(axis=0)    # [15, 25, 35]
std = data.std(axis=0)      # [4.08, 4.08, 4.08]

normalized = (data - mean) / std  # Broadcasting!
07

Mathematical Functions

np.mean()

Average of elements

np.std()

Standard deviation

np.sum()

Sum of elements

np.min() / np.max()

Minimum / Maximum

np.argmin() / np.argmax()

Index of min/max

np.cumsum()

Cumulative sum

arr = np.array([10, 20, 30, 40, 50])

np.mean(arr)        # 30.0
np.std(arr)         # 14.14...
np.percentile(arr, 50)  # 30.0 (median)
np.corrcoef(arr1, arr2) # Correlation matrix