Master NumPy with Interactive Visualizations
The fundamental package for scientific computing in Python. Learn how to handle large, multi-dimensional arrays with speed and efficiency.
Why Use NumPy?
Python lists are flexible but slow for numerical computing. NumPy solves these problems with powerful array operations optimized in C.
Performance Comparison
Adding two arrays of 1,000,000 elements
Python List Memory
Scattered pointers: High overhead
NumPy Array Memory
Contiguous block: Cache efficient
import numpy as np
import time
size = 1_000_000
# Python list approach
list1 = list(range(size))
list2 = list(range(size))
start = time.time()
result = [x + y for x, y in zip(list1, list2)]
python_time = time.time() - start
# NumPy approach
arr1 = np.array(list1)
arr2 = np.array(list2)
start = time.time()
result = arr1 + arr2 # Vectorized operation
numpy_time = time.time() - start
print(f"Speedup: {python_time/numpy_time:.0f}x faster")
Creating Arrays
From Python Lists
import numpy as np
# 1D array
arr1 = np.array([1, 2, 3, 4, 5])
# 2D array
arr2 = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr2.shape) # (2, 3)
Built-in Methods
np.zeros((3, 3)) # 3x3 zeros
np.ones((2, 4)) # 2x4 ones
np.full((2, 2), 7) # Fill with 7
np.eye(4) # Identity matrix
np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # 5 values from 0 to 1
Interactive Array Builder
Indexing & Slicing
Important: Views vs Copies
Slicing returns a view, not a copy.
Modifying the slice modifies the original array. Use .copy() if you need
independence.
Basic Indexing
arr = np.array([10, 20, 30, 40, 50])
arr[0] # 10 (first)
arr[-1] # 50 (last)
arr[1:4] # [20, 30, 40]
arr[::2] # [10, 30, 50]
Boolean Masking
arr = np.array([10, 20, 30, 40, 50])
# Filter values > 25
mask = arr > 25
arr[mask] # [30, 40, 50]
# Direct filtering
arr[arr > 25] # Same result
Visual Indexing Demo
Multidimensional Arrays
Understanding axes is crucial. In a 2D array: Axis 0 = rows (vertical), Axis 1 = columns (horizontal).
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Access element
arr[1, 2] # 6 (row 1, col 2)
# Slice rows
arr[0:2, 1:3] # [[2, 3],
# [5, 6]]
# Operations along axes
np.sum(arr, axis=0) # Sum columns: [12, 15, 18]
np.sum(arr, axis=1) # Sum rows: [6, 15, 24]
# Modify column
arr[:, 1] = 0 # Set column 1 to 0
3D Array Structure
Data Types
NumPy arrays are homogeneous (same type). Choosing the right dtype optimizes memory and performance.
| Type | Description | Memory |
|---|---|---|
| int32 | Integer | 4 bytes |
| int64 | Integer (default) | 8 bytes |
| float32 | Single precision | 4 bytes |
| float64 | Double precision (default) | 8 bytes |
| bool | Boolean | 1 byte |
# Specify dtype
arr = np.array([1, 2, 3], dtype=np.float32)
# Convert dtype
arr_int = arr.astype(np.int32)
# Check memory
arr_int64 = np.array([1, 2, 3], dtype=np.int64) # 24 bytes
arr_int32 = np.array([1, 2, 3], dtype=np.int32) # 12 bytes
Broadcasting
Broadcasting allows operations on arrays of different shapes without creating copies. Smaller arrays are "stretched" to match larger ones.
arr1 = np.array([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3)
arr2 = np.array([10, 20, 30]) # Shape: (3,)
result = arr1 + arr2 # Broadcasting adds arr2 to each row
# [[11, 22, 33],
# [14, 25, 36]]
Real-world: Data Normalization
data = np.array([[10, 20, 30],
[15, 25, 35],
[20, 30, 40]])
mean = data.mean(axis=0) # [15, 25, 35]
std = data.std(axis=0) # [4.08, 4.08, 4.08]
normalized = (data - mean) / std # Broadcasting!
Mathematical Functions
np.mean()
Average of elements
np.std()
Standard deviation
np.sum()
Sum of elements
np.min() / np.max()
Minimum / Maximum
np.argmin() / np.argmax()
Index of min/max
np.cumsum()
Cumulative sum
arr = np.array([10, 20, 30, 40, 50])
np.mean(arr) # 30.0
np.std(arr) # 14.14...
np.percentile(arr, 50) # 30.0 (median)
np.corrcoef(arr1, arr2) # Correlation matrix