ML by Learning Paradigm¶

A complementary lens to data domain: the same dataset can be approached with different learning paradigms depending on what labels you have, what you're optimizing for, and how feedback is structured.

Three fundamental paradigms — supervised, unsupervised, reinforcement — plus modern variants that blur the lines.

Supervised Learning¶

Learn from labeled examples.

You provide both the input data and the correct answers. The algorithm learns to map inputs to outputs by minimizing its error on the provided examples. At inference time, it generalizes this mapping to unseen inputs.

Requires: labeled data (often expensive and time-consuming to collect)
Feedback signal: the difference between predicted output and ground truth label (loss function)

Structure¶

Input (X) + Label (Y)  →  Train model  →  Predict Y for new X

Examples¶

Task	Input	Label
Email spam detection	Email text	Spam / not spam
X-ray diagnosis	Chest X-ray image	Healthy / pneumonia
House price prediction	Square footage, location, etc.	Sale price
Speech recognition	Audio waveform	Transcript

Sub-tasks¶

Classification — output is a discrete class (binary or multiclass)
Regression — output is a continuous value
Structured prediction — output is a structured object (sequence, tree, bounding box)

Common Algorithms¶

Logistic / Linear Regression
Decision Trees, Random Forest, Gradient Boosting (XGBoost)
Support Vector Machines
Neural networks (when inputs are images, text, audio)

Unsupervised Learning¶

Find structure without a teacher.

No labels — only raw input data. There is no "right answer" to optimize against. The algorithm discovers patterns, groupings, or compressed representations inherent in the data.

Requires: unlabeled data (cheap and abundant)
Feedback signal: none external; optimization targets internal criteria (cluster cohesion, reconstruction error, etc.)

Structure¶

Input (X) only  →  Train model  →  Discover structure in X

Core Tasks¶

Clustering¶

Group similar data points together without predefined categories.

Example: segment customers by purchasing behavior — model discovers natural groups without being told what the groups are
Algorithms: k-Means, DBSCAN, Hierarchical clustering, Gaussian Mixture Models

Dimensionality Reduction¶

Compress high-dimensional data into fewer dimensions while preserving structure. Used for visualization, denoising, and as preprocessing.

Example: compress 10,000-gene expression profiles to 2D for visualization
Algorithms: PCA, t-SNE, UMAP, Autoencoders

Anomaly Detection¶

Identify data points that don't fit the learned distribution of "normal."

Example: flag unusual network traffic without labeled attack examples
Algorithms: Isolation Forest, One-Class SVM, Autoencoders

Generative Modeling¶

Learn the data distribution and sample new data points from it.

Example: generate synthetic medical images for training data augmentation
Algorithms: VAEs, GANs, Diffusion models

Reinforcement Learning¶

Learn through trial and error.

An agent takes actions in an environment, receives rewards for good actions and penalties for bad ones. Learning happens through repeated interaction — no labeled dataset, no static training set. The agent improves its policy (mapping from states to actions) to maximize cumulative reward over time.

Requires: a defined environment with a reward signal
Feedback signal: scalar reward (often delayed, sparse, or noisy)

Structure¶

Agent observes state  →  Takes action  →  Environment returns reward + new state  →  Agent updates policy  →  Repeat

Key Concepts¶

Policy — the agent's strategy; maps states to actions
Reward — scalar signal indicating how good an action was
Value function — estimated cumulative future reward from a given state
Exploration vs. exploitation — balance between trying new actions and using what already works
Delayed reward — the consequences of an action may only become clear many steps later (e.g., a bad move in chess)

Examples¶

Domain	Agent	Actions	Reward
Games	Game-playing AI (AlphaGo, OpenAI Five)	Board moves / controller inputs	Win/loss, score
Robotics	Robot arm	Joint torques	Task completion, efficiency
LLM fine-tuning (RLHF)	Language model	Token generation	Human preference score
Ad bidding	Bidding system	Bid amount	Revenue from click/conversion
Drug discovery	Molecule generator	Add/remove atoms	Predicted binding affinity

Common Algorithms¶

Algorithm	Notes
Q-Learning / DQN	Learn value of each action in each state; Deep Q-Network extends with neural nets
Policy Gradient (REINFORCE)	Directly optimize the policy; high variance
PPO (Proximal Policy Optimization)	Stable, widely used; powers most modern RL applications including RLHF
SAC (Soft Actor-Critic)	Off-policy; sample efficient; strong in continuous action spaces (robotics)
AlphaZero / MuZero	Combines RL with tree search; superhuman game-playing

Modern Variants¶

The three paradigms above are foundational but increasingly blended in practice.

Semi-Supervised Learning¶

A small amount of labeled data + a large amount of unlabeled data.

Labeling is expensive; unlabeled data is cheap. The model uses the unlabeled data to learn better representations, then refines with the small labeled set.

Example: you have 200 labeled X-rays and 50,000 unlabeled scans — semi-supervised learning uses all 50,200 images
Techniques: pseudo-labeling, consistency regularization, label propagation
Tools: FixMatch, MixMatch, graph-based methods

Self-Supervised Learning¶

The model creates its own labels from the structure of the data.

No human annotation required. The model is given a pretext task derived from the data itself — predicting a masked portion, predicting the next token, predicting if two image patches are from the same image. Solving this forces the model to learn rich, generalizable representations.

This is how modern LLMs are trained: given a sequence of text, predict the next token. The "label" is just the next word in the corpus — no human needed.
Also how vision foundation models train: mask patches of an image and predict the missing pixels (MAE), or learn that two augmented views of the same image should have similar embeddings (SimCLR, DINO)

Model	Pretext task	Learned representation used for
GPT / LLaMA	Predict next token	Text generation, reasoning, classification
BERT	Predict masked tokens (MLM)	Text classification, NER, QA
MAE	Predict masked image patches	Image classification, detection, segmentation
SimCLR / DINO	Contrastive: augmented views → same embedding	Visual features without labels

Transfer Learning¶

Not a paradigm on its own, but the dominant practical workflow: pretrain a model on a large task with abundant data (often self-supervised), then fine-tune on a small task-specific labeled dataset.

Self-supervised pretraining → supervised fine-tuning
Powers nearly all modern NLP and computer vision applications

Summary¶

Paradigm	Labeled data	Feedback signal	Typical use
Supervised	Yes (required)	Ground truth label	Classification, regression, detection
Unsupervised	No	Internal criteria	Clustering, dimensionality reduction, generation
Reinforcement	No (uses rewards)	Environment reward	Games, robotics, LLM alignment (RLHF)
Semi-supervised	Small amount	Labels + unlabeled structure	Low-label medical, vision tasks
Self-supervised	No (self-generated)	Pretext task error	LLM pretraining, vision foundation models