Skip to content

ML by Data Domain

The type of data you're working with largely determines which ML techniques are appropriate. This is one of the most practical mental models for choosing an approach.


Tabular Data

Rows and columns — spreadsheets, relational databases, CSVs.

Tree-based ensemble methods dominate and routinely outperform deep learning on tabular benchmarks.

Algorithms

Method Notes
XGBoost / LightGBM / CatBoost Go-to for most tabular tasks; handles mixed types, missing values, and non-linearities well
Random Forest Strong baseline; robust to overfitting via bagging
Linear models (Ridge, Lasso, Logistic Regression) Fast, interpretable; good when relationships are approximately linear
SVMs Effective on smaller datasets with clear decision boundaries
k-NN Distance-based; sensitive to scale and dimensionality
Neural nets (TabNet, FT-Transformer) Narrowing the gap with tree methods but still rarely win outright

Common Tasks

  • Classification — predict a category (churn, fraud, diagnosis)
  • Regression — predict a continuous value (price, demand, risk score)
  • Ranking — order items by relevance (search, recommendations)
  • Anomaly detection — flag outliers (fraud, sensor faults)

Image Data & Computer Vision

Teaching machines to understand and interpret images and video.

Pixel grids — photos, medical scans, satellite imagery, video frames.

Convolutional Neural Networks (CNNs) have been the dominant architecture since 2012; Vision Transformers (ViTs) are now competitive at scale. Computer vision is almost exclusively a deep learning domain.

Core Tasks

Classification

Assign a label to an entire image.

  • Input: image → Output: class label
  • Example: X-ray image → "healthy" or "pneumonia" — model learns visual patterns associated with pathology
  • Tools: ResNet, EfficientNet, ViT

Object Detection

Locate and classify multiple objects within an image. Returns bounding boxes + labels.

  • Input: image → Output: bounding boxes + class + confidence score
  • Example: detect all lesions in a scan and mark their locations
  • Tools: YOLO (real-time, single-stage), DETR (transformer-based), Faster R-CNN (two-stage, more accurate)

Segmentation

Classify at the pixel level — the most granular form of understanding.

Two variants: - Semantic segmentation — label every pixel with a class (no distinction between instances) - Instance segmentation — label every pixel and distinguish separate instances of the same class

  • Example: MRI scan → pixel mask showing exactly which voxels are tumor tissue
  • Tools: U-Net (dominant in medical imaging), SAM (Segment Anything Model, zero-shot), Mask R-CNN

Algorithms & Tools

Method Type Notes
ResNet / EfficientNet / ConvNeXt Classification CNN backbones; pretrain on ImageNet, fine-tune on domain data
ViT / Swin Transformer Classification Patch-based attention; strong at scale
YOLOv8 / YOLOv9 Detection Fast enough for real-time; good balance of speed and accuracy
Faster R-CNN Detection Two-stage; higher accuracy, slower
DETR Detection Transformer-based; no hand-crafted anchors
U-Net Segmentation Encoder-decoder with skip connections; standard in medical imaging
SAM (Segment Anything) Segmentation Foundation model; prompt-based, zero-shot capable
Mask R-CNN Instance seg Extends Faster R-CNN with pixel masks per instance

Frameworks

  • PyTorch — dominant in research and medical imaging
  • TorchVision — pretrained models, datasets, transforms
  • Hugging Face Transformers — ViT, DETR, SAM
  • Ultralytics (YOLO) — detection and segmentation pipelines
  • MONAI — PyTorch-based framework specifically for medical image analysis

Key Concepts

  • Transfer learning — pretrain on large dataset (ImageNet), fine-tune on small domain-specific dataset (e.g., 500 X-rays); critical when labeled medical data is scarce
  • Data augmentation — random flips, rotations, brightness shifts to artificially expand training data
  • Backbone vs. head — backbone extracts features (ResNet, ViT); task-specific head performs classification/detection/segmentation on top
  • Anchor boxes — predefined bounding box shapes used in older detection models (YOLO, Faster R-CNN); DETR eliminates these

Text / NLP

Teaching machines to understand, process, and generate human language.

Sequences of tokens — documents, records, conversations, code.

Transformers are now the universal architecture for NLP. The core idea is self-attention: the model learns which words in a sequence are relevant to each other, regardless of distance.

Real-World Examples

  • Google Translate — sequence-to-sequence transformer maps tokens in one language to another
  • Medical records analysis — extract diagnoses, medications, dosages, and patient history from unstructured clinical notes; enables downstream analytics without manual chart review
  • Email spam filtering — text classification
  • Legal document review — NER + classification to flag clauses, entities, obligations

Core Tasks

Task What it does Example
Classification Assign label to text Sentiment, spam, ICD code from notes
NER Extract named entities Pull drug names, dosages, dates from records
Translation Map sequence → sequence in another language Google Translate
Summarization Compress long text Summarize a patient's discharge notes
Question answering Extract or generate answer given context "What medications is this patient on?"
Embeddings / retrieval Represent text as dense vectors for similarity search Semantic search, RAG pipelines

Algorithms & Tools

Method Notes
BERT / RoBERTa Encoder-only transformer; strong for classification, NER, extractive QA
GPT / LLaMA Decoder-only; generative tasks — summarization, translation, chat
T5 / BART Encoder-decoder; seq2seq tasks — translation, summarization
Sentence-BERT (SBERT) Fine-tuned for sentence embeddings; semantic similarity and retrieval
TF-IDF + logistic regression Fast baseline; works well when labeled data is limited
RNNs / LSTMs Pre-transformer sequential models; largely superseded

Frameworks

  • Hugging Face Transformers — standard library; access to thousands of pretrained models
  • spaCy — production NLP: tokenization, NER, dependency parsing
  • LangChain / LlamaIndex — LLM orchestration, RAG pipelines
  • NLTK — classical NLP utilities; preprocessing

Key Concepts

  • Tokenization — text is split into subword tokens before being fed to the model
  • Pretraining + fine-tuning — pretrain on massive corpus (Wikipedia, PubMed); fine-tune on small labeled dataset; essential when domain-specific labeled data is scarce (e.g., clinical NLP)
  • Embeddings — words and sentences mapped to vectors in high-dimensional space; similar meanings → nearby vectors
  • RAG (Retrieval-Augmented Generation) — retrieve relevant documents at inference time and feed them as context to the model; grounds outputs in real data

Time Series

Data where time and order matter — you can't shuffle the rows without destroying information.

Ordered numerical sequences — stock prices, website traffic, sensor readings, vital signs, energy consumption.

A mixed domain: classical statistical models remain competitive; deep learning wins when patterns are complex or data is abundant.

Real-World Examples

  • Stock prices — forecast next-day closing price; detect anomalous trading activity
  • Website traffic — predict hourly sessions to auto-scale infrastructure
  • Patient vitals — detect early deterioration from continuous ICU monitor readings
  • Energy demand — forecast grid load to optimize power generation

The Time Ordering Problem

Standard ML assumes i.i.d. (independent, identically distributed) samples — time series violates this. Key challenges: - Temporal leakage — accidentally training on future data; train/test splits must be chronological, never random - Non-stationarity — statistical properties (mean, variance) shift over time - Seasonality — patterns that repeat at fixed intervals (daily, weekly, annual)

Approaches

Classical Statistics

Best when data is limited, interpretability matters, or patterns are well-understood.

Method Notes
ARIMA Models trend + autocorrelation; requires stationarity
SARIMA Extends ARIMA with explicit seasonality term
Exponential Smoothing (ETS) Weighted average of past observations; simple and robust
Prophet (Facebook) Additive model for business time series; handles holidays and seasonality

Tree Methods + Feature Engineering

Often the best-performing approach on benchmarks without large datasets.

  • Compute lag features (value at t-1, t-7, t-30), rolling statistics (7-day mean, std), and calendar features (day of week, month)
  • Feed into XGBoost / LightGBM as standard tabular ML
  • Interpretable, fast, handles missing data well

Deep Learning

Preferred when patterns are complex, multivariate, or you have large volumes of data.

Method Notes
LSTMs / GRUs Recurrent nets; capture sequential dependencies; slower to train
Temporal CNNs (TCN) Dilated convolutions over time; parallelizable, faster than RNNs
N-BEATS / N-HiTS Purpose-built neural forecasters; strong on univariate benchmarks
Temporal Fusion Transformer (TFT) Attention + LSTM; strong on multivariate, handles static metadata
PatchTST Treats time series patches as tokens; strong long-horizon performance

Common Tasks

  • Forecasting — predict future values (next hour, day, quarter)
  • Anomaly detection — flag unusual patterns in logs, sensor streams, transactions
  • Classification — label segments (normal vs. arrhythmia in ECG)
  • Imputation — fill missing values in incomplete series

Audio

Waveforms or spectrograms — speech, music, environmental sound.

Raw waveforms can be processed directly or converted to 2D spectrograms and treated like images.

Algorithms

Method Notes
CNNs on mel-spectrograms Convert audio → 2D image, apply image techniques
Wav2Vec 2.0 / HuBERT Self-supervised pretraining on raw waveforms; strong for ASR
Whisper Transformer-based ASR; robust across languages and accents
RNNs / CTC Older ASR pipelines

Common Tasks

  • Speech recognition (ASR) — speech to text
  • Speaker identification — who is speaking
  • Sound classification — music genre, environmental sound
  • Audio generation — text-to-speech, music synthesis

Graph Data

Nodes connected by edges — social networks, molecules, knowledge graphs.

Graph Neural Networks (GNNs) are purpose-built for this structure; they propagate information across edges.

Algorithms

Method Notes
GCN (Graph Convolutional Network) Spectral-based convolution over graph
GraphSAGE Inductive; samples and aggregates neighbor features
GAT (Graph Attention Network) Attention weights over neighbors
GIN (Graph Isomorphism Network) Theoretically expressive for graph classification

Common Tasks

  • Node classification — label each node (user type, protein function)
  • Link prediction — will two nodes connect (friend recommendation, drug interaction)
  • Graph classification — label whole graphs (molecule property prediction)

Video

Frames over time — combines spatial (image) and temporal structure.

Algorithms

Method Notes
3D CNNs (C3D, I3D) Extend 2D convolutions into the time dimension
Two-stream networks Separate spatial and optical-flow streams
Video Transformers (TimeSformer, VideoMAE) Self-attention across space and time

Common Tasks

  • Action recognition — classify what's happening in a clip
  • Object tracking — follow entities across frames
  • Video generation — text-to-video (Sora, Runway)

Quick Reference

Domain Primary architecture Classic baseline
Tabular Gradient boosting (XGBoost) Linear model
Images CNN / ViT Logistic regression on pixels
Text Transformer (BERT/GPT) TF-IDF + logistic regression
Time series Tree methods + lag features ARIMA
Audio CNN on spectrogram / Wav2Vec MFCC + SVM
Graphs GNN (GAT, GraphSAGE) Node features + tree methods
Video Video Transformer / 3D CNN Frame-level CNN + pooling