ML by Data Domain¶
The type of data you're working with largely determines which ML techniques are appropriate. This is one of the most practical mental models for choosing an approach.
Tabular Data¶
Rows and columns — spreadsheets, relational databases, CSVs.
Tree-based ensemble methods dominate and routinely outperform deep learning on tabular benchmarks.
Algorithms¶
| Method | Notes |
|---|---|
| XGBoost / LightGBM / CatBoost | Go-to for most tabular tasks; handles mixed types, missing values, and non-linearities well |
| Random Forest | Strong baseline; robust to overfitting via bagging |
| Linear models (Ridge, Lasso, Logistic Regression) | Fast, interpretable; good when relationships are approximately linear |
| SVMs | Effective on smaller datasets with clear decision boundaries |
| k-NN | Distance-based; sensitive to scale and dimensionality |
| Neural nets (TabNet, FT-Transformer) | Narrowing the gap with tree methods but still rarely win outright |
Common Tasks¶
- Classification — predict a category (churn, fraud, diagnosis)
- Regression — predict a continuous value (price, demand, risk score)
- Ranking — order items by relevance (search, recommendations)
- Anomaly detection — flag outliers (fraud, sensor faults)
Image Data & Computer Vision¶
Teaching machines to understand and interpret images and video.
Pixel grids — photos, medical scans, satellite imagery, video frames.
Convolutional Neural Networks (CNNs) have been the dominant architecture since 2012; Vision Transformers (ViTs) are now competitive at scale. Computer vision is almost exclusively a deep learning domain.
Core Tasks¶
Classification¶
Assign a label to an entire image.
- Input: image → Output: class label
- Example: X-ray image → "healthy" or "pneumonia" — model learns visual patterns associated with pathology
- Tools: ResNet, EfficientNet, ViT
Object Detection¶
Locate and classify multiple objects within an image. Returns bounding boxes + labels.
- Input: image → Output: bounding boxes + class + confidence score
- Example: detect all lesions in a scan and mark their locations
- Tools: YOLO (real-time, single-stage), DETR (transformer-based), Faster R-CNN (two-stage, more accurate)
Segmentation¶
Classify at the pixel level — the most granular form of understanding.
Two variants: - Semantic segmentation — label every pixel with a class (no distinction between instances) - Instance segmentation — label every pixel and distinguish separate instances of the same class
- Example: MRI scan → pixel mask showing exactly which voxels are tumor tissue
- Tools: U-Net (dominant in medical imaging), SAM (Segment Anything Model, zero-shot), Mask R-CNN
Algorithms & Tools¶
| Method | Type | Notes |
|---|---|---|
| ResNet / EfficientNet / ConvNeXt | Classification | CNN backbones; pretrain on ImageNet, fine-tune on domain data |
| ViT / Swin Transformer | Classification | Patch-based attention; strong at scale |
| YOLOv8 / YOLOv9 | Detection | Fast enough for real-time; good balance of speed and accuracy |
| Faster R-CNN | Detection | Two-stage; higher accuracy, slower |
| DETR | Detection | Transformer-based; no hand-crafted anchors |
| U-Net | Segmentation | Encoder-decoder with skip connections; standard in medical imaging |
| SAM (Segment Anything) | Segmentation | Foundation model; prompt-based, zero-shot capable |
| Mask R-CNN | Instance seg | Extends Faster R-CNN with pixel masks per instance |
Frameworks¶
- PyTorch — dominant in research and medical imaging
- TorchVision — pretrained models, datasets, transforms
- Hugging Face Transformers — ViT, DETR, SAM
- Ultralytics (YOLO) — detection and segmentation pipelines
- MONAI — PyTorch-based framework specifically for medical image analysis
Key Concepts¶
- Transfer learning — pretrain on large dataset (ImageNet), fine-tune on small domain-specific dataset (e.g., 500 X-rays); critical when labeled medical data is scarce
- Data augmentation — random flips, rotations, brightness shifts to artificially expand training data
- Backbone vs. head — backbone extracts features (ResNet, ViT); task-specific head performs classification/detection/segmentation on top
- Anchor boxes — predefined bounding box shapes used in older detection models (YOLO, Faster R-CNN); DETR eliminates these
Text / NLP¶
Teaching machines to understand, process, and generate human language.
Sequences of tokens — documents, records, conversations, code.
Transformers are now the universal architecture for NLP. The core idea is self-attention: the model learns which words in a sequence are relevant to each other, regardless of distance.
Real-World Examples¶
- Google Translate — sequence-to-sequence transformer maps tokens in one language to another
- Medical records analysis — extract diagnoses, medications, dosages, and patient history from unstructured clinical notes; enables downstream analytics without manual chart review
- Email spam filtering — text classification
- Legal document review — NER + classification to flag clauses, entities, obligations
Core Tasks¶
| Task | What it does | Example |
|---|---|---|
| Classification | Assign label to text | Sentiment, spam, ICD code from notes |
| NER | Extract named entities | Pull drug names, dosages, dates from records |
| Translation | Map sequence → sequence in another language | Google Translate |
| Summarization | Compress long text | Summarize a patient's discharge notes |
| Question answering | Extract or generate answer given context | "What medications is this patient on?" |
| Embeddings / retrieval | Represent text as dense vectors for similarity search | Semantic search, RAG pipelines |
Algorithms & Tools¶
| Method | Notes |
|---|---|
| BERT / RoBERTa | Encoder-only transformer; strong for classification, NER, extractive QA |
| GPT / LLaMA | Decoder-only; generative tasks — summarization, translation, chat |
| T5 / BART | Encoder-decoder; seq2seq tasks — translation, summarization |
| Sentence-BERT (SBERT) | Fine-tuned for sentence embeddings; semantic similarity and retrieval |
| TF-IDF + logistic regression | Fast baseline; works well when labeled data is limited |
| RNNs / LSTMs | Pre-transformer sequential models; largely superseded |
Frameworks¶
- Hugging Face Transformers — standard library; access to thousands of pretrained models
- spaCy — production NLP: tokenization, NER, dependency parsing
- LangChain / LlamaIndex — LLM orchestration, RAG pipelines
- NLTK — classical NLP utilities; preprocessing
Key Concepts¶
- Tokenization — text is split into subword tokens before being fed to the model
- Pretraining + fine-tuning — pretrain on massive corpus (Wikipedia, PubMed); fine-tune on small labeled dataset; essential when domain-specific labeled data is scarce (e.g., clinical NLP)
- Embeddings — words and sentences mapped to vectors in high-dimensional space; similar meanings → nearby vectors
- RAG (Retrieval-Augmented Generation) — retrieve relevant documents at inference time and feed them as context to the model; grounds outputs in real data
Time Series¶
Data where time and order matter — you can't shuffle the rows without destroying information.
Ordered numerical sequences — stock prices, website traffic, sensor readings, vital signs, energy consumption.
A mixed domain: classical statistical models remain competitive; deep learning wins when patterns are complex or data is abundant.
Real-World Examples¶
- Stock prices — forecast next-day closing price; detect anomalous trading activity
- Website traffic — predict hourly sessions to auto-scale infrastructure
- Patient vitals — detect early deterioration from continuous ICU monitor readings
- Energy demand — forecast grid load to optimize power generation
The Time Ordering Problem¶
Standard ML assumes i.i.d. (independent, identically distributed) samples — time series violates this. Key challenges: - Temporal leakage — accidentally training on future data; train/test splits must be chronological, never random - Non-stationarity — statistical properties (mean, variance) shift over time - Seasonality — patterns that repeat at fixed intervals (daily, weekly, annual)
Approaches¶
Classical Statistics¶
Best when data is limited, interpretability matters, or patterns are well-understood.
| Method | Notes |
|---|---|
| ARIMA | Models trend + autocorrelation; requires stationarity |
| SARIMA | Extends ARIMA with explicit seasonality term |
| Exponential Smoothing (ETS) | Weighted average of past observations; simple and robust |
| Prophet (Facebook) | Additive model for business time series; handles holidays and seasonality |
Tree Methods + Feature Engineering¶
Often the best-performing approach on benchmarks without large datasets.
- Compute lag features (value at t-1, t-7, t-30), rolling statistics (7-day mean, std), and calendar features (day of week, month)
- Feed into XGBoost / LightGBM as standard tabular ML
- Interpretable, fast, handles missing data well
Deep Learning¶
Preferred when patterns are complex, multivariate, or you have large volumes of data.
| Method | Notes |
|---|---|
| LSTMs / GRUs | Recurrent nets; capture sequential dependencies; slower to train |
| Temporal CNNs (TCN) | Dilated convolutions over time; parallelizable, faster than RNNs |
| N-BEATS / N-HiTS | Purpose-built neural forecasters; strong on univariate benchmarks |
| Temporal Fusion Transformer (TFT) | Attention + LSTM; strong on multivariate, handles static metadata |
| PatchTST | Treats time series patches as tokens; strong long-horizon performance |
Common Tasks¶
- Forecasting — predict future values (next hour, day, quarter)
- Anomaly detection — flag unusual patterns in logs, sensor streams, transactions
- Classification — label segments (normal vs. arrhythmia in ECG)
- Imputation — fill missing values in incomplete series
Audio¶
Waveforms or spectrograms — speech, music, environmental sound.
Raw waveforms can be processed directly or converted to 2D spectrograms and treated like images.
Algorithms¶
| Method | Notes |
|---|---|
| CNNs on mel-spectrograms | Convert audio → 2D image, apply image techniques |
| Wav2Vec 2.0 / HuBERT | Self-supervised pretraining on raw waveforms; strong for ASR |
| Whisper | Transformer-based ASR; robust across languages and accents |
| RNNs / CTC | Older ASR pipelines |
Common Tasks¶
- Speech recognition (ASR) — speech to text
- Speaker identification — who is speaking
- Sound classification — music genre, environmental sound
- Audio generation — text-to-speech, music synthesis
Graph Data¶
Nodes connected by edges — social networks, molecules, knowledge graphs.
Graph Neural Networks (GNNs) are purpose-built for this structure; they propagate information across edges.
Algorithms¶
| Method | Notes |
|---|---|
| GCN (Graph Convolutional Network) | Spectral-based convolution over graph |
| GraphSAGE | Inductive; samples and aggregates neighbor features |
| GAT (Graph Attention Network) | Attention weights over neighbors |
| GIN (Graph Isomorphism Network) | Theoretically expressive for graph classification |
Common Tasks¶
- Node classification — label each node (user type, protein function)
- Link prediction — will two nodes connect (friend recommendation, drug interaction)
- Graph classification — label whole graphs (molecule property prediction)
Video¶
Frames over time — combines spatial (image) and temporal structure.
Algorithms¶
| Method | Notes |
|---|---|
| 3D CNNs (C3D, I3D) | Extend 2D convolutions into the time dimension |
| Two-stream networks | Separate spatial and optical-flow streams |
| Video Transformers (TimeSformer, VideoMAE) | Self-attention across space and time |
Common Tasks¶
- Action recognition — classify what's happening in a clip
- Object tracking — follow entities across frames
- Video generation — text-to-video (Sora, Runway)
Quick Reference¶
| Domain | Primary architecture | Classic baseline |
|---|---|---|
| Tabular | Gradient boosting (XGBoost) | Linear model |
| Images | CNN / ViT | Logistic regression on pixels |
| Text | Transformer (BERT/GPT) | TF-IDF + logistic regression |
| Time series | Tree methods + lag features | ARIMA |
| Audio | CNN on spectrogram / Wav2Vec | MFCC + SVM |
| Graphs | GNN (GAT, GraphSAGE) | Node features + tree methods |
| Video | Video Transformer / 3D CNN | Frame-level CNN + pooling |