Machine Learning & Deep Learning

Algorithms, architectures, and workflows for supervised, unsupervised, and deep learning in scientific applications

Overview

Machine learning (ML) encompasses algorithms that learn patterns from data without explicit programming. Supervised learning (regression, classification) maps inputs to known outputs; unsupervised learning (clustering, dimensionality reduction) discovers hidden structure; deep learning uses multi-layer neural networks to learn hierarchical representations. In ocean science, ML powers satellite image classification, species distribution modeling, ocean state forecasting, and anomaly detection. Key Python frameworks include scikit-learn, TensorFlow, PyTorch, and XGBoost.

CategoryMethodsOcean Science Application
Supervised – RegressionLinear, SVR, RF, Gradient Boosting, MLPSST prediction, Chl-a retrieval
Supervised – ClassificationSVM, RF, XGBoost, CNNHabitat mapping, species ID
Unsupervised – ClusteringK-Means, DBSCAN, GMM, SOMWater mass classification, regime detection
Deep Learning – CNNResNet, U-Net, VGGSemantic segmentation of satellite imagery
Deep Learning – RNN/LSTMLSTM, GRU, ConvLSTMTime series forecasting, surge prediction
Deep Learning – TransformerVision Transformer, GPT-styleFoundation models for Earth observation
Physics-InformedPINNs, Neural ODEsConstrained ocean dynamics learning

Core Concepts

Bias-Variance Tradeoff

Model error = Bias² + Variance + Irreducible noise. Simple models (high bias) underfit; complex models (high variance) overfit. Cross-validation, regularization (L1/L2), and ensemble methods balance this tradeoff.

MSE = Bias²(f̂) + Var(f̂) + σ²

Neural Network Architecture

A neural network is a composition of linear transformations with nonlinear activations. Each layer: z = Wx + b, a = σ(z). Backpropagation computes gradients for weight updates via chain rule. Dropout, batch normalization, and skip connections improve training.

Convolutional Neural Networks

CNNs use learnable spatial filters (kernels) to extract local features from images. Pooling reduces spatial dimensions; stacked layers learn hierarchical features. U-Net is popular for semantic segmentation of satellite/sonar imagery.

Recurrent & Sequence Models

LSTMs and GRUs process sequential data with gating mechanisms that control information flow, solving the vanishing gradient problem. ConvLSTM combines spatial and temporal learning for spatiotemporal forecasting (e.g., SST fields).

Ensemble Methods

Random Forest (bagging decision trees) and Gradient Boosting (XGBoost, LightGBM) combine many weak learners into a powerful predictor. Feature importance measures aid interpretability.

Hyperparameter Tuning

Grid search, random search, and Bayesian optimization (Optuna, Hyperopt) find optimal hyperparameters. Nested cross-validation prevents information leakage between tuning and evaluation.

Interactive Visualizations

Learning Curves — Bias vs. Variance

Confusion Matrix — Habitat Classification

Training Loss & Validation Loss Over Epochs

Key References

  1. Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press.
  2. Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
  3. LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
  4. Reichstein, M. et al. (2019). Deep learning and process understanding for data-driven Earth system science. Nature, 566, 195–204.
  5. Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD, 785–794.