Classical Machine Learning in Oceanography

Traditional ML algorithms remain powerful workhorses for oceanographic data — especially when interpretability and efficiency matter. I routinely apply these methods to satellite-derived and in situ datasets.

Ensemble Methods

  • Random Forest (RF) — water quality parameter estimation, land cover classification, species distribution modeling
  • XGBoost / Gradient Boosting — hypoxia prediction in the Gulf of Mexico, Chl-a estimation
  • Extra Trees, AdaBoost — feature importance ranking in multi-sensor datasets
Random Forest XGBoost Gradient Boosting

Regression & Classification

  • Support Vector Machine (SVM) — remote sensing image classification, benthic habitat mapping
  • Gaussian Process Regression (GPR) — spatiotemporal interpolation of ocean variables
  • Ridge / Lasso / Elastic Net — multivariate water quality analysis
  • K-Nearest Neighbors, Naive Bayes — baseline comparisons for classification tasks
SVM GPR Regression

Clustering & Dimensionality Reduction

  • K-Means, DBSCAN — oceanographic regime identification, eddy detection
  • PCA / t-SNE / UMAP — high-dimensional satellite data exploration
  • Hierarchical Clustering — ecological community analysis (PRIMER methods)
PCA K-Means UMAP

Key Applications

  • Chlorophyll-a & water quality estimation from Sentinel-2 / Landsat imagery
  • Hypoxia prediction in the Gulf of Mexico using multi-variable ocean data
  • Land use / land cover change modeling (CA-ANN, RF, SVM)
  • Species distribution modeling — MaxEnt, BRT, RF
  • Discharge–Chl-a relationship analysis in coastal systems
Water Quality Land Cover SDM

Deep Learning

Deep neural networks unlock pattern recognition capabilities far beyond classical methods — critical for handling the scale and complexity of satellite imagery, time series, and autonomous sensor data in oceanographic research.

CNN

Convolutional Neural Networks (CNN)

Image classification, object detection, semantic segmentation

I have designed and trained custom CNNs and fine-tuned pre-trained architectures for oceanographic and ecological tasks. Key work includes:

  • Imaging FlowCytobot (IFC) plankton classification — CNN pipeline for automated phytoplankton species identification in the Mississippi Sound
  • Marine mammal & shark detection — VGG-19 + custom CNN on static imagery
  • Cattle behavior classification — VGG-19, custom CNN on HPC/supercomputer (MSU GRI × USDA internship, 2023)
  • Satellite image segmentation — land cover, surface water dynamics, delta mapping
VGG-19 ResNet Custom CNN PyTorch
RNN / LSTM

Recurrent Networks & Temporal Modeling

Time series forecasting, sequential ocean data

LSTMs and GRUs are well-suited for oceanographic time series — capturing long-range temporal dependencies in water quality records, tide gauge data, and climate indices.

  • Long-term water quality trend modeling in the Mississippi Sound
  • Sea surface temperature and chlorophyll-a forecasting
  • Discharge–productivity lag analysis in estuarine systems
LSTM GRU Time Series
Transformer

Transformers & Attention Mechanisms

Vision Transformers, multi-modal learning, sequence modeling

Transformers have redefined state-of-the-art in both NLP and computer vision. For Earth observation, Vision Transformers (ViT) and their derivatives enable powerful multi-scale, multi-temporal analysis of satellite imagery.

  • Vision Transformer (ViT) for satellite scene classification
  • Swin Transformer for high-resolution land cover mapping
  • Multi-temporal attention for change detection in coastal zones
  • Cross-modal fusion — combining optical + SAR + LiDAR data
ViT Swin Transformer Hugging Face
GAN / VAE

Generative Models

Data augmentation, super-resolution, synthetic data

  • GANs — synthetic satellite image generation for data-scarce regions, super-resolution downscaling of ocean model outputs
  • Variational Autoencoders (VAE) — anomaly detection in ocean time series (HABs, hypoxia events)
  • Diffusion Models — exploratory use for climate downscaling applications
GAN VAE Super-Resolution

Earth & Geospatial Foundation Models

Foundation models — large models pre-trained on massive datasets and fine-tuned for downstream tasks — are revolutionizing Earth observation and geoscience. These models enable few-shot learning, zero-shot generalization, and powerful feature extraction across diverse remote sensing data.

Prithvi (NASA × IBM)

A geospatial foundation model pre-trained on 6 years of global Harmonized Landsat Sentinel-2 (HLS) data. Prithvi uses a masked autoencoder (MAE) architecture and supports multi-temporal, multi-spectral inputs.

  • Flood mapping, burn scar detection, crop segmentation
  • Fine-tuning for coastal land cover change in Bangladesh and Gulf Coast
  • Multi-temporal change detection in estuarine systems
NASA / IBM HLS Data MAE Architecture HuggingFace

SatMAE & Scale-MAE

Masked Autoencoders adapted for satellite imagery. SatMAE handles temporal and multi-spectral sequences; Scale-MAE leverages ground sampling distance (GSD) to achieve scale-aware feature learning.

  • Pre-training on Sentinel-1/2, Landsat, and Planet imagery
  • Zero-shot transfer to ocean colour and water quality tasks
  • Scale-aware feature extraction across different sensor resolutions
SatMAE Scale-MAE Multi-Spectral

Segment Anything Model (SAM)

Meta's SAM provides promptable, zero-shot image segmentation. Adapted for remote sensing (GeoSAM, SAM-Geo), it dramatically accelerates annotation and segmentation of satellite imagery features.

  • Automated delineation of water bodies, wetlands, and coastal features
  • Rapid mangrove and seagrass bed extraction from high-res imagery
  • Semi-automated training data generation for supervised models
  • SAM 2 — video/temporal segmentation for surface water dynamics time series
SAM / SAM 2 GeoSAM Zero-Shot

Clay Foundation Model

An open-source Earth observation foundation model trained on multi-sensor satellite data (Sentinel, Landsat, NAIP, LINZ). Clay uses a Vision Transformer backbone with metadata embeddings for time, location, and sensor.

  • Embeddings for downstream water quality and ocean colour tasks
  • Coastal change detection with minimal labeled data
  • Semantic similarity search across large satellite archives
Clay ViT Backbone Open Source Clay Docs

Aurora & Pangu-Weather (Atmospheric FMs)

Large-scale atmospheric and climate foundation models that challenge traditional NWP models in medium-range forecasting. Relevant for oceanography through ocean–atmosphere coupling.

  • Aurora (Microsoft) — 1.3B parameter atmospheric model trained on diverse reanalysis and forecast data; achieves state-of-the-art 5-day forecasts
  • Pangu-Weather (Huawei) — 3D Earth system transformer for global weather forecasting
  • FourCastNet (NVIDIA) — Fourier Neural Operator for high-resolution forecasting
  • Application: coupled ocean–atmosphere boundary layer analysis
Aurora Pangu-Weather FourCastNet

LLMs for Geoscience & Oceanography

Large Language Models and vision-language models (VLMs) are being adapted for scientific applications — from automated literature synthesis to multimodal satellite image Q&A.

  • GeoGPT / OceanGPT — domain-adapted LLMs for geoscience question answering
  • GPT-4o / Claude / Gemini — code generation, data analysis assistance, report drafting
  • CLIP / RemoteCLIP — zero-shot satellite image–text retrieval
  • LLaVA / InternVL — multimodal VLM for satellite image captioning and analysis
  • Retrieval-Augmented Generation (RAG) over scientific literature
OceanGPT RemoteCLIP RAG VLM

Tools & Frameworks

Core

Scientific Python Stack

NumPy, Pandas, SciPy, Xarray, Dask — for large n-dimensional ocean datasets; Matplotlib, Seaborn, Plotly, Cartopy — for publication-quality visualization.

NumPyXarrayDaskCartopy
ML/DL

Machine & Deep Learning Libraries

Scikit-learn, XGBoost, LightGBM for classical ML. PyTorch and TensorFlow/Keras for deep learning. Hugging Face transformers and timm for pre-trained model access and fine-tuning.

PyTorchTensorFlowScikit-learnHugging Face
Geo AI

Geospatial AI & Remote Sensing Tools

TorchGeo, Segment Geospatial (samgeo), GDAL/Rasterio, Google Earth Engine Python API, PySTAC for satellite data discovery and processing pipelines.

TorchGeosamgeoGEE PythonPySTAC
HPC

High-Performance Computing

Experience running ML workloads on MSU's HPC cluster (Orion / Atlas supercomputers). SLURM job scheduling, multi-GPU training with PyTorch DDP, parallelization with Dask and Joblib.

SLURMMulti-GPUDaskPyTorch DDP

Selected Projects & Publications

2025–2026

Ensemble ML for Water Quality in Mississippi Sound

GRI, Mississippi State University

Developed ensemble ML models (RF, XGBoost, SVR) fused with Landsat and Sentinel-2 imagery to map seasonal water quality dynamics in the Western Mississippi Sound. Achieved state-of-the-art accuracy for Chl-a, turbidity, and CDOM estimation.

Ensemble MLLandsatWater Quality Publication
2025

CNN Pipeline for Imaging FlowCytobot Plankton Classification

GRI, Mississippi State University

Designed a full CNN-based workflow for automated classification of phytoplankton and harmful algal bloom species from the Imaging FlowCytobot (IFC) deployed in the Mississippi Sound. Integrated with real-time data pipelines.

CNNIFCHABsPyTorch
2025

Machine Learning for Hypoxia Prediction — Gulf of Mexico

Published in Regional Studies in Marine Science

Applied and compared multiple ML algorithms (RF, XGBoost, ANN, SVM) for spatial and temporal prediction of hypoxic zones in the northern Gulf of Mexico using satellite, buoy, and cruise data.

HypoxiaGulf of MexicoXGBoost DOI
2024

CA-ANN Modeling — Sundarbans Delta Change

Published in IEEE JSTARS

Used Cellular Automata coupled with Artificial Neural Networks to model and project land use / land cover changes and delta dynamics in the Sundarbans, Bangladesh — one of the world's largest mangrove systems.

CA-ANNSundarbansIEEE DOI
2023

Cattle Behavior Classification — HPC Internship

GRI × USDA, MSU Supercomputer

Classified cattle behavior from video and sensor data using VGG-19 and a custom CNN, trained on MSU's High-Performance Computing (HPC) cluster. Selected as one of eight interns across USA universities.

VGG-19Custom CNNHPC Project

Useful Resources & Links

Learning & Courses