utils module

Utils module

Utility functions for consistent and robust handling of satellite and oceanographic data.

This module provides a collection of helper routines used across the VisuSat package. These utilities include:

  • Safe opening of NetCDF files using several possible backends (safe_open_dataset),

  • Conversion of compact timestamp strings into ISO 8601 format (parse_isodate),

  • Detection of velocity component variable names in Copernicus Marine datasets (detect_velocity_vars),

  • Escaping of LaTeX-sensitive characters for Matplotlib labels (escape_latex),

  • Compute and display basic statistical histograms for a geospatial dataset (plot_dataset_stats).

  • General-purpose functions used by plotting and data-access routines.

The goal of this module is to centralize small but essential operations to keep the rest of the codebase clean, consistent, and resilient across various data sources (EUMETSAT, CMEMS, CDSAPI, etc.).

visusat.utils.detect_velocity_vars(ds)

Detect available ocean velocity components in a Copernicus Marine dataset.

The function inspects the dataset to determine whether a valid pair of horizontal velocity variables is present. Several common CMEMS conventions are checked, including:

  • ("ugos", "vgos") : geostrophic currents from altimetry

  • ("uo", "vo") : total ocean currents from reanalyses or models

  • ("eastward_velocity", "northward_velocity") : alternative naming

The first matching pair is returned. If no valid pair is found, a KeyError is raised with a list of available variables.

Parameters:

ds (xarray.Dataset) – Dataset in which to search for velocity component variables.

Returns:

A tuple (u_var, v_var) giving the names of the detected eastward and northward velocity variables.

Return type:

(str, str)

Raises:

KeyError – If no known velocity variable pair is present in ds.

Notes

This helper function is mainly used by plotting routines to ensure that the correct velocity fields are extracted regardless of dataset naming conventions.

visusat.utils.escape_latex(text)

Escape LaTeX-sensitive characters in a string.

Parameters:

text (str) – Input string to sanitize for LaTeX compatibility.

Returns:

Escaped string, safe to use in LaTeX environments.

Return type:

str

Notes

  • Currently only escapes the percent symbol (%).

  • The function can be extended to support more LaTeX-sensitive characters.

visusat.utils.parse_isodate(date)

Convert different date-like objects into a clean ISO8601 string.

Accepted inputs:
  • numpy.datetime64

  • pandas.Timestamp

  • datetime.datetime

  • str (ISO8601 or compact YYYYMMDDhhmmss)

Parameters:

date (Any) – Date or datetime-like object to be converted.

Returns:

The corresponding ISO 8601 date-time string, e.g. "2025-01-13T12:30:00".

Return type:

str

Raises:

ValueError – If the input value cannot be interpreted as a valid date.

Notes

This helper function is typically used to format metadata extracted from satellite or model filenames.

visusat.utils.plot_dataset_stats(data, cmap='viridis')

Compute and display basic statistical histograms for a geospatial dataset.

This function generates three diagnostic plots for a given xarray.DataArray:

  1. A 1D histogram of the data values, filtered between the 1st and 99th percentiles to reduce the influence of outliers.

  2. A 2D histogram of longitude vs. data values, useful for identifying longitudinal biases or zonal structures.

  3. A 2D histogram of data values vs. latitude, useful for detecting latitudinal patterns.

Pixels outside the percentile-based threshold are removed before plotting. NaN values are automatically masked.

Parameters:
  • data (xarray.DataArray) – Input geospatial field. Must contain coordinates x (longitude) and y (latitude), and ideally attributes long_name and unit for axis labeling.

  • cmap (str, optional) – Colormap used for the 2D histograms. Defaults to "viridis".

Returns:

The function produces diagnostic figures but does not return any object.

Return type:

None

Notes

  • Requires Matplotlib.

  • Does not call plt.show() (left to the user).

  • Outlier filtering uses the 1st and 99th percentiles of the dataset.

  • The 2D histograms use flattened grids and ignore missing values.

  • Useful as an initial visual inspection or quality check of CMEMS or model fields.

visusat.utils.safe_open_dataset(path)

Open a NetCDF file using the first available compatible backend.

This function attempts to open the dataset sequentially using several xarray-compatible NetCDF engines. This is useful because different datasets may require different backends depending on how the file was encoded (NetCDF3, NetCDF4/HDF5, CF conventions, etc.).

The engines are tested in the following order:
  1. h5netcdf

  2. netcdf4

  3. scipy

The first successful engine is used to load and return the dataset. If none of the engines works, a RuntimeError is raised.

Parameters:

path (str or Path) – Path to the input NetCDF file.

Returns:

The opened dataset.

Return type:

xarray.Dataset

Raises:

RuntimeError – If none of the available backends can open the file.

Notes

  • h5netcdf is often the fastest backend and works well with modern NetCDF4/HDF5 files.

  • scipy can only read classic NetCDF3 files.

  • This function logs which backend succeeded or failed.