Examples

This page provides comprehensive examples covering all features of geoextent using datasets under 100MB for fast execution.

Remote Repository Examples

Geoextent supports extracting geospatial extent from multiple research data repositories.

Zenodo Example

Extract extent from a Zenodo atmospheric data repository (~50MB):

python -m geoextent -b -t https://doi.org/10.5281/zenodo.4593540

Or using just the DOI:

python -m geoextent -b -t 10.5281/zenodo.4593540

ZIP File Support

Geoextent automatically detects and extracts ZIP files from remote repositories, including nested archives. Extract from a Zenodo repository containing a single ZIP file (~1MB):

python -m geoextent -b https://doi.org/10.5281/zenodo.3446746

This will download the ZIP file, extract all geospatial data inside (GeoPackage, Shapefile, etc.), and calculate the spatial extent. Works with all supported repository providers.

PANGAEA Example

Extract extent from PANGAEA Arctic Ocean dataset (~1MB):

python -m geoextent -b -t https://doi.org/10.1594/PANGAEA.734969

PANGAEA datasets often include rich geospatial metadata. Both tabular and non-tabular data are supported, including datasets with downloadable files (GeoJSON, GeoTIFF, Shapefile, etc.):

# Non-tabular dataset with GeoTIFF files
python -m geoextent -b https://doi.org/10.1594/PANGAEA.913496

# Non-tabular dataset with GeoJSON files
python -m geoextent -b https://doi.org/10.1594/PANGAEA.858767

OSF Example

Extract extent from OSF geographic research data (~5MB):

python -m geoextent -b -t https://doi.org/10.17605/OSF.IO/4XE6Z

Multiple OSF identifier formats are supported:

python -m geoextent -b -t OSF.IO/4XE6Z
python -m geoextent -b -t https://osf.io/4xe6z/

GFZ Data Services Example

Extract extent from GFZ geothermal resources dataset (~30MB):

python -m geoextent -b -t 10.5880/GFZ.4.8.2023.004

Dryad Example

Extract extent from Dryad dataset:

python -m geoextent -b -t https://doi.org/10.5061/dryad.0k6djhb7x

TU Dresden Opara Example

Extract extent from TU Dresden Opara repository (DSpace 7.x):

python -m geoextent -b -t https://opara.zih.tu-dresden.de/items/4cdf08d6-2738-4c9e-9d27-345a0647ff7c

Multiple URL variants are supported:

python -m geoextent -b -t 10.25532/OPARA-581
python -m geoextent -b -t https://doi.org/10.25532/OPARA-581
python -m geoextent -b -t https://opara.zih.tu-dresden.de/handle/123456789/821

This example dataset contains glacier calving front locations with ZIP files containing nested directories and multiple shapefiles.

UKCEH (EIDC) Example

Extract extent from a UKCEH Environmental Information Data Centre dataset (metadata-only):

python -m geoextent -b -t --no-download-data 10.5285/dd35316a-cecc-4f6d-9a21-74a0f6599e9e

Download data and extract extent:

python -m geoextent -b -t 10.5285/dd35316a-cecc-4f6d-9a21-74a0f6599e9e

UKCEH supports both Apache datastore directory listings and data-package ZIP downloads. The provider tries the datastore first (enabling selective file download) and falls back to the ZIP if needed.

Convex hull from a multi-region UKCEH dataset (3 bounding boxes across Africa):

python -m geoextent -b --convex-hull --no-download-data 10.5285/3de48cb6-d1c2-446e-a652-57d329849361

DEIMS-SDR Example

Extract extent from a DEIMS-SDR ecological research dataset (metadata-only):

python -m geoextent -b -t https://deims.org/dataset/3d87da8b-2b07-41c7-bf05-417832de4fa2

Extract spatial extent from a DEIMS-SDR research site:

python -m geoextent -b https://deims.org/8eda49e9-1f4e-4f3e-b58e-e0bb25dc32a6

DEIMS-SDR is a metadata-only provider. It extracts geospatial boundaries (POINT, POLYGON, MULTIPOLYGON) and temporal ranges from the DEIMS-SDR REST API for long-term ecological research sites and datasets.

NFDI4Earth Knowledge Hub Example

Extract extent from the NFDI4Earth Knowledge Hub (metadata-only, via SPARQL):

# Schiffsdichte 2013 — North Sea shipping density (spatial only)
python -m geoextent -b https://onestop4all.nfdi4earth.de/result/dthb-82b6552d-2b8e-4800-b955-ea495efc28af/

# ESA Antarctic Ice Sheet — spatial + temporal extent (1994–2021)
python -m geoextent -b -t https://onestop4all.nfdi4earth.de/result/dthb-7b3bddd5af4945c2ac508a6d25537f0a/

NFDI4Earth is a metadata-only provider. It extracts WKT geometry and temporal ranges from the SPARQL endpoint. When a dataset has a landingPage URL that matches another supported provider, geoextent automatically follows it. Use --no-follow to stay with NFDI4Earth metadata:

python -m geoextent -b -t --no-follow https://onestop4all.nfdi4earth.de/result/dthb-82b6552d-2b8e-4800-b955-ea495efc28af/

STAC Catalog Example

Extract extent from any STAC (SpatioTemporal Asset Catalog) Collection. STAC Collections contain pre-computed bounding boxes and temporal intervals, so extraction is instant (metadata-only, no file downloads).

# US National Agriculture Imagery (Element84 Earth Search)
python -m geoextent -b -t https://earth-search.aws.element84.com/v1/collections/naip

# German forest structure (DLR EOC) — open-ended temporal range
python -m geoextent -b -t https://geoservice.dlr.de/eoc/ogc/stac/v1/collections/FOREST_STRUCTURE_DE_COVER_P1Y

# Switzerland population data (WorldPop)
python -m geoextent -b -t https://api.stac.worldpop.org/collections/CHE

Any URL pointing to a STAC Collection is supported — geoextent recognizes known STAC API hosts, /stac/ URL path patterns, and falls back to JSON content inspection. See Content Providers for the full list of known hosts.

CKAN Example

Extract extent from any CKAN open data portal. The generic CKAN provider works with all CKAN instances.

Metadata-only extraction (fast, no file downloads):

# GeoKur TU Dresden — global cropland extent with temporal range
python -m geoextent -b -t --no-download-data https://geokur-dmp.geo.tu-dresden.de/dataset/cropland-extent

# German GovData — Rhine surface water sampling dataset
python -m geoextent -b -t --no-download-data https://ckan.govdata.de/dataset/a-spatially-distributed-sampling-of-rhine-surface-water-for-non-target-screening

Data download (downloads files and extracts extent from contents):

# Ireland — downloads Shapefile of Dublin library locations
python -m geoextent -b https://data.gov.ie/dataset/libraries-dlr

# Australia — downloads GeoJSON of Gisborne neighbourhood precincts
python -m geoextent -b https://data.gov.au/dataset/gisborne-neighbourhood-character-precincts

Recommended: metadata-first strategy for CKAN datasets, which tries catalogue metadata first and falls back to data download if needed:

python -m geoextent -b -t --metadata-first https://ckan.govdata.de/dataset/a-spatially-distributed-sampling-of-rhine-surface-water-for-non-target-screening

The CKAN provider supports known hosts (instant matching) and unknown CKAN instances (verified via API probe). See Content Providers for the full list of known hosts.

GitHub Example

Extract extent from public GitHub repositories. The GitHub provider downloads geospatial files and extracts their spatial and temporal extent.

Repository root (all geospatial files):

python -m geoextent -b https://github.com/fraxen/tectonicplates

Specific subdirectory (only files under the given path):

python -m geoextent -b https://github.com/Nowosad/spDataLarge/tree/master/inst/raster

Skip non-geospatial files (recommended for repositories with many non-geospatial files):

python -m geoextent -b --download-skip-nogeo https://github.com/fraxen/tectonicplates

The GitHub provider preserves directory structure when downloading, which is essential for shapefile components and world files. Set the GITHUB_TOKEN environment variable for higher API rate limits (5000/hour vs 60/hour unauthenticated).

GitLab Example

Extract extent from public GitLab repositories on gitlab.com and self-hosted instances. The GitLab provider downloads geospatial files and extracts their spatial and temporal extent.

Repository root (all geospatial files):

python -m geoextent -b -t https://gitlab.com/bazylizon/seismicity

Specific subdirectory (only files under the given path):

python -m geoextent -b https://gitlab.com/eaws/eaws-regions/-/tree/master/public/outline

Self-hosted GitLab instance (RWTH Aachen):

python -m geoextent -b https://git.rwth-aachen.de/nfdi4earth/crosstopics/knowledgehub-maps/-/tree/main/maps/200_datasets/data

Skip non-geospatial files (recommended for repositories with many non-geospatial files):

python -m geoextent -b --download-skip-nogeo https://gitlab.com/bazylizon/seismicity

The GitLab provider supports nested namespace paths (group/subgroup/project), /-/tree/{ref}/{path} branch/subdirectory URLs, and .git suffixes. Self-hosted instances are detected via known hosts, hostname heuristic (contains “gitlab”), or API probe. Set the GITLAB_TOKEN environment variable for higher API rate limits.

Software Heritage Example

Extract extent from software artifacts archived by Software Heritage using persistent SWHIDs or browse URLs.

Browse origin URL with subdirectory (targeted extraction, minimal download):

python -m geoextent -b --download-skip-nogeo "https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/AWMC/geodata&path=Cultural-Data/political_shading/hasmonean"

Direct directory SWHID:

python -m geoextent -b --download-skip-nogeo swh:1:dir:92890dbe77bbe36ccba724673bc62c2764df4f5a

Software Heritage assigns persistent identifiers (SWHIDs) to every software artifact, providing long-term reproducibility. Set the SWH_TOKEN environment variable for higher API rate limits (1200/hour vs 120/hour anonymous).

Advanced Features

Convex Hull Extraction

Calculate the convex hull instead of just the bounding box for vector files:

python -m geoextent -b --convex-hull https://doi.org/10.5281/zenodo.4593540

This provides a more accurate representation of the actual spatial extent.

Placename Lookup

Add geographic context to your extracts:

# Using default GeoNames gazetteer (requires API key in .env)
python -m geoextent -b --placename https://doi.org/10.5281/zenodo.4593540

# Using Nominatim (no API key needed)
python -m geoextent -b --placename nominatim https://doi.org/10.1594/PANGAEA.734969

# Using Photon (no API key needed)
python -m geoextent -b --placename photon https://osf.io/4xe6z/

Size Limiting

Control download size when processing large repositories:

# Limit to 10MB total download (ordered method - files as returned by provider)
python -m geoextent -b --max-download-size 10MB https://doi.org/10.5281/zenodo.4593540

# Select smallest files first to maximize file coverage within size limit
python -m geoextent -b --max-download-size 100MB --max-download-method smallest https://doi.org/10.25532/OPARA-703

# Select largest files first (useful when you want the most substantial data)
python -m geoextent -b --max-download-size 500MB --max-download-method largest https://doi.org/10.5281/zenodo.4593540

# Random sampling with seed for reproducibility
python -m geoextent -b --max-download-size 50MB --max-download-method random --max-download-method-seed 42 https://doi.org/10.5281/zenodo.4593540

File Filtering

Skip non-geospatial files to save time:

python -m geoextent -b --download-skip-nogeo https://doi.org/10.5281/zenodo.4593540

Combine with size limits:

python -m geoextent -b --download-skip-nogeo --max-download-size 50MB https://osf.io/4xe6z/

Output Formats

Choose different output formats:

# Default GeoJSON format
python -m geoextent -b https://doi.org/10.1594/PANGAEA.734969

# Well-Known Text (WKT) format
python -m geoextent -b --format wkt https://doi.org/10.1594/PANGAEA.734969

# Well-Known Binary (WKB) hex format
python -m geoextent -b --format wkb https://doi.org/10.1594/PANGAEA.734969

Visualization

Generate geojson.io URLs for interactive visualization:

python -m geoextent -b --geojsonio https://doi.org/10.5281/zenodo.4593540

Open the visualization directly in your browser (without printing URL):

python -m geoextent -b --browse https://doi.org/10.5281/zenodo.4593540

Print URL and open in browser (use both options):

python -m geoextent -b --geojsonio --browse https://doi.org/10.5281/zenodo.4593540

Combine with other options:

python -m geoextent -b --convex-hull --geojsonio --browse https://doi.org/10.1594/PANGAEA.734969

Quiet Mode

Suppress progress bars and warnings for scripting:

python -m geoextent -b --quiet https://doi.org/10.1594/PANGAEA.734969

Perfect for shell scripts:

BBOX=$(python -m geoextent -b --format wkt --quiet https://doi.org/10.1594/PANGAEA.734969)
echo "Bounding box: $BBOX"

Local File Examples

All GeoJSON outputs automatically include extraction metadata with version information, input sources, and processing statistics. See the Extraction Metadata section in the Advanced Features documentation for details.

Single File Processing

Extract from GeoJSON:

python -m geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson

Extract from CSV:

python -m geoextent -b -t tests/testdata/csv/cities_NL.csv

Extract from Shapefile:

python -m geoextent -b -t tests/testdata/shapefile/muenster_ring.shp

Directory Processing

Process all files in a directory:

python -m geoextent -b -t tests/testdata/geojson/

With convex hull:

python -m geoextent -b --convex-hull tests/testdata/geojson/

Multiple Files

Process specific files together:

python -m geoextent -b -t tests/testdata/shapefile/muenster_ring.shp tests/testdata/csv/cities_NL.csv

Mix files and directories:

python -m geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files

Use --details for per-file breakdown:

python -m geoextent -b -t --details tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/csv/cities_NL.csv tests/testdata/geopackage/nc.gpkg

Convex hull from multiple files:

python -m geoextent -b --convex-hull tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files/districtes.geojson tests/testdata/csv/cities_NL.csv

Combined Examples

All Features Together

Extract with all features enabled:

python -m geoextent -b -t \
  --convex-hull \
  --placename nominatim \
  --max-download-size 50MB \
  --download-skip-nogeo \
  --format wkt \
  --geojsonio \
  https://doi.org/10.5281/zenodo.4593540

Multiple Repositories

Process multiple repositories together:

python -m geoextent -b \
  --max-download-size 20MB \
  https://doi.org/10.5281/zenodo.4593540 \
  https://doi.org/10.1594/PANGAEA.734969 \
  https://osf.io/4xe6z/

Docker Examples

Basic Docker Usage

Using Docker for remote repositories:

docker run --rm geoextent -b https://doi.org/10.5281/zenodo.4593540

With placename lookup:

docker run --rm --env-file .env geoextent -b --placename https://doi.org/10.5281/zenodo.4593540

Local files with Docker:

docker run --rm -v ${PWD}/tests/testdata:/data geoextent -b -t /data/geojson/

Performance Options

Parallel Downloads

Control download workers:

# Use 8 parallel workers
python -m geoextent -b --max-download-workers 8 https://doi.org/10.5281/zenodo.4593540

# Sequential downloads (slower but safer)
python -m geoextent -b --max-download-workers 1 https://doi.org/10.5281/zenodo.4593540

Testing Examples

Quick Repository Exploration

Explore a repository with minimal download:

python -m geoextent -b \
  --max-download-size 5MB \
  --max-download-method random \
  --quiet \
  https://doi.org/10.5281/zenodo.4593540

Format Coverage Examples

These examples demonstrate all supported formats using small datasets:

GeoJSON

python -m geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson

CSV

python -m geoextent -b -t tests/testdata/csv/cities_NL.csv

Shapefile

python -m geoextent -b tests/testdata/shapefile/muenster_ring.shp

GeoTIFF

python -m geoextent -b tests/testdata/tif/wf_100m_klas.tif

GeoPackage

python -m geoextent -b tests/testdata/geopackage/nc.gpkg

GPX

python -m geoextent -b -t tests/testdata/gpx/gpx1.1_with_all_fields.gpx

KML

python -m geoextent -b tests/testdata/kml/aasee.kml

GML

python -m geoextent -b tests/testdata/gml/clc_1000_PT.gml

FlatGeobuf

python -m geoextent -b tests/testdata/flatgeobuf/sample.fgb

Repository Provider Coverage

All Supported Providers

Examples for each repository provider:

Zenodo:

python -m geoextent -b https://doi.org/10.5281/zenodo.4593540

Figshare:

python -m geoextent -b https://figshare.com/articles/dataset/London_boroughs/11373984

Figshare (institutional portal - ICES Library):

python -m geoextent -b https://ices-library.figshare.com/articles/dataset/HELCOM_request_2022_for_spatial_data_layers_on_effort_fishing_intensity_and_fishing_footprint_for_the_years_2016-2021/20310255

Figshare (metadata-only - USDA Ag Data Commons with geospatial coverage):

python -m geoextent -b --no-download-data https://api.figshare.com/v2/articles/30753383

Dryad:

python -m geoextent -b https://datadryad.org/stash/dataset/doi:10.5061/dryad.0k6djhb7x

PANGAEA (tabular data):

python -m geoextent -b https://doi.org/10.1594/PANGAEA.734969

PANGAEA (non-tabular data - GeoTIFF files):

python -m geoextent -b https://doi.org/10.1594/PANGAEA.913496

OSF:

python -m geoextent -b https://doi.org/10.17605/OSF.IO/4XE6Z

GFZ Data Services:

python -m geoextent -b 10.5880/GFZ.4.8.2023.004

Dataverse:

python -m geoextent -b https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/12345

Pensoft:

python -m geoextent -b https://doi.org/10.3897/BDJ.2.e1068

TU Dresden Opara:

python -m geoextent -b 10.25532/OPARA-581

NFDI4Earth Knowledge Hub (OneStop4All URL):

python -m geoextent -b -t https://onestop4all.nfdi4earth.de/result/dthb-7b3bddd5af4945c2ac508a6d25537f0a/

STAC Catalog (any STAC Collection URL):

python -m geoextent -b -t https://earth-search.aws.element84.com/v1/collections/naip

GitHub:

python -m geoextent -b https://github.com/fraxen/tectonicplates

GitLab:

python -m geoextent -b -t https://gitlab.com/bazylizon/seismicity

GitLab (self-hosted):

python -m geoextent -b https://git.rwth-aachen.de/nfdi4earth/crosstopics/knowledgehub-maps/-/tree/main/maps/200_datasets/data

Interactive Showcase Notebooks

Explore geoextent’s capabilities through interactive Jupyter notebooks that demonstrate real-world usage with research data repositories.

Running Showcase Notebooks

Launch Binder

Click the Binder badge above to run the showcase notebooks in your browser without installation.

Local Setup

To run the showcase notebooks locally, install JupyterLab or the classic Jupyter Notebook. We recommend using a virtual environment:

cd showcase
pip install -r requirements.txt
pip install -r showcase/requirements.txt
pip install -e .

# Trust the notebook for full functionality
jupyter trust showcase/SG_01_Exploring_Research_Data_Repositories_with_geoextent.ipynb

# Start Jupyter
jupyter lab

Then open the local Jupyter Notebook server using the displayed link and open the notebook files (*.ipynb) in the showcase/ directory.

Note

The notebook must be trusted and the python-markdown extension should be installed so that variables within Markdown text can be shown.

Note

Some notebooks use paired notebooks based on Jupytext. Consult the Jupytext documentation before editing these notebooks.