Command-Line Interface (CLI)¶
Basics¶
geoextent can be called on the command line with this command :
¶
usage: geoextent [-h] [--formats] [--list-features] [--version] [--debug] [--details] [--output] [output file] [--join] [-b] [-t] [--convex-hull] [--no-download-data] [--no-metadata-fallback] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--browse] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random,smallest,largest}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] [--keep-files] [--assume-wgs84] input1 [input2 ...]
- files¶
input file, directory, DOI, or repository URL (supports multiple inputs including mixed types)
- -h, --help¶
show help message and exit
- --formats¶
show supported formats
- --list-features¶
output machine-readable JSON with all supported file formats and content providers
- --version¶
show installed version
- --debug¶
turn on debug logging, alternatively set environment variable GEOEXTENT_DEBUG=1
- --details¶
Returns details of folder/zipFiles geoextent extraction
- --output <output>¶
Export results to a file. Format is auto-detected from extension: .gpkg (GeoPackage), .geojson/.json (GeoJSON), .csv (CSV). Works with single files, directories, and remote sources.
- -b, --bounding-box¶
extract spatial extent (bounding box)
- -t, --time-box¶
extract temporal extent (%Y-%m-%d)
- --time-format <format>¶
output format for temporal extents. Presets: ‘date’ (%Y-%m-%d, default), ‘iso8601’ (%Y-%m-%dT%H:%M:%SZ). Also accepts strftime format strings (e.g. ‘%Y/%m/%d %H:%M’).
- --convex-hull¶
extract convex hull instead of bounding box for vector geometries
- --no-download-data¶
for repositories: disable downloading data files and use metadata only (not recommended for most providers)
- --metadata-first¶
try metadata-only extraction first, fall back to data download if metadata yields no results (mutually exclusive with –no-download-data)
- --no-metadata-fallback¶
disable automatic metadata fallback when data download yields no files (by default, geoextent falls back to metadata-only extraction if data files are unavailable and the provider supports metadata)
- --no-follow¶
disable following external DOIs/URLs to other providers (e.g., DEIMS-SDR datasets referencing Zenodo). By default, geoextent follows these references to extract actual data extents.
- --no-progress¶
disable progress bars during download and extraction
- --quiet¶
suppress all console messages including warnings, progress bars, map preview messages, and terminal display (–map FILE still saves the image silently)
- --format {geojson,wkt,wkb}¶
output format for spatial extents (default: geojson)
- --no-subdirs¶
only process files in the top-level directory, ignore subdirectories
- --geojsonio¶
generate and print a clickable geojson.io URL for the extracted spatial extent
- --browse¶
open the geojson.io URL in the default web browser (use with –geojsonio to also print URL)
- --map <file>¶
save a map preview image of the spatial extent as PNG. If FILE is given, saves to that path; otherwise saves to a temporary file. (requires: pip install geoextent[preview])
- --preview¶
display a map preview of the spatial extent in the terminal (requires: pip install geoextent[preview])
- --map-dim <wxh>¶
dimensions of the map preview image in pixels (default: 600x400)
- --no-metadata¶
exclude extraction metadata and statistics from GeoJSON output
- --max-download-size <max_download_size>¶
maximum download size limit (e.g. ‘100MB’, ‘2GB’). Uses filesizelib for parsing.
- --max-download-method {ordered,random,smallest,largest}¶
method for selecting files when size limit is exceeded: ‘ordered’ (as returned by provider), ‘random’, ‘smallest’ (smallest files first), ‘largest’ (largest files first) (default: ordered)
- --max-download-method-seed <max_download_method_seed>¶
seed for random file selection when using –max-download-method random (default: 42)
- --placename¶
enable placename lookup using default gazetteer (geonames). Use –placename-service to specify a different gazetteer
- --placename-service {geonames,nominatim,photon}¶
specify gazetteer service for placename lookup (requires –placename)
- --placename-escape¶
escape Unicode characters in placename output (requires –placename)
- --ext-metadata¶
retrieve external metadata for DOIs (title, authors, publisher, publication year, URL, license) from CrossRef and DataCite
- --ext-metadata-method {auto,all,crossref,datacite}¶
method for retrieving external metadata: ‘auto’ (try CrossRef first, then DataCite), ‘all’ (query all sources), ‘crossref’ (CrossRef only), ‘datacite’ (DataCite only) (default: auto)
- --download-skip-nogeo¶
skip downloading files that don’t appear to contain geospatial data (e.g., PDFs, images, plain text)
- --download-skip-nogeo-exts <download_skip_nogeo_exts>¶
comma-separated list of additional file extensions to consider as geospatial (e.g., ‘.xyz,.las,.ply’)
- --max-download-workers <max_download_workers>¶
maximum number of parallel downloads (default: 4, set to 1 to disable parallel downloads)
- --keep-files¶
keep downloaded and extracted files instead of cleaning them up (for debugging purposes)
- --legacy¶
use traditional GIS coordinate order (longitude, latitude) instead of EPSG:4326 native order (latitude, longitude)
- --assume-wgs84¶
assume WGS84 (EPSG:4326) for raster files without projection information (e.g., world files without .prj). By default, ungeoreferenced rasters are skipped.
- -p <workers>, --parallel <workers>¶
enable parallel file extraction within directories. Without a number, uses all available CPU cores. Specify a number (e.g., -p 4) to set worker count. Default: sequential processing.
- --join¶
Join multiple exported files (from –output) into a single file. Requires –output to specify the destination.
Examples¶
Note
Depending on the local configuration, geoextent might need to be called with the python interpreter prepended:
python -m geoextent …
Show help message¶
geoextent -h
geoextent is a Python library for extracting geospatial and temporal extents of a file
or a directory of multiple geospatial data formats.
usage: geoextent [-h] [--formats] [--list-features] [--version] [--debug] [--details] [--output] [output file] [--join] [-b] [-t] [--convex-hull] [--no-download-data] [--no-metadata-fallback] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--browse] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random,smallest,largest}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] [--keep-files] [--assume-wgs84] input1 [input2 ...]
positional arguments:
files input file, directory, DOI, or repository URL
(supports multiple inputs including mixed types)
options:
-h, --help show help message and exit
--formats show supported formats
--list-features output machine-readable JSON with all supported file
formats and content providers
--version show installed version
--debug turn on debug logging, alternatively set environment
variable GEOEXTENT_DEBUG=1
--details Returns details of folder/zipFiles geoextent
extraction
--output OUTPUT Export results to a file. Format is auto-detected from
extension: .gpkg (GeoPackage), .geojson/.json
(GeoJSON), .csv (CSV). Works with single files,
directories, and remote sources.
-b, --bounding-box extract spatial extent (bounding box)
-t, --time-box extract temporal extent (%Y-%m-%d)
--time-format FORMAT output format for temporal extents. Presets: 'date'
(%Y-%m-%d, default), 'iso8601' (%Y-%m-%dT%H:%M:%SZ).
Also accepts strftime format strings (e.g. '%Y/%m/%d
%H:%M').
--convex-hull extract convex hull instead of bounding box for vector
geometries
--no-download-data for repositories: disable downloading data files and
use metadata only (not recommended for most providers)
--metadata-first try metadata-only extraction first, fall back to data
download if metadata yields no results (mutually
exclusive with --no-download-data)
--no-metadata-fallback
disable automatic metadata fallback when data download
yields no files (by default, geoextent falls back to
metadata-only extraction if data files are unavailable
and the provider supports metadata)
--no-follow disable following external DOIs/URLs to other
providers (e.g., DEIMS-SDR datasets referencing
Zenodo). By default, geoextent follows these
references to extract actual data extents.
--no-progress disable progress bars during download and extraction
--quiet suppress all console messages including warnings,
progress bars, map preview messages, and terminal
display (--map FILE still saves the image silently)
--format {geojson,wkt,wkb}
output format for spatial extents (default: geojson)
--no-subdirs only process files in the top-level directory, ignore
subdirectories
--geojsonio generate and print a clickable geojson.io URL for the
extracted spatial extent
--browse open the geojson.io URL in the default web browser
(use with --geojsonio to also print URL)
--map [FILE] save a map preview image of the spatial extent as PNG.
If FILE is given, saves to that path; otherwise saves
to a temporary file. (requires: pip install
geoextent[preview])
--preview display a map preview of the spatial extent in the
terminal (requires: pip install geoextent[preview])
--map-dim WxH dimensions of the map preview image in pixels
(default: 600x400)
--no-metadata exclude extraction metadata and statistics from
GeoJSON output
--max-download-size MAX_DOWNLOAD_SIZE
maximum download size limit (e.g. '100MB', '2GB').
Uses filesizelib for parsing.
--max-download-method {ordered,random,smallest,largest}
method for selecting files when size limit is
exceeded: 'ordered' (as returned by provider),
'random', 'smallest' (smallest files first), 'largest'
(largest files first) (default: ordered)
--max-download-method-seed MAX_DOWNLOAD_METHOD_SEED
seed for random file selection when using --max-
download-method random (default: 42)
--placename enable placename lookup using default gazetteer
(geonames). Use --placename-service to specify a
different gazetteer
--placename-service GAZETTEER
specify gazetteer service for placename lookup
(requires --placename)
--placename-escape escape Unicode characters in placename output
(requires --placename)
--ext-metadata retrieve external metadata for DOIs (title, authors,
publisher, publication year, URL, license) from
CrossRef and DataCite
--ext-metadata-method {auto,all,crossref,datacite}
method for retrieving external metadata: 'auto' (try
CrossRef first, then DataCite), 'all' (query all
sources), 'crossref' (CrossRef only), 'datacite'
(DataCite only) (default: auto)
--download-skip-nogeo
skip downloading files that don't appear to contain
geospatial data (e.g., PDFs, images, plain text)
--download-skip-nogeo-exts DOWNLOAD_SKIP_NOGEO_EXTS
comma-separated list of additional file extensions to
consider as geospatial (e.g., '.xyz,.las,.ply')
--max-download-workers MAX_DOWNLOAD_WORKERS
maximum number of parallel downloads (default: 4, set
to 1 to disable parallel downloads)
--keep-files keep downloaded and extracted files instead of
cleaning them up (for debugging purposes)
--legacy use traditional GIS coordinate order (longitude,
latitude) instead of EPSG:4326 native order (latitude,
longitude)
--assume-wgs84 assume WGS84 (EPSG:4326) for raster files without
projection information (e.g., world files without
.prj). By default, ungeoreferenced rasters are
skipped.
-p [WORKERS], --parallel [WORKERS]
enable parallel file extraction within directories.
Without a number, uses all available CPU cores.
Specify a number (e.g., -p 4) to set worker count.
Default: sequential processing.
--join Join multiple exported files (from --output) into a
single file. Requires --output to specify the
destination.
Examples:
geoextent -b path/to/directory_with_geospatial_data
geoextent -t path/to/file_with_temporal_extent
geoextent -b -t path/to/geospatial_files
geoextent -b -t --details path/to/zipfile_with_geospatial_data
geoextent -b -t file1.shp file2.csv file3.geopkg
geoextent -b -t --geojsonio --no-download-data 10.25928/HK1000
geoextent -t *.geojson
geoextent -b -t https://doi.org/10.1594/PANGAEA.918707 https://doi.pangaea.de/10.1594/PANGAEA.858767
geoextent -b --convex-hull https://zenodo.org/record/4567890 10.1594/PANGAEA.123456
geoextent -b --placename file.geojson
geoextent -b --placename --placename-service nominatim https://zenodo.org/record/123456
geoextent -b --placename --placename-service photon --placename-escape https://doi.org/10.3897/BDJ.13.e159973
Supported formats:
- CSV (comma-separated values) (.csv, .txt)
- Vector data (.shp, .shx, .dbf, .prj, .geojson, .json, .gpkg, .gdb, .gpx, .kml, .kmz, .gml, .fgb)
- Raster data (.tif, .tiff, .geotiff, .nc, .netcdf, .asc, .wld, .jgw, .pgw, .pngw, .tfw, .tifw, .bpw, .gfw)
- Point cloud data (.las, .laz)
Supported data repositories:
- Wikidata (wikidata.org)
- Dryad (datadryad.org)
- 4TU.ResearchData (data.4tu.nl)
- Figshare (figshare.com)
- Zenodo (zenodo.org)
- InvenioRDM (inveniosoftware.org/products/rdm)
- Pangaea (pangaea.de)
- OSF (osf.io)
- Dataverse (dataverse.org)
- GFZ (dataservices.gfz-potsdam.de)
- RADAR (radar-service.eu)
- Arctic Data Center (arcticdata.io)
- DataONE (dataone.org)
- GBIF (gbif.org)
- Pensoft (pensoft.net)
- BGR (geoportal.bgr.de)
- BAW (datenrepository.baw.de)
- MDI-DE (mdi-de.org)
- GDI-DE (geoportal.de)
- Opara (opara.zih.tu-dresden.de)
- Senckenberg (dataportal.senckenberg.de)
- CKAN (ckan.org)
- Mendeley Data (data.mendeley.com)
- DEIMS-SDR (deims.org)
- NFDI4Earth (onestop4all.nfdi4earth.de)
- HALO DB (halo-db.pa.op.dlr.de)
- SEANOE (seanoe.org)
- GeoScienceWorld (pubs.geoscienceworld.org)
- UKCEH (catalogue.ceh.ac.uk)
- STAC (stacspec.org)
- GitHub (github.com)
- GitLab (gitlab.com)
- Forgejo (codeberg.org)
- Software Heritage (softwareheritage.org)
- Remote Raster (COG) (cogeo.org)
Extract bounding box from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s, Spatial extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'bbox': [51.94881477206191,
7.6016807556152335,
51.974624029877454,
7.647256851196289],
'crs': '4326',
'file_size_bytes': 1695}
Extract time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -t muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'tbox': ['2018-11-14', '2018-11-14'],
'file_size_bytes': 1695}
Extract both bounding box and time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b -t muenster_ring_zeit.geojson
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s, Spatial extent extracted]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'bbox': [51.94881477206191,
7.6016807556152335,
51.974624029877454,
7.647256851196289],
'crs': '4326',
'tbox': ['2018-11-14', '2018-11-14'],
'file_size_bytes': 1695}
Folders or ZIP files(s)¶
Geoextent also supports queries for multiple files inside folders or ZIP file(s).
Extract both bounding box and time interval from a folder or zipfile¶
geoextent -b -t folder_two_files
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s]
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s, Processing districtes.geojson]
Processing directory: folder_two_files: 50%|█████ | 1/2 [00:00<00:00, 121.81item/s, Processing muenster_ring_zeit.geojson]
Merging results: 0it [00:00, ?it/s]
Merging results: 0it [00:00, ?it/s, folder_two_files]
{'format': 'folder',
'crs': '4326',
'bbox': [41.31703852240476,
2.052333387639205,
51.974624029877454,
7.647256851196289],
'tbox': ['2018-11-14', '2019-09-11']}
The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the folder or zipfile. The resulting coordinate reference system CRS of the combined bbox is always in the EPSG: 4326 system.
Multiple Inputs¶
Geoextent supports processing multiple files and/or directories in a single command. Results are merged into a single spatial and temporal extent.
Extract merged bounding box from multiple files¶
geoextent -b file1.geojson file2.csv file3.gpkg
Extract merged extent from files and directories¶
geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files
Extract convex hull from multiple files¶
geoextent -b --convex-hull tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files/districtes.geojson tests/testdata/csv/cities_NL.csv
Use --details to see per-file results alongside the merged extent:
geoextent -b -t --details tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/csv/cities_NL.csv
Remote Repositories¶
Geoextent supports extracting geospatial extent from multiple research data repositories including Zenodo, PANGAEA, OSF, Figshare, Dryad, GFZ Data Services, RADAR, Arctic Data Center, 4TU.ResearchData, B2SHARE, BAW, MDI-DE, GDI-DE, DEIMS-SDR, NFDI4Earth, GBIF, Dataverse, Pensoft, and GitHub repositories.
Extract from Zenodo¶
geoextent -b -t https://doi.org/10.5281/zenodo.4593540
Extract from PANGAEA¶
geoextent -b -t https://doi.org/10.1594/PANGAEA.734969
Extract from OSF¶
geoextent -b -t https://doi.org/10.17605/OSF.IO/4XE6Z
geoextent -b -t OSF.IO/4XE6Z
Extract from GFZ Data Services¶
geoextent -b -t 10.5880/GFZ.4.8.2023.004
Extract from RADAR¶
geoextent -b -t 10.35097/tvn5vujqfvf99f32
Extract from Arctic Data Center¶
geoextent -b -t 10.18739/A2Z892H2J
Extract from Arctic Data Center (metadata only)¶
geoextent -b --no-download-data 10.18739/A2Z892H2J
Extract from 4TU.ResearchData¶
geoextent -b -t https://data.4tu.nl/articles/_/12707150/1
Extract from 4TU.ResearchData (metadata only)¶
geoextent -b --no-download-data https://data.4tu.nl/articles/_/12707150/1
Extract from BAW-Datenrepository (landing page URL)¶
geoextent -b -t --no-data-download https://datenrepository.baw.de/trefferanzeige?docuuid=40936F66-3DD8-43D0-99AE-7CA5EF2E1287
Extract from BAW-Datenrepository (DOI, small measurement site)¶
geoextent -b -t --no-data-download 10.48437/02.2023.K.0601.0001
Extract from BAW-Datenrepository (DOI, sedimentology dataset)¶
geoextent -b -t --no-data-download 10.48437/929835b7fca4
Extract from MDI-DE (metadata only)¶
geoextent -b -t --no-download-data https://nokis.mdi-de-dienste.org/trefferanzeige?docuuid=00100e9d-7838-4563-9dd7-2570b0d932cb
Extract from MDI-DE (direct download)¶
geoextent -b -t https://nokis.mdi-de-dienste.org/trefferanzeige?docuuid=00100e9d-7838-4563-9dd7-2570b0d932cb
Extract from MDI-DE (WFS download, bare UUID)¶
geoextent -b -t c7d748c9-e12f-4038-a556-b1698eb4033e
Extract from GDI-DE (metadata only, geoportal.de URL)¶
geoextent -b -t --no-download-data https://www.geoportal.de/Metadata/75987CE0-AA66-4445-AC44-068B98390E89
Extract from GDI-DE (metadata only, bare UUID)¶
geoextent -b -t --no-download-data cdb2c209-7e08-4f4c-b500-69de926e3023
Extract from DEIMS-SDR (dataset)¶
geoextent -b -t https://deims.org/dataset/3d87da8b-2b07-41c7-bf05-417832de4fa2
Extract from DEIMS-SDR (site)¶
geoextent -b https://deims.org/8eda49e9-1f4e-4f3e-b58e-e0bb25dc32a6
Extract from GBIF (metadata only, by DOI)¶
geoextent -b -t --no-download-data 10.15468/6bleia
Extract from GBIF (metadata only, by dataset URL)¶
geoextent -b --no-download-data https://www.gbif.org/dataset/378651d7-c235-4205-a617-2939d6faa434
Extract from GBIF (DwC-A data download)¶
geoextent -b -t 10.15468/6bleia
Extract from GBIF with geojson.io preview¶
geoextent -b --geojsonio --no-download-data 10.15472/lavgys
Extract from SEANOE (metadata only, French Mediterranean CTD)¶
geoextent -b -t --no-download-data 10.17882/105467
Extract from SEANOE (data download, Ireland coastline)¶
geoextent -b 10.17882/109463
Extract from SEANOE (whale biologging with geojson.io preview)¶
geoextent -b -t --geojsonio --no-download-data 10.17882/112127
Extract from DEIMS-SDR without following external references¶
By default, DEIMS-SDR datasets that reference external repositories (e.g., Zenodo, PANGAEA) are followed for actual data extent extraction. Use --no-follow to disable this and use DEIMS metadata only:
geoextent -b -t --no-follow https://deims.org/dataset/3d87da8b-2b07-41c7-bf05-417832de4fa2
Extract from NFDI4Earth Knowledge Hub (OneStop4All URL)¶
geoextent -b -t https://onestop4all.nfdi4earth.de/result/dthb-7b3bddd5af4945c2ac508a6d25537f0a/
Extract from NFDI4Earth Knowledge Hub (Cordra URL)¶
geoextent -b https://cordra.knowledgehub.nfdi4earth.de/objects/n4e/dthb-82b6552d-2b8e-4800-b955-ea495efc28af
Extract from NFDI4Earth without following landing page¶
By default, NFDI4Earth datasets with a landingPage URL are followed to other supported providers. Use --no-follow to disable this and use NFDI4Earth SPARQL metadata only:
geoextent -b -t --no-follow https://onestop4all.nfdi4earth.de/result/dthb-82b6552d-2b8e-4800-b955-ea495efc28af/
Extract from GitHub¶
geoextent -b https://github.com/fraxen/tectonicplates
Extract from a specific subdirectory:
geoextent -b https://github.com/Nowosad/spDataLarge/tree/master/inst/raster
Skip non-geospatial files (recommended for repos with many non-geo files):
geoextent -b --download-skip-nogeo https://github.com/fraxen/tectonicplates
Extract from Software Heritage¶
geoextent -b --download-skip-nogeo "https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/AWMC/geodata&path=Cultural-Data/political_shading/hasmonean"
Extract from a directory SWHID:
geoextent -b --download-skip-nogeo swh:1:dir:92890dbe77bbe36ccba724673bc62c2764df4f5a
Extract from a Remote GeoTIFF (COG)¶
Extract extent directly from a remote Cloud Optimized GeoTIFF (COG) URL — only the file header is downloaded:
geoextent -b https://raw.githubusercontent.com/GeoTIFF/test-data/main/files/gfw-azores.tif
Extract with temporal extent:
geoextent -b -t https://zenodo.org/records/14711942/files/FSM_1-km_MED-epsg.4326_v01.tif
Smart metadata-first extraction¶
Use --metadata-first to try metadata-only extraction first, falling back to data download if the provider has no metadata or the metadata didn’t yield results. This is useful for batch extractions across multiple providers:
geoextent -b --metadata-first 10.12761/sgn.2018.10225
geoextent -b --metadata-first Q64
Extract from GEO Knowledge Hub (automatic metadata fallback)¶
Some providers (e.g., GEO Knowledge Hub packages) have data files disabled. Geoextent automatically falls back to metadata-only extraction when this happens:
geoextent -b https://gkhub.earthobservations.org/packages/msaw9-hzd25
To disable the automatic fallback, use --no-metadata-fallback:
geoextent -b --no-metadata-fallback https://gkhub.earthobservations.org/packages/msaw9-hzd25
Extract from three German regional datasets with a convex hull — Wikidata (Berlin), 4TU (Dresden), and Senckenberg all use fast metadata extraction, producing a compact convex hull over central Germany:
geoextent -b --convex-hull --metadata-first Q64 https://data.4tu.nl/datasets/3035126d-ee51-4dbd-a187-5f6b0be85e9f/1 10.12761/sgn.2018.10225
Download size limits¶
Use --max-download-size to cap how much data geoextent will download from a repository. The value accepts human-friendly size strings (parsed by filesizelib):
Format |
Meaning |
|---|---|
|
100 megabytes (decimal) |
|
2 gigabytes (decimal) |
|
500 kilobytes (decimal) |
|
10 mebibytes (binary) |
|
0.5 gibibytes (binary) |
|
1.5 terabytes (decimal) |
When the total download exceeds the limit, the CLI prompts for confirmation instead of silently truncating the file list. This works for all providers whose APIs report file sizes before download:
Zenodo: the download is approximately 45.2 MB (limit is 20 MB).
Proceed with download? [y/N]
Answering y retries with the actual size as the new limit. In non-interactive contexts (scripts, CI pipelines), geoextent exits with an error. To avoid the prompt entirely, use --no-download-data for metadata-only extraction or set a sufficiently large --max-download-size.
Note
The interactive prompt relies on providers reporting file sizes in their API metadata before download. Metadata-only providers (DEIMS-SDR, NFDI4Earth, HALO DB, Wikidata, Pensoft) do not download data files, so the size limit does not apply to them.
# Download at most 20 MB of data
geoextent -b -t --max-download-size 20MB 10.23728/b2share.26jnj-a4x24
# Limit GBIF DwC-A download to 500 MB
geoextent -b -t --max-download-size 500MB 10.15468/6bleia
# Use binary units
geoextent -b --max-download-size 0.5GiB 10.5281/zenodo.4593540
For GBIF datasets, Darwin Core Archive (DwC-A) downloads have an additional built-in soft limit of 1 GB. When a DwC-A archive exceeds this limit (or the --max-download-size value, whichever is smaller), the CLI also prompts interactively.
You can trigger this prompt intentionally by setting a very small limit:
$ geoextent -b --max-download-size 1KB 10.5281/zenodo.820562
Zenodo: the download is approximately 2.3 MB (limit is 0 MB).
Proceed with download? [y/N] N
Answering N (or pressing Enter) cancels the download and produces no output.
Comparing extraction modes: metadata, download, and convex hull¶
The following three calls on an Arctic Data Center dataset of ice wedge thermokarst polygons at Point Lay, Alaska illustrate how --no-download-data and --convex-hull affect the output geometry.
1. Metadata-only extraction (--no-download-data): Uses the bounding box stored in the repository metadata — fast, no file downloads. The bbox is slightly larger because it comes from the dataset-level metadata rather than the actual geometries:
geoextent -b -t --no-download-data 10.18739/A2Z892H2J
Output bbox: [-163.049, 69.721, -162.935, 69.760] with tbox [1949-01-01, 2020-01-01].
View on geojson.io
2. Full download extraction (default): Downloads the 2 GeoJSON files (1.6 MB) and computes the merged bounding box from the actual feature geometries — tighter than metadata:
geoextent -b -t 10.18739/A2Z892H2J
Output bbox: [-163.027, 69.723, -162.931, 69.751].
View on geojson.io
3. Convex hull extraction (--convex-hull): Downloads the same files but computes a convex hull around all feature vertices instead of an axis-aligned bounding box — most precise representation of the data footprint:
geoextent -b -t --convex-hull 10.18739/A2Z892H2J
The three modes yield progressively tighter representations: metadata bbox > download bbox > convex hull. Use --no-download-data for speed when approximate extents suffice, or --convex-hull for the most faithful footprint of the actual data.
The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the repository. The resulting coordinate reference system CRS of the combined bbox is always in the EPSG: 4326 system.
For comprehensive examples including all supported repositories and advanced features, see Examples.
Parallel extraction¶
Use -p / --parallel to extract extents from files within a directory in parallel using multiple threads. This speeds up processing of directories with many geodata files:
# Auto-detect CPU count
geoextent -p -b -t path/to/directory
# Use 4 workers
geoextent -p 4 -b -t path/to/directory
# Parallel extraction from a remote repository
geoextent -p -b -t https://doi.org/10.5281/zenodo.4593540
Without -p, files are processed sequentially (the default).
Note
geoextent extracts spatial extents by reading file headers, which is very fast (a few milliseconds per file regardless of file size). Parallel extraction helps most when a directory contains many files (tens or more), where the per-file I/O latency adds up. For directories with only a few files, sequential processing is already fast and -p provides little benefit.
Debugging¶
You can enable detailed logs by passing the --debug option, or by setting the environment variable GEOEXTENT_DEBUG=1.
geoextent --debug -b -t muenster_ring_zeit.geojson
GEOEXTENT_DEBUG=1 geoextent -b -t muenster_ring_zeit.geojson
Details¶
You can enable details for folders and ZIP files by passing the --details option, this option allows you to access
to the geoextent of the individual files inside the folders/ ZIP files used to compute the aggregated bounding box (bbox)
or time box (tbox).
geoextent --details -b -t folder_one_file
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s]
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s, Processing muenster_ring_zeit.geojson]
Merging results: 0it [00:00, ?it/s]
Merging results: 0it [00:00, ?it/s, folder_one_file]
{'format': 'folder',
'crs': '4326',
'bbox': {'type': 'Polygon',
'coordinates': [[[51.94881477206191, 7.608118057250977],
[51.953258408047034, 7.602796554565429],
[51.96537036973145, 7.6016807556152335],
[51.97361943924433, 7.606401443481445],
[51.974624029877454, 7.62125015258789],
[51.97240332571046, 7.636871337890624],
[51.96817310852836, 7.645368576049805],
[51.96780294552556, 7.645540237426757],
[51.96330786509095, 7.6471710205078125],
[51.95807185013927, 7.647256851196289],
[51.953258408047034, 7.643308639526367],
[51.94881477206191, 7.608118057250977]]]},
'convex_hull': True,
'tbox': ['2018-11-14', '2018-11-14']}
Map preview¶
Generate a map preview to a temporary file (requires pip install geoextent[preview]):
geoextent --map -b muenster_ring_zeit.geojson
Save the map to a specific file:
geoextent -b --map extent.png muenster_ring_zeit.geojson
Display the map directly in the terminal:
geoextent -b --preview muenster_ring_zeit.geojson
Save to a specific file and display in the terminal:
geoextent -b --map extent.png --preview muenster_ring_zeit.geojson
Customize the image dimensions (default: 600x400):
geoextent -b --map extent.png --map-dim 800x600 muenster_ring_zeit.geojson
The path of the saved map is always printed to stderr (suppressed by --quiet).
For more details on map preview options, see Core Features.
Export to file¶
Export extraction results to a file. The format is auto-detected from the file extension:
Single file to GeoPackage:
geoextent -b -t --output result.gpkg tests/testdata/geojson/muenster_ring_zeit.geojson
Directory to GeoJSON:
geoextent -b -t --output result.geojson tests/testdata/folders/folder_two_files
Multiple files to CSV:
geoextent -b -t --output result.csv file1.shp file2.geojson
Convex hull geometry:
geoextent -b --convex-hull --output hull.gpkg tests/testdata/folders/folder_two_files
CSV with WKB geometry (via –format):
geoextent -b --format wkb --output result.csv tests/testdata/folders/folder_two_files
For more details on export options, see Core Features.
Join export files¶
Merge multiple exported files into a single file. Summary rows are excluded — only individual-file features are kept. Input files can be any supported format; the output format is auto-detected from the extension:
geoextent --join --output merged.gpkg run1.gpkg run2.gpkg
Cross-format join (GeoJSON + GPKG -> CSV):
geoextent --join --output combined.csv run1.geojson run2.gpkg
For more details, see Core Features.