Command-Line Interface (CLI)¶
Basics¶
geoextent can be called on the command line with this command :
¶
usage: geoextent [-h] [--formats] [--list-features] [--version] [--debug] [--details] [--output] [output file] [--join] [-b] [-t] [--convex-hull] [--no-download-data] [--no-metadata-fallback] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--browse] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random,smallest,largest}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] [--keep-files] [--assume-wgs84] input1 [input2 ...]
- files¶
input file, directory, DOI, or repository URL (supports multiple inputs including mixed types). Use ‘-’ to read text from stdin (requires –text-method). May be empty when –text STRING is used.
- -h, --help¶
show help message and exit
- --formats¶
show supported formats
- --list-features¶
output machine-readable JSON with all supported file formats and content providers
- --list-periods¶
output the bundled named-time-period gazetteer (ICS GTS2020) with licensing and provenance metadata. Useful for downstream UIs and autocomplete widgets. Default format is JSON; use –list-periods-format to switch to a plain-text table.
- --list-periods-format {json,text}¶
output format for –list-periods (default: json)
- --list-periods-filter <substr>¶
case-insensitive substring match on period name or alias; only matching periods are listed
- --version¶
show installed version
- --debug¶
turn on debug logging, alternatively set environment variable GEOEXTENT_DEBUG=1
- --details¶
Returns details of folder/zipFiles geoextent extraction
- --output <output>¶
Export results to a file. Format is auto-detected from extension: .gpkg (GeoPackage), .geojson/.json (GeoJSON), .csv (CSV). Works with single files, directories, and remote sources.
- -b, --bounding-box¶
extract spatial extent (bounding box)
- -t, --time-box¶
extract temporal extent (%Y-%m-%d)
- --time-format <format>¶
output format for temporal extents. Presets: ‘date’ (%Y-%m-%d, default), ‘iso8601’ (%Y-%m-%dT%H:%M:%SZ). Also accepts strftime format strings (e.g. ‘%Y/%m/%d %H:%M’).
- --convex-hull¶
extract convex hull instead of bounding box for vector geometries
- --no-download-data¶
for repositories: disable downloading data files and use metadata only (not recommended for most providers)
- --metadata-first¶
try metadata-only extraction first, fall back to data download if metadata yields no results (mutually exclusive with –no-download-data)
- --no-metadata-fallback¶
disable automatic metadata fallback when data download yields no files (by default, geoextent falls back to metadata-only extraction if data files are unavailable and the provider supports metadata)
- --no-follow¶
disable following external DOIs/URLs to other providers (e.g., DEIMS-SDR datasets referencing Zenodo). By default, geoextent follows these references to extract actual data extents.
- --no-progress¶
disable progress bars during download and extraction
- --quiet¶
suppress all console messages including warnings, progress bars, map preview messages, and terminal display (–map FILE still saves the image silently)
- --format {geojson,wkt,wkb}¶
output format for spatial extents (default: geojson)
- --no-subdirs¶
only process files in the top-level directory, ignore subdirectories
- --geojsonio¶
generate and print a clickable geojson.io URL for the extracted spatial extent. GeoJSON payloads up to ~150 KB are embedded in the URL fragment; larger payloads attempt an anonymous GitHub Gist upload which now requires auth (geojsonio limitation, not geojson.io). Pair with –convex-hull to keep the payload small.
- --browse¶
open the geojson.io URL in the default web browser (use with –geojsonio to also print URL)
- --map <file>¶
save a map preview image of the spatial extent as PNG. If FILE is given, saves to that path; otherwise saves to a temporary file. (requires: pip install geoextent[preview])
- --preview¶
display a map preview of the spatial extent in the terminal (requires: pip install geoextent[preview])
- --map-dim <wxh>¶
dimensions of the map preview image in pixels (default: 600x400)
- --no-metadata¶
exclude extraction metadata and statistics from GeoJSON output
- --max-download-size <max_download_size>¶
maximum download size limit (e.g. ‘100MB’, ‘2GB’). Uses filesizelib for parsing.
- --max-download-method {ordered,random,smallest,largest}¶
method for selecting files when size limit is exceeded: ‘ordered’ (as returned by provider), ‘random’, ‘smallest’ (smallest files first), ‘largest’ (largest files first) (default: ordered)
- --max-download-method-seed <max_download_method_seed>¶
seed for random file selection when using –max-download-method random (default: 42)
- --placename¶
enable placename lookup using default gazetteer (nominatim — no API key required). Use –placename-service to specify a different gazetteer (geonames requires GEONAMES_USERNAME)
- --placename-service {geonames,nominatim,photon}¶
specify gazetteer service for placename lookup (default: nominatim; requires –placename)
- --placename-escape¶
escape Unicode characters in placename output (requires –placename)
- --text-method {ner,none}¶
text-extraction method for plain-text files (.txt, .md, …) and the –text/stdin inputs. ‘ner’ (default) uses spaCy NER to detect place names, calendar dates, and named time periods, resolving them via the configured place and period gazetteers. ‘none’ disables text extraction (text files fall back to other handlers or are skipped). Requires: pip install geoextent[nlp] for any value other than ‘none’; if spaCy is not installed, the text handler silently declines so existing workflows that happen to include text files keep working.
- --text <string>¶
run text extraction on this literal string (requires –text-method)
- --ner-model <model>¶
spaCy model name for NER (default: en_core_web_sm). The model is auto-downloaded on first use unless –no-auto-download is set.
- --ner-labels <labels>¶
comma-separated entity labels to keep as places (default: LOC,GPE)
- --ner-score-threshold <float>¶
drop NER mentions with score below FLOAT (only used if the model emits per-entity scores; otherwise ignored)
- --ner-gazetteer {geonames,nominatim,photon}¶
gazetteer used to forward-geocode detected place names (default: same as –placename-service if set, else nominatim, which works without an API key or login). Use ‘geonames’ for the GeoNames service (requires GEONAMES_USERNAME env var or .env).
- --ner-ambiguity {drop,top}¶
how to handle ambiguous gazetteer hits: ‘drop’ (skip mentions with multiple candidates, default, defensive) or ‘top’ (keep the highest-ranked candidate)
- --no-auto-download¶
disable automatic spaCy model download on first use
- --period-gazetteer {bundled,none}¶
gazetteer used to resolve named time periods (e.g. ‘Holocene’, ‘Mesozoic Era’) to signed ISO date ranges. ‘bundled’ uses the ICS GTS2020 chronostratigraphic chart shipped with geoextent (default). ‘none’ disables period matching.
- --period-ambiguity {drop,top}¶
how to handle ambiguous period gazetteer hits: ‘drop’ (default, defensive) or ‘top’ (keep the highest-ranked candidate)
- --no-period-resolution¶
disable named time-period matching entirely (still parses DATE/TIME entities via dateutil)
- --no-source-text¶
omit the NFC-normalised source string from text/NER results (opt-out for privacy or to shrink output size; offsets in place_names and date_entities still index into the normalised source the extractor used internally)
- --place-geometry {auto,boundary,point}¶
how to use the gazetteer geometry for matched place names in the spatial extent. ‘auto’ (default) uses the administrative boundary or other areal polygon when the gazetteer provides one (Nominatim does for administrative regions; GeoNames and Photon are point-only), and falls back to the centroid point otherwise. ‘boundary’ is the same but logs a debug message on point-only fallback. ‘point’ forces the centroid lat/lon even when a boundary is available.
- --annotate {auto,ansi,brackets,off}¶
when text/NER inputs are processed, print the source text after the JSON result with matched place names and dates highlighted. ‘ansi’ uses terminal colour, ‘brackets’ uses textual markers (
[[Berlin|place]]), ‘auto’ picks based on TTY, ‘off’ disables. Default: auto.
- --annotate-classes <map>¶
comma-separated overrides for –annotate colours/markers. Example: ‘place=cyan,date=yellow,period=magenta’. Recognised ANSI names: black, red, green, yellow, blue, magenta, cyan, white (bright_* prefix for bold variants).
- --ext-metadata¶
retrieve external metadata for DOIs (title, authors, publisher, publication year, URL, license) from CrossRef and DataCite
- --ext-metadata-method {auto,all,crossref,datacite}¶
method for retrieving external metadata: ‘auto’ (try CrossRef first, then DataCite), ‘all’ (query all sources), ‘crossref’ (CrossRef only), ‘datacite’ (DataCite only) (default: auto)
- --download-skip-nogeo¶
skip downloading files that don’t appear to contain geospatial data (e.g., PDFs, images, plain text)
- --download-skip-nogeo-exts <download_skip_nogeo_exts>¶
comma-separated list of additional file extensions to consider as geospatial (e.g., ‘.xyz,.las,.ply’)
- --max-download-workers <max_download_workers>¶
maximum number of parallel downloads (default: 4, set to 1 to disable parallel downloads)
- --keep-files¶
keep downloaded and extracted files instead of cleaning them up (for debugging purposes)
- --legacy¶
use traditional GIS coordinate order (longitude, latitude) instead of EPSG:4326 native order (latitude, longitude)
- --assume-wgs84¶
assume WGS84 (EPSG:4326) for raster files without projection information (e.g., world files without .prj). By default, ungeoreferenced rasters are skipped.
- -p <workers>, --parallel <workers>¶
enable parallel file extraction within directories. Without a number, uses all available CPU cores. Specify a number (e.g., -p 4) to set worker count. Default: sequential processing.
- --join¶
Join multiple exported files (from –output) into a single file. Requires –output to specify the destination.
Examples¶
Note
Depending on the local configuration, geoextent might need to be called with the python interpreter prepended:
python -m geoextent …
Show help message¶
geoextent -h
geoextent is a Python library for extracting geospatial and temporal extents of a file
or a directory of multiple geospatial data formats.
usage: geoextent [-h] [--formats] [--list-features] [--version] [--debug] [--details] [--output] [output file] [--join] [-b] [-t] [--convex-hull] [--no-download-data] [--no-metadata-fallback] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--browse] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random,smallest,largest}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] [--keep-files] [--assume-wgs84] input1 [input2 ...]
positional arguments:
files input file, directory, DOI, or repository URL
(supports multiple inputs including mixed types). Use
'-' to read text from stdin (requires --text-method).
May be empty when --text STRING is used.
options:
-h, --help show help message and exit
--formats show supported formats
--list-features output machine-readable JSON with all supported file
formats and content providers
--list-periods output the bundled named-time-period gazetteer (ICS
GTS2020) with licensing and provenance metadata.
Useful for downstream UIs and autocomplete widgets.
Default format is JSON; use --list-periods-format to
switch to a plain-text table.
--list-periods-format {json,text}
output format for --list-periods (default: json)
--list-periods-filter SUBSTR
case-insensitive substring match on period name or
alias; only matching periods are listed
--version show installed version
--debug turn on debug logging, alternatively set environment
variable GEOEXTENT_DEBUG=1
--details Returns details of folder/zipFiles geoextent
extraction
--output OUTPUT Export results to a file. Format is auto-detected from
extension: .gpkg (GeoPackage), .geojson/.json
(GeoJSON), .csv (CSV). Works with single files,
directories, and remote sources.
-b, --bounding-box extract spatial extent (bounding box)
-t, --time-box extract temporal extent (%Y-%m-%d)
--time-format FORMAT output format for temporal extents. Presets: 'date'
(%Y-%m-%d, default), 'iso8601' (%Y-%m-%dT%H:%M:%SZ).
Also accepts strftime format strings (e.g. '%Y/%m/%d
%H:%M').
--convex-hull extract convex hull instead of bounding box for vector
geometries
--no-download-data for repositories: disable downloading data files and
use metadata only (not recommended for most providers)
--metadata-first try metadata-only extraction first, fall back to data
download if metadata yields no results (mutually
exclusive with --no-download-data)
--no-metadata-fallback
disable automatic metadata fallback when data download
yields no files (by default, geoextent falls back to
metadata-only extraction if data files are unavailable
and the provider supports metadata)
--no-follow disable following external DOIs/URLs to other
providers (e.g., DEIMS-SDR datasets referencing
Zenodo). By default, geoextent follows these
references to extract actual data extents.
--no-progress disable progress bars during download and extraction
--quiet suppress all console messages including warnings,
progress bars, map preview messages, and terminal
display (--map FILE still saves the image silently)
--format {geojson,wkt,wkb}
output format for spatial extents (default: geojson)
--no-subdirs only process files in the top-level directory, ignore
subdirectories
--geojsonio generate and print a clickable geojson.io URL for the
extracted spatial extent. GeoJSON payloads up to ~150
KB are embedded in the URL fragment; larger payloads
attempt an anonymous GitHub Gist upload which now
requires auth (geojsonio limitation, not geojson.io).
Pair with --convex-hull to keep the payload small.
--browse open the geojson.io URL in the default web browser
(use with --geojsonio to also print URL)
--map [FILE] save a map preview image of the spatial extent as PNG.
If FILE is given, saves to that path; otherwise saves
to a temporary file. (requires: pip install
geoextent[preview])
--preview display a map preview of the spatial extent in the
terminal (requires: pip install geoextent[preview])
--map-dim WxH dimensions of the map preview image in pixels
(default: 600x400)
--no-metadata exclude extraction metadata and statistics from
GeoJSON output
--max-download-size MAX_DOWNLOAD_SIZE
maximum download size limit (e.g. '100MB', '2GB').
Uses filesizelib for parsing.
--max-download-method {ordered,random,smallest,largest}
method for selecting files when size limit is
exceeded: 'ordered' (as returned by provider),
'random', 'smallest' (smallest files first), 'largest'
(largest files first) (default: ordered)
--max-download-method-seed MAX_DOWNLOAD_METHOD_SEED
seed for random file selection when using --max-
download-method random (default: 42)
--placename enable placename lookup using default gazetteer
(nominatim — no API key required). Use --placename-
service to specify a different gazetteer (geonames
requires GEONAMES_USERNAME)
--placename-service GAZETTEER
specify gazetteer service for placename lookup
(default: nominatim; requires --placename)
--placename-escape escape Unicode characters in placename output
(requires --placename)
--text-method {ner,none}
text-extraction method for plain-text files (.txt,
.md, ...) and the --text/stdin inputs. 'ner' (default)
uses spaCy NER to detect place names, calendar dates,
and named time periods, resolving them via the
configured place and period gazetteers. 'none'
disables text extraction (text files fall back to
other handlers or are skipped). Requires: pip install
geoextent[nlp] for any value other than 'none'; if
spaCy is not installed, the text handler silently
declines so existing workflows that happen to include
text files keep working.
--text STRING run text extraction on this literal string (requires
--text-method)
--ner-model MODEL spaCy model name for NER (default: en_core_web_sm).
The model is auto-downloaded on first use unless --no-
auto-download is set.
--ner-labels LABELS comma-separated entity labels to keep as places
(default: LOC,GPE)
--ner-score-threshold FLOAT
drop NER mentions with score below FLOAT (only used if
the model emits per-entity scores; otherwise ignored)
--ner-gazetteer GAZETTEER
gazetteer used to forward-geocode detected place names
(default: same as --placename-service if set, else
nominatim, which works without an API key or login).
Use 'geonames' for the GeoNames service (requires
GEONAMES_USERNAME env var or .env).
--ner-ambiguity {drop,top}
how to handle ambiguous gazetteer hits: 'drop' (skip
mentions with multiple candidates, default, defensive)
or 'top' (keep the highest-ranked candidate)
--no-auto-download disable automatic spaCy model download on first use
--period-gazetteer {bundled,none}
gazetteer used to resolve named time periods (e.g.
'Holocene', 'Mesozoic Era') to signed ISO date ranges.
'bundled' uses the ICS GTS2020 chronostratigraphic
chart shipped with geoextent (default). 'none'
disables period matching.
--period-ambiguity {drop,top}
how to handle ambiguous period gazetteer hits: 'drop'
(default, defensive) or 'top' (keep the highest-ranked
candidate)
--no-period-resolution
disable named time-period matching entirely (still
parses DATE/TIME entities via dateutil)
--no-source-text omit the NFC-normalised source string from text/NER
results (opt-out for privacy or to shrink output size;
offsets in place_names and date_entities still index
into the normalised source the extractor used
internally)
--place-geometry {auto,boundary,point}
how to use the gazetteer geometry for matched place
names in the spatial extent. 'auto' (default) uses the
administrative boundary or other areal polygon when
the gazetteer provides one (Nominatim does for
administrative regions; GeoNames and Photon are point-
only), and falls back to the centroid point otherwise.
'boundary' is the same but logs a debug message on
point-only fallback. 'point' forces the centroid
lat/lon even when a boundary is available.
--annotate {auto,ansi,brackets,off}
when text/NER inputs are processed, print the source
text after the JSON result with matched place names
and dates highlighted. 'ansi' uses terminal colour,
'brackets' uses textual markers
(``[[Berlin|place]]``), 'auto' picks based on TTY,
'off' disables. Default: auto.
--annotate-classes MAP
comma-separated overrides for --annotate
colours/markers. Example:
'place=cyan,date=yellow,period=magenta'. Recognised
ANSI names: black, red, green, yellow, blue, magenta,
cyan, white (bright_* prefix for bold variants).
--ext-metadata retrieve external metadata for DOIs (title, authors,
publisher, publication year, URL, license) from
CrossRef and DataCite
--ext-metadata-method {auto,all,crossref,datacite}
method for retrieving external metadata: 'auto' (try
CrossRef first, then DataCite), 'all' (query all
sources), 'crossref' (CrossRef only), 'datacite'
(DataCite only) (default: auto)
--download-skip-nogeo
skip downloading files that don't appear to contain
geospatial data (e.g., PDFs, images, plain text)
--download-skip-nogeo-exts DOWNLOAD_SKIP_NOGEO_EXTS
comma-separated list of additional file extensions to
consider as geospatial (e.g., '.xyz,.las,.ply')
--max-download-workers MAX_DOWNLOAD_WORKERS
maximum number of parallel downloads (default: 4, set
to 1 to disable parallel downloads)
--keep-files keep downloaded and extracted files instead of
cleaning them up (for debugging purposes)
--legacy use traditional GIS coordinate order (longitude,
latitude) instead of EPSG:4326 native order (latitude,
longitude)
--assume-wgs84 assume WGS84 (EPSG:4326) for raster files without
projection information (e.g., world files without
.prj). By default, ungeoreferenced rasters are
skipped.
-p [WORKERS], --parallel [WORKERS]
enable parallel file extraction within directories.
Without a number, uses all available CPU cores.
Specify a number (e.g., -p 4) to set worker count.
Default: sequential processing.
--join Join multiple exported files (from --output) into a
single file. Requires --output to specify the
destination.
Examples:
geoextent -b path/to/directory_with_geospatial_data
geoextent -t path/to/file_with_temporal_extent
geoextent -b -t path/to/geospatial_files
geoextent -b -t --details path/to/zipfile_with_geospatial_data
geoextent -b -t file1.shp file2.csv file3.geopkg
geoextent -b -t --geojsonio --no-download-data 10.25928/HK1000
geoextent -t *.geojson
geoextent -b -t https://doi.org/10.1594/PANGAEA.918707 https://doi.pangaea.de/10.1594/PANGAEA.858767
geoextent -b --convex-hull https://zenodo.org/record/4567890 10.1594/PANGAEA.123456
geoextent -b --placename file.geojson
geoextent -b --placename --placename-service nominatim https://zenodo.org/record/123456
geoextent -b --placename --placename-service photon --placename-escape https://doi.org/10.3897/BDJ.13.e159973
Supported formats:
- CSV (comma-separated values) (.csv, .txt)
- Vector data (.shp, .shx, .dbf, .prj, .geojson, .json, .gpkg, .gdb, .gpx, .kml, .kmz, .gml, .fgb)
- Raster data (.tif, .tiff, .geotiff, .nc, .netcdf, .asc, .wld, .jgw, .pgw, .pngw, .tfw, .tifw, .bpw, .gfw)
- Point cloud data (.las, .laz)
- Text (NER) (.markdown, .md, .rst, .text, .txt)
Supported data repositories:
- Wikidata (wikidata.org)
- Dryad (datadryad.org)
- 4TU.ResearchData (data.4tu.nl)
- Figshare (figshare.com)
- Zenodo (zenodo.org)
- InvenioRDM (inveniosoftware.org/products/rdm)
- Pangaea (pangaea.de)
- OSF (osf.io)
- Dataverse (dataverse.org)
- GFZ (dataservices.gfz-potsdam.de)
- RADAR (radar-service.eu)
- Arctic Data Center (arcticdata.io)
- DataONE (dataone.org)
- GBIF (gbif.org)
- Pensoft (pensoft.net)
- BGR (geoportal.bgr.de)
- BAW (datenrepository.baw.de)
- MDI-DE (mdi-de.org)
- GDI-DE (geoportal.de)
- Opara (opara.zih.tu-dresden.de)
- Senckenberg (dataportal.senckenberg.de)
- CKAN (ckan.org)
- Mendeley Data (data.mendeley.com)
- DEIMS-SDR (deims.org)
- NFDI4Earth (onestop4all.nfdi4earth.de)
- HALO DB (halo-db.pa.op.dlr.de)
- SEANOE (seanoe.org)
- GeoScienceWorld (pubs.geoscienceworld.org)
- OJS (pkp.sfu.ca/ojs)
- Janeway (janeway.systems)
- UKCEH (catalogue.ceh.ac.uk)
- STAC (stacspec.org)
- GitHub (github.com)
- GitLab (gitlab.com)
- Forgejo (codeberg.org)
- Software Heritage (softwareheritage.org)
- Remote Raster (COG) (cogeo.org)
Extract bounding box from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s, Spatial extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'envelope': [51.94881477206191,
7.6016807556152335,
51.974624029877454,
7.647256851196289],
'geometry': {'type': 'Polygon',
'coordinates': [[[51.94881477206191, 7.6016807556152335],
[51.94881477206191, 7.647256851196289],
[51.974624029877454, 7.647256851196289],
[51.974624029877454, 7.6016807556152335],
[51.94881477206191, 7.6016807556152335]]]},
'convex_hull': False,
'file_size_bytes': 1695}
Extract time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -t muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?it/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'tbox': ['2018-11-14', '2018-11-14'],
'file_size_bytes': 1695}
Extract both bounding box and time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b -t muenster_ring_zeit.geojson
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s, ../tests/testdata/geojson/muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s, Spatial extent extracted]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?it/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handle_vector',
'envelope': [51.94881477206191,
7.6016807556152335,
51.974624029877454,
7.647256851196289],
'geometry': {'type': 'Polygon',
'coordinates': [[[51.94881477206191, 7.6016807556152335],
[51.94881477206191, 7.647256851196289],
[51.974624029877454, 7.647256851196289],
[51.974624029877454, 7.6016807556152335],
[51.94881477206191, 7.6016807556152335]]]},
'convex_hull': False,
'tbox': ['2018-11-14', '2018-11-14'],
'file_size_bytes': 1695}
Folders or ZIP files(s)¶
Geoextent also supports queries for multiple files inside folders or ZIP file(s).
Extract both bounding box and time interval from a folder or zipfile¶
geoextent -b -t folder_two_files
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s]
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s, Processing muenster_ring_zeit.geojson]
Processing directory: folder_two_files: 50%|█████ | 1/2 [00:00<00:00, 19.04item/s, Processing districtes.geojson]
Merging results: 0it [00:00, ?it/s]
Merging results: 0it [00:00, ?it/s, folder_two_files]
{'format': 'folder',
'envelope': [41.31703852240476,
2.052333387639205,
51.974624029877454,
7.647256851196289],
'geometry': {'type': 'Polygon',
'coordinates': [[[41.31703852240476, 2.052333387639205],
[41.31703852240476, 7.647256851196289],
[51.974624029877454, 7.647256851196289],
[51.974624029877454, 2.052333387639205],
[41.31703852240476, 2.052333387639205]]]},
'convex_hull': False,
'tbox': ['2018-11-14', '2019-09-11']}
The output of this function is the combined extent (envelope + geometry) and/or tbox resulting from merging all results of individual files (see: Supported file formats) inside the folder or zipfile. All output is in EPSG: 4326; no crs field is emitted.
Multiple Inputs¶
Geoextent supports processing multiple files and/or directories in a single command. Results are merged into a single spatial and temporal extent.
Extract merged bounding box from multiple files¶
geoextent -b file1.geojson file2.csv file3.gpkg
Extract merged extent from files and directories¶
geoextent -b -t tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files
Extract convex hull from multiple files¶
geoextent -b --convex-hull tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/folders/folder_two_files/districtes.geojson tests/testdata/csv/cities_NL.csv
Use --details to see per-file results alongside the merged extent:
geoextent -b -t --details tests/testdata/geojson/muenster_ring_zeit.geojson tests/testdata/csv/cities_NL.csv
Remote Repositories¶
Geoextent supports extracting geospatial extent from multiple research data repositories including Zenodo, PANGAEA, OSF, Figshare, Dryad, GFZ Data Services, RADAR, Arctic Data Center, 4TU.ResearchData, B2SHARE, BAW, MDI-DE, GDI-DE, DEIMS-SDR, NFDI4Earth, GBIF, Dataverse, Pensoft, and GitHub repositories.
Extract from Zenodo¶
geoextent -b -t https://doi.org/10.5281/zenodo.4593540
Extract from PANGAEA¶
geoextent -b -t https://doi.org/10.1594/PANGAEA.734969
Extract from OSF¶
geoextent -b -t https://doi.org/10.17605/OSF.IO/4XE6Z
geoextent -b -t OSF.IO/4XE6Z
Extract from GFZ Data Services¶
geoextent -b -t 10.5880/GFZ.4.8.2023.004
Extract from RADAR¶
geoextent -b -t 10.35097/tvn5vujqfvf99f32
Extract from Arctic Data Center¶
geoextent -b -t 10.18739/A2Z892H2J
Extract from Arctic Data Center (metadata only)¶
geoextent -b --no-download-data 10.18739/A2Z892H2J
Extract from 4TU.ResearchData¶
geoextent -b -t https://data.4tu.nl/articles/_/12707150/1
Extract from 4TU.ResearchData (metadata only)¶
geoextent -b --no-download-data https://data.4tu.nl/articles/_/12707150/1
Extract from BAW-Datenrepository (landing page URL)¶
geoextent -b -t --no-data-download https://datenrepository.baw.de/trefferanzeige?docuuid=40936F66-3DD8-43D0-99AE-7CA5EF2E1287
Extract from BAW-Datenrepository (DOI, small measurement site)¶
geoextent -b -t --no-data-download 10.48437/02.2023.K.0601.0001
Extract from BAW-Datenrepository (DOI, sedimentology dataset)¶
geoextent -b -t --no-data-download 10.48437/929835b7fca4
Extract from MDI-DE (metadata only)¶
geoextent -b -t --no-download-data https://nokis.mdi-de-dienste.org/trefferanzeige?docuuid=00100e9d-7838-4563-9dd7-2570b0d932cb
Extract from MDI-DE (direct download)¶
geoextent -b -t https://nokis.mdi-de-dienste.org/trefferanzeige?docuuid=00100e9d-7838-4563-9dd7-2570b0d932cb
Extract from MDI-DE (WFS download, bare UUID)¶
geoextent -b -t c7d748c9-e12f-4038-a556-b1698eb4033e
Extract from GDI-DE (metadata only, geoportal.de URL)¶
geoextent -b -t --no-download-data https://www.geoportal.de/Metadata/75987CE0-AA66-4445-AC44-068B98390E89
Extract from GDI-DE (metadata only, bare UUID)¶
geoextent -b -t --no-download-data cdb2c209-7e08-4f4c-b500-69de926e3023
Extract from DEIMS-SDR (dataset)¶
geoextent -b -t https://deims.org/dataset/3d87da8b-2b07-41c7-bf05-417832de4fa2
Extract from DEIMS-SDR (site)¶
geoextent -b https://deims.org/8eda49e9-1f4e-4f3e-b58e-e0bb25dc32a6
Extract from GBIF (metadata only, by DOI)¶
geoextent -b -t --no-download-data 10.15468/6bleia
Extract from GBIF (metadata only, by dataset URL)¶
geoextent -b --no-download-data https://www.gbif.org/dataset/378651d7-c235-4205-a617-2939d6faa434
Extract from GBIF (DwC-A data download)¶
geoextent -b -t 10.15468/6bleia
Extract from GBIF with geojson.io preview¶
geoextent -b --geojsonio --no-download-data 10.15472/lavgys
Extract from SEANOE (metadata only, French Mediterranean CTD)¶
geoextent -b -t --no-download-data 10.17882/105467
Extract from SEANOE (data download, Ireland coastline)¶
geoextent -b 10.17882/109463
Extract from SEANOE (whale biologging with geojson.io preview)¶
geoextent -b -t --geojsonio --no-download-data 10.17882/112127
Extract from DEIMS-SDR without following external references¶
By default, DEIMS-SDR datasets that reference external repositories (e.g., Zenodo, PANGAEA) are followed for actual data extent extraction. Use --no-follow to disable this and use DEIMS metadata only:
geoextent -b -t --no-follow https://deims.org/dataset/3d87da8b-2b07-41c7-bf05-417832de4fa2
Extract from NFDI4Earth Knowledge Hub (OneStop4All URL)¶
geoextent -b -t https://onestop4all.nfdi4earth.de/result/dthb-7b3bddd5af4945c2ac508a6d25537f0a/
Extract from NFDI4Earth Knowledge Hub (Cordra URL)¶
geoextent -b https://cordra.knowledgehub.nfdi4earth.de/objects/n4e/dthb-82b6552d-2b8e-4800-b955-ea495efc28af
Extract from NFDI4Earth without following landing page¶
By default, NFDI4Earth datasets with a landingPage URL are followed to other supported providers. Use --no-follow to disable this and use NFDI4Earth SPARQL metadata only:
geoextent -b -t --no-follow https://onestop4all.nfdi4earth.de/result/dthb-82b6552d-2b8e-4800-b955-ea495efc28af/
Extract from GitHub¶
geoextent -b https://github.com/fraxen/tectonicplates
Extract from a specific subdirectory:
geoextent -b https://github.com/Nowosad/spDataLarge/tree/master/inst/raster
Skip non-geospatial files (recommended for repos with many non-geo files):
geoextent -b --download-skip-nogeo https://github.com/fraxen/tectonicplates
Extract from Software Heritage¶
geoextent -b --download-skip-nogeo "https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/AWMC/geodata&path=Cultural-Data/political_shading/hasmonean"
Extract from a directory SWHID:
geoextent -b --download-skip-nogeo swh:1:dir:92890dbe77bbe36ccba724673bc62c2764df4f5a
Extract from a Remote GeoTIFF (COG)¶
Extract extent directly from a remote Cloud Optimized GeoTIFF (COG) URL — only the file header is downloaded:
geoextent -b https://raw.githubusercontent.com/GeoTIFF/test-data/main/files/gfw-azores.tif
Extract with temporal extent:
geoextent -b -t https://zenodo.org/records/14711942/files/FSM_1-km_MED-epsg.4326_v01.tif
Smart metadata-first extraction¶
Use --metadata-first to try metadata-only extraction first, falling back to data download if the provider has no metadata or the metadata didn’t yield results. This is useful for batch extractions across multiple providers:
geoextent -b --metadata-first 10.12761/sgn.2018.10225
geoextent -b --metadata-first Q64
Extract from GEO Knowledge Hub (automatic metadata fallback)¶
Some providers (e.g., GEO Knowledge Hub packages) have data files disabled. Geoextent automatically falls back to metadata-only extraction when this happens:
geoextent -b https://gkhub.earthobservations.org/packages/msaw9-hzd25
To disable the automatic fallback, use --no-metadata-fallback:
geoextent -b --no-metadata-fallback https://gkhub.earthobservations.org/packages/msaw9-hzd25
Extract from three German regional datasets with a convex hull — Wikidata (Berlin), 4TU (Dresden), and Senckenberg all use fast metadata extraction, producing a compact convex hull over central Germany:
geoextent -b --convex-hull --metadata-first Q64 https://data.4tu.nl/datasets/3035126d-ee51-4dbd-a187-5f6b0be85e9f/1 10.12761/sgn.2018.10225
Download size limits¶
Use --max-download-size to cap how much data geoextent will download from a repository. The value accepts human-friendly size strings (parsed by filesizelib):
Format |
Meaning |
|---|---|
|
100 megabytes (decimal) |
|
2 gigabytes (decimal) |
|
500 kilobytes (decimal) |
|
10 mebibytes (binary) |
|
0.5 gibibytes (binary) |
|
1.5 terabytes (decimal) |
When the total download exceeds the limit, the CLI prompts for confirmation instead of silently truncating the file list. This works for all providers whose APIs report file sizes before download:
Zenodo: the download is approximately 45.2 MB (limit is 20 MB).
Proceed with download? [y/N]
Answering y retries with the actual size as the new limit. In non-interactive contexts (scripts, CI pipelines), geoextent exits with an error. To avoid the prompt entirely, use --no-download-data for metadata-only extraction or set a sufficiently large --max-download-size.
Note
The interactive prompt relies on providers reporting file sizes in their API metadata before download. Metadata-only providers (DEIMS-SDR, NFDI4Earth, HALO DB, Wikidata, Pensoft) do not download data files, so the size limit does not apply to them.
# Download at most 20 MB of data
geoextent -b -t --max-download-size 20MB 10.23728/b2share.26jnj-a4x24
# Limit GBIF DwC-A download to 500 MB
geoextent -b -t --max-download-size 500MB 10.15468/6bleia
# Use binary units
geoextent -b --max-download-size 0.5GiB 10.5281/zenodo.4593540
For GBIF datasets, Darwin Core Archive (DwC-A) downloads have an additional built-in soft limit of 1 GB. When a DwC-A archive exceeds this limit (or the --max-download-size value, whichever is smaller), the CLI also prompts interactively.
You can trigger this prompt intentionally by setting a very small limit:
$ geoextent -b --max-download-size 1KB 10.5281/zenodo.820562
Zenodo: the download is approximately 2.3 MB (limit is 0 MB).
Proceed with download? [y/N] N
Answering N (or pressing Enter) cancels the download and produces no output.
Comparing extraction modes: metadata, download, and convex hull¶
The following three calls on an Arctic Data Center dataset of ice wedge thermokarst polygons at Point Lay, Alaska illustrate how --no-download-data and --convex-hull affect the output geometry.
1. Metadata-only extraction (--no-download-data): Uses the bounding box stored in the repository metadata — fast, no file downloads. The bbox is slightly larger because it comes from the dataset-level metadata rather than the actual geometries:
geoextent -b -t --no-download-data 10.18739/A2Z892H2J
Output bbox: [-163.049, 69.721, -162.935, 69.760] with tbox [1949-01-01, 2020-01-01].
View on geojson.io
2. Full download extraction (default): Downloads the 2 GeoJSON files (1.6 MB) and computes the merged bounding box from the actual feature geometries — tighter than metadata:
geoextent -b -t 10.18739/A2Z892H2J
Output bbox: [-163.027, 69.723, -162.931, 69.751].
View on geojson.io
3. Convex hull extraction (--convex-hull): Downloads the same files but computes a convex hull around all feature vertices instead of an axis-aligned bounding box — most precise representation of the data footprint:
geoextent -b -t --convex-hull 10.18739/A2Z892H2J
The three modes yield progressively tighter representations: metadata envelope > download envelope > convex hull. Use --no-download-data for speed when approximate extents suffice, or --convex-hull for the most faithful footprint of the actual data.
The output of this function is the combined envelope/geometry and/or tbox resulting from merging all results of individual files (see: Supported file formats) inside the repository. All output is in EPSG: 4326; no crs field is emitted.
For comprehensive examples including all supported repositories and advanced features, see Examples.
Parallel extraction¶
Use -p / --parallel to extract extents from files within a directory in parallel using multiple threads. This speeds up processing of directories with many geodata files:
# Auto-detect CPU count
geoextent -p -b -t path/to/directory
# Use 4 workers
geoextent -p 4 -b -t path/to/directory
# Parallel extraction from a remote repository
geoextent -p -b -t https://doi.org/10.5281/zenodo.4593540
Without -p, files are processed sequentially (the default).
Note
geoextent extracts spatial extents by reading file headers, which is very fast (a few milliseconds per file regardless of file size). Parallel extraction helps most when a directory contains many files (tens or more), where the per-file I/O latency adds up. For directories with only a few files, sequential processing is already fast and -p provides little benefit.
Debugging¶
You can enable detailed logs by passing the --debug option, or by setting the environment variable GEOEXTENT_DEBUG=1.
geoextent --debug -b -t muenster_ring_zeit.geojson
GEOEXTENT_DEBUG=1 geoextent -b -t muenster_ring_zeit.geojson
Details¶
You can enable details for folders and ZIP files by passing the --details option, this option allows you to access
to the geoextent of the individual files inside the folders/ ZIP files used to compute the aggregated bounding box (bbox)
or time box (tbox).
geoextent --details -b -t folder_one_file
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s]
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s, Processing muenster_ring_zeit.geojson]
Merging results: 0it [00:00, ?it/s]
Merging results: 0it [00:00, ?it/s, folder_one_file]
{'format': 'folder',
'envelope': [51.94881477206191,
7.6016807556152335,
51.974624029877454,
7.647256851196289],
'geometry': {'type': 'Polygon',
'coordinates': [[[51.94881477206191, 7.608118057250977],
[51.953258408047034, 7.602796554565429],
[51.96537036973145, 7.6016807556152335],
[51.97361943924433, 7.606401443481445],
[51.974624029877454, 7.62125015258789],
[51.97240332571046, 7.636871337890624],
[51.96817310852836, 7.645368576049805],
[51.96780294552556, 7.645540237426757],
[51.96330786509095, 7.6471710205078125],
[51.95807185013927, 7.647256851196289],
[51.953258408047034, 7.643308639526367],
[51.94881477206191, 7.608118057250977]]]},
'convex_hull': True,
'tbox': ['2018-11-14', '2018-11-14']}
Map preview¶
Generate a map preview to a temporary file (requires pip install geoextent[preview]):
geoextent --map -b muenster_ring_zeit.geojson
Save the map to a specific file:
geoextent -b --map extent.png muenster_ring_zeit.geojson
Display the map directly in the terminal:
geoextent -b --preview muenster_ring_zeit.geojson
Save to a specific file and display in the terminal:
geoextent -b --map extent.png --preview muenster_ring_zeit.geojson
Customize the image dimensions (default: 600x400):
geoextent -b --map extent.png --map-dim 800x600 muenster_ring_zeit.geojson
The path of the saved map is always printed to stderr (suppressed by --quiet).
For more details on map preview options, see Core Features.
Export to file¶
Export extraction results to a file. The format is auto-detected from the file extension:
Single file to GeoPackage:
geoextent -b -t --output result.gpkg tests/testdata/geojson/muenster_ring_zeit.geojson
Directory to GeoJSON:
geoextent -b -t --output result.geojson tests/testdata/folders/folder_two_files
Multiple files to CSV:
geoextent -b -t --output result.csv file1.shp file2.geojson
Convex hull geometry:
geoextent -b --convex-hull --output hull.gpkg tests/testdata/folders/folder_two_files
CSV with WKB geometry (via –format):
geoextent -b --format wkb --output result.csv tests/testdata/folders/folder_two_files
For more details on export options, see Core Features.
Join export files¶
Merge multiple exported files into a single file. Summary rows are excluded — only individual-file features are kept. Input files can be any supported format; the output format is auto-detected from the extension:
geoextent --join --output merged.gpkg run1.gpkg run2.gpkg
Cross-format join (GeoJSON + GPKG -> CSV):
geoextent --join --output combined.csv run1.geojson run2.gpkg
For more details, see Core Features.