Command-Line Interface (CLI)

Basics

geoextent can be called on the command line with this command :

usage: geoextent [-h] [--formats] [--version] [--debug] [--details] [--output] [output file] [-b] [-t] [--convex-hull] [--no-download-data] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] input1 [input2 ...]
files

input file, directory, DOI, or repository URL (supports multiple inputs including mixed types)

-h, --help

show help message and exit

--formats

show supported formats

--version

show installed version

--debug

turn on debug logging, alternatively set environment variable GEOEXTENT_DEBUG=1

--details

Returns details of folder/zipFiles geoextent extraction

--output <output>

Creates geopackage with geoextent output

-b, --bounding-box

extract spatial extent (bounding box)

-t, --time-box

extract temporal extent (%Y-%m-%d)

--convex-hull

extract convex hull instead of bounding box for vector geometries

--no-download-data

for repositories: disable downloading data files and use metadata only (not recommended for most providers)

--no-progress

disable progress bars during download and extraction

--quiet

suppress all console messages including warnings and progress bars

--format {geojson,wkt,wkb}

output format for spatial extents (default: geojson)

--no-subdirs

only process files in the top-level directory, ignore subdirectories

--geojsonio

generate and print a clickable geojson.io URL for the extracted spatial extent

--max-download-size <max_download_size>

maximum download size limit (e.g. ‘100MB’, ‘2GB’). Uses filesizelib for parsing.

--max-download-method {ordered,random}

method for selecting files when size limit is exceeded (default: ordered)

--max-download-method-seed <max_download_method_seed>

seed for random file selection when using –max-download-method random (default: 42)

--placename

enable placename lookup using default gazetteer (geonames). Use –placename-service to specify a different gazetteer

--placename-service {geonames,nominatim,photon}

specify gazetteer service for placename lookup (requires –placename)

--placename-escape

escape Unicode characters in placename output (requires –placename)

--download-skip-nogeo

skip downloading files that don’t appear to contain geospatial data (e.g., PDFs, images, plain text)

--download-skip-nogeo-exts <download_skip_nogeo_exts>

comma-separated list of additional file extensions to consider as geospatial (e.g., ‘.xyz,.las,.ply’)

--max-download-workers <max_download_workers>

maximum number of parallel downloads (default: 4, set to 1 to disable parallel downloads)

Examples

Note

Depending on the local configuration, geoextent might need to be called with the python interpreter prepended:

python -m geoextent …

Show help message

geoextent -h

geoextent is a Python library for extracting geospatial and temporal extents of a file
 or a directory of multiple geospatial data formats.

usage: geoextent [-h] [--formats] [--version] [--debug] [--details] [--output] [output file] [-b] [-t] [--convex-hull] [--no-download-data] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] input1 [input2 ...]

positional arguments:
  files                 input file, directory, DOI, or repository URL
                        (supports multiple inputs including mixed types)

options:
  -h, --help            show help message and exit
  --formats             show supported formats
  --version             show installed version
  --debug               turn on debug logging, alternatively set environment
                        variable GEOEXTENT_DEBUG=1
  --details             Returns details of folder/zipFiles geoextent
                        extraction
  --output OUTPUT       Creates geopackage with geoextent output
  -b, --bounding-box    extract spatial extent (bounding box)
  -t, --time-box        extract temporal extent (%Y-%m-%d)
  --convex-hull         extract convex hull instead of bounding box for vector
                        geometries
  --no-download-data    for repositories: disable downloading data files and
                        use metadata only (not recommended for most providers)
  --no-progress         disable progress bars during download and extraction
  --quiet               suppress all console messages including warnings and
                        progress bars
  --format {geojson,wkt,wkb}
                        output format for spatial extents (default: geojson)
  --no-subdirs          only process files in the top-level directory, ignore
                        subdirectories
  --geojsonio           generate and print a clickable geojson.io URL for the
                        extracted spatial extent
  --max-download-size MAX_DOWNLOAD_SIZE
                        maximum download size limit (e.g. '100MB', '2GB').
                        Uses filesizelib for parsing.
  --max-download-method {ordered,random}
                        method for selecting files when size limit is exceeded
                        (default: ordered)
  --max-download-method-seed MAX_DOWNLOAD_METHOD_SEED
                        seed for random file selection when using --max-
                        download-method random (default: 42)
  --placename           enable placename lookup using default gazetteer
                        (geonames). Use --placename-service to specify a
                        different gazetteer
  --placename-service GAZETTEER
                        specify gazetteer service for placename lookup
                        (requires --placename)
  --placename-escape    escape Unicode characters in placename output
                        (requires --placename)
  --download-skip-nogeo
                        skip downloading files that don't appear to contain
                        geospatial data (e.g., PDFs, images, plain text)
  --download-skip-nogeo-exts DOWNLOAD_SKIP_NOGEO_EXTS
                        comma-separated list of additional file extensions to
                        consider as geospatial (e.g., '.xyz,.las,.ply')
  --max-download-workers MAX_DOWNLOAD_WORKERS
                        maximum number of parallel downloads (default: 4, set
                        to 1 to disable parallel downloads)


Examples:

geoextent -b path/to/directory_with_geospatial_data
geoextent -t path/to/file_with_temporal_extent
geoextent -b -t path/to/geospatial_files
geoextent -b -t --details path/to/zipfile_with_geospatial_data
geoextent -b -t file1.shp file2.csv file3.geopkg
geoextent -t *.geojson
geoextent -b -t https://doi.org/10.1594/PANGAEA.918707 https://doi.pangaea.de/10.1594/PANGAEA.858767
geoextent -b --convex-hull https://zenodo.org/record/4567890 10.1594/PANGAEA.123456
geoextent -b --placename file.geojson
geoextent -b --placename --placename-service nominatim https://zenodo.org/record/123456
geoextent -b --placename --placename-service photon --placename-escape https://doi.org/10.3897/BDJ.13.e159973


Supported formats:
- GeoJSON (.geojson)
- Tabular data (.csv)
- GeoTIFF (.geotiff, .tif)
- Shapefile (.shp)
- GeoPackage (.gpkg)
- GPS Exchange Format (.gpx)
- Geography Markup Language (.gml)
- Keyhole Markup Language (.kml)
- FlatGeobuf (.fgb)

Supported data repositories:
- Zenodo (zenodo.org)
- Dryad (datadryad.org)
- Figshare (figshare.com)
- PANGAEA (pangaea.de)
- OSF (osf.io)
- GFZ Data Services (dataservices.gfz-potsdam.de)
- Pensoft Journals (e.g., bdj.pensoft.net)


Extract bounding box from a single file

Note

You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.

geoextent -b muenster_ring_zeit.geojson

Output:

Processing muenster_ring_zeit.geojson:   0%|          | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson:   0%|          | 0/1 [00:00<?, ?task/s, Spatial extent extracted]
                                                                                                        

{'format': 'geojson',
 'geoextent_handler': 'handleVector',
 'bbox': [7.6016807556152335,
  51.94881477206191,
  7.647256851196289,
  51.974624029877454],
 'crs': '4326'}

Extract time interval from a single file

Note

You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.

geoextent -t muenster_ring_zeit.geojson

Output:

Processing muenster_ring_zeit.geojson:   0%|          | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson:   0%|          | 0/1 [00:00<?, ?task/s, Temporal extent extracted]
                                                                                                         

{'format': 'geojson',
 'geoextent_handler': 'handleVector',
 'tbox': ['2018-11-14', '2018-11-14']}

Extract both bounding box and time interval from a single file

Note

You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.

geoextent -b -t muenster_ring_zeit.geojson
Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s, Spatial extent extracted]
Processing muenster_ring_zeit.geojson:  50%|█████     | 1/2 [00:00<00:00, 134.34task/s, Temporal extent extracted]
                                                                                                                  

{'format': 'geojson',
 'geoextent_handler': 'handleVector',
 'bbox': [7.6016807556152335,
  51.94881477206191,
  7.647256851196289,
  51.974624029877454],
 'crs': '4326',
 'tbox': ['2018-11-14', '2018-11-14']}

Folders or ZIP files(s)

Geoextent also supports queries for multiple files inside folders or ZIP file(s).

Extract both bounding box and time interval from a folder or zipfile

geoextent -b -t folder_two_files
Processing directory: folder_two_files:   0%|          | 0/2 [00:00<?, ?item/s]
Processing directory: folder_two_files:   0%|          | 0/2 [00:00<?, ?item/s, Processing districtes.geojson]

Processing districtes.geojson:   0%|          | 0/2 [00:00<?, ?task/s]


Processing districtes.geojson:   0%|          | 0/2 [00:00<?, ?task/s, Spatial extent extracted]


Processing districtes.geojson:  50%|█████     | 1/2 [00:00<00:00, 21.76task/s, Temporal extent extracted]


                                                                                                         

Processing directory: folder_two_files:  50%|█████     | 1/2 [00:00<00:00, 11.43item/s, Processing muenster_ring_zeit.geojson]

Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s]


Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s, Spatial extent extracted]


Processing muenster_ring_zeit.geojson:  50%|█████     | 1/2 [00:00<00:00, 109.89task/s, Temporal extent extracted]


                                                                                                                  

Processing directory: folder_two_files: 100%|██████████| 2/2 [00:00<00:00, 18.90item/s, Processing muenster_ring_zeit.geojson]
Processing directory: folder_two_files: 100%|██████████| 2/2 [00:00<00:00, 18.79item/s, Processing muenster_ring_zeit.geojson]

{'format': 'folder',
 'crs': '4326',
 'bbox': [2.052333387639205,
  41.31703852240476,
  7.647256851196289,
  51.974624029877454],
 'tbox': ['2018-11-14', '2019-09-11']}

The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the folder or zipfile. The resulting coordinate reference system CRS of the combined bbox is always in the EPSG: 4326 system.

Remote Repositories

Geoextent supports extracting geospatial extent from multiple research data repositories including Zenodo, PANGAEA, OSF, Figshare, Dryad, GFZ Data Services, Dataverse, and Pensoft.

Extract from Zenodo

geoextent -b -t https://doi.org/10.5281/zenodo.4593540

Extract from PANGAEA

geoextent -b -t https://doi.org/10.1594/PANGAEA.734969

Extract from OSF

geoextent -b -t https://doi.org/10.17605/OSF.IO/4XE6Z
geoextent -b -t OSF.IO/4XE6Z

Extract from GFZ Data Services

geoextent -b -t 10.5880/GFZ.4.8.2023.004

The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the repository. The resulting coordinate reference system CRS of the combined bbox is always in the EPSG: 4326 system.

For comprehensive examples including all supported repositories and advanced features, see Examples.

Debugging

You can enable detailed logs by passing the --debug option, or by setting the environment variable GEOEXTENT_DEBUG=1.

geoextent --debug -b -t muenster_ring_zeit.geojson

GEOEXTENT_DEBUG=1 geoextent -b -t muenster_ring_zeit.geojson

Details

You can enable details for folders and ZIP files by passing the --details option, this option allows you to access to the geoextent of the individual files inside the folders/ ZIP files used to compute the aggregated bounding box (bbox) or time box (tbox).

geoextent --details -b -t folder_one_file
Processing directory: folder_one_file:   0%|          | 0/1 [00:00<?, ?item/s]
Processing directory: folder_one_file:   0%|          | 0/1 [00:00<?, ?item/s, Processing muenster_ring_zeit.geojson]

Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s]


Processing muenster_ring_zeit.geojson:   0%|          | 0/2 [00:00<?, ?task/s, Spatial extent extracted]


Processing muenster_ring_zeit.geojson:  50%|█████     | 1/2 [00:00<00:00, 126.22task/s, Temporal extent extracted]


                                                                                                                  

Processing directory: folder_one_file: 100%|██████████| 1/1 [00:00<00:00, 57.89item/s, Processing muenster_ring_zeit.geojson]

{'format': 'folder',
 'crs': '4326',
 'bbox': {'type': 'Polygon',
  'coordinates': [[[7.608118057250977, 51.94881477206191],
    [7.602796554565429, 51.953258408047034],
    [7.6016807556152335, 51.96537036973145],
    [7.606401443481445, 51.97361943924433],
    [7.62125015258789, 51.974624029877454],
    [7.636871337890624, 51.97240332571046],
    [7.645368576049805, 51.96817310852836],
    [7.645540237426757, 51.96780294552556],
    [7.6471710205078125, 51.96330786509095],
    [7.647256851196289, 51.95807185013927],
    [7.643308639526367, 51.953258408047034],
    [7.608118057250977, 51.94881477206191]]]},
 'convex_hull': True,
 'tbox': ['2018-11-14', '2018-11-14']}

Export function

You can export the result of Geoextent to a Geopackage file. This file contains the output of all files within the folder or repository.

geoextent -b -t --output path/to/output/geopackage_file.gpkg folder_path