Command-Line Interface (CLI)¶
Basics¶
geoextent
can be called on the command line with this command :
¶
usage: geoextent [-h] [--formats] [--version] [--debug] [--details] [--output] [output file] [-b] [-t] [--convex-hull] [--no-download-data] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] input1 [input2 ...]
- files¶
input file, directory, DOI, or repository URL (supports multiple inputs including mixed types)
- -h, --help¶
show help message and exit
- --formats¶
show supported formats
- --version¶
show installed version
- --debug¶
turn on debug logging, alternatively set environment variable GEOEXTENT_DEBUG=1
- --details¶
Returns details of folder/zipFiles geoextent extraction
- --output <output>¶
Creates geopackage with geoextent output
- -b, --bounding-box¶
extract spatial extent (bounding box)
- -t, --time-box¶
extract temporal extent (%Y-%m-%d)
- --convex-hull¶
extract convex hull instead of bounding box for vector geometries
- --no-download-data¶
for repositories: disable downloading data files and use metadata only (not recommended for most providers)
- --no-progress¶
disable progress bars during download and extraction
- --quiet¶
suppress all console messages including warnings and progress bars
- --format {geojson,wkt,wkb}¶
output format for spatial extents (default: geojson)
- --no-subdirs¶
only process files in the top-level directory, ignore subdirectories
- --geojsonio¶
generate and print a clickable geojson.io URL for the extracted spatial extent
- --max-download-size <max_download_size>¶
maximum download size limit (e.g. ‘100MB’, ‘2GB’). Uses filesizelib for parsing.
- --max-download-method {ordered,random}¶
method for selecting files when size limit is exceeded (default: ordered)
- --max-download-method-seed <max_download_method_seed>¶
seed for random file selection when using –max-download-method random (default: 42)
- --placename¶
enable placename lookup using default gazetteer (geonames). Use –placename-service to specify a different gazetteer
- --placename-service {geonames,nominatim,photon}¶
specify gazetteer service for placename lookup (requires –placename)
- --placename-escape¶
escape Unicode characters in placename output (requires –placename)
- --download-skip-nogeo¶
skip downloading files that don’t appear to contain geospatial data (e.g., PDFs, images, plain text)
- --download-skip-nogeo-exts <download_skip_nogeo_exts>¶
comma-separated list of additional file extensions to consider as geospatial (e.g., ‘.xyz,.las,.ply’)
- --max-download-workers <max_download_workers>¶
maximum number of parallel downloads (default: 4, set to 1 to disable parallel downloads)
Examples¶
Note
Depending on the local configuration, geoextent might need to be called with the python interpreter prepended:
python -m geoextent …
Show help message¶
geoextent -h
geoextent is a Python library for extracting geospatial and temporal extents of a file
or a directory of multiple geospatial data formats.
usage: geoextent [-h] [--formats] [--version] [--debug] [--details] [--output] [output file] [-b] [-t] [--convex-hull] [--no-download-data] [--no-progress] [--quiet] [--format {geojson,wkt,wkb}] [--no-subdirs] [--geojsonio] [--placename] [--placename-service GAZETTEER] [--placename-escape] [--max-download-size SIZE] [--max-download-method {ordered,random}] [--max-download-method-seed SEED] [--download-skip-nogeo] [--download-skip-nogeo-exts EXTS] [--max-download-workers WORKERS] input1 [input2 ...]
positional arguments:
files input file, directory, DOI, or repository URL
(supports multiple inputs including mixed types)
options:
-h, --help show help message and exit
--formats show supported formats
--version show installed version
--debug turn on debug logging, alternatively set environment
variable GEOEXTENT_DEBUG=1
--details Returns details of folder/zipFiles geoextent
extraction
--output OUTPUT Creates geopackage with geoextent output
-b, --bounding-box extract spatial extent (bounding box)
-t, --time-box extract temporal extent (%Y-%m-%d)
--convex-hull extract convex hull instead of bounding box for vector
geometries
--no-download-data for repositories: disable downloading data files and
use metadata only (not recommended for most providers)
--no-progress disable progress bars during download and extraction
--quiet suppress all console messages including warnings and
progress bars
--format {geojson,wkt,wkb}
output format for spatial extents (default: geojson)
--no-subdirs only process files in the top-level directory, ignore
subdirectories
--geojsonio generate and print a clickable geojson.io URL for the
extracted spatial extent
--max-download-size MAX_DOWNLOAD_SIZE
maximum download size limit (e.g. '100MB', '2GB').
Uses filesizelib for parsing.
--max-download-method {ordered,random}
method for selecting files when size limit is exceeded
(default: ordered)
--max-download-method-seed MAX_DOWNLOAD_METHOD_SEED
seed for random file selection when using --max-
download-method random (default: 42)
--placename enable placename lookup using default gazetteer
(geonames). Use --placename-service to specify a
different gazetteer
--placename-service GAZETTEER
specify gazetteer service for placename lookup
(requires --placename)
--placename-escape escape Unicode characters in placename output
(requires --placename)
--download-skip-nogeo
skip downloading files that don't appear to contain
geospatial data (e.g., PDFs, images, plain text)
--download-skip-nogeo-exts DOWNLOAD_SKIP_NOGEO_EXTS
comma-separated list of additional file extensions to
consider as geospatial (e.g., '.xyz,.las,.ply')
--max-download-workers MAX_DOWNLOAD_WORKERS
maximum number of parallel downloads (default: 4, set
to 1 to disable parallel downloads)
Examples:
geoextent -b path/to/directory_with_geospatial_data
geoextent -t path/to/file_with_temporal_extent
geoextent -b -t path/to/geospatial_files
geoextent -b -t --details path/to/zipfile_with_geospatial_data
geoextent -b -t file1.shp file2.csv file3.geopkg
geoextent -t *.geojson
geoextent -b -t https://doi.org/10.1594/PANGAEA.918707 https://doi.pangaea.de/10.1594/PANGAEA.858767
geoextent -b --convex-hull https://zenodo.org/record/4567890 10.1594/PANGAEA.123456
geoextent -b --placename file.geojson
geoextent -b --placename --placename-service nominatim https://zenodo.org/record/123456
geoextent -b --placename --placename-service photon --placename-escape https://doi.org/10.3897/BDJ.13.e159973
Supported formats:
- GeoJSON (.geojson)
- Tabular data (.csv)
- GeoTIFF (.geotiff, .tif)
- Shapefile (.shp)
- GeoPackage (.gpkg)
- GPS Exchange Format (.gpx)
- Geography Markup Language (.gml)
- Keyhole Markup Language (.kml)
- FlatGeobuf (.fgb)
Supported data repositories:
- Zenodo (zenodo.org)
- Dryad (datadryad.org)
- Figshare (figshare.com)
- PANGAEA (pangaea.de)
- OSF (osf.io)
- GFZ Data Services (dataservices.gfz-potsdam.de)
- Pensoft Journals (e.g., bdj.pensoft.net)
Extract bounding box from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, Spatial extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handleVector',
'bbox': [7.6016807556152335,
51.94881477206191,
7.647256851196289,
51.974624029877454],
'crs': '4326'}
Extract time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -t muenster_ring_zeit.geojson
Output:
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/1 [00:00<?, ?task/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handleVector',
'tbox': ['2018-11-14', '2018-11-14']}
Extract both bounding box and time interval from a single file¶
Note
You can find the file used in the examples of this section from muenster_ring_zeit. Furthermore, for displaying the rendering of the file contents, see rendered blob.
geoextent -b -t muenster_ring_zeit.geojson
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s, Spatial extent extracted]
Processing muenster_ring_zeit.geojson: 50%|█████ | 1/2 [00:00<00:00, 134.34task/s, Temporal extent extracted]
{'format': 'geojson',
'geoextent_handler': 'handleVector',
'bbox': [7.6016807556152335,
51.94881477206191,
7.647256851196289,
51.974624029877454],
'crs': '4326',
'tbox': ['2018-11-14', '2018-11-14']}
Folders or ZIP files(s)¶
Geoextent also supports queries for multiple files inside folders or ZIP file(s).
Extract both bounding box and time interval from a folder or zipfile¶
geoextent -b -t folder_two_files
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s]
Processing directory: folder_two_files: 0%| | 0/2 [00:00<?, ?item/s, Processing districtes.geojson]
Processing districtes.geojson: 0%| | 0/2 [00:00<?, ?task/s]
[A
Processing districtes.geojson: 0%| | 0/2 [00:00<?, ?task/s, Spatial extent extracted]
[A
Processing districtes.geojson: 50%|█████ | 1/2 [00:00<00:00, 21.76task/s, Temporal extent extracted]
[A
[A
Processing directory: folder_two_files: 50%|█████ | 1/2 [00:00<00:00, 11.43item/s, Processing muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s]
[A
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s, Spatial extent extracted]
[A
Processing muenster_ring_zeit.geojson: 50%|█████ | 1/2 [00:00<00:00, 109.89task/s, Temporal extent extracted]
[A
[A
Processing directory: folder_two_files: 100%|██████████| 2/2 [00:00<00:00, 18.90item/s, Processing muenster_ring_zeit.geojson]
Processing directory: folder_two_files: 100%|██████████| 2/2 [00:00<00:00, 18.79item/s, Processing muenster_ring_zeit.geojson]
{'format': 'folder',
'crs': '4326',
'bbox': [2.052333387639205,
41.31703852240476,
7.647256851196289,
51.974624029877454],
'tbox': ['2018-11-14', '2019-09-11']}
The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the folder or zipfile. The resulting coordinate reference system CRS
of the combined bbox is always in the EPSG: 4326 system.
Remote Repositories¶
Geoextent supports extracting geospatial extent from multiple research data repositories including Zenodo, PANGAEA, OSF, Figshare, Dryad, GFZ Data Services, Dataverse, and Pensoft.
Extract from Zenodo¶
geoextent -b -t https://doi.org/10.5281/zenodo.4593540
Extract from PANGAEA¶
geoextent -b -t https://doi.org/10.1594/PANGAEA.734969
Extract from OSF¶
geoextent -b -t https://doi.org/10.17605/OSF.IO/4XE6Z
geoextent -b -t OSF.IO/4XE6Z
Extract from GFZ Data Services¶
geoextent -b -t 10.5880/GFZ.4.8.2023.004
The output of this function is the combined bbox or tbox resulting from merging all results of individual files (see: Supported file formats) inside the repository. The resulting coordinate reference system CRS
of the combined bbox is always in the EPSG: 4326 system.
For comprehensive examples including all supported repositories and advanced features, see Examples.
Debugging¶
You can enable detailed logs by passing the --debug
option, or by setting the environment variable GEOEXTENT_DEBUG=1
.
geoextent --debug -b -t muenster_ring_zeit.geojson
GEOEXTENT_DEBUG=1 geoextent -b -t muenster_ring_zeit.geojson
Details¶
You can enable details for folders and ZIP files by passing the --details
option, this option allows you to access
to the geoextent of the individual files inside the folders/ ZIP files used to compute the aggregated bounding box (bbox)
or time box (tbox).
geoextent --details -b -t folder_one_file
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s]
Processing directory: folder_one_file: 0%| | 0/1 [00:00<?, ?item/s, Processing muenster_ring_zeit.geojson]
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s]
[A
Processing muenster_ring_zeit.geojson: 0%| | 0/2 [00:00<?, ?task/s, Spatial extent extracted]
[A
Processing muenster_ring_zeit.geojson: 50%|█████ | 1/2 [00:00<00:00, 126.22task/s, Temporal extent extracted]
[A
[A
Processing directory: folder_one_file: 100%|██████████| 1/1 [00:00<00:00, 57.89item/s, Processing muenster_ring_zeit.geojson]
{'format': 'folder',
'crs': '4326',
'bbox': {'type': 'Polygon',
'coordinates': [[[7.608118057250977, 51.94881477206191],
[7.602796554565429, 51.953258408047034],
[7.6016807556152335, 51.96537036973145],
[7.606401443481445, 51.97361943924433],
[7.62125015258789, 51.974624029877454],
[7.636871337890624, 51.97240332571046],
[7.645368576049805, 51.96817310852836],
[7.645540237426757, 51.96780294552556],
[7.6471710205078125, 51.96330786509095],
[7.647256851196289, 51.95807185013927],
[7.643308639526367, 51.953258408047034],
[7.608118057250977, 51.94881477206191]]]},
'convex_hull': True,
'tbox': ['2018-11-14', '2018-11-14']}
Export function¶
You can export the result of Geoextent to a Geopackage file. This file contains the output of all files within the folder or repository.
geoextent -b -t --output path/to/output/geopackage_file.gpkg folder_path