Apply policies to large data streams from outside or analytics.
Automatically react to “environmental” changes in data.
Distributed Task Queueing
Use cases: Defer long running analytics, routing to HPC resources. Ingest regularly updated external datasets. Sunset expiring data.
Schedule tasks for regular execution
Defer long-running tasks for batch processing.
Tasks are developed per-application and automatically discovered and scheduled by Geoanalytics, or can be used in server-side program as functions with deferrable execution.
Example tasks include:
Sunsetting ADCIRC and sensor data
Running batch analytics on an offline dataset.
IRODS
Use cases: Manage data retention and access policy. Maintain a data grid.
Celery handles getting data into databases. IRODS handles data once it gets on disk.
Policy based storage and retrieval of data, which can then be loaded into the active data stores, such as MongoDB or PostGIS.
Data federation layer across Geoanalytics instances.