Geographic data has become Big Data
Big Data requires new approaches to data storage, management, processing, interaction, and understanding. GIS, Geographic Information Systems, attack the problem of performing geographic analysis on their datasets, but do not approach these other problems inherent to the kinds of datasets available in 2011. At the same time that the amount of data available rapidly outstrips the abilities of your average GIS system to process it, multiple disciplines are becoming more aware of their geographic needs within these data.
How do we give these disciplines the tools they need?
RENCI’s Geoanalytics program will transform the software and the solution development cycle of working with geographic data. Current GIS systems give users the kitchen sink, making all problems doable, but common problems hard. They require extensive training to use and are usually the domain of experts in the field of geography. Additionally, by not integrating network-awareness and data management into the development cycle, they encourage data and logic silos that isolate researchers from one another. No one system sufficiently enables cross-disciplinary science using geography-intensive Big Data. Open-source systems that could integrate instead exist separately and at different levels of compatibility for sharing, analyzing, and managing data, and visualizing results. The Geoanalytics cyberinfrastructure developed by RENCI will:
- Scale horizontally to Big Data, its update frequency, access patterns, and management requirements.
- Integrate sensible data management solutions to scale.
- Vet, recommend, and federate open source geography tools to reduce the barrier of entry to using big geographic data for science.
- Provide pathways to accomplish common tasks, reducing the complexity of getting things done.
- Be able to rapidly develop and deploy prototypes and solutions.
To provide this functionality, Geoanalytics will include:
- A data management and analytics layer incorporating IRODS, open source GIS software, our supercomputing resources, and a distributed task queue.
- A set of formalized, managed data models that encompass most common data patterns for RENCI stakeholders.
- A standards-based web service layer that integrates open-source GIS and provides for data interoperability and rapid application development.
- A federated set of client-side software that can be used to rapidly develop browser-based or mobile web applications.
The above architecture diagram gives a high level overview of the Geoanalytics system as a layered architecture. The layers consist of a data / computational grid, a data model layer, a web-service architecture, and a client-application toolset. These components are described more fully in the following pages:
Open Source Software
Geoanalytics is intended to be Open Source software. However, while it is under heavy initial development, access to the source is on a per-request basis. For more information, email Jeff Heard, jeff at renci dot org.