Prepping a Geoanalytics install on CentOS 6

This is the first revision of this tutorial.  There are likely to be mistakes, omissions, errors, and gaffes. If you get to the end and nothing works, comment on the post and I will endeavour to fix the problems.

Getting geoanalytics up and running on CentOS-6 is still a bit involved, but has been much simplified with the addition of an installer.  The installer can be downloaded from GitHub here

1. Setup a fresh CentOS-6 or RHEL VM.

This doesn’t have to be anything fancy.  In fact, I would recommend setting up the most basic, minimal CentOS, as the installer will take care of making all the proper packages show up.  You don’t want an existing webserver or database server on there, as this will only make for confusing errors down the road.

2. Download ga_prep and start the installer

$ wget https://github.com/JeffHeard/ga_prep/zipball/master
$ unzip JeffHeard.zip
$ mv JeffHeard* ga_prep
$ cd ga_prep
$ sudo ./install_ga-centos6.sh

Make sure to run as root or everything will blow up!!

3. Installing geoanalytics

Now things will begin installing.  The installer first takes care of adding a couple of repositories for you, including the ELGIS repository, which contains the RPMs for most of the OSGeo toolchain, and the 10gen repository, which contains the latest stable version of MongoDB.  Then it prompts you to install RPMs.  If you have a custom setup system with GDAL, PostGIS, GEOS, HDF5, NetCDF, and so forth, you may not want to let the installer perform this part.

From there, the installer installs an updated version of GDAL that includes support for building python extensions.  The ELGIS gdal is incompatible with the latest GDAL python bindings, and will need to be replaced.

Then the installer will update the /etc/profile with some extra paths that are necessary, including those for GRASS and for locally installed libraries.

Next the installer begins installing Django in a Python virtual environment. The installer creates a django user (and prompts you for a password) whose home is /opt/django.  The virtual environment for geoanalytics will be /opt/django/apps/ga/current.  To operate within this environment later, type:

$ sudo su django

Once the virtual environment is setup, the installer install the geoanalytics basic codebase.  Then it will ask you if you want to install PostGIS.  If you are running everything on the local machine, because you’re just experimenting with Geoanalytics, or you have a small installation, then you will want to answer “y” to this.  If you choose to answer “n” either because you want to setup PostGIS yourself or you have an existing PostGIS installation already running, you will merely want to make sure the following preconditions are true for your installation of PostGIS:

  • The DATABASES parameter in /opt/django/apps/ga/current/ga/settings.py is setup to point to your PostGIS installation
  • A PostGIS template database is loaded into the database (this is important for test running)
  • Your PostGIS’s pg_hba.conf is configured to allow connections coming from the geoanalytics machine (this may be obvious, but I’ve forgotten it enough times that it bears repeating).

Then the installer will ask you if you want to install MongoDB.  MongoDB can be quite complex to setup if you are creating a sharded, clustered instance of MongoDB. This installer assumes that you are creating the most basic installation of MongoDB possible.  If you are interested in a more robust solution, answer ‘n’ to this question and go to the MongoDB website for more information on a clustered configuration.  The basic configuration will work, however if you will be serving significant web application loads, you will want to move MongoDB off the same machine that PostGIS is on.

When you have your MongoDB server figured out, add the following lines to your settings.py file:

import mongoengine
mongoengine.connect('geoanalytics', host={server}, port={port})

Finally the installer will ask you if you want to setup the task broker on this machine.  If everything is running on the same machine, then the answer to this is “y”.  The task broker is relatively lightweight.  It handles apportioning Celery tasks among nodes of the Geoanalytics cluster.  There need be only one task broker, so if you are running this installer on multiple machines, you only need to answer ‘y’ to this once.  Just be sure to add the following line in your settings.py file once you have your broker figured out.  Note that it is possible to use the task queue “unbrokered” in which it uses the Django ORM, but this is much slower and has reduced functionality:

BROKER_URL = "amqp://geoanalytics:geoanalytics@{hostname}:5672//"

And that’s it!

3. Post install instructions

There are a number of things that should be done after the installer has finished.

  • Setup Celery the way you want it.
If you are using Celery, you should add autostart=true and autorestart=true to the  [celerycam] and [celerybeatd] applications on exactly one machine of your geoanalytics cluster in the file /opt/django/configs/supervisord/myapp.conf .  Usually this will be the “headnode”
  • Setup nginx the way you want it.
If you are using multiple machines for Geoanalytics and you want to load balance among the servers, there are a number of ways to go about it.  You may want to setup nginx to round-robin between servers.  This can be handled via the standard nginx configuration file, which is described on nginx.org
  • Setup MongoDB the way you want it.
If you want a clustered instance of MongoDB, because you need more space than you have on your main machine, or you want better resilience or throughput, go to mongodb.org and follow their instructions.
  • Sudo to the django user and do the following:
$ cd $VIRTUAL_ENV/ga
$ python manage.py collectstatic
$ python manage.py syncdb
$ supervisorctl restart ga

Finally, RedHat and CentOS generally firewall everything but ssh by default.  You will want to add rules to iptables to open your machine to port 80 and 443.  Also, if you’re running a geoanalytics cluster, you’ll want to make sure that all appropriate database ports are visible between the machines of your cluster.  To setup just port 80, you need a rule like this:

$ iptables --new Bills-Chain
$ iptables --insert INPUT 1 --jump Bills-Chain 
$ iptables -A Bills-Chain -p tcp --dport 80 -j ACCEPT

Once you do all this, you should be able to surf to your host and get a 404 page that gives you a list of valid URLs, like “^/admin” If you get that, you’re done with this tutorial!  From here you can write your own GeoDjango apps using the Geoanalytics core libraries.  As I publish new source code on GitHub, I will go into detail about how to use the libraries.