Thursday, September 24, 2015

Importing OSM Data into PostGIS - Part 1

The PV Viability Map project seems to really be picking up some steam!  Thanks to our good fortune in having David Winslow join us, we now have someone with the skills and experience to move us forward -- thanks, David!  In our meet up last Thursday we had a first discussion of the requirements for the project, which I'll do my best to summarize here.

PV Viability Map Requirements

  • Display map in web page.
  • Search an address and display the building there.
  • Standard map mouse navigation (drag to pan, mouse wheel to zoom, etc.)
  • Click on building to identify.
I have been tasked with seeing how far I can get before next Thursday's meet up with the first task - displaying a map in a web page.  Actually, I've been tasked with the first part of this process - getting OpenStreetMap (OSM) data, trimming for Arlington, and loading it into PostGIS.

Process Overview

Before diving into looking for a solution, a pause to consider the process is in order.  From the Wikipedia page for OpenStreetMap we learn:
  1. The main copy of the OSM data is stored in OSM's main database. The main database is a PostgreSQL database, which has one table for each data primitive, with individual objects stored as rows. All edits happen in this database, and all other formats are created from it.
  2. For data transfer, several database dumps are created, which are available for download. The complete dump is called planet.osm. These dumps exist in two formats, one using XML and one using the Protocol Buffer Binary Format (PBF).
From what I can gather (and the notes I took at our last Code for NoVA meet up), I will be using a combination of two tools to grab the data from OSM's server and then push it into my local database:
  1. osmosis
  2. imposm
I was able to install both of those with:
$ sudo aptitude install osmosis imposm
I'm not clear when one would use imposm and when one would use osm2psql, which seems to be for the same purpose. I installed the latter as well with:
$ sudo aptitude install osm2psql
To be honest, I find the OSM wiki documentation to be very challenging to read. It appears thorough and is well written, but it assumes a wealth of background I lack.  Hopefully with time I'll be able to make better use of it.  For now I'll wait for our meet up this evening to see if David can help me figure out how to use osmosis to get data for Arlington County.

Tuesday, September 22, 2015

Creating a Lubuntu Custom Install Disk for the School Lab

I'm a high school teacher in a lab full of Windows 7 machines, and it's just not
working for me.  I'm teaching Web Page Design I, which is a course in HTML and CSS, and by using WinSCP and Notepad++, the Windows workstations are adequate for the task.  But for my Computer Science class, where we will be exploring mathematics with Python and will need to install lots of Python libraries and tools, I need Ubuntu.  Actually, I'll be using Lubuntu, since it is light weight, supports the same software, and works much better with the old NVIDIA GeForce 200 graphics cards that are in the machines (the Unity desktop wants to make use of the 3D graphics features of the card, which looks awful and crashes with both the free nvidia driver and the proprietary ones from nvidia.  Lubuntu is perfectly happy in 2D, and the free driver works very well with it.

The Plan

  1. Create a VirtualBox VM with Lubuntu and all the software I plan to use in the lab installed on it.
  2. Install  Remastersys on this VM and use it to create a custom installation disk.
  3. Use the custom installation disk thus created to install Lubuntu along side Windows on the lab workstations.

Step 1: The VM

I created a new VirtualBox VM and installed Lubuntu 14.04 (64 bit) on it.  After running all the updates, I did the following:
  • Installed virtualbox-guest-utils. This package will not be needed on the lab workstations, but without it screen resolution on the VM is limited to 640x480, making it too difficult to work with.
  • Added this PPA and then installed remastersys-gtk. I figured better to test this early, since if it doesn't work there is not point in doing the rest of the setup.
  • Installed gnome-screenshot and gimp. I added these two now to be able to take screenshots of the remastersys screens and edit them.
  • From Synaptic package manager I enabled the "Canonical Partners" repository and installed adobe-flashplugin.
  • Added the ubuntugis-unstable PPA.
After this I tried using remastersys, since if that doesn't work, there is no point in continuing.  I'm glad I did this early in the process, since I encountered a problem after launching System Tools -> Remastersys and selecting dist:
I found an Ubuntu forum post here, which suggested creating an empty lightdm.conf file in /etc/lightdm would fix the problem. It did! The process completed and I found a custom-dist.iso file in the /home/remastersys/remastersys directory.  I installed successfully with the custom-dist.iso, with one caveat -- when I tried to select "Encrypt my home folder" during the install, it crashed.  Encrypting the home directory does not appear to work with the Remastersys created iso, so I'll just make sure not to select that option, and to use the process described here to encrypt home directories later.

The last thing I needed by way of infrastructure on this VM is Grub Customizer, which will make it easy to edit the bootloader menu to have it boot to Windows by default (it pains me to do that, but I'm sharing the lab this year so I have to play nice ;-)


Step 2: Adding Software

I know I'll continue to add other software as the year goes on and I find other things I need, but for the first go round, here is a list of packages I know I'll want, that I have just installed on the VM:
  • python3-pip
  • idle3
  • python3-matplotlib
  • python3-sphinx
  • python3-pep8
  • python3-bs4
  • python3-w3lib
  • python3-scipy
  • python3-pyqt5
  • ipython3
  • ipython3-notebook
  • python3-termcolor
  • python3-cairo
  • python3-paste
  • python3-cherrypy3
  • python3-flask
  • python3-bottle
  • spyder3
  • inkscape
  • gftp
  • vim
  • vim-gtk
  • most
  • openjdk-7-jdk
  • sqlite3
  • spatialite-bin
  • spatialite-gui
  • qgis
  • grass
  • pgadmin3
  • postgresql-client
I also installed SymPy, using the command: 'sudo pip3 install sympy', since sympy is not in the package repository. Finally, I installed Google Chrome using the installer from here and Opera using the installer here.

With all this software installed, I ran Remastersys again to make an installation iso disk image.  Tomorrow I'll try it out in the lab...

Sunday, September 20, 2015

Web-based GIS Assignment 2

With pgAdmin running on my Ubuntu 14.04 desktop, and PostGIS setup on an Internet virtual machine (running Ubuntu 14.04 server), I connected pgAdmin to the server by double clicking on the server name (NYC) in the list.  It connected fine, but not without giving me a warning:
A quick Google search brought me to this Ask Ubuntu link:
which says I need to install postgresql-contrib.  Here goes:
$ sudo aptitude install postgresql-contrib
It installed a single package without incident.  After clicking the "Fix It!" button, I can now connect to the remote server without warning.

To install pgAdmin on my desktop at home, I ran:
$ sudo aptitude install pgadmin3
it returned that:
The following NEW packages will be installed:
  pgadmin3 pgadmin3-data{a} pgagent{a} postgresql-client{a}
  postgresql-client-9.3{a} postgresql-client-common{a}
While it is no surprise that pgAdmin would depend on postgresql-client (which has the command line psql program), it is convenient. For one thing, it makes it easier to test out the connection to the remote database server, since the psql command:
$ psql -h [server name] [database name]
Is easy to remember.  I still don't know off the bat how to fill in the fields of the pgAdmin connection screen without looking things up.  Here is a screen shot of the New Server Registration screen:
I know what to put in the Host, Username, and Password fields, but I'm not sure about Name and Maintenance DB.  Is Name the name of the database, or is that Maintenance DB?

This documentation page proved most helpful:
I put "nyc" (the name of the database we are using for class) in both the Name and Maintenance DB fields, and it connected to the server without incident.

Setting Up the Local Virtual Database Server for Testing

My next task, assigned to me at our last Code for NoVA meet up, is to look into loading OpenStreetMap (OSM) data into a PostGIS database. As cool as it is having a remote server out on the web running a database server 24/7 that I can connect to whenever I want, I'm not going run new "experiments" on that machine.  That's where local VirtualBox VMs come in handy.

I described my first attempt at setting up a PostGIS server in a previous post. As I've learned since, starting out with the command:
$ sudo aptitude install postgresql-9.3-postgis-2.1
is the best way to get going, since it installs postgresql itself and most everything else you need to get started. I still have the VM I made back then, so now I'm going to copy over the nyc database to it and configure it for remote access.

I'll repeat the process I used to move the nyc database from the VirtualBox VM we were given in our Web-based GIS class, described in my previous post, only this time I'll export it from my database server, so that I can skip the steps where I had to rename the owner of the database.

Here is what I did:
username@local_machine:~$ ssh [database server name]
username@dbserver:~$ pg_dump -c nyc > nyc.sql
username@dbserver:~$ exit
username@local_machine:~$ scp [database server name]:nyc.sql .
I could export the database as me, since I've been setup as a database user. I don't remember whether I did that on the VirtualBox VM, so now is a good time to learn some more PostgreSQL administration.  Here is a documentation page which has what I need:
Trying to start psql on the VirtualBox server shows that I didn't add my user as a database user:
$ psql
psql: FATAL:  role "[username]" does not exist
Time to fix that:
username@postgis:~$ sudo -i
[sudo] password for username:
root@postgis:~# su - postgres
postgres@postgis:~$ createuser --superuser username
postgres@postgis:~$ exit
root@postgis:~# exit

username@postgis:~$ createdb nyc
username@postgis:~$ psql nyc < nyc.sql
It successfully ran the script and populated the nyc database. I could then run:
username@postgis:~$ psql nyc
and connect to the nyc database and run queries.

Now to enable remote connections:
username@postgis:~$ sudo vi /etc/postgresql/9.3/main/pg_hba.conf
and changed this line:
host  all  all  md5
host  all  all     md5
username@postgis:~$ sudo vi /etc/postgresql/9.3/main/postgresql.conf
and changed this line:
#listen_addresses = 'localhost'
listen_addresses = '*'
username@postgis:~$ psql nyc
nyc=# alter user [user] with password '[password]';
This is mostly a repetition of information in my previous post, but since I want to learn it, it bares repeating.

Finally, I'll connect from my desktop machine to the "remote" VirtualBox server, which has IP address on my home network:
username@localmachine:~$ psql -h nyc
psql (9.3.9)
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.

Excellent!  I'm all set to explore importing OSM data into my database.

Saturday, September 19, 2015

Two Old Draft Posts from My LIDAR Study

Note: I had two draft posts from my Lidar study last Summer that are incomplete, but contain useful information I will want to refer to in the future.  Since time does not allow me to complete them now, I'll just publish them both "as is" here...

Visualizing the Loudoun County Lidar Data


For the PV Viability Map project, we are using Lidar data from Loudoun County, Virginia.  Loudoun County was chosen because the goal of the initiating organization, Northern Virginia Regional Commision (NVRC), is to produce a map for all the NVRC members, but complete Lidar data is currently only available for Loudoun County.  Currently available data can be obtained from the USGS EarthExplorer website (see previous post, Getting Started with LIDAR Data), but it is available in small distribution units from the VirginiaLidar site.  A few other websites I came across in searching that might be useful later include:
To get the data we'll be using, I went to this folder on Virginia Lidar's Google drive.  I downloaded the PDF, Shapefile zip, and KMZ files.  This is the first time I've encountered KMZ files, which are zipped Keyhole Markup Language files.  KML is an open standard for expressing annotations and markers on web based 2D and 3D maps.  It appears you can work with them in OpenStreetMap.

The Lidar data on the Virginia Lidar site is divided into blocks.  The region covered by each block is described in the LoudounCo_Ref.pdf PDF file in the Virginia Lidar Google drive location linked above.  Here is what it looks like:
To begin learning to work with the data, I set myself the task of downloading a single block of the data and then exploring tools to visualize it.  I choose 18STJ7733.laz since it contains the intersection of Routes 15 and 7 in downtown Leesburg (see this Google map), and would thus I hoped have recognizable features that would aid in testing the visualizations. Since the data is provided in compressed LAZ format, and many of the tools require the uncompressed LAS format, I had to first uncompress the file. I described how I did this in an earlier post.


After spending a reasonable amount of time searching on the web for Lidar visualization tools, it became clear to me that I will have to overcome a number of obstacles to be able to work with this data.  I'm going to need to understand the LAS file format in some detail to use the lower level tools that are available for GNU/Linux systems.  I may explore some of the free software tools that are only available for Windows, but I'll only do that after I've exhausted what I can do a free software platform.

I found a website,, that offers through the web visualization of LAS files.  This screenshot shows the website with the default data being visualized:


Manual of Airborne Topographic Lidar - Chapter 3 Notes

Chapter 3 of the Manual of Airborne Topographic Lidar is titled Enabling Technologies, and discusses the Global Navigation Satellite Systems (GNSS) and inertial navigation systems (INS) that together enable airborne laser scanning (ALS).  Both these technologies were mentioned frequently in the chapter 2 discussion of elements of ALS technologies, and it is clear from the discussion why ALS could not emerge as a viable commercial technology until the 1990s, when the GPS system become available.

Global Navigation Satellite Systems

There are currently two operational GNSS systems, the US Global Positioning System (GPS), and the Russian GLObal NAvigation Satellite System (GLONASS), and two systems under development, the EU Galileo system, and the Chinese BeiDou Navigation Satellite System (BDS), both expected to be completed by 2020.

How Does a GNSS Work? 

The text describes GNSS as "a constellation of satellites carrying atomic clocks that broadcast time and an arbitrary number of receivers each of which computes its own position on the Earth from measured signal propagation times from the visible satellites to the receiver." (p. 99).  What I didn't understand was how the receiver could compute the propagation time of the signal (and thus find the distance from the satellite) without having a clock that was synchronized with the atomic clock on the satellite.  A post titled How does GPS receiver synchronize time with GPS satellites? provides a nice explaination:
"The time value isn't used to tell the receiver what time it is (at least not directly, although that is helpful later). It's used so that the receiver can tell relatively what the distance is to each satellite.
If you hear Sat A say that the time is 0.00000 and Sat B says the time is 0.00010, then if they are in sync, you must be closer to B than to A. You can tell exactly how much closer you are by the specific time difference.
Repeat the calculations with a few other satellites and you will find that there is only one place (and time) that the receiver can be located.
The GPS receiver computes a solution that simultaneously provides Position, Velocity, and Time (PVT). It's not that one is calculated first, then the other is. They all fall out simultaneously."
A bit later in the post the following equation is listed:

Looking at the 4 unknowns, x, y, z, and t, it makes sense why 4 satellites are need to provide a location (and time).


Tuesday, September 15, 2015

First Assignment in Web-based GIS Class

My first assignment for the Web-based GIS class has two parts: 1. Reading the first chapter of Spatial Databases: A Tour, by Shashi Shekhar and Sanjay Chawla, answering several conceptual questions related to the reading, and 2. A "lab" project designed to get us familiar with our VirtualBox VM, pgAdmin, and QGIS.

The questions involved understanding what spatial data is, comparing file systems to databases, looking at a brief history of spatial database management systems (SDMSs), defining abstract data types (ADTs) and understanding an SDMS as a specific instance of example of an object-relational database management system (ORDBMS).

The lab portion of the assignment involved running pgAdmin and QGIS on a VirtualBox VM to connect to a PostGIS database. I've been using VirtualBox for years, so I wanted use this assignment to push my PostGIS administration knowledge by setting up a PostGIS database on the web and migrating the database from our class VM to this server, and then connecting to it from pgAdmin and QGIS clients running on local machines.

Moving a Database

Logging into the VirtualBox VM given to us in class, I ran the following commands:
ggs $ sudo -i
# su - postgres
postgres $ pg_dump -c nyc > nyc.sql
(the file can be downloaded from here).  I then made the following substitutions in the nyc.sql file, since I want the database owner to be my username and not the postgres user:
  1. change ' TO postgres' to ' TO [user]'
  2. change ' FROM postgres' to ' FROM [user]'
I then used scp to copy the nyc.sql file to my on-line database server.

Picking up on the database server where I left off in my previous post, I now need to make my regular user a database admin who can create databases.
$ sudo su - postgres
$ createuser --superuser [user]
$ exit
Now I can create the database and import the data:
$ createdb nyc
$ psql nyc < nyc.sql
Let me run a query from the lesson to make sure it works:
$ psql nyc
psql (9.3.9)
Type "help" for help.

nyc=#  select name from nyc_subway_stations where name like 'Broad%';
 Broadway Jct
 Broadway Jct
 Broad St
 Broadway-Lafayette St
 Broadway-Nassau St
 Broadway Jct
 Broad Channel
(9 rows)

To get the postgres server to except outside connections, I needed to make the following changes to config files:
$ sudo vi /etc/postgresql/9.3/main/pg_hba.conf
and changed this line:
host  all  all  md5
host  all  all     md5
$ sudo vi /etc/postgresql/9.3/main/postgresql.conf
and changed this line:
#listen_addresses = 'localhost'
listen_addresses = '*'
I still couldn't connect remotely, since I got a password error the user I created for the database, so I had to set that by running:
nyc=#  alter user [user] with password '[password]';
After that I could connect remotely to the server!

Monday, September 7, 2015

GGS 692: Web-based GIS

I've just started my Fall semester graduate GIS course at George Mason University, GGS 692: Web-based GIS. This program continues to be extremely rewarding and just what I was looking for in a graduate program, since I am learning real skills that allow me to apply my previous background in mathematics and computer science to solving "real world problems".

According the syllabus, this course will:
[P]rovide the students with the knowledge to curate, store, manage and query geospatial data by means of powerful database management systems. Moreover, to communicate the data, the students will learn how to build Web mapping applications on top of a database and so communicate and interact with the data using nothing more than a Web browser. The course will cover a variety of open source software packages for web mapping and will provide pointers to commercial solutions where appropriate.
The specific goals are
  • To enable students to develop a good understanding of the principles and techniques of spatial databases.
  • To design and build a spatial database.
  • To perform common various types of queries and spatial analyses.
  • To design, develop, and implement custom web mapping applications using open standards and open source software.
The course involves a large final project, which I hope to use to develop the Photovoltaic Viability Map web application that will allow Northern Virginia residents to look at their homes on a map and get information about the cost and benefit of putting solar panels on their roof.

The specific technologies we will be learning about include:
All of these are free software GIS tools, so I am delighted at the opportunity to be compelled to learn about them.  I will be adding Mapnik and GeoDjango to the list, since my goal is to learn to be a Python GIS web application developer.

Getting Started

I've already been learning some of these technologies as part of my previous two courses, so this semester the goal is to really begin to master them.  Since a geospatial database is something I'm going to need on a regular basis, I'm going to install PostGIS on an Internet VM that I already have available, so that I'll be able to connect to it whenever I need to.

Referring back to the post I made on July 8, PostGIS Installation, I ran:
$ sudo aptitude install postgresql-9.3-postgis-2.1
My July 9 post, Adminning a PostGIS Server, has details for setting up remote connectivity and creating a database, but I think before I do that I'll go through this tutorial:
to get a broader overview of PostreSQL administration.