Improve performance making the difference between two layers with PostGIS

24/08/2018Por iCarto

Recently one of our clients asked us for help because a geoprocess they were developing in PostGIS took too long to complete.

We can not share the data or the real code, but we can share some simplified queries that can be used as a generic example. We will do it step by step and through the CREATE TABLE statements you can visualize the process. The final algorithm is somewhat different, but we have tried to keep the example simple and the basic idea is the same.

We identified the central problem in a query that attempted to generate the difference between two layers of the database as follows:

WITH
    l1 AS (
        SELECT ST_Union(geom) AS geom FROM boundary_l1
    ),  
    l2 AS (
        SELECT ST_Union(geom) AS geom FROM boundary_l2
    )                  
SELECT ST_Difference(l1.geom, l2.geom) FROM l1, l2;

In other words, we group each of the layers into a single geometry and then “subtract” them. The approach is simple and clear, but operations such as ST_Difference or ST_Intersects are computationally expensive. In general, they will be faster when working on simple polygons, especially if we are able to filter adequately on which pairs of geometries to carry out the operation.

Knowing the data in these cases is useful, both layers are multi-polygons of administrative regions, with l2 being lower level (let’s say municipalities) than l1 (let’s say provinces). In this case, the geometries of each layer could be grouped by independent “islands” on which to make the difference.

So the first thing is to generate simple polygons for each layer, which we will then group dissolving the borders and then break again into simple polygons.

-- l1_simple already contains simple dissolved polygon
-- cast to geometry(polygon, 4326) simplifies visualize results in GIS, and
-- checks the types of the generated geometries
CREATE TABLE l1_simple AS
    SELECT (ST_Dump(geom)).geom::geometry(polygon, 4326) geom FROM boundary_l1;

CREATE TABLE l2 AS
    SELECT (ST_Dump(geom)).geom::geometry(polygon, 4326) geom FROM boundary_l2;
CREATE TABLE l2_union AS
    SELECT ST_Union(geom)::geometry(multipolygon, 4326) geom FROM l2;
CREATE TABLE l2_simple AS
    SELECT (ST_Dump(geom)).geom::geometry(polygon, 4326) geom FROM l2_union;

Examining the results shows that l2_simple has many more geometries tha

Examining the results shows that l2_simple has many more geometries than expected. It could be from poorly digitized geometries that touch us, but

SELECT ST_Area(geom) FROM l2_simple ORDER BY 1;

It allows us to immediately detect a huge amount of slivers> that can be removed.

DELETE FROM l2_simple WHERE ST_Area(geom) < 0.0001;

Now that we have simple dissolved polygons for each zone of interest, all that is left to do is to make the difference between each pair of geometries that touche. But first we’ll keep the geometries of l1 for which there’s no equivalent in l2.

CREATE TABLE RESULT AS
    SELECT ST_Multi(l1.geom)::geometry(multipolygon, 4326) AS geom
    FROM l1_simple l1
    WHERE l1.ctid NOT IN (
        SELECT l1.ctid
        FROM a JOIN l2_simple l2
        ON ST_Intersects (l1.geom, l2.geom)
    );
INSERT INTO RESULT
    SELECT ST_Difference(l1_simple.geom, l2_simple.geom)
    FROM l1_simple JOIN l2_simple
    ON ST_Intersects(l1_simple.geom, l2_simple.geom);

We stopped the initial query in the first test when it had been running for more than 20 minutes. The modified algorithm runs in less than a minute.

Remember, simple is usually faster than multi. And if you need help with your PostGIS geoprocesses write to us.