Was spending my free time working through Real World Haskell by O’Sullivan, Stewart, and Goerzen. I can appreciate how many of the exercises end up being very mathy. For instance, the Chapter 3 exercises culminate in an implementation of Graham’s scan algorithm for finding the Convex Hull of a finite set of points in the plane. Here’s my implementation.

Input: A finite list of points $$(x,y)$$ in the plane.

Sample Input: (0,0), (0.5, 1), (1,0), (1,1), (0,1), (-10,-10)

Output: The (ordered) list of vertices of the smallest convex polygon containing all of the input points.

Sample Output: (-10,-10), (1,0), (1,1), (0,1)

By ordered we mean that traversing the list of vertices should be analogous to taking a counterclockwise walk around the outside edge of the polygon, without cutting through the interior.

Strategy: By extremal point we mean a point in the input set that is a vertex of the final convex polygon. Our task is to identify all of the extremal points and order them.

We first find an obviously-extremal point, then we use a method not unlike ray tracing to order the remaining points. Once the remaining points are ordered, we can determine which points are extremal by taking a walk along the points, making sure we only ever take left turns (by throwing away points when we don’t make a left).

1. Remove any duplicates from the input list.

2. Find the leftmost point on the input list (using lowest $$y$$-coordinate to break ties). Call it b. The description of b makes it clear that b is extremal.

3. For each of the remaining points p, find the slope and distance between b and p. If multiple ps form the same slope with b, then obviously the ones closer to b can’t be extremal, so only keep the furthest.

At this step, we worry about what will happen to points directly above b, since the slope will be undefined. Haskell’s Double implementation will treat these slopes as Infinity.

4. Order the remaining points by increasing slope. Then append b to the front of the ordered list. We now have a list ps of points that starts with b, is ordered by increasing slope, and contains at most one point for each slope.

A quick check in GHCi shows (1.0 / 0.0) > 1.0 evaluates to True, so if there were any points directly above b, the furthest such point will be the last entry of ps.

5. Now, we take a walk along our list, looking at three consecutive points at a time. For each triple of points p1, p2, p3 we imagine that we travel from p1 through p2 to p3, and we calculate the direction we turn at p2.

Since our list is ordered in increasing slope, anything other than a left turn at p2 indicates that p2 can’t be extremal, so in that case we throw it away. If we do make a left turn at p2, then because of how our list is ordered p2 is extremal, and we can then move on to considering the trip from p2 through p3 to p4, and so on.

Let’s take a look at the code. The whole file is 98 lines, and you can see it in its entirety on GitHub, but we dissect it below.

First, imports and a general convenience function.

removeDuplicates does exactly what it sounds like it does.

Edit: The function I’m looking for is nub from Data.List.

Next we introduce a datatype for points in the plane and a few functions that we’ll need later.

Edit: Using a type alias, such as type Point = (Double, Double) instead of a data declaration allows us to use the default Ord, Read, and Write instances, greatly simplifying the code. These changes are reflected on the GitHub version of exB12.hs.

We could implement this algorithm using angle instead of slope, but slope is easier to calculate and ends up being equivalent. I.e. if we were to calculate the angle that each point forms with the ray moving in the positive $$x$$ direction emanating from a fixed point, we’d end up needing to calculate atan of the slope, anyway, and since atan is monotonic, sorting on atan . slope is the same as sorting on slope, so why compute atan? Likewise, we chose to use the taxicab metric instead of the usual distance formula, since the taxicab metric is easier to calculate (we avoid sqrt) and results in the same sort.

Next we define a datatype for relative directions and a function that discerns the direction we need to turn at p2 in order to proceed to p3 from p1.

Lines 17 through 23 perform a coordinate transformation on p3. We could pull the coordinate transformation out of direction and make it a distinct top-level function, but I don’t see any reason to at this point.

Wikipedia suggests embedding our points in three space and using the $$z$$-coordinate of the cross product of the vector $$\vec{p_1 p_2}$$ with the vector $$\vec{p_2 p_3}$$ to determine direction. That approach works, but it misses a good number of edge cases, and taking care of those edge cases results in a function that’s not any simpler than the one at hand.

We’re ready for the algorithm itself:

The algorithm flows through two phases: sort massages our input data into a form that is easy to recurse on, walkPerimeter process the sorted data, three points at a time.

I’ve annotated sort and walkPerimeter, and you’ll see they implement the five-step strategy described at the start of this post. One thing to remember while reading the annotations, though: read from bottom to top. This is functional language, and sort is a composition of a bunch of functions. The rightmost (here, lowest) function is the first to touch our input data.

walkPerimeter assumes that its first argument is a point that we intend to keep (it has no way of deleting the first point it’s fed), and this is okay since we know b is an extremal point.

Let’s give it a test drive in GHCi.

Let’s noise-up our data a little and see what happens.

I’d like to write some robust some testing suites, but I don’t really know how. Anyway, that’s enough for tonight.