My ideas for redistricting with a sandboxwalls program involve a lot of data sorting. First, the census blocks of a state are sorted by latitude. However, the states are two dimensional, not one-dimensional. Consider a television picture. The picture is made up of horizontal stripes. Using that concept, the latitude of the State is then divided up into stripes. The census blocks within each stripe are then sorted by longitude. A box is placed on the picture. The center of the box is where a district will grow outward in all directions from a dot. The box then contains segments from a number of horizontal stripes. All of the census blocks within the box must be sorted by distance from the center dot. That way, the district can grow by acquiring the closest census blocks in order by distance. See my last blog on the square root. To keep all of the districts at about the same population, a tally records the current population for each and every growing district. The district with the lowest population grows by acquiring a census block. To know which district has the lowest population, the tally must be kept sorted by population. To correlate PL94-171 data with Shapefiles data, they can both be sorted to match.
To get this far has involved a lot of sorting. If the census blocks are delivered by the Census Bureau in some kind of order, that can either be a help or even severely hinder depending on the order versus the chosen sorting algorithm. I call the order seemingly being used as “thoughtlessly random”, and chose accordingly. How to implement all this sorting is a job for a software engineer. Unfortunately, I have never found a software engineer interested in playing with my sandboxwalls.
After some thought, I realized that the classic bubble sort algorithm should work well for sorting the tally each time. But bubble sort would be horribly time consuming at sorting a state’s amount of census blocks by latitude, and a different sorting algorithm is clearly needed there. I use a binary search insertion sort algorithm for that.
Actually, there isn’t exactly a best sorting algorithm for a given sorting problem. My analysis of the sorting the box problem resulted in a solution that is a combination of algorithms. Each segment in the box is split into 2 pieces. Each piece becomes a list which is then bubble sorted. The 2 lists are then merge sorted into a single list. Each merge sorted list is then binary search insertion sorted as a batch of census blocks into the final result. This requires more than the usual amount of code of course, but it is the quickest way I know how to get the result. I hope it will work.
The public library eventually got me the book I put on hold. The title is “The Art of Computer Programming volume 3” by Donald E. Knuth. I learned several things about sorting from this book. Much of what was discussed is over my head, as the subject of sorting has depth. I already knew that I needed help, but the book is the best help I have been able to get. Also, it seems that some version of the merge sort algorithm is more appropriate for quickly sorting the census blocks by latitude than what I have now.
The question is: Do I have enough time to go back and rewrite the FORTRAN, or would it be better to spend the time attempting to debug what I have now? The reality is that I still have no idea how to extract the data needed for redistricting from the big knotted clump of data supplied by the Census Bureau for gerrymandering. I may be limited to rectangular states if I am unable to obtain the data or figure out how to use the TIGER Shapefiles to limit the length of horizontal stripes to within a state border.
The Holidays are almost over, and soon I will be back to earning a living instead of playing with FORTRAN programming for redistricting. Happy New Year seems like a wish, not a reality in this economy. When I get a chance, the next blog will be about accuracy.