This repository has been archived by the owner on Mar 25, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathmj-msc.tex
1740 lines (1371 loc) · 68.6 KB
/
mj-msc.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage[american]{babel}
\usepackage[utf8]{inputenc}
\usepackage{fvextra}
\usepackage[autostyle,english=american]{csquotes}
\MakeOuterQuote{"}
\usepackage[
maxbibnames=99,
style=numeric,
sorting=none,
alldates=iso,
seconds=true
]{biblatex}
\addbibresource{bib.bib}
\usepackage[
pdfusetitle,
pdfkeywords={Line Generalization,Line Simplification,Wang--Mueller},
pdfborderstyle={/S/U/W 0} % /S/U/W 1 to enable reasonable decorations
]{hyperref}
\usepackage{enumitem}
\usepackage[toc,page,title]{appendix}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{dcolumn}
\usepackage{gensymb}
\usepackage{units}
\usepackage{varwidth}
\usepackage{tabularx}
\usepackage{float}
\usepackage{numprint}
\usepackage{tikz}
\usetikzlibrary{shapes.geometric,arrows,positioning}
\usepackage{fancyvrb}
\usepackage{layouts}
\usepackage{minted}
%\usepackage{charter}
%\usepackage{setspace}
%\doublespacing
\input{version.inc}
\input{vars.inc}
\newcommand{\onpage}[1]{\ref{#1} on page~\pageref{#1}}
\newcommand{\titlecite}[1]{\citetitle{#1}\cite{#1}}
\newcommand{\titleciteauthor}[1]{\citetitle{#1} by \citeauthor{#1}\cite{#1}}
\newcommand{\DP}{Douglas \& Peucker}
\newcommand{\VW}{Visvalingam--Whyatt}
\newcommand{\WM}{Wang--M{\"u}ller}
\newcommand{\WnM}{Wang and M{\"u}ller}
\newcommand{\WirM}{Wang ir M{\"u}ller}
% {\WM} algoritmo realizacija kartografinei upių generalizacijai
\newcommand{\MYTITLE}{{\WM} algorithm realization for cartographic line generalization}
\newcommand{\MYTITLENOCAPS}{wang--m{\"u}ller algorithm realization for cartographic line generalization}
\newcommand{\MYAUTHOR}{Motiejus Jakštys}
\newcommand{\inputcode}[2]{\inputminted[fontsize=\small]{#1}{#2}}
\newenvironment{longlisting}{\captionsetup{type=listing}}{}
\title{\MYTITLE}
\author{\MYAUTHOR}
\date{\VCDescribe}
\begin{document}
\begin{titlepage}
\begin{center}
\includegraphics[width=0.2\textwidth]{vu.pdf} \\[4ex]
\large
\textbf{\textsc{
vilnius university \\
faculty of chemistry and geosciences \\
department of cartography and geoinformatics
}} \\[8ex]
\textbf{\MYAUTHOR} \\[8ex]
\normalsize
A Thesis Presented for the Degree of Master in Cartography \\[8ex]
\LARGE
\textbf{\textsc{\MYTITLENOCAPS}}
\vfill
\normalsize
Supervisor Dr. Andrius Balčiūnas \\[16ex]
\VCDescribe
\end{center}
\end{titlepage}
\begin{abstract}
\label{sec:abstract}
Currently available line simplification algorithms are rooted in
mathematics and geometry, and are unfit for bendy map features like rivers
and coastlines. {\WnM} observed how cartographers simplify these natural
features and created an algorithm. We implemented this algorithm and
documented it in great detail. Our implementation makes {\WM} algorithm
freely available in PostGIS, and this paper explains it.
\vfill
Šiuo metu esami linijų supaprastinimo algoritmai yra kilę iš matematikos ir
geometrijos, bet nėra tinkami lankstiems geografiniams objektams, tokiems
kaip upės ir pakrantės, atvaizduoti. {\WirM} ištyrė, kaip kartografai
atlieka upių generalizaciją, ir sukūrė algoritmą. Mes realizavome šį
algoritmą ir išsamiai jį dokumentavome. Mūsų {\WM} realizacija ir
dokumentacija yra nemokamos ir laisvai prieinamos, naudojant PostGIS
platformą.
\end{abstract}
\clearpage
\tableofcontents
\listoftables
\listoflistings
\newpage
\section{Introduction}
\label{sec:introduction}
\iffalse
NOTICE: this value should be copied to layer2img.py:TEXTWIDTH, so dimensions
of inline images are reasonable.
Textwidth in cm: {\printinunitsof{cm}\prntlen{\textwidth}}
\fi
When creating small-scale maps, often the detail of the data source is greater
than desired for the map. While many features can be removed or simplified, it
is more tricky with natural features that have many bends, like coastlines,
rivers, or forest boundaries.
To create a small-scale map from a large-scale data source, features need to be
simplified, i.e., detail should be reduced. While performing the
simplification, it is important to retain the "defining" shape of the original
feature. Otherwise, if the simplified feature looks too different from the
original, the result will look unrealistic. Simplification problem for some
objects can often be solved by non-geometric means:
\begin{itemize}
\item Towns and cities can be filtered by the number of inhabitants.
\item Roads can be eliminated by the road length, number of lanes, or
classification of the road (local, regional, international).
\end{itemize}
However, things are not as simple for natural features like rivers or
coastlines. If a river is nearly straight, it should remain such after
simplification. An overly straightened river will look like a canal, and the
other way around --- too curvy would not reflect the natural shape. Conversely,
if the river originally is highly wiggly, the number of bends should be
reduced, but not removed altogether. Natural line simplification problem can be
viewed as a task of finding a delicate balance between two competing goals:
\begin{itemize}
\item Reduce detail by removing or simplifying "less important" features.
\item Retain enough detail, so the original is still recognizable.
\end{itemize}
Given the discussed complexities with natural features, a fine line between
under-simplification (leaving an object as-is) and over-simplification (making a
straight line) needs to be found. Therein lies the complexity of simplification
algorithms: all have different trade-offs.
The purpose of the thesis is to implement a cartographic line generalization
algorithm on the basis of {\WM} algorithm, using open-source software. Tasks:
\begin{itemize}
\item Evaluate existing line simplification algorithms.
\item Identify main river generalization problems, using classical line
simplification algorithms.
\item Define the method of the {\WM} technical implementation.
\item Realize {\WM} algorithm technically, explaining the geometric
transformations in detail.
\item Apply the created algorithm for different datasets and compare
the results with national datasets.
\end{itemize}
Scientific relevance of this work --- the simplification processes (steps)
described by the {\WM} algorithm --- are analyzed in detail, practically
implemented, and the implementation is described. That expands the knowledge of
cartographic theory about the generalization of natural objects' boundaries
after their natural defining properties.
In the original {\WM} article introducing the algorithm, the steps are not
detailed in a way that can be put into practice for specific data; the steps are
specified in this work. Practically, this work makes it possible to use open-source software to perform cartographic line generalization. The developed
specialized cartographic line simplification algorithm can be applied by
cartographers to implement automatic data generalization solutions. Given the
open-source nature of this work, the algorithm implementation can be modified
freely.
\section{Literature Review And Problematic}
\label{sec:literature-review-problematic}
\subsection{Available Algorithms}
This section reviews the classical line simplification algorithms, which,
besides being around for a long time, offer easily accessible implementations,
as well as more modern ones, which only theorize, but do not provide an
implementation.
\subsubsection{{\DP}, {\VW} and Chaikin's}
\label{sec:dp-vwchaikin}
{\DP}\cite{douglas1973algorithms} and {\VW}\cite{visvalingam1993line} are
"classical" line simplification computer graphics algorithms. They are
relatively simple to implement and require few runtime resources. Both of them
accept a single parameter based on desired scale of the map, which makes them
straightforward to adjust for different scales.
Both algorithms are available in PostGIS, a free-software GIS suite:
\begin{itemize}
\item {\DP} via
\href{https://postgis.net/docs/ST_Simplify.html}{PostGIS \textsc{st\_simplify}}.
\item {\VW} via
\href{https://postgis.net/docs/ST_SimplifyVW.html}{PostGIS
\textsc{st\_simplifyvw}}.
\end{itemize}
It may be worthwhile to post-process those through Chaikin's line smoothing
algorithm\cite{chaikin1974algorithm} via
\href{https://postgis.net/docs/ST_ChaikinSmoothing.html}{PostGIS
\textsc{st\_chaikinsmoothing}}.
In generalization examples, we will use two rivers: Šalčia and Visinčia.
These rivers were chosen because they have both large and small bends, and
thus are convenient to analyze for both small- and large-scale generalization.
Figure~\onpage{fig:salvis-25} illustrates the original two rivers without any
simplification.
\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{salvis-25k}
\caption{Example rivers for visual tests (1:{\numprint{25000}}).}
\label{fig:salvis-25}
\end{figure}
\begin{figure}[ht]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-2x50k}
\caption{Example scaled 1:\numprint{50000}.}
\label{fig:salvis-2x50k}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\centering
\includegraphics[width=.2\textwidth]{salvis-250k-10x}
\caption{Example scaled 1:\numprint{250000}.}
\end{subfigure}
\caption{Down-scaled original river.}
\label{fig:salvis-50-250}
\end{figure}
Same rivers, unprocessed but in higher scales (1:\numprint{50000} and
1:\numprint{250000}), are depicted in Figure~\ref{fig:salvis-50-250}. Some
river features are so compact that a reasonably thin line depicting the river
is touching itself, creating a thicker line. We can assume that some
simplification for scale 1:\numprint{50000} and especially for
1:\numprint{250000} is worthwhile.
\begin{figure}[ht]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-dp64-2x50k}
\caption{Using {\DP}.}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-vw64-2x50k}
\caption{Using {\VW}.}
\end{subfigure}
\caption{Simplified using classical algorithms (1:\numprint{50000}).}
\label{fig:salvis-generalized-1x50k}
\end{figure}
Figure~\ref{fig:salvis-generalized-1x50k} illustrates the same river bend, but
simplified using {\DP} and {\VW} algorithms. The resulting lines are jagged,
and thus the resulting line looks unlike a real river. To smoothen the jaggedness,
traditionally, Chaikin's\cite{chaikin1974algorithm} is applied after
generalization, illustrated in Figure~\ref{fig:salvis-generalized-chaikin-1x50k}.
\begin{figure}[ht!]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-dpchaikin64-2x50k}
\caption{{\DP} and Chaikin's.}
\label{fig:salvis-dpchaikin64-2x50k}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-vwchaikin64-2x50k}
\caption{{\VW} and Chaikin's.}
\label{fig:salvis-vwchaikin64-2x50k}
\end{subfigure}
\caption{Simplified and smoothened river (1:\numprint{50000}).}
\label{fig:salvis-generalized-chaikin-1x50k}
\end{figure}
\begin{figure}[ht!]
\centering
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-overlaid-dpchaikin64-2x50k}
\caption{Original (fig.~\ref{fig:salvis-2x50k}) and simplified
(fig.~\ref{fig:salvis-dpchaikin64-2x50k}).}
\end{subfigure}
\hfill
\begin{subfigure}[b]{.49\textwidth}
\includegraphics[width=\textwidth]{salvis-overlaid-vwchaikin64-2x50k}
\caption{Original (fig.~\ref{fig:salvis-2x50k}) and simplified
(fig.~\ref{fig:salvis-vwchaikin64-2x50k}.)}
\end{subfigure}
\caption{Zoomed-in simplified and smoothened river and original.}
\label{fig:salvis-overlaid-generalized-chaikin-1x50k}
\end{figure}
\begin{figure}[b]
\centering
\includegraphics[width=\textwidth]{amalgamate1}
\caption{Narrow bends amalgamating into thick unintelligible blobs.}
\label{fig:pixel-amalgamation}
\end{figure}
The resulting simplified and smoothened example
(Figure~\onpage{fig:salvis-generalized-chaikin-1x50k}) yields a more
aesthetically pleasing result; however, it obscures natural river features.
Given the absence of rocks, the only natural features that influence the river
direction are topographic:
\begin{itemize}
\item Relatively straight river (completely straight or with small-angled
bends over a relatively long distance) implies greater slope, more
water, and/or faster flow.
\item Bendy river, on the contrary, implies slower flow, slighter slope,
and/or less water.
\end{itemize}
Both {\VW} and {\DP} have a tendency to remove the small bends altogether,
removing a valuable characterization of the river.
Sometimes low-water rivers in slender slopes have many bends next to each
other. In low resolutions (either in small-DPI screens or paper, or when the
river is sufficiently zoomed out, or both), the small bends will amalgamate to
a unintelligible blob. Figure~\ref{fig:pixel-amalgamation} illustrates a
real-world example where a bendy river, normally 1 or 2 pixels wide, creates a
wide area, of which the shapes of the bend become unintelligible. In this
example, classical algorithms would remove these bends altogether. A
cartographer would retain a few of those distinctive bends, but would increase
the distance between the bends, remove some of the bends, or both.
% TODO: figues shouldn't split the sentence.
For the reasons discussed in this section, the "classical" {\DP} and {\VW} are
not well-suited for natural river generalization, and a more robust line
generalization algorithm is worthwhile to look for.
\clearpage
\subsubsection{Modern Approaches}
Due to their simplicity and ubiquity, {\DP} and {\VW} have been established as
go-to algorithms for line generalization. During recent years, alternatives
have emerged. These modern replacements fall into roughly two categories:
\begin{itemize}
\item Cartographic knowledge was encoded to an algorithm (bottom-up
approach). One among these are \titlecite{wang1998line}, also known
as {\WM}'s algorithm.
\item Mathematical shape transformation which yields a more cartographic
result. E.g., \titlecite{jiang2003line},
\titlecite{dyken2009simultaneous}, \titlecite{mustafa2006dynamic},
\titlecite{nollenburg2008morphing}, \titlecite{devangleserrorbends}.
\end{itemize}
Authors of most of the aforementioned articles have implemented the
generalization algorithm, at least to generate the illustrations in the
articles. However, code is not available for evaluation with a desired dataset, much less for use as a basis for creating new maps. To the author's knowledge,
{\WM}\cite{wang1998line} is available in a commercial product, but requires a
purchase of the commercial product suite, without a way to license the
standalone algorithm.
{\WM} algorithm was created by encoding professional cartographers' knowledge
into a computer algorithm. It has a few main properties which make it
especially suitable for generalization of natural linear features:
\begin{figure}[h!]
\centering
\includegraphics[width=.8\textwidth]{wang125}
\caption{Figure 12.5 in \cite{wang1998line}: example of cartographic line
generalization.}
\label{fig:wang125}
\end{figure}
\begin{itemize}
\item Small bends are not always removed, but either combined (e.g.,
3 bends into 2), exaggerated, or removed, depending on the neighboring
bends.
\item Long and gentle bends are not straightened, but kept as-is.
\end{itemize}
As a result of these properties, {\WM} algorithm retains the defining
properties of the natural features: high-current rivers keep their appearance
as such, instead of becoming canals; low-stream bendy rivers retain their
frequent small bends.
Figure~\ref{fig:wang125}, sub-figure labeled "proposed method" (from the
original \titlecite{wang1998line}) illustrates the {\WM} algorithm.
\subsection{Problematic with Generalization of Rivers}
This section introduces the reader to simplification and generalization, and
discusses two main problems with current-day automatic cartographic line
generalization:
\begin{itemize}
\item Currently available line simplification algorithms were created
to simplify geometries, but do not encode cartographic knowledge.
\item Existing cartographic line generalization algorithms are not freely
accessible.
\end{itemize}
\subsubsection{Simplification versus Generalization}
It is important to note the distinction between simplification, line
generalization, and cartographic generalization.
Simplification reduces an object's detail in isolation, not taking the object's
natural properties or surrounding objects into account. For example, if a
river is simplified, it may have an approximate shape of the original river,
but lose some shapes that define it. For example:
\begin{itemize}
\item Low-water rivers in slender slopes have many small bends next to each
other. A non-cartographic line simplification may remove all of them,
thus losing an important river's characteristic feature: after such
simplification, it will be hard to tell that the original river was
low-water in a slender slope.
\item Low-angle river bend river over a long distance differs significantly
from a completely straight canal. Non-cartographic line simplification
may replace that bend with a straight line, making the river more
similar to a canal than a river.
\end{itemize}
In other words, simplification processes the line, ignoring its geographic
features. It works well when the features are human-made (e.g., roads,
administrative boundaries, buildings). There is a number of freely available
non-cartographic line simplification algorithms, which this paper will review.
Contrary to line simplification, cartographic generalization does not focus
into a single feature class (e.g., rivers), but the whole map. For example,
line simplification may change river bends in a way that bridges (and roads to
the bridges) become misplaced. While line simplification is limited to a single
feature class, cartographic generalization is not. Fully automatic cartographic
generalization is not yet a solved problem. % <TODO: Reference needed>.
Cartographic line generalization falls in between the two: it does more than
line simplification, and less than cartographic generalization. Cartographic
line generalization deals with a single feature class, takes into account its
geographic properties, but ignores other features. This paper examines {\WM}'s
\titlecite{wang1998line}, a cartographic line generalization algorithm.
\subsubsection{Availability of Generalization Algorithms}
Lack of robust openly available generalization algorithm implementations poses
a problem for map creation with free software: there is no high-quality
simplification algorithm to create down-scaled maps, so any cartographic work,
which uses line generalization as part of its processing, will be of sub-par
quality. We believe that the availability of high-quality open-source tools is an
important foundation for future cartographic experimentation and development,
thus it benefits the cartographic society as a whole.
{\WM}'s commercial availability signals something about the value of the
algorithm: at least the authors of the commercial software suite deemed it
worthwhile to include it. However, not everyone has access to the commercial
software suite, access to funds to buy the commercial suite, or access to the
operating system required to run the commercial suite. PostGIS, in contrast, is
free itself, and runs on free platforms. Therefore, algorithm
implementations that run on PostGIS or other free platforms are useful to a
wider cartographic society than proprietary ones.
\subsubsection{Unfitness of Line Simplification Algorithms}
Section~\ref{sec:dp-vwchaikin} illustrates the current gaps with line
simplification algorithms for real rivers. To sum up, we highlight the
following cartographic problems from our examples:
\begin{description}
\item[Long bends] should remain as long bends, instead of becoming fully
straight lines.
\item[Many small bends] should not be removed. To retain a river's character,
the algorithm should retain some small bends, and, when they are too
small to be visible, they should be combined or exaggerated.
\end{description}
We are limiting the problem to cartographic line generalization. That is, full
cartographic generalization, which takes topology and other feature classes
into account, is out of scope.
Figure~\onpage{fig:wang125} illustrates {\WM} algorithm from their original
paper. Note how the long bends retain curvy, and how some small bends get
exaggerated.
\section{Methodology}
\label{sec:methodology}
The original {\WM}'s algorithm \cite{wang1998line} leaves something to be
desired for a practical implementation: it is not straightforward to implement
the algorithm from the paper alone.
Explanations in this document are meant to expand, rather than substitute, the
original description in {\WM}. Therefore, familiarity with the original paper is
assumed, and, for some sections, having the original close-by is necessary to
meaningfully follow this document.
This paper describes {\WM} in detail that is more useful for anyone who wishes
to follow the algorithm implementation more closely: each section is expanded
with additional commentary, and illustrations added for non-obvious steps. Corner
cases are discussed, too.
\subsection{Main Geometry Elements Used by Algorithm}
\label{sec:vocab}
This section defines and explains the geometry elements that are used
throughout this paper and the implementation. Assume Euclidean geometry
throughout this document, unless noted otherwise.
\begin{description}
\item[\normalfont\textsc{vertex}] is a point on a plane, can be expressed
by a pair of $(x,y)$ coordinates.
\item[\normalfont\textsc{line segment}] or \textsc{segment} joins two
vertices by a straight line. A segment can be expressed by two
coordinate pairs: $(x_1, y_1)$ and $(x_2, y_2)$. Line segment and
segment are used interchangeably.
\item[\normalfont\textsc{line}] or \textsc{linestring} represents a single
linear feature. For example, a river or a coastline.
Geometrically, a line is a series of connected line segments, or,
equivalently, a series of connected vertices. Each vertex connects to
two other vertices, with the exception of the vertices at either ends of the line:
these two connect to a single other vertex.
\item[\normalfont\textsc{multiline}] or \textsc{multilinestring} is a
collection of linear features. Throughout this implementation, this is
used rarely (normally, a river is a single line) but can be valid
when, for example, a river has an island.
\item[\normalfont\textsc{bend}] is a subset of a line that humans perceive
as a curve. The geometric definition is complex and is discussed in
section~\ref{sec:definition-of-a-bend}.
\item[\normalfont\textsc{baseline}] is a line between the bend's first and last
vertices.
\item[\normalfont\textsc{sum of inner angles}] is a measure of how "curved"
the bend is. Assume that first and last bend vertices are vectors. Then sum
of inner angles will be the angular difference of those two vectors.
\item[\normalfont\textsc{algorithmic complexity}] measured in \textsc{big o
notation}, is a relative measure that helps explain how
long\footnote{the upper bound, i.e., the worst case.} the
algorithm will run depending on its input. It is widely used in computing
science when discussing the efficiency of a given algorithm.
For example, given $n$ objects and time complexity of $O(log(n))$, the
time it takes to execute the algorithm is logarithmic to $n$.
Conversely, if complexity is $O(n^2)$, then the time it takes to
execute the algorithm grows quadratically with input. Importantly, if
the input size doubles, the time it takes to run the algorithm
quadruples.
\textsc{big o notation} was first suggested by
Bachmann\cite{bachmann1894analytische} and Landau\cite{landau1911} in
late \textsc{xix} century, and clarified and popularized for computing
science by Donald Knuth\cite{knuth1976big} in the 1970s.
\end{description}
\clearpage
\subsection{Algorithm Implementation Process}
\label{sec:algorithm-implementation-process}
\tikzset{
startstop/.style={trapezium,text centered,minimum height=2em,
trapezium left angle=70,trapezium right angle=110,draw=black,fill=red!20},
proc/.style={rectangle,minimum height=2em,text centered,draw=black,
fill=orange!20},
decision/.style={diamond,minimum height=2em,text centered,aspect=3,
draw=black,fill=green!20},
arrow/.style={thick,->,>=stealth},
}
\begin{figure}[!ht]
\centering
\begin{tikzpicture}[node distance=1.5cm,auto]
\node (start) [startstop] {Read \textsc{linestring}};
\node (detect) [proc,below of=start] {Detect bends};
\node (inflections) [proc,below of=detect] {Fix gentle inflections};
\node (selfcrossing) [proc,below of=inflections] {Eliminate self-crossing};
\node (mutated1) [decision,below of=selfcrossing] {Mutated?};
\node (bendattrs) [proc,below of=mutated1] {Compute bend attributes};
\node (exaggeration) [proc,below of=bendattrs] {Exaggeration};
\node (mutated2) [decision,below of=exaggeration] {Mutated?};
\node (elimination) [proc,below of=mutated2] {Elimination};
\node (mutated3) [decision,below of=elimination] {Mutated?};
\node (stop) [startstop,below of=mutated3] {Stop};
\coordinate [right of=mutated1,node distance=5cm] (mutated1y) {};
\coordinate [right of=mutated2,node distance=5cm] (mutated2y) {};
\coordinate [right of=mutated3,node distance=5cm] (mutated3y) {};
\draw [arrow] (start) -- (detect);
\draw [arrow] (detect) -- (inflections);
\draw [arrow] (inflections) -- (selfcrossing);
\draw [arrow] (selfcrossing) -- (mutated1);
\draw [arrow] (mutated1) -| node [near start] {Yes} (mutated1y) |- (detect);
\draw [arrow] (mutated1) -- node[anchor=west] {No} (bendattrs);
\draw [arrow] (bendattrs) -- (exaggeration);
\draw [arrow] (exaggeration) -- (mutated2);
\draw [arrow] (mutated2) -| node [near start] {Yes} (mutated2y) |- (detect);
\draw [arrow] (mutated2) -- node[anchor=west] {No} (elimination);
\draw [arrow] (mutated3) -| node [near start] {Yes} (mutated3y) |- (detect);
\draw [arrow] (mutated3) -- node[anchor=west] {No} (stop);
\draw [arrow] (elimination) -- (mutated3);
\end{tikzpicture}
\caption{Flow chart of the implementation workflow.}
\label{fig:flow-chart}
\end{figure}
Figure~\ref{fig:flow-chart} visualizes the algorithm steps for each line.
\textsc{multilinestring} features are split to \textsc{linestring} features and
executed in order.
Judging from {\WM} prototype flow chart (depicted in figure 11 of the original
paper), their approach is iterative for the line: it will process the line in
sequence, doing all steps, before moving on to the next step. We will call this
approach "streaming", because it does not require to have the full line to
process it.
We have taken a different approach: process each step fully for the line,
before moving to the next step. This way provides the following advantages:
\begin{itemize}
\item For \textsc{eliminate self-crossing} stage, when it finds a bend with
the right sum of inflection angles, it checks the whole line for
self-crossings. This is impossible with streaming because it requires
having the full line in memory. It could be optimized by, for example,
looking for a fixed number of neighboring bends (say, 10), but that
would complicate the implementation.
\item \textsc{fix gentle inflections} is iterating the same line twice from
opposite directions. That could be re-written to streaming fashion, but
it complicates the implementation, too.
\end{itemize}
On the other hand, comparing to the {\WM} prototype flow chart, our
implementation uses more memory (because it needs to have the full line before
processing), and some steps are unnecessarily repeated, like re-computing the
bend's attributes during repeated iterations.
\subsection{Technical Implementation}
\label{sec:technical-implementation}
Technical algorithm realization was created in \titlecite{postgis311}. PostGIS
is a PostgreSQL extension for working with spatial data.
PostgreSQL is an open-source relational database, widely used in industry and
academia. PostgreSQL can be interfaced from nearly any programming language;
therefore, solutions written in PostgreSQL (and their extensions) are usable in
many environments. On top of that, PostGIS implements a rich set of
functions\cite{postgisref} for working with geometric and geographic objects.
Due to its wide applicability and rich library of spatial functions, PostGIS is
the implementation language of the {\WM} algorithm. The implementation exposes
the entrypoint function \textsc{st\_simplifywm}, in
listing~\ref{lst:st-simplifywm}.
\begin{listing}
\begin{minted}[fontsize=\small]{sql}
create function ST_SimplifyWM(
geom geometry,
dhalfcircle float,
intersect_patience integer default 10,
dbgname text default null
) returns geometry
\end{minted}
\caption{Function \textsc{st\_simplifywm}.}
\label{lst:st-simplifywm}
\end{listing}
This function accepts the following parameters:
\begin{description}
\item[\normalfont\textsc{geom}] is the input geometry. Either
\textsc{linestring} or \textsc{multilinestring}.
\item[\normalfont\textsc{dhalfcircle}] is the diameter of the half-circle.
Explained in section~\ref{sec:bend-scaling-and-dimensions}.
\item[\normalfont\textsc{intersect\_patience}] is an optional parameter to
exaggeration operator, explained in
section~\ref{sec:exaggeration-operator}.
\item[\normalfont\textsc{dbgname}] is an optional human-readable name of
the figure. Explained in section~\ref{sec:debugging}.
\end{description}
The function \textsc{st\_simplifywm} calls into helper functions, which detect,
transform, or remove bends. These helper functions are also defined in the
implementation and are part of the algorithm technical realization. All
supporting functions use spatial manipulation functions provided by PostGIS.
\subsection{Automated Tests}
\label{sec:automated-tests}
As part of the algorithm realization, an automated test suite has been
developed. Shapes to test each function have been hand-crafted, and expected
results have been manually calculated. The test suite executes parts of the
algorithm against a predefined set of geometries, and asserts that the output
matches the resulting hand-calculated geometries.
The full set of test geometries is visualized in Figure~\ref{fig:test-figures}.
\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{test-figures}
\caption{Geometries for automated test cases.}
\label{fig:test-figures}
\end{figure}
Test suite can be executed with a single command and completes in about a
second. Having an easily accessible test suite boosts confidence that no
unexpected bugs have snug in while modifying the algorithm.
We will explain two instances when automated tests were very useful during
the implementation:
\begin{itemize}
\item Created a function \textsc{wm\_exaggeration}, which exaggerates bends
following the rules. It worked well over simple geometries but, due to
a subtle bug, created a self-crossing bend in Visinčia. The offending
bend was copied to the automated test suite, which helped fix the bug.
Now the test suite contains the same bend (a hook-like bend on the
right-hand side of Figure~\ref{fig:test-figures}) and code to verify
that it was correctly exaggerated.
\item During algorithm development, automated tests run about once a
minute. They quickly find logical and syntax errors. In contrast,
running the algorithm with real rivers takes a few minutes, which
increases the feedback loop, and takes longer to fix the "simple"
errors.
\end{itemize}
Whenever we find and fix a bug, we aim to create an automated test case for it,
so the same bug is not re-introduced by whoever works next on the same piece of
code.
Besides testing for specific cases, an automated test suite ensures future
stability and longevity of the implementation itself: when new contributors
start changing code, they have higher assurance they have not broken
an already-working code.
\subsection{Reproducibility}
\label{sec:reproducing-the-paper}
It is widely believed that the ability to reproduce the results of a published
study is important to the scientific community. In practice, however, it is
often hard or impossible: research methodologies, as well as algorithms
themselves, are explained in prose, which, due to the nature of the non-machine
language, lends itself to inexact interpretations.
This article, besides explaining the algorithm in prose, includes the program
of the algorithm in a way that can be executed on reader's workstation. On top
of it, all the illustrations in this paper are generated using that algorithm
from a predefined list of test geometries (see
section~\ref{sec:automated-tests}).
This article and accompanying code are accessible on GitHub as of 2021-05-19
\cite{wmsql}.
Instructions how to re-generate all the visualizations are in
appendix~\ref{sec:code-regenerate}. The visualization code serves as a good
example reference for anyone willing to start using the algorithm.
\section{Algorithm Implementation}
As alluded in section~\ref{sec:introduction}, {\WM} paper skims over
certain details which are important to implement the algorithm. This section
goes through each algorithm stage, illustrating the intermediate steps and
explaining the author's desiderata for a more detailed description.
Illustrations of the following sections are extracted from the automated test
cases which were written during the algorithm implementation (as discussed in
section~\ref{sec:automated-tests}).
\subsection{Debugging}
\label{sec:debugging}
This implementation includes debugging facilities in a form of a table
\textsc{wm\_debug}. The table's schema is written in
listing~\ref{lst:wm-debug-sql}.
When debug mode is active, implementation steps will store their results, which
can be useful to manually inspect the results of intermediate actions. Besides
manual inspection, most of the figure illustrations in this article are
visualized from the \textsc{wm\_debug} table. Debugging mode can be activated
by passing a non-empty \textsc{dbgname} string to the function
\textsc{st\_simplifywm} (this function was described in
section~\ref{sec:technical-implementation}). By convention, \textsc{dbgname} is
the name of the geometry that is being simplified, e.g., \textsc{šalčia}. The
purpose of each column in \textsc{wm\_debug} is described below:
\begin{description}
\item[\normalfont\textsc{id}] is a unique identifier for each feature.
Generated automatically by PostgreSQL. Useful when it is necessary to
copy one or more features to a separate table for unit tests, as
described in section~\ref{sec:automated-tests}.
\item[\normalfont\textsc{stage}] is the stage of the algorithm. As of
writing, there are a few:
\begin{description}
\item[\normalfont\textsc{afigures}] at the beginning of the loop.
\item[\normalfont\textsc{bbends}] after bends are detected.
\item[\normalfont\textsc{cinflections}] after gentle inflections
are fixed.
\item[\normalfont\textsc{dcrossings}] after self-crossings are
eliminated.
\item[\normalfont\textsc{ebendattrs}] after bend attributes are
calculated.
\item[\normalfont\textsc{gexaggeration}] after bends have been
exaggerated.
\item[\normalfont\textsc{helimination}] after bends have been
eliminated.
\end{description}
Some of these have sub-stages which are encoded by a dash and a
sub-stage name, e.g., \textsc{bbends-polygon} creates polygon
geometries after polygons have been detected; this particular example
is used to generate colored polygons in
Figure~\ref{fig:fig8-definition-of-a-bend}.
\item[\normalfont\textsc{name}] is the name of the geometry, which comes from
parameter~\textsc{dbgname}.
\item[\normalfont\textsc{gen}] is the top-level iteration number. In other
words, the number of times the execution flow passes through
\textsc{detect bends} phase as depicted in
Figure~\onpage{fig:flow-chart}.
\item[\normalfont\textsc{nbend}] is the bend's index in its \textsc{line}.
\item[\normalfont\textsc{way}] is the geometry column.
\item[\normalfont\textsc{props}] is a free-form JSON object to store
miscellaneous values. For example, \textsc{ebendattrs} phase stores a
boolean property \textsc{isolated}, which signifies whether the bend is
isolated or not (explained in section~\ref{sec:isolated-bend}).
\end{description}
When debug mode is turned off (that is, \textsc{dbgname} is left unspecified),
\textsc{wm\_debug} is empty and the algorithm runs slightly faster.
\begin{listing}[h!]
\begin{minted}[fontsize=\small]{sql}
drop table if exists wm_debug;
create table wm_debug(
id serial,
stage text not null,
name text not null,
gen bigint not null,
nbend bigint,
way geometry,
props jsonb
);
\end{minted}
\caption{\textsc{wm\_debug} table definition}
\label{lst:wm-debug-sql}
\end{listing}
\subsection{Merging Pieces of a River into One}
Example river geometries were sourced from OpenStreetMap\cite{openstreetmap}
and NŽT\cite{nzt}. Rivers in both data sources are stored in shorter line
segments, and multiple segments (usually hundreds or thousands for significant
rivers) define one full river. While it is convenient to store and edit, these
segments are not explicitly related to each other. This poses a problem for
simplification algorithms which manipulate on full linear features at a time:
full river geometries, but not their parts.
Since these rivers do not have an explicit relationship to connect them
together, they were connected using heuristics: if two line segments share a
name and are within 500 meters from each other, then they form a single river.
For all line simplification algorithms, all rivers need to be combined and
this way proved to be reasonably effective. Source code for this operation can
be found in listing~\onpage{lst:aggregate-rivers.sql}.
\subsection{Bend Scaling And Dimensions}
\label{sec:bend-scaling-and-dimensions}
{\WM} accepts a single input parameter: the diameter of a half-circle. If the
bend's adjusted size (explained in detail in section~\ref{sec:shape-of-a-bend})
is greater than the area of the half-circle, then the bend will be left
untouched. If the bend's adjusted size is smaller than the area of the provided
half-circle, the bend will be simplified: either exaggerated, combined, or
eliminated.
The extent of line simplification, as well as the half-circle's diameter,
depends on the desired target scale. Simplification should be more aggressive
for smaller target scales and less aggressive for larger scales. This section
goes through the process of finding the correct variable to {\WM} algorithm.
What is the minimal, but still eligible, figure that should be displayed on
the map?
According to \titlecite{cartoucheMinimalDimensions}, the map is typically held
at a distance of 30 cm. Recommended minimum symbol size, given viewing distance
of 45 cm (1.5 feet), is 1.5 mm, as analyzed in \titlecite{mappingunits}.
In our case, our target is line bend, rather than a symbol. Assume 1.5 mm is a
diameter of the bend. A semi-circle of 1.5 mm diameter is depicted in
Figure~\ref{fig:half-circle}. A bend of this size or larger, when adjusted to
scale, will not be simplified.
\begin{figure}[ht]
\centering
\begin{tikzpicture}[x=1mm,y=1mm]
\draw[] (-10, 0) -- (-.75,0) arc (225:-45:.75) -- (10, 0);
\end{tikzpicture}
\caption{Smallest feature that will be not simplified (to scale).}
\label{fig:half-circle}
\end{figure}
\begin{table}[h!]
\centering
\begin{tabular}{ c D{.}{.}{1} }
Scale & \multicolumn{1}{c}{$D(m)$} \\ \hline
1:\numprint{10000} & 15 \\
1:\numprint{15000} & 22.5 \\
1:\numprint{25000} & 37.5 \\
1:\numprint{50000} & 75 \\
1:\numprint{250000} & 220 \\
\end{tabular}
\caption{{\WM} half-circle diameter $D$ for popular scales.}
\label{table:scale-halfcirlce-diameter}
\end{table}
{\WM} algorithm does not have a notion of scale, but it does have a notion of
distance: it accepts a single parameter $D$, the half-circle's diameter.
Assuming measurement units in projected coordinate system are meters (for
example, \titlecite{epsg3857}), some popular scales are highlighted in
table~\ref{table:scale-halfcirlce-diameter}.
\subsection{Definition of a Bend}
\label{sec:definition-of-a-bend}
The original article describes a bend as follows:
\begin{displaycquote}{wang1998line}
A bend can be defined as that part of a line which contains a number of
subsequent vertices, with the inflection angles on all vertices included in
the bend being either positive or negative and the inflection of the bend's