-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathdocumentation.xml
3529 lines (3470 loc) · 231 KB
/
documentation.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<Documentation>
<DataInput />
<Module>
<Title>FieldData</Title>
<Description>The FieldData module allows a user to add presence/absence points or count data recorded across a landscape for the phenomenon being modeled (e.g., plant sightings, evidence of animal presence, etc.). The input data for this module must be in the form of a .csv file that follows one of two formats:
Format 1:
A .csv file with the following column headings, in order: "X," "Y," and "responseBinary". In this case, the "X" field should be populated with the horizontal (longitudinal) positional data for a sample point. The "Y" field should be populated with the vertical (latitudinal) data for a sample point. These values must be in the same coordinate system/units as the template layer used in the workflow. The column "responseBinary" should be populated with either a '0' (indicating absence at the point) or a '1' (indicating presence at the point).
Format 2:
A .csv file with the following column headings, in order: "X," "Y," and "responseCount". In this case, the "X" field should be populated with the horizontal (longitudinal) positional data for a sample point. The "Y" field should be populated with the vertical (latitudinal) data for a sample point. These values must be in the same coordinate system/units as the template layer used in the workflow. The column "responseCount" should be populated with either a '-9999' (indicating that the point is a background point) or a numerical value (either '0' or a positive integer) indicating the number of incidences of the phenomenon recorded at that point. </Description>
<OutputPorts>
<Port>
<PortName>value</PortName>
<Definition>This is the actual file object that is being passed to other modules in the workflow.</Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The 'fieldData_file' input port of the FieldDataQuery Module if the field data needs subsetting or aggregation.</Connection>
<Connection>The 'fieldData' input port of the FieldDataAggregateAndWeight Module if the field data needs to be aggregated or weighted to match the spatial resolution of the template layer.</Connection>
<Connection>The 'fieldData' input port of the MDS builder Module if the field data needs no further pre-processing prior to modeling.</Connection>
</Connections>
</Port>
<Port>
<PortName>value_as_string</PortName>
<Definition>This is a VisTrails port that is not used in general SAHM workflows.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>This does not commonly connect to other SAHM modules.</Connection>
</Connections>
</Port>
</OutputPorts>
<InputPorts />
</Module>
<Module>
<Title>Predictor</Title>
<Description>The Predictor module allows a user to select a single raster layer for consideration in the modeled analysis. Besides selecting the file the user also specifies the parameters to use for resampling, aggregation, and whether the data is categorical.</Description>
<InputPorts>
<Port>
<PortName>categorical</PortName>
<Definition>This parameter allows a user to indicate the type of data represented. The distinction between continuous and categorical data will maintained through a workflow by appending the word '_categorical' to categorical layer names in the resulting MDS file. It is also import to select the nearest neighbor resampling option for categorical layers.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>NA<Option>True (Checked) - The data contained in the raster layer is categorical (e.g., landcover categories).</Option><Option>False(Unchecked) - The data contained in the raster is continuous (e.g., a DEM layer).</Option></Options>
<Connections>Does not generally Connect to any other Module.</Connections>
</Port>
<Port>
<PortName>ResampleMethod</PortName>
<Definition>The resample method employed to interpolate new cell values when transforming the raster layer to the coordinate space or cell size of the template layer. </Definition>
<Mandatory>TRUE</Mandatory>
<Default>NA</Default>
<Options>
<Option>near: nearest neighbor resampling Fastest algorithm, worst interpolation quality, but best choice for categorical data. </Option>
<Option>bilinear: bilinear resampling, good choice for continuous data.</Option>
<Option>cubic: cubic resampling.</Option>
<Option>cubicspline: cubic spline resampling.</Option>
<Option>lanczos: Lanczos windowed sinc resampling.</Option>
<Option>see: http://www.gdal.org/gdalwarp.html for context</Option>
</Options>
<Connections>Does not generally Connect to any other Module.</Connections>
</Port>
<Port>
<PortName>AggregationMethod</PortName>
<Definition>The aggregation method to be used in the event that the raster layer must be up-scaled to match the template layer (e.g., generalizing a 10 m input layer to a 100 m output layer). Care should be taken to ensure that the aggregation method that best preserves the integrity of the data is used. See the PARC module documentation for more information on how resampling and aggregation are performed.</Definition>
<Mandatory>TRUE</Mandatory>
<Default>NA</Default>
<Options>
<Option>Mean: Average value of all constituent pixels used.</Option>
<Option>Max: Maximum value of all constituent pixels used.</Option>
<Option>Min: Minimum value of all constituent pixels used.</Option>
<Option>Majority: The value occurring most frequently in constituent pixels used.</Option>
<Option>None: No Aggregation used.</Option>
</Options>
<Connections>Does not generally Connect to any other Module.</Connections>
</Port>
<Port>
<PortName>file</PortName>
<Definition>The location of the raster file. A user can navigate to the location on their file system. When a user is selecting an ESRI grid raster, the user should navigate to the 'hdr.adf' file contained within the grid folder</Definition>
<Mandatory>TRUE</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not generally Connect to any other Module.</Connections>
</Port>
</InputPorts>
<OutputPorts>
<Port>
<PortName>value</PortName>
<Definition />
<Mandatory>TRUE</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The output from this port only connects to the PARC input port 'predictor'.</Connection>
</Connections>
</Port>
<Port>
<PortName>value_as_string</PortName>
<Definition>This is a VisTrails port that is not used in general SAHM workflows.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA<Connection>Does not generally connect to other SAHM modules.</Connection></Connections>
</Port>
</OutputPorts>
<References></References>
<SeeAlso />
</Module>
<Module>
<Title>TemplateLayer</Title>
<Description>The second fundamental input in an analysis is the template layer. It is used to define the extent and resolution that will be used in all subsequent analysis. The TemplateLayer is a raster data layer with a defined coordinate system, a known cell size, and an extent that defines the study area. The data type and values in this raster are not important. All additional raster layers used in the analysis will be resampled and reprojected as needed to match the template, snapped to the template, and clipped to have an extent that matches the template. Users should ensure that additional covariates considered in the analysis have complete coverage of the template layer used.</Description>
<InputPorts />
<OutputPorts>
<Port>
<PortName>value</PortName>
<Definition>This is the actual file object that is being passed to other modules in the workflow.</Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The 'TemplateLayer' input port of the FieldDataAggregationAndWeight Module.</Connection>
<Connection>The 'TemplateLayer' input port of the PARC Module.</Connection>
</Connections>
</Port>
<Port>
<PortName>value_as_string</PortName>
<Definition>This is a VisTrails port that is not used in general SAHM workflows.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>This does not commonly connect to other SAHM modules.</Connection>
</Connections>
</Port>
</OutputPorts>
</Module>
<Module>
<Title>PredictorListFile</Title>
<Description>The PredictorListFile module allows a user to load a .csv file containing a list of rasters for consideration in the modeled analysis. The .csv file should contain a header row and four columns containing the following information, in order, for each raster input.
Column 1: The full file path to the input raster layer.
Column 2: A binary value indicating whether the input layer is categorical or not. A value of "0" indicates that an input raster is non-categorical data (continuous), while a value of "1" indicates that an input raster is categorical data.
Column 3: The resampling method employed to interpolate new cell values when transforming the raster layer to the coordinate space or cell size of the template layer, if necessary. The resampling type should be specified using one of the following values: "nearestneighbor," "bilinear," "cubic," or "lanczos."
Column 4: The aggregation method to be used in the event that the raster layer must be up-scaled to match the template layer (e.g., generalizing a 10 m input layer to a 100 m output layer). Care should be taken to ensure that the aggregation method that best preserves the integrity of the data is used. The aggregation should be specified using one of the following values: "Min," "Mean," "Max," "Majority," or "None."
In formatting the list of predictor files, the titles assigned to each of the columns are unimportant as the module retrieves the information based on the order of the values in the .csv file (the ordering of the information and the permissible values in the file however, are strictly enforced). The module also anticipates a header row and will ignore the first row in the .csv file.
</Description>
<InputPorts>
<Port>
<PortName>csvFileList</PortName>
<Definition>This is the CSV file on the file system. While not strictly mandatory this port will almost always have an input.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>predictor</PortName>
<Definition>Allows a user to add individual Predictor modules to a PredictorListFile</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The output port 'value' of a Predictor module.</Connection>
</Connections>
</Port>
</InputPorts>
<OutputPorts>
<Port>
<PortName>RastersWithPARCInfoCSV</PortName>
<Definition>This port generally connects to the input port 'RastersWithPARCInfoCSV' on the PARC module.</Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
</OutputPorts>
<References></References>
<SeeAlso />
</Module>
<Models />
<Module>
<Title>BoostedRegressionTree</Title>
<Description>BRT uses decision trees to partition the the parameter space into the most homogeneous groups in terms of the response. BRT starts with a single decision tree, then adds a tree that best explains error in the first tree, and so on. Like random forest, BRT models automatically model interactions and nonlinear relationships and are robust to missing observations. Our implementation makes approximately 1,000 trees. It incorporates advanced algorithms for tuning the model settings, simplifying the model using a cross-validation technique, and for detecting important interactions between covariates. If more than 500 presence or absence records are found a random subset will be used for learning rate estimation and model simplification but all data will be used in the final model fitting step. The cross-validation step within BRT should not be confused with that provided by the Model Selection cross-validation step. The former is used to optimize parameter values when defaults are not provided while the later is used to select models based on between model comparisons of evaluation metrics. All discussion of cross-validation related to setting parameters in the BRT argument documentation refers to the algorithm used for parameter optimization and does not affect the cross-validation split selected by Model Selection and cross-validation.
Several options are available for fitting BRTs when run using VisTrails special attention is required before moving away from the defaults because selection of certain parameters will disallow selection of others. Optional parameters are described briefly here but a more in depth description can be found in Elith and Leathwich 2008.</Description>
<InputPorts>
<Port>
<PortName>mdsFile</PortName>
<Definition>The the input data set consisting of locational data for each sample point, the values of each predictor variable at those points This input file is almost always generated by the upstream steps. </Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The mdsFile can be produced by any of MDSBuilder, ModelEvaluationSplit, ModelSelectionCrossValidation, ModelSelectionSplit, or CorariateCorrelationAndSelection. </Connection>
</Connections>
</Port>
<Port>
<PortName>makeBinMap</PortName>
<Definition>Indicate whether to discretize the continues prediction map into presence absence. See the ThresholdOptimizationMethod for how this is done. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeProbabilityMap</PortName>
<Definition>Indicate whether a map of predicted values is to be produced for the model fit.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeMESMap</PortName>
<Definition>Indicate whether to produce a multivariate environmental similarity surface (MESS) and a map of which factor is limiting at each point see Elith et. al. 2010 for more details. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well. </Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ThresholdOptimizationMethod</PortName>
<Definition>Determines how the threshold is optimized in order to discretize continuous predictions into binary. These are used for evaluation metrics calculated based on the confusion matrix as well as for the binary map. The value calculated for the train portion of the data will be applied to the test portion and if cross-validation was specified, the value is calculated separately for each fold using the threshold from the training data and applying it to the test data for the hold out fold. These options come from the R package PresenceAbsence and more details can be found in the associated manual see Freeman 2007. </Definition>
<Mandatory>False</Mandatory>
<Default>2</Default>
<Options>
<Option>1: Threshold=0.5</Option>
<Option>2: Sensitivity=Specificity</Option>
<Option>3: Maximizes (sensitivity+specificity)/2</Option>
<Option>4: Maximizes Cohen's Kappa</Option>
<Option>5: Maximizes PCC (percent correctly classified)</Option>
<Option>6: Predicted prevalence=observed prevalence</Option>
<Option>7: Threshold=observed prevalence</Option>
<Option>8: Mean predicted probability</Option>
<Option>9: Minimizes distance between ROC plot and (0,1)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Seed</PortName>
<Definition>The random number seed used by BRT. There is a default seed specified in the SAHM configuration. If you want to use a different value it can be entered here.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>Randomly Generated</Default>
<Options>
<Option>Any integer between -2147483647 and 2147483647</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>TreeComplexity</PortName>
<Definition>Sets the level of interactions fitted in the model. A tree complexity of 1 fits no interactions, 2 will fit up to but not necessarily all two way interactions and so on. </Definition>
<Mandatory>FALSE</Mandatory>
<Default>If not set, tree complexity will be selected based on the number of observations and what produces the best model</Default>
<Options>
<Option>any positive integer (generally not greater than 3)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>BagFraction</PortName>
<Definition>Controls the proportion of the data that is used to fit the model at each step. Using a bag fraction of 1 will give a fully deterministic model but this is usually not preferable as stochasticity generally improves model performance (Elith and Leathwick 2008).</Definition>
<Mandatory>FALSE</Mandatory>
<Default>.75</Default>
<Options>
<Option>Any positive number greater than 0 and less than or equal to 1</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>NumberOfFolds</PortName>
<Definition>If cross-validation is used for model simplification, this sets the number of folds used for cross-validation.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>3</Default>
<Options>
<Option>A positive integer (generally between 2 and 10) </Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Alpha</PortName>
<Definition>Controls when the algorithm stops in the model simplification step. The change in deviance is calculated between the previous and current iteration in model simplification and if the average change in deviance per observation is less than the standard error of the original deviance multiplied by alpha then the simplification step is accepted as long as we have not reached the maximum number of drops allowed. </Definition>
<Mandatory>FALSE</Mandatory>
<Default>1</Default>
<Options>
<Option>Any positive floating point value is valid</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>PrevalenceStratify</PortName>
<Definition>This specifies whether cross-validation samples should be stratified to match the overall prevalence. This is currently only valid for presence absence data and is only used in model simplification.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>True (Checked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ToleranceMethod</PortName>
<Definition>Method used in determining when to stop model simplification.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>"auto"</Default>
<Options>
<Option>Either "auto" or "fixed"</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Tolerance</PortName>
<Definition>Can be set to control the stopping rule in model simplification. If ToleranceMethod is set to “auto” this value will be multiplied by the mean total deviance of the null model. Change in deviance is compared to the tolerance to determine when to stop model simplification.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>.001</Default>
<Options>
<Option>Any positive floating point value is valid</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>LearningRate</PortName>
<Definition>Controls the amount each tree contributes to the model. A small learning rate restricts individual tree contributions to the overall model. </Definition>
<Mandatory>FALSE</Mandatory>
<Default>If not specified, learning rate will be determined based on the number of trees and the tree complexity</Default>
<Options>
<Option>Any positive number greater than 0 and less than 1</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MaximumTrees</PortName>
<Definition>The absolute upper limit on the total number of tress to fit. Setting this below 5000 will result in an error.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>10,000</Default>
<Options>
<Option>Any positive integer greater than 5,000</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>SelectBestPredSubset</PortName>
<Definition>Boolean if true then model selection will occur and the predictors that don't contribute significantly will be dropped from the final model. If untrue then all predictors selected at the covariate correlation filter will be used to create the final model. </Definition>
<Mandatory>False</Mandatory>
<Default>False</Default>
<Options>True (Checked)</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>run_name_info</PortName>
<Definition>Used to specify a meaningful tag and subfolder for output file naming/organization. See documentation for OutputName module for more information.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>None</Options>
<Connections>Connects to an OutputName module</Connections>
</Port>
</InputPorts>
<OutputPorts>
<Port>
<PortName>modelWorkspace</PortName>
<Definition>The R workspace where all internal details regarding the fitted model are stored. This is used by the Select and Test the Final Model module.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>'modelWorkspace' port of SAHMModelOutputViewerCell for viewing the aspatial model output.</Connection>
<Connection>'modelWorkspace' port of SAHMSpatialOutpuViewerCell for viewing the spatial model output in a mini GIS.</Connection>
</Connections>
</Port>
<Port>
<PortName>BinaryMap</PortName>
<Definition>If specified using MakeBinaryMap=True then a surface of binary predictions is produced by discretizing the prediction map based on the selected threshold. This map indicates whether one could expect each site to be occupied or unoccupied based on the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ProbabilityMap</PortName>
<Definition>If specified using MakeProbabilityMap=True then a surface of predicted values is produced based on the tiffs in the input .mds file and the fitted model. These can but do not always indicate the probability of finding the species at a given site. For example if model calibration is poor then these will not agree well with the true probabilities though discrimination between presence and absences might still be good.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResidualsMap</PortName>
<Definition>Model residual plots show the spatial relationship between the model deviance residuals. Most models assume residuals will be independent thus spatial pattern in the deviance residuals can be indicative of a problem with the model fit and inference based on the fit. It can for example indicate that important predictors were not included in the model and can be compared with the spatial pattern of predictors that were not included in the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MessMap</PortName>
<Definition>If specified by selecting makeMESMap=True the the MESS and MoD surfaces will be produced. The MESS surface is the multivariate environment similarity surface and shows how well each point fits into the univariate ranges of the points for which the model was fit. Negative values in this map indicate that the point is out of the range of the training data.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MoDMap</PortName>
<Definition>If specified by selecting makeMESMap=TRUE the the MESS and MoD surfaces will be produced. The MoD map is related to the MESS map and indicates which variable was furthest from the range over which the model was fit for each spatial location. See Elith et. al. 2010 for details on how the MESS map calculations are performed.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>modelEvalPlot</PortName>
<Definition>For binary data this will be a Receiver operating characteristic curve. Which shows the relationship between sensitivity and specificity as the threshold for discretizing continuous predictions into presence absence is varied. The threshold selected using the specified ThresholdOptimizationMethod is shown. If a model selection test training split was specified the ROC curve for this will be shown in red and if a cross-validation split was specified ROC curves for each cross-validation fold will be overlaied with box plots summarizing cross-validation results. For count data this display will show several standard plots for assessment of model residuals. </Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResponseCurves</PortName>
<Definition>Model response curves show the relationship between each predictor included in the model, while holding all other predictors constant at their means, and the fitted values. MARS response curves are shown on a logit scale thus the response axis will not necessarily be bounded on the 0 to 1 interval. BRT response curves will show response surfaces for any interaction terms included in the final model along with the percent relative influence.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Text_Output</PortName>
<Definition>This file contains a summary of the model fit. The information contained here includes the number of presence observations (counts equal to or greater than 1 for count models), the number of absence points, the number of covariates that were considered by the model selection algorithm. Note all of these can differ from the numbers in the original .mds due to incomplete records being deleted, and predictors with only one unique value being removed. Evaluation Statistics are reported for the data used to fit the model as well as for the test or cross-validation split if applicable. References for how to interpret most of these are ubiquitous in the literature but it is worth mentioning that interpretation of the calibration statistics is described by Pearce and Ferrier 2000 as well as Miller and Hui 1991. Most metrics reported here can also be found in related graphical displays.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>modelCalibrationPlot</PortName>
<Definition>The calibration plot shows the predicted probability of occurrence plotted against the actual proportions of occurrence for each of 5 bins along the probability axis. A logistic regression model is fit to the logits of the predicted probabilities of occurrence and is shown on the plot. These plots are used to determine how reliably a model will predict if a site is occupied or unoccupied (Pearce and Ferrier 2000)</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
</OutputPorts>
<SeeAlso />
<References>
<Reference>Bivand, R.S., Pebesma, E.J., and Gomez-Rubio, V. (2008). Applied Spatial Data Analysis with R. Springer New York, NY. </Reference>
<Reference>Dormann, C.F., McPherson, J.M., Araujo, M.B., Bivand, R., Bolliger, J., et al. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30:609–28. </Reference>
<Reference>Elith, J., Kearney, M., Phillips, S. (2010). The art of modeling range-shifting species. Methods Ecol Evol 1:330–342</Reference>
<Reference>Elith, J., Leathwick, J.R. and Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77, 802–813. </Reference>
<Reference>Miller, M.E., Hui, S.L., Tierney, W.M. (1991). Validation techniques for logistic regression models. Statistics in Medicine 10: 1213-26</Reference>
<Reference>Pearce, J., and S. Ferrier. (2000). Evaluating the predictive performance of habitat models developed using logistic regression. Ecological Modelling 133:225–245.</Reference>
<Reference>R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. </Reference>
<Reference>Freeman, E. (2007). PresenceAbsence: An R Package for Presence-Absence Model Evaluation. USDA Forest Service, Rocky Mountain Research Station, 507 25th street,Ogden, UT, USA
</Reference>
</References>
</Module>
<Module>
<Title>RandomForest</Title>
<Description />
<InputPorts>
<Port>
<PortName>mdsFile</PortName>
<Definition>The the input data set consisting of locational data for each sample point, the values of each predictor variable at those points This input file is almost always generated by the upstream steps. </Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The mdsFile can be produced by any of MDSBuilder, ModelEvaluationSplit, ModelSelectionCrossValidation, ModelSelectionSplit, or CovariateCorrelationAndSelection. </Connection>
</Connections>
</Port>
<Port>
<PortName>makeBinMap</PortName>
<Definition>Indicate whether to discretize the continues prediction map into presence absence. See the ThresholdOptimizationMethod for how this is done. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeProbabilityMap</PortName>
<Definition>Indicate whether a map of predicted values is to be produced for the model fit.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeMESMap</PortName>
<Definition>Indicate whether to produce a multivariate environmental similarity surface (MESS) and a map of which factor is limiting at each point see Elith et. al. 2010 for more details. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well. </Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Seed</PortName>
<Definition>The random number seed used by BRT. . There is a default seed specified in the SAHM configuration. If you want to use a different value it can be entered here.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>Randomly Generated</Default>
<Options>
<Option>Any integer between -2147483647 and 2147483647</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ThresholdOptimizationMethod</PortName>
<Definition>Determines how the threshold is optimized in order to discretize continuous predictions into binary. These are used for evaluation metrics calculated based on the confusion matrix as well as for the binary map. The value calculated for the train portion of the data will be applied to the test portion and if cross-validation was specified, the value is calculated separately for each fold using the threshold from the training data and applying it to the test data for the hold out fold. These options come from the R package PresenceAbsence and more details can be found in the associated manual see Freeman 2007. </Definition>
<Mandatory>False</Mandatory>
<Default>2</Default>
<Options>
<Option>1: Threshold=0.5</Option>
<Option>2: Sensitivity=Specificity</Option>
<Option>3: Maximizes (sensitivity+specificity)/2</Option>
<Option>4: Maximizes Cohen's Kappa</Option>
<Option>5: Maximizes PCC (percent correctly classified)</Option>
<Option>6: Predicted prevalence=observed prevalence</Option>
<Option>7: Threshold=observed prevalence</Option>
<Option>8: Mean predicted probability</Option>
<Option>9: Minimizes distance between ROC plot and (0,1)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>mTry</PortName>
<Definition>By default this is optimized using the tuneRF function so that OOB error is minimized. See the CRAN website for more details.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>this is optimized using the tuneRF function so that out of bag error is minimized.</Default>
<Options>
<Option>A number between 1 and the total number of valid parameters used in model fitting </Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>nTrees</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>See randomForest documentation for valid input</Options>
<Option />
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>nodesize</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>replace</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>maxnodes</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>False</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module</Connections>
</Port>
<Port>
<PortName>importance</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>localImp</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>proximity</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>oobProx</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>normVotes</PortName>
<Definition>See the randomForest documentation on the CRAN website for details http://cran.r-project.org/web/packages/randomForest/index.html.</Definition>
<Mandatory>FALSE</Mandatory>
<Default>randomForest function default</Default>
<Options>
<Option>See randomForest documentation for valid input</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>outputFolderName</PortName>
<Definition>Adds an indentifier to the output folder name for the purpose of data organization. The folder name is still preficed with 'ApplyModel_' and suffixed with and auto-incremented counter.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>run_name_info</PortName>
<Definition>Used to specify a meaningful tag and subfolder for output file naming/organization. See documentation for OutputName module for more information.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>None</Options>
<Connections>Connects to an OutputName module</Connections>
</Port>
</InputPorts>
<OutputPorts>
<Port>
<PortName>modelWorkspace</PortName>
<Definition>The R workspace where all internal details regarding the fitted model are stored. This is used by the Select and Test the Final Model module.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>'modelWorkspace' port of SAHMModelOutputViewerCell for viewing the aspatial model output.</Connection>
<Connection>'modelWorkspace' port of SAHMSpatialOutpuViewerCell for viewing the spatial model output in a mini GIS.</Connection>
</Connections>
</Port>
<Port>
<PortName>BinaryMap</PortName>
<Definition>If specified using MakeBinaryMap=True then a surface of binary predictions is produced by discretizing the prediction map based on the selected threshold. This map indicates whether one could expect each site to be occupied or unoccupied based on the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ProbabilityMap</PortName>
<Definition>If specified using MakeProbabilityMap=True then a surface of predicted values is produced based on the tiffs in the input .mds file and the fitted model. These can but do not always indicate the probability of finding the species at a given site. For example if model calibration is poor then these will not agree well with the true probabilities though discrimination between presence and absences might still be good.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResidualsMap</PortName>
<Definition>Model residual plots show the spatial relationship between the model deviance residuals. Most models assume residuals will be independent thus spatial pattern in the deviance residuals can be indicative of a problem with the model fit and inference based on the fit. It can for example indicate that important predictors were not included in the model and can be compared with the spatial pattern of predictors that were not included in the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MessMap</PortName>
<Definition>If specified by selecting makeMESMap=True the the MESS and MoD surfaces will be produced. The MESS surface is the multivariate environment similarity surface and shows how well each point fits into the univariate ranges of the points for which the model was fit. Negative values in this map indicate that the point is out of the range of the training data.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MoDMap</PortName>
<Definition>If specified by selecting makeMESMap=TRUE the the MESS and MoD surfaces will be produced. The MoD map is related to the MESS map and indicates which variable was furthest from the range over which the model was fit for each spatial location. See Elith et. al. 2010 for details on how the MESS map calculations are performed.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>modelEvalPlot</PortName>
<Definition>For binary data this will be a Receiver operating characteristic curve. Which shows the relationship between sensitivity and specificity as the threshold for discretizing continuous predictions into presence absence is varied. The threshold selected using the specified ThresholdOptimizationMethod is shown. If a model selection test training split was specified the ROC curve for this will be shown in red and if a cross-validation split was specified ROC curves for each cross-validation fold will be overlaied with box plots summarizing cross-validation results. For count data this display will show several standard plots for assessment of model residuals.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResponseCurves</PortName>
<Definition>Model response curves show the relationship between each predictor included in the model, while holding all other predictors constant at their means, and the fitted values. MARS response curves are shown on a logit scale thus the response axis will not necessarily be bounded on the 0 to 1 interval. BRT response curves will show response surfaces for any interaction terms included in the final model along with the percent relative influence.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Text_Output</PortName>
<Definition>This file contains a summary of the model fit. The information contained here includes the number of presence observations (counts equal to or greater than 1 for count models), the number of absence points, the number of covariates that were considered by the model selection algorithm. Note all of these can differ from the numbers in the original .mds due to incomplete records being deleted, and predictors with only one unique value being removed. The random number seed is recorded if applicable which allows completely reproducible results as well as a summary of the model fit. Evaluation Statistics are reported for the data used to fit the model as well as for the test or cross-validation split if applicable. References for how to interpret most of these are ubiquitous in the literature but it is worth mentioning that interpretation of the calibration statistics is described by Pearce and Ferrier 2000 as well as Miller and Hui 1991. Most metrics reported here can also be found in related graphical displays.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>modelCalibrationPlot</PortName>
<Definition>The calibration plot shows the predicted probability of occurrence plotted against the actual proportions of occurrence for each of 5 bins along the probability axis. A logistic regression model is fit to the logits of the predicted probabilities of occurrence and is shown on the plot. These plots are used to determine how reliably a model will predict if a site is occupied or unoccupied (Pearce and Ferrier 2000)</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
</OutputPorts>
<References>
<Reference>Bivand, R.S., Pebesma, E.J., and Gomez-Rubio, V. (2008). Applied Spatial Data Analysis with R. Springer New York, NY. </Reference>
<Reference>Dormann, C.F., McPherson, J.M., Araujo, M.B., Bivand, R., Bolliger, J., et al. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30:609–28. </Reference>
<Reference>Elith, J., Kearney, M., Phillips, S. (2010). The art of modeling range-shifting species. Methods Ecol Evol 1:330–342</Reference>
<Reference>Liaw, A. and Wiener M. (2002). Classification and Regression by randomForest. R News 2(3), 18--22.</Reference>
<Reference>Miller, M.E., Hui, S.L., Tierney, W.M. (1991). Validation techniques for logistic regression models. Statistics in Medicine 10: 1213-26</Reference>
<Reference>Pearce, J., and S. Ferrier. (2000). Evaluating the predictive performance of habitat models developed using logistic regression. Ecological Modelling 133:225–245.</Reference>
<Reference>R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. </Reference>
<Reference>Freeman, E. (2007). PresenceAbsence: An R Package for Presence-Absence Model Evaluation. USDA Forest Service, Rocky Mountain Research Station, 507 25th street,Ogden, UT, USA
</Reference>
</References>
<SeeAlso />
</Module>
<Module>
<Title>MAXENT</Title>
<Description />
<InputPorts />
</Module>
<Module>
<Title>MARS</Title>
<Description>MARS is a non-parametric technique that builds flexible models by fitting piecewise logistic regressions. In effect, it is similar to GLM except that rather than fitting a straight line response to each predictor, piecewise functions of each predictor are fit, which allows MARS to better accommodate nonlinear response to predictors and also reduces the risk that outlying observations might have high leverage. The model is deliberately over-fit and then pruned back. The original code was developed from that provided in the supporting material of Leathwick and Elith 2006 which contains more details on how model fitting occurs.</Description>
<InputPorts>
<Port>
<PortName>mdsFile</PortName>
<Definition>The the input data set consisting of locational data for each sample point, the values of each predictor variable at those points This input file is almost always generated by the upstream steps. </Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The mdsFile can be produced by any of MDSBuilder, ModelEvaluationSplit, ModelSelectionCrossValidation, ModelSelectionSplit, or CovariateCorrelationAndSelection. </Connection>
</Connections>
</Port>
<Port>
<PortName>makeBinMap</PortName>
<Definition>Indicate whether to discretize the continues prediction map into presence absence. See the ThresholdOptimizationMethod for how this is done. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeProbabilityMap</PortName>
<Definition>Indicate whether a map of predicted values is to be produced for the model fit.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeMESMap</PortName>
<Definition>Indicate whether to produce a multivariate environmental similarity surface (MESS) and a map of which factor is limiting at each point see Elith et. al. 2010 for more details. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well. </Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ThresholdOptimizationMethod</PortName>
<Definition>Determines how the threshold is optimized in order to discretize continuous predictions into binary. These are used for evaluation metrics calculated based on the confusion matrix as well as for the binary map. The value calculated for the train portion of the data will be applied to the test portion and if cross-validation was specified, the value is calculated separately for each fold using the threshold from the training data and applying it to the test data for the hold out fold. These options come from the R package PresenceAbsence and more details can be found in the associated manual see Freeman 2007. </Definition>
<Mandatory>False</Mandatory>
<Default>2</Default>
<Options>
<Option>1: Threshold=0.5</Option>
<Option>2: Sensitivity=Specificity</Option>
<Option>3: Maximizes (sensitivity+specificity)/2</Option>
<Option>4: Maximizes Cohen's Kappa</Option>
<Option>5: Maximizes PCC (percent correctly classified)</Option>
<Option>6: Predicted prevalence=observed prevalence</Option>
<Option>7: Threshold=observed prevalence</Option>
<Option>8: Mean predicted probability</Option>
<Option>9: Minimizes distance between ROC plot and (0,1)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MarsDegree</PortName>
<Definition>The level of interaction allowed:
1=no interactions (default) terms are allowed in the model
2=1st order interactions
3=2nd order interactions and so on. </Definition>
<Mandatory>FALSE</Mandatory>
<Default>1</Default>
<Options>
<Option>A positive integer generally no greater than 3 or possibly 4</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MarsPenalty</PortName>
<Definition>The cost per degree of freedom charge in fitting the mars model (from the mda library).</Definition>
<Mandatory>FALSE</Mandatory>
<Default>2</Default>
<Options>
<Option>A positive float</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>outputFolderName</PortName>
<Definition>Adds an indentifier to the output folder name for the purpose of data organization. The folder name is still preficed with 'ApplyModel_' and suffixed with and auto-incremented counter.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>run_name_info</PortName>
<Definition>Used to specify a meaningful tag and subfolder for output file naming/organization. See documentation for OutputName module for more information.</Definition>
<Mandatory>False</Mandatory>
<Default>NA</Default>
<Options>None</Options>
<Connections>Connects to an OutputName module</Connections>
</Port>
</InputPorts>
<OutputPorts>
<Port>
<PortName>modelWorkspace</PortName>
<Definition>The R workspace where all internal details regarding the fitted model are stored. This is used by the Select and Test the Final Model module.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>'modelWorkspace' port of SAHMModelOutputViewerCell for viewing the aspatial model output.</Connection>
<Connection>'modelWorkspace' port of SAHMSpatialOutpuViewerCell for viewing the spatial model output in a mini GIS.</Connection>
</Connections>
</Port>
<Port>
<PortName>BinaryMap</PortName>
<Definition>If specified using MakeBinaryMap=True then a surface of binary predictions is produced by discretizing the prediction map based on the selected threshold. This map indicates whether one could expect each site to be occupied or unoccupied based on the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>ProbabilityMap</PortName>
<Definition>If specified using MakeProbabilityMap=True then a surface of predicted values is produced based on the tiffs in the input .mds file and the fitted model. These can but do not always indicate the probability of finding the species at a given site. For example if model calibration is poor then these will not agree well with the true probabilities though discrimination between presence and absences might still be good.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResidualsMap</PortName>
<Definition>Model residual plots show the spatial relationship between the model deviance residuals. Most models assume residuals will be independent thus spatial pattern in the deviance residuals can be indicative of a problem with the model fit and inference based on the fit. It can for example indicate that important predictors were not included in the model and can be compared with the spatial pattern of predictors that were not included in the model.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MessMap</PortName>
<Definition>If specified by selecting makeMESMap=True the the MESS and MoD surfaces will be produced. The MESS surface is the multivariate environment similarity surface and shows how well each point fits into the univariate ranges of the points for which the model was fit. Negative values in this map indicate that the point is out of the range of the training data.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>MoDMap</PortName>
<Definition>If specified by selecting makeMESMap=TRUE the the MESS and MoD surfaces will be produced. The MoD map is related to the MESS map and indicates which variable was furthest from the range over which the model was fit for each spatial location. See Elith et. al. 2010 for details on how the MESS map calculations are performed.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>modelEvalPlot</PortName>
<Definition>For binary data this will be a Receiver operating characteristic curve. Which shows the relationship between sensitivity and specificity as the threshold for discretizing continuous predictions into presence absence is varied. The threshold selected using the specified ThresholdOptimizationMethod is shown. If a model selection test training split was specified the ROC curve for this will be shown in red and if a cross-validation split was specified ROC curves for each cross-validation fold will be overlaied with box plots summarizing cross-validation results. For count data this display will show several standard plots for assessment of model residuals.
</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>NA</Connections>
</Port>
<Port>
<PortName>ResponseCurves</PortName>
<Definition>Model response curves show the relationship between each predictor included in the model, while holding all other predictors constant at their means, and the fitted values. MARS response curves are shown on a logit scale thus the response axis will not necessarily be bounded on the 0 to 1 interval. BRT response curves will show response surfaces for any interaction terms included in the final model along with the percent relative influence.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>Text_Output</PortName>
<Definition>This file contains a summary of the model fit. The information contained here includes the number of presence observations (counts equal to or greater than 1 for count models), the number of absence points, the number of covariates that were considered by the model selection algorithm. Note all of these can differ from the numbers in the original .mds due to incomplete records being deleted, and predictors with only one unique value being removed. The random number seed is recorded if applicable which allows completely reproducible results as well as a summary of the model fit. Evaluation Statistics are reported for the data used to fit the model as well as for the test or cross-validation split if applicable. References for how to interpret most of these are ubiquitous in the literature but it is worth mentioning that interpretation of the calibration statistics is described by Pearce and Ferrier 2000 as well as Miller and Hui 1991. Most metrics reported here can also be found in related graphical displays.</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>modelCalibrationPlot</PortName>
<Definition>The calibration plot shows the predicted probability of occurrence plotted against the actual proportions of occurrence for each of 5 bins along the probability axis. A logistic regression model is fit to the logits of the predicted probabilities of occurrence and is shown on the plot. These plots are used to determine how reliably a model will predict if a site is occupied or unoccupied (Pearce and Ferrier 2000)</Definition>
<Mandatory>NA</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
</OutputPorts>
<References>
<Reference>Bivand, R.S., Pebesma, E.J., and Gomez-Rubio, V. (2008). Applied Spatial Data Analysis with R. Springer New York, NY. </Reference>
<Reference>Dormann, C.F., McPherson, J.M., Araujo, M.B., Bivand, R., Bolliger, J., et al. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30:609–28. </Reference>
<Reference>Elith, J., Kearney, M., Phillips, S. (2010). The art of modeling range-shifting species. Methods Ecol Evol 1:330–342</Reference>
<Reference>Hastie, T. and Tibshirani., R. mda: Mixture and flexible discriminant analysis. Ported to R by Leisch, F., Hornik, K. and Ripley B. D. (2011). R package version 0.4-2.</Reference>
<Reference>Leathwick J.R., Elith, J., Hastie, T. (2006). Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecological Modelling 199: 188-96</Reference>
<Reference>Miller, M.E., Hui, S.L., Tierney, W.M. (1991). Validation techniques for logistic regression models. Statistics in Medicine 10: 1213-26</Reference>
<Reference>Pearce, J., and S. Ferrier. (2000). Evaluating the predictive performance of habitat models developed using logistic regression. Ecological Modelling 133:225–245.</Reference>
<Reference>R Development Core Team (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. </Reference>
<Reference>Freeman, E. (2007). PresenceAbsence: An R Package for Presence-Absence Model Evaluation. USDA Forest Service, Rocky Mountain Research Station, 507 25th street,Ogden, UT, USA
</Reference>
</References>
</Module>
<Module>
<Title>UserDefinedCurve</Title>
<Description>This model allows the user to specify the response curves manually using empirical or expert knowledge about the species response to environmental covariates. When it is run the workflow will pause while an interactive widget pops up to allow the user to specify the curves.</Description>
<InputPorts>
<Port>
<PortName>mdsFile</PortName>
<Definition>The the input data set consisting of locational data for each sample point, the values of each predictor variable at those points This input file is almost always generated by the upstream steps. </Definition>
<Mandatory>True</Mandatory>
<Default>NA</Default>
<Options>NA</Options>
<Connections>
<Connection>The mdsFile can be produced by any of MDSBuilder, ModelEvaluationSplit, ModelSelectionCrossValidation, ModelSelectionSplit, or CovariateCorrelationAndSelection. </Connection>
</Connections>
</Port>
<Port>
<PortName>makeBinMap</PortName>
<Definition>Indicate whether to discretize the continues prediction map into presence absence. See the ThresholdOptimizationMethod for how this is done. If time is a concern and many models are to be fit and assessed maps can be produced after model selection for only the best models using the Select and Test the Final Model tool. Options are available for producing Probability, Binary and MESS maps there as well.</Definition>
<Mandatory>False</Mandatory>
<Default>False (Unchecked)</Default>
<Options>
<Option>True (Checked)</Option>
<Option>False (Unchecked)</Option>
</Options>
<Connections>Does not connect to any other module.</Connections>
</Port>
<Port>
<PortName>makeProbabilityMap</PortName>