-
Notifications
You must be signed in to change notification settings - Fork 1
/
settings-example
232 lines (190 loc) · 9.38 KB
/
settings-example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
###
# Binary paths
###
# What is the path to the ghostscript binary? Usually you can just leave this
# set to "gs", but may need to be tweaked on some systems.
GHOSTSCRIPT="gs"
# What is the path to the opj binaries? This may be opj_compress /
# opj_decompress, opj2_*, or a full path, depending on a variety of factors....
OPJ_COMPRESS="opj2_compress"
OPJ_DECOMPRESS="opj2_decompress"
###
# Web configuration
###
# Full URL to the NCA web app; though the hostname isn't currently used,
# it may be necessary for things like sending email notifications in the future
WEBROOT="https://internal.somewhere.edu/nca"
# The bind address is what the server listens on, often just a port string
# prefixed with a colon. Apache would use this for reverse-proxying.
BIND_ADDRESS=":8080"
# Full URL to the IIIF server's base path - this is used to display issues'
# pages during metadata entry and review
IIIF_BASE_URL="https://my.server.com/iiif"
# Full URL to the live news site, for pulling information about issues which
# are in production, and may be used in the future to link from NCA to
# live batches' / issues' detail pages
NEWS_WEBROOT="https://news.somewhere.edu"
# Full URL to the staging news site for directing admins to batches which are
# ready for review. In theory this could point toward production, allowing a
# more loose "test it live" approach.
STAGING_NEWS_WEBROOT="https://news-staging.somewhere.edu"
# Locations (usually a URL) to pull marc records when a new newspaper title is
# added. If a location begins with http/https, an HTTP request is made,
# otherwise it is treated as a path to a file. The string "{{lccn}}" is
# replaced with the LCCN being looked up. The second location is only
# considered if the first has no XML.
MARC_LOCATION_1="/var/local/marc/{{lccn}}/marc.xml"
MARC_LOCATION_2="https://chroniclingamerica.loc.gov/lccn/{{lccn}}/marc.xml"
###
# Database settings
###
DB_HOST="127.0.0.1"
DB_PORT=3306
DB_USER="nca"
DB_PASSWORD="nca"
DB_DATABASE="nca"
###
# SFTPGo settings. These determine how NCA interacts with SFTPGo, not how
# SFTPGo behaves more generally; see
# https://github.com/drakkan/sftpgo/blob/main/docs/full-configuration.md for
# configuring the SFTPGo daemon.
#
# This section is optional. If SFTP_API_URL is set to either blank or "-",
# connections to SFTPGo will not be initiated. This will however mean that you
# have to manage publisher uploads yourself, and NCA will not have any
# integrated upload support.
###
# SFTPGo's API URL as NCA will use it; Set this to "-" to skip SFTPGo
# integration. This must be the "internal" URL. e.g., the default value below
# ("http://sftpgo:8080/api/v2") is how NCA sees SFTPGo in Docker. This won't
# be the same URL you'd use in a browser for web-based administration.
SFTPGO_API_URL="http://sftpgo:8080/api/v2"
# The SFTPGo API key for API requests
# Reset to 'sftpgo_admin_api_key' surrounded by exclamation points
# and run sftpgo/get_admin_api_key.sh to get a new API key
SFTPGO_ADMIN_API_KEY=!sftpgo_admin_api_key!
# How much disk each new user is allotted for uploads. Note that this is NOT
# retroactive. This is simply the default used when a *new* title is sent to
# SFTPGo. Changing quotas for existing titles requires manual editing of their
# title data in NCA.
SFTPGO_NEW_USER_QUOTA=5gb
###
# ONI Agent settings. These tell NCA what to connect to in order to start and
# monitor batch loading and purging tasks on staging and production servers.
###
STAGING_AGENT="oni-agent-staging:22"
PRODUCTION_AGENT="oni-agent-prod:22"
###
# Paths for PDFs, derivatives, etc.
###
# Uploaded PDFs are deposited here - these should be exactly what the publisher
# gives us. There must be one subdirectory per title, named after the title's
# "SFTP Directory" value in the admin app, and each title must contain one
# subdirectory per issue, named by the issue's date, in YYYY-MM-DD format.
PDF_UPLOAD_PATH="/mnt/news/sftp"
# In-house scans are deposited here. The folder structure must be precisely as follows:
#
# <scan upload root>/<MARC org code>/<lccn>/<issue date with optional edition>
#
# For example, "/mnt/news/scans/oru/sn12345678/2018-01-01_02"
#
# The TIFF files should be at least 300dpi, and the PDFs should contain a
# 150dpi JPEG image encoded at about a quality of 40 (or "medium"), per the
# NDNP spec. The PDF also needs to have the OCR text embedded for the
# derivative processing to create the proper page XML.
SCAN_UPLOAD_PATH="/mnt/news/scans"
# When an uploaded PDF has been split, this is where we back it up. Originals
# are stored in the batches, so if you back up the batch, this can be a short-
# term backup location.
ORIGINAL_PDF_BACKUP_PATH="/mnt/news/backup/originals"
# This is where split PDFs are moved for manual reordering / renaming (e.g.,
# using Adobe Bridge)
PDF_PAGE_REVIEW_PATH="/mnt/news/page-review"
# Once processing happens, full batches are put here. They will be moved to the
# archive path upon QC approval and successful push to production.
BATCH_OUTPUT_PATH="/mnt/news/outgoing"
# When a batch is built, this location is where the ONI-required files live
# (basically everything but TIFFs and born-digital original file backups).
# Basically the final home for the always-online files. Both your production
# and staging Open ONI servers must be able to read files from this location.
BATCH_PRODUCTION_PATH="/mnt/news/production-batches"
# When a batch is live, its source files, including TIFF and backups, move to
# an archival location. This can be a dark archive, a "transfer" location for
# prepping bulk DA moves, or a location you manually manage in some way.
BATCH_ARCHIVE_PATH="/mnt/news/batch-archive"
# This is where scanned issues and SFTPed issues go after all manual processing
# is done. Issues moved here shouldn't be accessible to anybody for manual
# modification, and all metadata will live in the database.
WORKFLOW_PATH="/mnt/news/workflow"
# This is where you want issues to be dropped off when they're reported as
# unfixable and then an "Issue Manager" user chooses to remove them permanently
# from NCA. This location should be someplace curators / reviewers can access
# so rescanning or other manual fixes can take place.
ERRORED_ISSUES_PATH="/mnt/news/errors"
# This is where cached files are stored for the issue data. This is extremely
# important for issues which live on the live website, as those are very
# expensive to pull each time a given process is run.
ISSUE_CACHE_PATH="/var/local/news/nca/cache"
# This is where the core application lives, and is important for finding the
# HTML template files as well as static files like JS and CSS
APP_ROOT="/usr/local/nca"
# Where is the template for building issue XML? You can point this to the
# XML in the repo (templates/xml/mets.go.html) for ease, but some XML values
# may be wrong.
METS_XML_TEMPLATE_PATH="/usr/local/nca/templates/xml/mets.go.html"
# Where is the template for building the batch XML? This should be safe to use
# as-is, but it could be changed if necessary.
BATCH_XML_TEMPLATE_PATH="/usr/local/nca/templates/xml/batch.go.html"
###
# General rules for issue processing and batch creation
###
# How many pages must an issue have? Setting this to 2 can avoid processing
# obviously unfinished uploads.
MINIMUM_ISSUE_PAGES=2
# MARC organization code you want to use in chronam to represent batches from
# PDF sources. e.g., "oru" results in batch names like
# "batch_oru_20150101120000", attributed to the UO Knight Library. It is best
# to choose a code that chronam already knows about.
PDF_BATCH_MARC_ORG_CODE="oru"
# How many PDFs do we allow in a single batch? This limit is a lot less
# necessary now that fixing a batch is a simpler process. Historically, it was
# a lot easier to deal with smaller batches when they had to be rebuilt each
# time an issue had an error.
#
# NOTE: this limit will *not* split an issue. i.e., this is the maximum number
# of PDFs, not necessarily the precise number a batch will contain.
MAX_BATCH_SIZE=10000
# What is the minimum size of a batch? A batch won't be queued up until this number of pages is reached.
#
# NOTE: this limit will be ignored if issues have been ready to go live for too
# long in order to avoid issues being "stranded"
MIN_BATCH_SIZE=5000
# How long we require an issue to be untouched prior to anybody queueing it. If
# set to zero, people can queue an issue *immediately* after upload. This is
# not recommended if uploads aren't tightly restricted, but can make sense if
# NCA curators are also the ones doing the uploading.
#
# Duration format follows the Go ["ParseDuration" function][1] To summarize
# what you'd likely need, though: "1h" is one hour, "24h" would be one day,
# "30m" would be 30 minutes, "1s" would be one second, etc. The default is 48h,
# or two days.
#
# [1]: <https://pkg.go.dev/time#ParseDuration>
DURATION_ISSUE_CONSIDERED_DANGEROUS=48h
# How long we warn users that the issue is new - but it can be queued before
# that warning goes away so long as DAYS_ISSUE_CONSIDERED_DANGEROUS has elapsed
#
# Duration format follows the Go "ParseDuration" function - see above. The
# default of 336h is two weeks.
DURATION_ISSUE_CONSIDERED_NEW=336h
###
# Derivative settings
###
# DPI for ghostscript to use on the PDF-to-PNG conversion
DPI=200
# JP2 quality value for graphicsmagick
QUALITY=62.5
# Set this to the value your scanned images use to embed JPGs. This should be
# 150 per the NDNP spec, but could be changed if scanned images aren't under
# your control.
SCANNED_PDF_DPI=150