-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnftables-router.nft
704 lines (661 loc) · 35 KB
/
nftables-router.nft
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
#!/usr/sbin/nft --file
#### This -*-nftables-*- ruleset is my /etc/nftables.conf for new hosts.
#### Ref. http://jengelh.medozas.de/documents/Perfect_Ruleset.pdf
#### Ref. https://wiki.nftables.org/
####
#### GOTCHA: This is written for nft 0.9.1.
#### Several options have changed, even since 0.9.0!
####
#### GOTCHA: "nft --file" does not flush by default.
#### That is, it acts like "iptables-restore --noflush".
####
#### GOTCHA: If your ruleset does "flush; add; flush; add",
#### it will actually do "flush; flush; add; add".
#### That is, all the flushes move to the top.
####
#### This makes it impossible to do my old trick of
#### having the ruleset start with a "deny all" policy,
#### which was useful to make sure that if the firewall failed,
#### the OS would lock down the whole system.
#### (See "iptab" for notes about that.)
####
#### To achieve that under nft, you instead need to patch
#### nftables.service to have
#### OnFailure=nftables-denyall.service, and then write that
#### unit to load a SEPARATE, DIFFERENT nftables file that
#### blocks everything. And even then, it won't behave quite
#### the same.
####
#### This also means it is no longer safe to refer to
#### hostnames in this ruleset, safe in the knowledge that
#### they can only be resolved via local /etc/hosts. Because
#### the "deny all" ruleset can't be prepended here, we CANNOT
#### be sure the real ruleset will only resolve hostnames via
#### local sources -- so you might add ones that are only in
#### DNS, and then have the firewall fail to load during early
#### boot -- leaving you with a "working" host, with no
#### firewall!
####
#### GOTCHA: "list ruleset" here will print the ruleset BEFORE it goes through the kernel;
#### "nft list ruleset" later will print the ruleset AFTER it goes through the kernel.
#### The difference is usually like "iptables -p tcp" getting an implied "-m tcp".
#### However, differences MIGHT indicate a bug! Watch out!
####
#### GOTCHA: "table filter" and "chain INPUT" or "chain input" just
#### conventions and have NO MEANING WHATSOEVER. The actual
#### meaning comes from the "type filter hook input priority
#### filter" line.
####
#### GOTCHA: you can't use globs inside a set or map/vmap:
####
#### iifname "ppp*" # works
#### iifname {"ppp*", "en*"} # fails
#### iifname vmap {"ppp*": drop} # fails
####
#### GOTCHA: Service names resolve via nss (/etc/services) only in nft 0.9.1+!
#### For example, "imap" is in /etc/services, but not in nft 0.9.0.
#### In nft 0.9.0, "nft describe tcp dport" will print the internal list.
####
#### GOTCHA: Using variable ("define x = y") is annoying:
####
#### * definition variable names aren't limited to C-style identifiers;
#### "define 4 = 0" is not allowed, but "define ::1 = 0" is.
####
#### * definitions ARE SCOPED to the file, table, or chain.
#### You can define $x differently in two chains, but
#### you can't define a default $x in a table, then
#### "override" it in a single chain.
####
#### * the definition (define x = y) MUST PRECEDE the usage ($x).
####
#### * a round-trip through the kernel will lose the variable
#### names (nft -f + nft list ruleset).
####
#### CORROLLARY: to change the value, you have to reload the ruleset.
####
#### A workaround for most of this is to use a named set, e.g.
####
#### define x = y
#### rule inet a b ip saddr $x log
####
#### becomes
####
#### set inet a b x { type ipv4_addr; elements={y} }
#### rule inet a b ip saddr @x log
####
#### HOWEVER, sets aren't allowed in some places, e.g.
####
#### rule ip a b tcp dport { http, https } dnat to @www
####
#### NOTE: Mixing nft and legacy xtables should MOSTLY Just Work, but
#### is discouraged because of confusion and kernel bugs.
#### In such case, you need to look at "nft list ruleset" **AND** "iptables-legacy-save".
####
#### NOTE: as at systemd v242, "machinectl start my-container" will
#### create a legacy xtables MASQUERADE rule by default.
####
#### NOTE: Only create a chain if you use it.
#### An empty chain is slightly slower than no chain at all.
#### e.g. most hosts don't need an output chain.
####
#### NOTE: iptables ALWAYS counts how many packets/bytes hit every chain and rule.
#### nftables makes this OPT IN, e.g. change "accept" to "counter accept".
#### iptables-save -c would print "[pkts:bytes] -A ...".
#### nftables list ruleset will print "... counter packets 12 bytes 34 ...".
####
#### Since counters are useful during debugging but not production,
#### I have left them out of this example.
####
#### NOTE: "table x" is implicitly "table ip x", which is IPv4 only.
#### If you want dual-stack, say "table inet x".
####
#### NOTE: "iif lo" is resolved at ruleset load time into an interface
#### NUMBER inside the kernel; whereas "iifname lo" remains a
#### string. This means that:
####
#### * "iifname" is ~10 times slower than "iif" for every
#### packet considered (strcmp versus ==).
####
#### * If you load a ruleset with "iif foo" before foo exists,
#### the load will fail, LEAVING YOU UNPROTECTED!
####
#### * If you load a ruleset with "iif foo" and then foo is
#### removed and readded (e.g. ppp0 for a flaky ADSL link),
#### what happens?
####
#### * Rule of thumb: always use "iifname" (not "iif").
####
#### NOTE: instead of using "define dmz = enp12s0f3",
#### give your interface a logical name directly.
#### Then you use the name EVERYWHERE, e.g. "ip -s l show dmz".
####
#### OLD METHOD:
#### $ cat /etc/udev/rules.d/70-persistent-net.rules
#### SUBSYSTEM=="net", DRIVERS=="?*", ATTR{address}=="de:ad:be:ef:ba:be", NAME="internet"
#### SUBSYSTEM=="net", DRIVERS=="?*", ATTR{address}=="de:af:be:ad:de:ed", NAME="dmz"
#### SUBSYSTEM=="net", DRIVERS=="?*", ATTR{address}=="da:bb:ed:fa:ca:de", NAME="byod"
####
#### NEW METHOD (DID NOT WORK WHEN I TRIED IT):
#### $ cat /etc/systemd/network/dmz.link
#### [Match]
#### MACAddress=dead.beef.babe
#### [Link]
#### Name=dmz
#### # Is this line needed?
#### #NamePolicy=
####
#### NOTE: a single rule can match "allow 53/tcp and 53/udp", but
#### it is ugly, so don't do it:
####
#### meta l4proto {tcp, udp} @th,16,16 53 accept comment "accept DNS on UDP/TCP"
####
#### The @th,16,16 goes to the transport header, skips 16 bits, then reads 16 bits.
#### If those bits are equal to 53, the rule matches.
####
#### This relies on (abuses) the fact that TCP dport and UDP dport have the same offset and length.
#### You cannot use "domain", because nft doesn't know we're matching a service number.
# NOTE: this will remove *ALL* tables --- including tables set up by
# other things (e.g. sshguard)!
#
# In theory you can flush just your own rules, e.g.
#
# flush table inet my_filter
#
# FIXME: I tried that, and I got locked out of SSH!
# What it did was remove all the rules, but NOT the chains, so
# the default-deny policy dropped EVERYTHING!!!
#flush ruleset
# This seems to be a viable workaround (NOTE: must do this for each table):
add table inet my_filter # idempotent
delete table inet my_filter # not idempotent
table inet my_filter {
chain my_input {
type filter hook input priority filter
policy drop
jump my_prologue comment "deal with boring conntrack/loopback/ICMP/ICMPv6"
# YOUR RULES HERE.
# NOTE: service names resolve via nss (/etc/hosts) only in nft 0.9.1+!
tcp dport ssh accept
tcp dport smtp reject comment "alpha is null-listed first MX for CCA (antispam measure)."
iifname {lan, dmz, byod} tcp dport domain accept
iifname {lan, dmz, byod} udp dport {domain, ntp, bootps, tftp} accept
jump my_epilogue
}
chain my_forward {
type filter hook forward priority filter
policy drop
jump my_prologue comment "deal with boring conntrack/loopback/ICMP/ICMPv6"
# YOUR RULES HERE.
# If a pwned devices spams the internet,
# your entire network will be shitlisted!
# To avoid this, block outbound SMTP (25/tcp) from non-MTA hosts.
# MSAs (e.g. Outlook) are not affected, because they use submission (587/tcp).
#
# NOTE: this must appear BEFORE "allow all to internet", obviously.
# NOTE: the LANs can still spam the DMZ.
# NOTE: the DMZ can still spam the internet, because
# occasionally someone adds an MTA to the DMZ without telling me.
iifname != dmz oifname internet reject comment "MSAs MUST use submission (not smtp)"
# Example of a router between four networks:
# internet (the internet),
# dmz (internet-facing servers),
# lan (internal servers & managed laptops).
# byod (BYOD phones and laptops)
#
# Due to prologue, we're only handling NEW FLOWS.
# The return traffic is implicitly allowed.
# We want to say something like:
#
# * full access to internet (except port 25?)
# * limited access to dmz ("typical" server ports)
# * everything else implicitly blocked
#
# Ignoring hairpin traffic (iifname = oifname),
# we have 4 P 2 = 12 permutations.
#
# We can write every combination out longhand:
#
# * BONUS: very clear
# * BONUS: can't get rules in "wrong" order (efficiency)
# * BONUS: can't accidentally have overlapping rules (correctness)
# * BONUS: if a new iface appears, it will default deny (correctness)
# * MALUS: too verbose with many networks (5P2 = 20; 6P2 = 30, 7P2 = 42)
# * MALUS: can't use globs for shorthand (e.g. "*" . dmz).
#
# NOTE: the "continue" lines could be omitted, but are harmless.
iifname . oifname vmap {
lan . internet : accept, # all can attack the internet
byod . internet : accept, # all can attack the internet
dmz . internet : accept, # all can attack the internet
lan . dmz : jump my_dmz, # all can attack DMZ (only specific ports)
byod . dmz : jump my_dmz, # all can attack DMZ (only specific ports)
internet . dmz : jump my_dmz, # all can attack DMZ (only specific ports)
lan . byod : accept, # only LAN can attack phones
dmz . byod : continue, # only LAN can attack phones
internet . byod : continue, # only LAN can attack phones
byod . lan : continue, # nobody can attack LAN
dmz . lan : continue, # nobody can attack LAN
internet . lan : continue, # nobody can attack LAN
}
# OR, we can try to be clever, and write individual rules.
# This is shorter, but harder to reason about!
# oifname internet accept
# oifname dmz jump my_dmz
# iifname lan accept
## Allow connections to protected (non DMZ) services.
## FIXME: this is all IPv4 only! We need equivalent rules for IPv6 as well!!!
iifname dmz ip daddr @ldap_servers tcp dport ldaps accept comment "Centralized authentication"
iifname {lan, byod} ip daddr @irc_servers tcp dport ircd accept comment "IRC from laptops"
iifname {dmz, lan, byod} ip daddr @apt_servers tcp dport {http, 3142} accept comment "APT mirror access (3142 = apt-cacher-ng)"
iifname {dmz, lan, byod} ip daddr @log_servers tcp dport 2514 accept comment "RELP (modern syslog)"
iifname {dmz, lan, byod} ip daddr @log_servers udp dport syslog accept comment "*legacy* syslog (inc. wifi APs on BYOD network)"
ip saddr @ssh_servers tcp dport ssh accept comment "SSH *FROM* login gateway to anything else"
ip saddr @www_servers tcp dport https accept comment "HTTPS *FROM* reverse proxy to backend web apps"
jump my_epilogue
}
# This is mostly for transition from omega (one IP per service) to new-omega (one IP for all services).
# When the transition is done, we can flatten all of this down.
set ldap_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.154} }
set irc_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.134} }
set apt_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.153} }
set log_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.157} }
set ssh_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.5} }
set www_servers { type ipv4_addr; elements={203.7.155.214, 203.7.155.8} }
# We want output to be "allow all", so we don't even create a chain.
#chain my_output {
# type filter hook output priority filter
# policy accept
#}
# In theory DMZ hosts must fend for themselves;
# in practice their competence is suspect.
# Thus, limit DMZ access to "typical" ports (plus some per-host exceptions).
# Within those typical ports, DMZ hosts still fend for themselves.
chain my_dmz {
tcp dport {domain,ssh,http,https,smtp,submission,imaps} accept
udp dport {domain} accept
# Allow additional special ports, but only to the server that serves them.
# UPDATE: sigh, apparently there is no conntrack helper for mosh, and bulk allowing a giant port range REALLY IS the recommended workaround. --twb, Aug 2019
define russm.example.com = 127.254.254.254
ip daddr $russm.example.com udp dport 60000-61000 accept comment "mosh (FIXME: write nf_conntrack_mosh.ko!)"
}
chain my_prologue {
# Typically 99%+ of packets are part of an already-established flow.
# Allow those first, so we're a fast, stateful firewall.
# After this only "ct state new" (or "ct state untracked") will remain.
# FIXME: is a vmap here better (more efficient) than two separate rules?
# FIXME: {established or related: accept} does not match correctly!
ct state vmap { established: accept, related: accept, invalid: drop }
# FIXME: I HAVE **NO IDEA** IF THIS IS THE RIGHT THING!
#
# NOTE: this rule is only useful if you use port forwarding
# ("dnat to" or "redirect" in a nat prerouting chain).
#
# In xtables, port forwards would look like this:
# -t nat -A PREROUTING -p tcp --dports http,https -j DNAT --dnat-to www
# -t filter -A FORWARD -p tcp --dports http,https -d www -j ACCEPT
# to avoid having to keep those two rules in sync, you could simply do this:
# -t nat -A PREROUTING -p tcp --dports http,https -j DNAT --dnat-to www
# -t filter -A FORWARD --ctstate DNAT -j ACCEPT
#
# I *THINK* the following rule is the nftables equivalent.
ct status dnat accept
# Loopback traffic is needed for e.g. NFS RPC, and for debugging.
# FIXME: is iiftype here better than iif/iifname?
iiftype loopback accept
# Allow *some* kinds of IPv4/ICMP and IPv6/ICMPv6.
# FIXME: are "ip protocol icmp" and "ip6 nexthdr icmpv6" needed?
#
# NOTE: see also "sysctl net.ipv4.icmp_ratelimit=1000".
#ip protocol icmp icmp type vmap @ICMP_policy
#ip6 nexthdr icmpv6 icmpv6 type vmap @ICMPv6_RFC4890_policy
# Simpler version that relies on "ct state" and is PROBABLY good enough.
icmp type echo-request accept
icmpv6 type { echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept
# "ct state" doesn't recognize MDNS/DNS-SD & LLMNR return packets, because
# the query daddr (224.0.0.251) doesn't match the reponse saddr (e.g. 192.168.0.2).
# As a stopgap to FIX PRINTING (cups-browsed) accept (some) unsolicited multicast packets.
# https://en.wikipedia.org/wiki/Multicast_DNS#Packet_structure
# FIXME: I still do not fully understand this, but it was enough to fix printing at work.
# FIXME: can we combine "ip daddr" and "ip6 daddr" on the same line?
ip daddr 224.0.0.251 udp dport mdns accept comment "MDNS/DNS-SD (e.g. printer discovery)"
ip6 daddr ff02::fb udp dport mdns accept comment "MDNS/DNS-SD (e.g. printer discovery)"
# FIXME: is this needed to get "Connected to: <switch name>:<switch port>" in "networkctl stats --all"?
# I seem to get that WITHOUT explicitly adding this, so leaving as log-and-continue for now.
# https://en.wikipedia.org/wiki/Link-Local_Multicast_Name_Resolution
ip daddr 224.0.0.252 ether daddr 01:00:5E:00:00:FC udp dport 5355 log prefix "FIXME: allow this LLMNR packet? "
ip6 daddr ff02::1:3 ether daddr 33:33:00:01:00:03 udp dport 5355 log prefix "FIXME: allow this LLMNR packet? "
# DHCPv6 has the same problem. Example request & response:
# fe80::892d:3b35:6914:6a9f → ff02::1:2 DHCPv6 136 Solicit XID: 0xccffdd CID: 0004f596f2f14208a98737a353e097ba81fe
# fe80::c256:27ff:fe2f:d1e2 → fe80::892d:3b35:6914:6a9f DHCPv6 206 Reply XID: 0xccffdd CID: 0004f596f2f14208a98737a353e097ba81fe IAA: 2403:5804:c6::add IAA: fd2e:a1a5:b5e0::add
# For now, log-and-accept iff the response is link-local.
ip6 saddr fe80::/64 ip6 daddr fe80::/64 udp dport dhcpv6-client counter log level debug prefix "non-RELATED/ESTABLISHED DHCPv6-client " accept
jump my_IPS
}
chain my_epilogue {
# Finally, politely reject all other attempts.
# Omit to use the default policy ("policy drop", above) instead.
iifname internet drop # FIXME: why drop, not reject??
reject
}
## An automated SSH (et al) brute-force shitlist.
##
## The nominal goal is to nerf brute-force password guessing.
## Since I disable password auth, the REAL goal is to reduce the
## amount of spam in my SSH auth log.
##
## (Running SSH on a non-standard port would also work, but
## I want to benefit from ISPs giving preferential QOS to 22/tcp).
##
## 1. if you brute-force port X more than Y times/minute,
## you're shitlisted for Z minutes.
##
## 2. if you are shitlisted and make ANY connection,
## you're shitlisted for Z minutes (i.e. countdown resets).
##
## 3. if you are shitlisted, all your new flows are dropped.
## (We used to TARPIT, to tie up attacker resources.
## That used xtables-addons and isn't supported in nftables 0.9.1.)
##
## Compared to sshguard or fail2ban or DenyHosts:
##
## BONUS: installed on a gateway, protects the entire network.
##
## BONUS: works even when syslogd is down, or /var/log is full, or
## the syslog "access denied" log format changes.
##
## BONUS: works even when sshd (or whatever) is down.
## That is, if the host is off, the gateway will still trigger.
##
## BONUS: works even when sshd (or whatever) is unused.
## If you never even run FTP or RDP, trigger on them!
##
## MALUS: cannot ignore legitimate traffic.
##
## For SSH, you can mitigate this by forcing your users to
## use ControlMaster.
##
## For HTTPS and IMAPS, you're screwed --- those ALWAYS
## make 30+ connections at once (in IMAP's case, because
## IDLE extension sucks).
##
## You can also mitigate this by having a "backdoor" open
## while shitlisted, which adds you to a temporary
## goodlist if you port knock in the right sequence.
##
## The port knock sequence is a pre-shared key to your end
## users, with all the problems that a PSK involves!
##
## MALUS: easy for an attacker to spoof SYNs to block a legitimate user?
## (See port knock mitigation, above)
##
## MALUS: because we run this AFTER "ct state established accept",
## connections that are "in flight" when the ban hits
## are allowed to complete.
##
## This happens in the wild where the attacker makes 100
## SSH connections in 1 second.
##
## The alternative is to run this (relatively expensive)
## check on EVERY packet, instead of once per flow.
##
## You can see the current state of the list with:
##
## nft list set inet my_filter my_IPS_IPv4_shitlist
## nft list set inet my_filter my_IPS_IPv6_shitlist
##
## I recommend:
##
## * this IPS for low-rate (SSH w/ ControlMaster) and unused (FTP, RDP) services,
## on gateways, for flows originating from the internet / upstream.
##
## For a list of ports to (maybe) IPS guard, consider the first N lines of:
##
## sort -rnk3 /usr/share/nmap/nmap-services
##
## * a basic firewall, and sshguard, on every host that runs a relevant service.
## (This includes SSH, so basically everything.)
## This also covers legitimately bursty traffic on imaps.
## Does this cover submission 587/tcp (postfix)?
##
## * EXCEPT, sshguard doesn't do apache or nginx, so fail2ban on the www hosts?
##
## * postscreen covers smtp (25/tcp).
## FIXME: per https://wiki.dovecot.org/Authentication/Penalty, we
## should meter/block IPv6 sources by /48 instead of by single address (as we do for IPv4).
## Each corresponds to the typical allocation of a single ISP subscriber.
chain my_IPS {
ct state != new return comment "Operate per-flow, not per-packet (my_prologue guarantees this anyway)"
iiftype != ppp return comment "IPS only protects against attacks from the internet"
# Track the rate of new connections (my_IPS_IPvX_meter).
# If someone (ip saddr) connects to a service (ip daddr . tcp dport) too often,
# then shitlist them (my_IPS_IPvX_shitlist).
tcp dport @my_IPS_TCP_ports \
add @my_IPS_IPv4_meter { ip saddr . ip daddr . tcp dport limit rate over 1/minute burst 3 packets } \
add @my_IPS_IPv4_shitlist { ip saddr } \
log level info log prefix "Shitlist SRC: "
tcp dport @my_IPS_TCP_ports \
add @my_IPS_IPv6_meter { ip6 saddr . ip6 daddr . tcp dport limit rate over 1/minute burst 3 packets } \
add @my_IPS_IPv6_shitlist { ip6 saddr } \
log level info log prefix "Shitlist SRC: "
# If someone is NOT goodlisted, and IS shitlisted, then drop their connection, AND reset their countdown (hence "update" not "add").
# In other words, once shitlisted for brute-forcing SSH, you REMAIN shitlisted until you STFU for a while (on ALL ports).
ip saddr != @my_IPS_IPv4_goodlist ip saddr @my_IPS_IPv4_shitlist update @my_IPS_IPv4_shitlist { ip saddr } drop
ip6 saddr != @my_IPS_IPv6_goodlist ip6 saddr @my_IPS_IPv6_shitlist update @my_IPS_IPv6_shitlist { ip6 saddr } drop
}
set my_IPS_IPv4_meter { type ipv4_addr . ipv4_addr . inet_service; timeout 10m; flags dynamic; }
set my_IPS_IPv6_meter { type ipv6_addr . ipv6_addr . inet_service; timeout 10m; flags dynamic; }
set my_IPS_IPv4_shitlist { type ipv4_addr; timeout 10m; }
set my_IPS_IPv6_shitlist { type ipv6_addr; timeout 10m; }
set my_IPS_IPv4_goodlist { type ipv4_addr; timeout 10h; }
set my_IPS_IPv6_goodlist { type ipv6_addr; timeout 10h; }
set my_IPS_TCP_ports { type inet_service; elements={
ssh,
telnet, # we don't use it
ftp, ftps, # we don't use it
3389, 5900, # we don't use it (VNC & RDP)
pop3, pop3s, imap, # we don't use it
microsoft-ds, # we don't use it (SMB)
mysql, postgresql, ms-sql-s, # we don't use it (from the internet, without a VPN)
1723, # we don't use it (PPTP)
login, # we don't use it
}; }
# CONSIDERED AND REJECTED FOR my_IPS_TCP_ports
# ============================================
#
# * http, https:
#
# HTTP/0.9 and HTTP/1.1 is one TCP connect per request.
#
# HTTP/1.1 has workarounds that still suck due to head-of-line blocking.
# https://en.wikipedia.org/wiki/HTTP_persistent_connection
# https://en.wikipedia.org/wiki/HTTP_pipelining
#
# HTTP/2 solves this fully, but is /de facto/ never used on port 80.
#
# The end result is that as at August 2019,
# GUI browsers still routinely burst many HTTP connections to a single DST:DPT.
# This IPS only measures burstiness, so it can't work for HTTP/S.
#
# * imaps:
#
# If the server (and client) speak IMAP IDLE but not IMAP NOTIFY,
# the client will make ONE CONNECTION PER MAILBOX FOLDER.
# This looks very bursty, so the IPS can't do it's thing.
#
# See also:
# https://tools.ietf.org/html/rfc5465
# https://wiki2.dovecot.org/Plugins/PushNotification (??? -- different RFC)
# https://bugzilla.mozilla.org/show_bug.cgi?id=479133 (tbird)
# https://blog.jcea.es/posts/20141011-thunderbird_notify.html
# https://en.wikipedia.org/wiki/JMAP (just ditch IMAP entirely)
#
# * smtp, submission:
#
# For smtp (25/tcp), can't do shit because we have to talk to
# whatever the fuck crackhead MTAs are out there.
#
# For submission, we could limit connection rate IFF we knew
# ALL STAFF were running an MSA that batched up the messages.
# We know that at least msmtp does not, so this is a no-go.
#
# (Consider a manager sending 4+ one-liner "yes" or "do it!"
# emails in a single minute. We might be able to mitigate this
# by matching on submission with a more forgiving burst limit,
# e.g. 1/min burst 10? Otherwise, we have to rate-limit in the
# postfix->dovecot SASL backend, or the dovecot->ad LDAP
# backend. UGH.)
#
# * msrpc:
#
# FIXME: wtf even. I don't want to read enough about this to
# know if it's reasonable to IPS it.
#
# * openvpn:
#
# Normally UDP, and we currently only IPS TCP.
# Normally cert-based (but can use PSKs).
# Might be worth considering if we do this later.
#
# * ident:
#
# I think when you irssi -c irc.oftc.net,
# OFTC tries to ident back to you?
# I don't want to accidentally block OFTC/Freenode.
# Allow all ICMPv6 is wrong (insecure);
# Deny all ICMPv6 is wrong (breaks IPv6).
# The following vmap merges RFC 4890 4.43(for hosts) and 4.4 (for routers).
# Fortunately, the only verdict conflicts occur in
# "Traffic That Will Be Dropped Anyway" sections, so we can share this vmap
# between hook input (host) and hook forward (router).
#
# I *think* "dropped anyway" also means we also don't need these:
# ip6 hoplimit 1 # for LLMNR
# ip6 hoplimit 255 # for RA/RS/NA/NS
# ip6 saddr fe80::/10 # for LLMNR and MLD
#
# NOTE: I was going to use named types, but "nft describe icmpv6 type" doesn't have them all.
# Also, using bare numbers makes it possible to use intervals intuitively.
#
# FIXME: add "auto-merge" when possible
# (nft 0.9.1 has set auto-merge, but not map auto-merge).
map ICMPv6_RFC4890_policy {
type icmpv6_type : verdict
flags interval
elements = {
1 - 4: accept, # RFC 4890 4.3.1 & 4.4.1 essential errors
128 - 129: accept, # RFC 4890 4.3.1 & 4.4.1 Echo (ping)
144 - 147: accept, # RFC 4890 4.3.2 & 4.4.3 Mobile IPv6
133 - 136: accept, # RFC 4890 4.3.3 & 4.4.1 (replaces ARP and DHCPv4)
141 - 142: accept, # RFC 4890 4.3.3 & 4.4.1 (replaces ARP and DHCPv4)
130 - 132: accept, # RFC 4890 4.3.3 & 4.4.1 LLMNR
143: accept, # RFC 4890 4.3.3 & 4.4.1 LLMNR
148 - 149: accept, # RFC 4890 4.3.3 & 4.4.1 SEND
151 - 153: accept, # RFC 4890 4.3.3 & 4.4.1 Multicast Router
137: drop, # RFC 4890 4.3.3 & 4.4.4 Redirect
150: drop, # RFC 4890 4.3.4 & 4.4.3 Seamoby
5 - 99: drop, # RFC 4890 4.3.4 & 4.4.4 unallocated error messages
102 - 126: drop, # RFC 4890 4.3.4 & 4.4.4 unallocated error messages
154 - 199: drop, # RFC 4890 4.3.4 & 4.4.? unallocated informational messages
202 - 254: drop, # RFC 4890 4.3.4 & 4.4.? unallocated informational messages
138: drop, # RFC 4890 4.3.5 & 4.4.3 route renumbering
100 - 101: drop, # RFC 4890 4.3.5 & 4.4.5 experimental allocations
200 - 201: drop, # RFC 4890 4.3.5 & 4.4.5 experimental allocations
127: drop, # RFC 4890 4.3.5 & 4.4.5 extension type numbers
139 - 140: drop, # RFC 4890 4.3.5 & 4.4.4 Node Information
255: drop, # RFC 4890 4.3.5 & 4.4.5 extension type numbers
}
}
# NOTE: I couldn't find an RFC for ICMPv4 firewall, so
# I am adopting the following heuristic:
#
# 1. if there is an ICMPv6 equivalent, follow RFC4890.
# 2. if deprecated or experimental or reserved or unallocated, drop.
# 3. NOT rate-limiting ping for now, because ICBF.
# 4. NOT filtering by type.code (only type) for now, because ICBF.
#
# FIXME: duclicsic of #netfilter claims that "ct state related" implicitly handles
# everything except echo-request:
#
# <duclicsic> twb: if you accept related/established before dropping icmp it will work fine. destination unreachable messages are matched by related, echo replies are established, etc.
# <twb> duclicsic: interesting; I didn't realize that. That covers PMTUD at least
# <duclicsic> twb: the only icmpv4 message you need to explicitly accept in a sensible config is echo-request
# <twb> duclicsic: for ICMPv4, the only other thing I can see is RS/RA, which is probably not ACTUALLY needed
# <duclicsic> no I don't think I've ever seen them used, but you'd know to expect them if you were using them I'm sure.
# <twb> duclicsic: yeah except in the *even more* rare case where they're simply transiting through my network on the way somewhere else
# <duclicsic> i don't think you'd ever want to forward an icmp rotuer solicitation or advertisement
# <twb> duclicsic: oh yeah - DERP - RS/RA are dropped by the kernel automatically and do not transit a route (per RFC 4890)
#
# <twb> duclicsic: do you have related Opinions about ICMPv6, or have you simply not bothered with IPv6 yet?
# <duclicsic> twb: for v6 I accept echo-request, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert
# <twb> duclicsic: on gateways, or hosts, or both?
# <duclicsic> twb: that's an example from my desktop here
map ICMP_policy {
type icmp_type : verdict
flags interval
elements = {
destination-unreachable: accept, # RFC 4890 4.3.1 essential errors
time-exceeded: accept, # RFC 4890 4.3.1 essential errors
parameter-problem: accept, # RFC 4890 4.3.1 essential errors
echo-request: accept, # RFC 4890 4.3.1 echo (ping)
echo-reply: accept, # RFC 4890 4.3.1 echo (ping)
router-advertisement: accept, # RFC 4890 4.3.3 & 4.4.1 (IRDP - alternative to DHCPv4??)
router-solicitation: accept, # RFC 4890 4.3.3 & 4.4.1 (IRDP - alternative to DHCPv4??)
redirect: drop, # RFC 4890 4.3.3 & 4.4.4 Redirect
source-quench: drop, # deprecated
1 - 2: drop, # unassigned
6 - 7: drop, # deprecated / unassigned
15 - 39: drop, # deprecated / unassigned / reserved / experimental
41 - 255: drop, # deprecated / unassigned / reserved / experimental
13 - 14: continue, # FIXME Timestamp / Timestamp Reply???
40: continue, # FIXME Photuris???
}
}
}
# NOTE: dual-stack (IPv4/IPv6) NAT is annoying.
# IPv6 addresses are plentiful, so don't NAT IPv6.
#
# NOTE: in linux 5.2+ you *CAN* do inet (combined IPv4/IPv6) nat chains.
#
# table inet x { chain y { type nat hook postrouting priority srcnat; policy accept; } }
#
# GOTCHA: apparently you MUST hook BOTH prerouting AND postrouting.
# If you only hook one, it won't work.
#
# Ref. https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_%28NAT%29
#
# FIXME: discuss hostname resolution in DNS, /etc/hosts, versus hard-coding into this file.
#
# NOTE: I assume "the internet" iface is ADSL PPPoE/PPPoA (iiftype/oiftype ppp), and
# that's the ONLY ppp iface you have (cf. bullshit ppptp VPN for iPhone users).
# If you have decent internet, you will probably want to give the iface a logical name,
# then match by that name (iifname/oifname "internet").
#
# NOTE: see "nft flush ruleset" comment at top of file.
add table ip my_nat # idempotent
delete table ip my_nat # not idempotent
table ip my_nat {
chain my_postrouting {
type nat hook postrouting priority srcnat
policy accept
oiftype ppp masquerade
}
chain my_prerouting {
type nat hook prerouting priority dstnat
policy accept
iiftype != ppp return comment "port forwards are only relevant from the internet"
# YOUR PORT FORWARDS HERE.
# NOTE: Not using DNS because https://bugs.debian.org/933531, and
# because we can't use twb's trick ensure resolution
# ONLY via /etc/hosts (no DNS).
define www.example.com = 127.1.2.3
define mail.example.com = 127.254.253.252
tcp dport { http, https } dnat to $www.example.com
tcp dport { smtp, submission, imaps } dnat to $mail.example.com
}
}
# This is here to aid debugging.
# Note that its output WILL NOT MATCH a later "nft list ruleset".
# Also, it is buggy, e.g. the ICMPv6_RFC4890_policy it prints has gibberish in v0.9.1.
# UPDATE: in nftables=0.9.8-3 it sometimes core dumps! https://bugs.debian.org/982576
# Therefore comment out for now.
#list ruleset