Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: OLSR v1 & v2 support #2418

Closed
wants to merge 2 commits into from
Closed

Conversation

mkg20001
Copy link
Member

@mkg20001 mkg20001 commented Mar 22, 2022

I've started to implement OLSR support for gluon for the grazer funkfeuer mesh network

Currently this is half working. The mesh itself mostly works, but client AP is basically broken.

One important note is that we're using static IPs for each node. An additional module to configure and apply them has been added.

It's a lot of hacks. I'd need some guidance how to implement this further.

The commits are from https://github.com/mkg20001/gluon, squashed into one, random commits being added as they drop into the graz branch. I'll split it as required. Copying the history as-is hasn't been done since it would be a lot of work, as there are other modules mixed into the history that are unrelated.

Currenty todo list for PRs that will emerge from this one.

  • OLSRv2 IPv6 Only
  • OLSRv1 IPv6 Only
  • Working Client AP support
  • respondd integration

This PR is basically a more general discussion now and olsr is my tracking branch for the full mess.

@github-actions github-actions bot added 3. topic: config-mode This is about the configuration mode 3. topic: docs Topic: Documentation 3. topic: package Topic: Gluon Packages 3. topic: respondd labels Mar 22, 2022
@mkg20001
Copy link
Member Author

Note: Not sure if I missed some code from the other repo. Would have to test this myself first aswell.

@T-X
Copy link
Contributor

T-X commented Mar 22, 2022

Yaiy, I'm very excited about this! Would it make sense to adopt some of the babel site config options?
https://gluon.readthedocs.io/en/latest/user/site.html?highlight=babel#site-configuration

It would be great if with default settings one could simply swap between the gluon-mesh-babel and gluon-mesh-olsr packages and get a similar IPv6 mesh experience, if technically possible. And if gluon-mesh-olsr would have reasonable default settings in the absence of the olsrd site section.

Current site config options used by and introduced for babel are/were:

  • node_prefix6
  • node_client_prefix6

Btw. do both the OLSRv1 and OLSRv2 implementations here use the UDP port 263 (manet port) for all protocol related traffic? Or is there anything else I should add to this list of pcap filter rules for OLSR protocol traffic counting?

@mkg20001
Copy link
Member Author

mkg20001 commented Mar 22, 2022

Btw. do both the OLSRv1 and OLSRv2 implementations here use the UDP port 263 (manet port) for all protocol related traffic?

olsrv1 is 698/udp, olsr2 is 269/udp

It would be great if with default settings one could simply swap between the gluon-mesh-babel and gluon-mesh-olsr packages and get a similar IPv6 mesh experience,

The current implementation uses randomly generated v6/v4 IP addresses, but ones that are consistent (mac based generation). If a community does not want to do any manual IP configuration, then that can be simply left as-is (with 2 additional settings set to define dynamic address ranges). But that bears the possibility of collisions. (edit: in v6 with 64 prefix shouldn't matter. this is only a concern for a v4 mesh)

In the grazer mesh I use manman-sync, which pulls data about IP addresses that are assigned to a node and applies those. https://github.com/mkg20001/gluon/tree/funkfeuer/package/gluon-manman-sync

Would it make sense to adopt some of the babel site config options?

Probably. More or less the same options already exist under different names (referring to node_prefix6). https://github.com/freifunk-gluon/gluon/pull/2418/files#diff-29cfa763ccd2ac824390943463f717de0009287e4d405c1556127539db789d5eR63-R71

package/features Outdated Show resolved Hide resolved

define Package/gluon-static-ip
TITLE:=Static IP assignment and configuration for gluon
DEPENDS:=+gluon-core +luci-lib-ip
Copy link
Member Author

@mkg20001 mkg20001 Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need luci-lib-ip from the luci repo. in the fork currently the entire luci repo is simply added. better solution, anyone?

target = 'ACCEPT',
})

uci:section('firewall', 'rule', 'allow_mesh_icmpv6_input', {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random attempts at getting client wifi, node ping, etc to work. might not be needed. should be cleaned after client wifi works.

Copy link
Member

@christf christf Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might find some inspiration in the firewall rules for gluon-mesh-babel :)

It seems this could be a good time to move the mmfd firewall rules into the gluon-mmfd package to re-use them here.
Also the respondd-bits might require a bit of work still as I cannot find rules for that. Again this might be a good time to assign the rules from gluon-mesh-babel to a different location so we can get some re-use (in case these rules do actually work for you).

@@ -185,7 +185,7 @@ end
-- 6: owe1
-- 7: wan_radio1 (private WLAN); mesh VPN
function M.generate_mac(i)
if i > 7 or i < 0 then return nil end -- max allowed id (0b111)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a hack because v4 mesh requires IPs for each interface. we'd either need something proper or just live with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried to rework the MAC address generation logic in #1983 to provide more addresses,

Copy link
Member

@christf christf Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the babel implementation we just assume that we are using ipv6-only. This could be a valid assumption for olsr as well.
This would mean that the ipv4-only bits of the internet can be reached by implementing xlat.
In our babel net we use jool and on the nodes the xlat helpers.
It would save a royal headache that would arise from handling ipv4 addresses in the mesh.

In the previous years this has been good enough to run a network in Magdeburg - and we used that technology in Frankfurt as well.

What do you think?

@neocturne neocturne added the 2. status: rfc request for comments label Mar 23, 2022
@mweinelt
Copy link
Contributor

If we consider accepting this contribution I think it should be conditional on having a way to test these changes with CI builds at least on x86_64.

@mkg20001
Copy link
Member Author

@mweinelt how to do that and also how would i go about adding automated tests? I cant find where the files from tests/ get executed

@mweinelt
Copy link
Contributor

mweinelt commented Mar 27, 2022

For one we'd need a site.conf & site.mk that reflects a working OLSR setup. I think that would in in contrib/ci/olsr-site. Then we would need to look into adapting the workflow to build different site flavors.

Runtime testing is something we currrently don't have, in part because nested virtualization isn't a thing on GitHub actions.

@mkg20001
Copy link
Member Author

We'd need some help with getting l3roamd to work with olsr (cc @5gbr) in order to get the client AP working

In olsr there's no such concept as next node (or I don't properly understand what it is for)
How would one go about making l3roamd work with olsr?
Currently it's trying to roam to 10.12.0.10 which is the first dns server. I don't understand why it's doing that.

@mweinelt
Copy link
Contributor

In olsr there's no such concept as next node (or I don't properly understand what it is for)

The nextnode address is for stations connected to the router to have a consistent way of talking to it, e.g. for a stable resolver address, that is close to the station, or to reach its status page when not knowing its canonical address.

@mkg20001
Copy link
Member Author

mkg20001 commented Mar 27, 2022 via email

@christf
Copy link
Member

christf commented Mar 27, 2022

We'd need some help with getting l3roamd to work with olsr (cc @5gbr) in order to get the client AP working

it should, but I never tested this.

In olsr there's no such concept as next node (or I don't properly understand what it is for) How would one go about making l3roamd work with olsr? Currently it's trying to roam to 10.12.0.10 which is the first dns server. I don't understand why it's doing that.

The nextnode address is that address on which the node that is directly connected can be reached. It is not a feature of the mesh protocol but a feature of gluon.
Every client can connect to its directly connected upstream AP via the nextnode address. So, this should already work with l3roamd.

I had worked a bit on getting ipv4 working however the whole concept of routing is very different (we are using host-routes in the current ipv6 concept and are migrating these). In the end I didn't get roaming to work for ipv4 given the constraints of the network design but used roaming for ipv6 and declared ipv4 via xlat a stop-gap feature for ipv6-only networks.
In practice this worked well enough even though it would break ipv4 connections when roaming. My reasoning is that ipv6 is preferred often enough nowadays.

for (int i = 0; i < linkcount; i++) {
struct json_object *link = json_object_array_get_idx(links, i);
if (!link)
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out is still a new json object so I expect this return to leak memory. If out is returned instead this problem goes away.
Also in line 83, 94 and others..

}

static struct json_object * olsr2_get_neigh(void) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this shouldn't be empty. If it should, this could return a clean json object to avoid breakage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this is still WIP. I'll add olsr2 at a later point. as of now the respondd parts are entierly untested and likely there's a segfault somewhere-

Copy link
Member

@christf christf Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you are doing! Please do keep it up! :)
And thank you for raising this early.

@christf
Copy link
Member

christf commented Mar 28, 2022

As much as i understand you, the thing is that this implementation is intended for a gradual rollout over an existing mesh, such that we can just use both old hand-installed and new gluon setups side-by-side. As such I have no choice but to support olsr1 ipv4. Christof Schulze @.***> schrieb am So., 27. März 2022, 23:27:

That seems to me then, that the ipv4 support which collides with a gluen feature is required as a compatibility measure for an existing network. In that case it seems one will have to be deactivated for the migration and either has risks and benefirts:

  • either not build the ipv4 bits into gluon and instead use out-of-tree hacks to do the migration.
  • or build the migration code into gluon and find a way to deactivate the colliding next-node feature.
  • adjust the approach of the migration and treat it like an upgrade across breaking breaking changes for the mesh protocol. This one is tricky but could be done with downlaods and scheduled restarts. I am aware communities have done this in the past when changing from one batman version to an incompatible newer one.

At this point I am uncertain which is more desirable. Personally I would be learning towards the last option even if it induces a bit manual work.

@neocturne
Copy link
Member

At the moment, this PR is using a significant part of our CI resources - the change to the modules file causes a rebuild of all targets for each push. To avoid that, and to make this easier to review, I'd like limit the scope this PR to feature parity with our Babel packages or less. In particular:

  • Remove all IPv4 mesh support
  • Remove the static IP package

As noted by @christf, these features are in conflict with Gluon's design, so it seems unlikely to me that we can accept them in the main repo; the community package repo is a more appropriate place.

I also don't know if the goal of a gradual rollout of Gluon in an existing mesh is feasible at all - we obviously can't change Gluon to be compatible with every existing non-Gluon network.

@mkg20001
Copy link
Member Author

mkg20001 commented Mar 30, 2022

@NeoRaider my biggest concern is having olsrv1 support upstream. adding olsrdv1 ipv6 can also be done in case it's not wanted without it. we can have the ipv4 stuff downstream but I'd prefer to have most if it upstream if possible. I suppose that discussion will be for a later time once things become relevant, after merging basic olsr.

@T-X
Copy link
Contributor

T-X commented Mar 30, 2022

If I understand correctly, both OLSRv1 and OLSRv2 support both IPv4+IPv6, but OLSRv1 has only IPv4 support currently implemented for Gluon in this PR, while OLSRv2 has both?

What would be the plans for the future for OLSRv1? Can OLSRv2 fully replace it in the future or does v1 have certain benefits over v2? Is one considered more stable than the other at the moment? Is OLSRv1 still actively maintained or is it considered deprecated by its maintainer(s)?

My question goes in the direction if OLSRv1 (+IPv4) support in Gluon would be more a temporary measure for a few releases for compatibility to get users on board, to be able to then migrate the whole mesh to OLSRv2 via the Gluon Autoupdater + Gluon scheduled domain switcher later, for instance. Or if there is some permanent need for / benefit from OLSRv1 (+IPv4?) here?

With batman-adv for instance we've done something similar in Gluon in the past: We had the batman-adv 2013.4 (compat 14, "batman-adv-legacy") and batman-adv > 2013.4 (compat 15) in Gluon for a while for people to migrate. And then removed support for batman-adv-legacy later. To my knowledge this step worked fine for everyone. On the other hand, we still have batman-adv with both the protocol variants BATMAN_IV + BATMAN_V, as the latter is still more experimental and also has different, incompatible metrics (but they are at least within the same kernel module).

@mkg20001
Copy link
Member Author

OLSR v1, while intended as a migration, is in itself also pretty stable and it'll be good to have both arround for some time. ipv6 support has also been added to olsr1. It might be worth considering but I don't know if it is worth it. Not sure if OLSRv1 is really deperacted. Most OLSR communities also don't have much automated setup, instead it's usually lots of organic growth that occured over the years, so I'd say graz is prob going to hang on to v1 for at least 5 more years.

@5gbr add your 5cents plz

@mkg20001
Copy link
Member Author

The current state of this PR is that I'm finishing up the respondd code. Once that is done, I'll clean it up a bit and make a seperate PR for just olsr2 with no client AP support and no respondd code (for now)

Note that that means I'll be doing work on 3 branches at the same time, so don't say anything about commits being a mess until it's done. Then I'll split the changes into clean init commits for each feature.

@github-actions github-actions bot removed the 3. topic: config-mode This is about the configuration mode label May 28, 2022
@mkg20001 mkg20001 marked this pull request as draft August 11, 2022 10:41
@AiyionPrime AiyionPrime added the 2. status: merge conflict The merge has a conflict and needs rebasing label Jan 21, 2023
@AiyionPrime AiyionPrime added the 2. status: waiting-on-author Waiting on some action from the author label Mar 22, 2023
@mkg20001
Copy link
Member Author

mkg20001 commented May 7, 2023

this is being split into individual prs, closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. status: merge conflict The merge has a conflict and needs rebasing 2. status: rfc request for comments 2. status: waiting-on-author Waiting on some action from the author 3. topic: continuous integration 3. topic: docs Topic: Documentation 3. topic: firewall 3. topic: hardware Topic: Hardware Support 3. topic: package Topic: Gluon Packages 3. topic: respondd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants