Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Draft PR for improving connectivity testing to capture resolved IPs and all attempts #228

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

amircybersec
Copy link
Contributor

@amircybersec amircybersec commented May 2, 2024

Note: The code is still messy not ready for merge. I made this PR to discuss upcoming high-level changes to the testing approach and format for connectivity testing pursuant to #209 (comment)

I moved the loop that iterates over ip addresses outside of the intercept dialer. This way in each iteration I am generating a new intercept dialer that takes one of the resolved addresses and use that to continue the end-to-end test.

In this approach, I am allowing each attempt to run end-to-end. The test may fail earlier for example at connect attempt. In that case, it won't continue to the next stage. However each attempt could continue to the end and test for connect/read/write operations for each resolved IP.

We also need to decide if test for all resolved IPs or abort at success.

Format

I am proposing below format that we can use for both udp and tcp.

List of suggested format changes:

  • Each attempt should include a start-time and duration
  • The error captured for each attempt should be a ConnectivityError type that includes op, PosixError, Error fields.
  • I am uncertain about keeping selected_address in the main report. It is hard to tell which address this should be populated with.
  • I am uncertain what error field in the main report should be populated with. What if the test makes 3 attempts and all of them fail with different types of errors. Does 'error' then reflect? I guess when one of the attempts succeed error field should be nil.

TODO: The code is slow and I am planning to optimize it such that tests run in an independent treads in parallel.

TODO: Update the format in the code to match the proposal

{
    "resolver": "8.8.8.8:53",
    "proto": "udp",
    "transport": "ss://[email protected]:65496?prefix=",
    "endpoint": {
        "host": "pod2.exampledomain.com",
        "port": "65496"
    },
    "attempts": [
            {
                "address": {
                    "host": "2606:4700:3032::6815:xxxx",
                    "port": "65496"
                },
                "time": "2024-05-02T05:02:13Z",
                "duration_ms": 1000,
                "error": {
                    "op": "connect",
                    "msg": "i/o timeout"
                }
            },
            {
                "address": {
                    "host": "2606:4700:3032::ac43:xxxx",
                    "port": "65496"
                },
                "time": "2024-05-02T05:02:13Z",
                "duration_ms": 1000,
                "error": {
                    "op": "read",
                    "msg": "reset"
                }
            },
            {
                "address": {
                    "host": "172.67.138.xx",
                    "port": "65496"
                },
                "time": "2024-05-02T05:02:13Z",
                "duration_ms": 1000,
                "error": {
                    "op": "connect",
                    "msg": "i/o timeout"
                }
            },
            {
                "address": {
                    "host": "104.21.38.xx",
                    "port": "65496"
                },
                "time": "2024-05-02T05:02:13Z",
                "duration_ms": 1000,
                "error": {
                    "op": "connect",
                    "msg": "i/o timeout"
                }
            }
     ],
    // maybe we should remove this; what if both IPv4 and IPv6 addresses succeed, 
    // then which one is the selected address?
    "selected_address": { 
        "host": "104.21.38.199",
        "port": "65496"
    },
    // This would be the start-time of the main test and the duration for the whole test
    "time": "2024-05-02T05:02:13Z",
    "duration_ms": 20006,
    // maybe we should remove this one too or make it a summary of all errors in the all attempts. 
    // I am not sure what this field would reflect is we have two failed attempts. I guess this should be `nil`
   // if at least of the tests pass?
    "error": {
        "op": "connect",
        "msg": "all connect attempts failed. no more addresses to try"
    }
}

@amircybersec amircybersec changed the title WIP: Draft PR for improving connectivity testing to capture revolved IPs and all attempts WIP: Draft PR for improving connectivity testing to capture resolved IPs and all attempts May 2, 2024
@amircybersec
Copy link
Contributor Author

@fortuna I appreciate your feedback on the format and the proposed approach for testing. This is in reference to #209 (comment)

@@ -268,6 +268,11 @@ func NewUDPResolver(pd transport.PacketDialer, resolverAddr string) Resolver {
return nil, &nestedError{ErrDial, err}
}
defer conn.Close()
// force close connection is context is done/cancelled.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fortuna I have to add this here to close the connection for pending receives (in case of timeouts) when another attempt succeeds. I managed to do this with the following change. I am basically cancelling the passed context in connectivity.go and checking if context is done. This is the best I came up with but I am not sure if there's a more elegant way to accomplish this.

dns/resolver.go Outdated
@@ -391,3 +396,14 @@ func NewHTTPSResolver(sd transport.StreamDialer, resolverAddr string, url string
return &msg, nil
})
}

func wrappErrors(err1, err2 error) error {
Copy link
Contributor Author

@amircybersec amircybersec May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fortuna we need to use this function instead of errors.Join() since errors.Join() breaks the wrapped chain of errors. when we are unwrapping the errors with unwrapAll() in makeErrorRecord in test_connectivity.go (

record.Msg = unwrapAll(result.Err).Error()
), the errors don't get fully unwrapped and can leak unwanted error messages that can contain sensitive information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants