Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user-specified HTTP_PROXYs for opportunistic resources #600

Open
khurtado opened this issue Aug 8, 2017 · 13 comments
Open

Allow user-specified HTTP_PROXYs for opportunistic resources #600

khurtado opened this issue Aug 8, 2017 · 13 comments

Comments

@khurtado
Copy link
Contributor

khurtado commented Aug 8, 2017

Currently, lobster tries to detect an HTTP_PROXY on the Worker and it also tries to detect a proxy on the Master machine as a fallback.

This works fine on:

  • Lobster running at Notre Dame, because it uses eddie.crc.nd.edu:3128 detected from the master machine and all WNs have access to it.
  • CMS Connect and OSG: Because lobster is smart enough to use GLIDEIN_Proxy_URL / OSG_SQUID_LOCATION / cvmfs local siteconf in the WN, which are set by GlideinWMS / OSG stuff

But it breaks if:

  • You are not running on GlideinWMS/OSG managed worker nodes and the HTTP_PROXY detected in the master is not accessible by these worker nodes.

A workaround for this is using export HTTP_PROXY=something before starting your work_queue_factory, because the factory exports the submit environment to the worker nodes, but this breaks work_queue, since the WQ catalog will try to connect through this proxy and that's not guaranteed to work.

My current workaround is exporting GLIDEIN_Proxy_URL before running the factory instead. This makes work queue connect to the catalog without proxies but parrot will use it for CVMFS.

We should probably let the user specify the fallback proxy as an advanced parameter, so it only tries to detect a proxy in the master machine if this advanced parameter is unset. That proxy will only be used in cases the wrapper in the WN can't detect a valid proxy. Does that make sense? Is there a better approach to solve this?

@klannon
Copy link
Contributor

klannon commented Aug 8, 2017

This feels to me like something that WQ (or probably better yet, VC3) should be taking care of. The proxy used by the worker should be a function of where the worker runs. Specifying it in Lobster means that you can't have a proxy that is specific to the network each worker is on. I guess, as a fallback, it could be specified in Lobster, but I'll bet you that most users don't have access to a proxy server to which they can direct their traffic. Could this be made part of the WQ or VC3 environment?

@khurtado
Copy link
Contributor Author

khurtado commented Aug 8, 2017

I guess we could have the VC3 glidein set GLIDEIN_Proxy_URL. Lobster wouldn't need any changes this way.

@klannon
Copy link
Contributor

klannon commented Aug 8, 2017

In terms of VC3, I think the real question is how does the user of VC3 specify that a proxy is needed, and if so, how does VC3 ensure the site is providing one?

@khurtado
Copy link
Contributor Author

khurtado commented Aug 8, 2017

There is a resource provider specification that somebody (system admin, etc) has to fill in at some point with head node information, resource management (condor, slurm, pbs...), etc... so the proxy (if available) could be another entry for this specification.

EDIT: Looking at the vc3 client, the user can currently specify special variables in the environment like the http proxy per target. The user then would need to know how this is used in its application. Like, I could specify in my vc3 request I want to set GLIDEIN_Proxy_URL = myproxy.uchicago.edu (or HTTP_PROXY?) for my UChicago target, and I know lobster will use it for CVMFS / parrot. CMS Sites just advertise that info in the cvmfs SITECONF and applications like CRAB know how to look for it.

@matz-e
Copy link
Member

matz-e commented Aug 9, 2017

Have you tried setting LOBSTER_CVMFS_PROXY before running the factory? That is the last fallback that would not collide with any other use case and should be the proper variable, and I think I added that fallback with a situation like yours in mind.

If setting that works for you, we should add it to the documentation somewhere.

@khurtado
Copy link
Contributor Author

khurtado commented Aug 9, 2017

@matz-e : Yeah, I tried that first, but I was still getting eddie, which is why I ended up using GLIDEIN_Proxy_URL instead. I think this is because the environment for LOBSTER_CVMFS_PROXY and LOBSTER_FRONTIER_PROXY are overwritten by the master.

Changing the above would be easy, but I thought having an advanced parameter to avoid the user having to export environment variables prior to running their factories would be better.
For VC3, I think having the environment variables would be be better though, since a vc3-user can set environment variables such as the proxies for each target site. How these environment variables are used depend on the user application.

@matz-e
Copy link
Member

matz-e commented Aug 9, 2017

You're right, @khurtado, I forgot about that.

Yes, an advanced configuration parameter would probably be best, since it's easy to forget to export custom settings. In addition, we could add yet another environment variable, i.e., LOBSTER_USER_PROXY that overrides all settings, when running several factories? Possibly like VC3? I'd rather have a dedicated environment variable for that then relying on overwriting something that the submission infrastructure normally uses.

@khurtado
Copy link
Contributor Author

khurtado commented Aug 9, 2017

I like that, it would cover both situations. @klannon, opinions?

@klannon
Copy link
Contributor

klannon commented Aug 22, 2017

Sorry to leave this sit for so long. I'm confused, why wouldn't we just change the master behavior not to overwrite the existing LOBSTER_*PROXY variables if they're already defined in the worker environment. I think the proxy is really something that needs to be customized to each individual worker, so the master should only take care of providing some values if those values haven't already been supplied in some other way. Does that make sense?

@matz-e
Copy link
Member

matz-e commented Aug 22, 2017

Overwritten is not the right word here… those variables are used to tell the worker what the master uses. This works via WQ, where the worker sets these environment variables to the master values before the wrapper is executed. Hence my suggestion to add a few dedicated variables to set the proxy for this case. I would suggest setting the proxy to LOBSTER_USER_PROXY if present, either overriding detection or as fallback, and allowing to customize what the master sends to the worker in terms of defaults.

@klannon
Copy link
Contributor

klannon commented Aug 22, 2017

I guess I'm arguing for a bigger shift. Basically, take the responsibility for setting these values completely away from the master. In default Lobster usage (e.g. non-VC3) couldn't we just as easily include these variables as part of the factory config? (Maybe @btovar could weigh in?) In VC3 usage, providing a proxy server and communicating that to the task is part of specifying the necessary resources, I think, not something the task should be doing for itself.

So, basically, I guess what I'm arguing for is to have Lobster not set those values at all in the master, but instead make it part of what you need to do to set up the worker. Do you think that would work?

@matz-e
Copy link
Member

matz-e commented Aug 23, 2017

I introduced those settings in #298, to remove having our T3 stuff hardcoded. In principle, these settings are worker-specific and should not be set on the master.

For user convenience, particularly for running at Notre Dame, the code as is makes sense. Minimal user effort to start an instance of Lobster that just works. If we can provide these values as factory configuration values, I'm OK with removing/reverting the settings. After all, that will shrink the code base, and make the master more robust.

@matz-e
Copy link
Member

matz-e commented Aug 23, 2017

I don't see any options in the factory or worker to specify environment variables. @btovar, if we could have a factory setting environment that takes a dictionary of environment variables to explicitly set for each worker, that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants