-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asset accessor that uses libcurl directly #634
Conversation
First step to possibly replacing HttpAssetAccessor
Very little of the code needs to know the concrete type of the asset accessor. This is preparation for swapping HttpAssetAccessor for UrlAssetAccessor.
The motivation for this change is to take advantage of reusing the CURL handles, rather than tearing them down and creating them anew for each request. This allows libcurl to keep a connection to a remote server open and reuse it for multiple requests. This avoids renegotating SSL among other things and can be a big performance win.
This appears to be unnecessary when using the C interface to libcurl, as opposed to cpr. The path to the certificate is still stored, but will go away at a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but lacks support for proxies, which I suspect will be an issue for academic and government customers.
There is a great overview on everything.curl.dev regarding supporting proxies. Curl actually does most of the heavy lifting here, but the hard part is actually gathering the correct settings for the proxy.
I want to add, I'm not entirely sure this is required. I believe that libcpr
does give us proxy detection but I can't fully remember and if it doesn't then nothing is broken by us swapping to this.
} | ||
|
||
curl_slist* | ||
UrlAssetAccessor::setCommonOptions(CURL* curl, const std::string& url, const CesiumAsync::HttpHeaders& headers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below are a couple other easy opts that may be useful for performance. Admittedly two of these are taken from the Performance page on everything.curl.dev but in my experience these are the most impactful performance tweaks.
- CURLOPT_BUFFERSIZE controls the preferred size of the receive buffer. The default (usually CURLOPT_MAX_WRITE_SIZE which is usually 16 KiB) is good enough for 90% of use cases, but for larger data like ours it is probably helpful to have it larger. My gut says 512 KiB or 1 MiB, but we may need to do a little testing here. Beware, set it once per handle in the pool, and never set it again, especially if that handle is currently being used for a connection. Bad things will happen.
- CURLOPT_MAXCONNECTS defaults to 5 and controls the maximum number of connections in the connection pool. Raising this can be beneficial when you've got lots of parallel requests happening, but it can also hurt when you start considering HTTP2/3. It probably wouldn't hurt us to raise it slightly. Multi connects defaults to
CURLOPT_MAXCONNECTS * 4
so that might be a good starting place. - CURLOPT_DNS_CACHE_TIMEOUT controls how long before libcurl requests the domain again. The interface libcurl uses to get DNS info doesn't include TTL, so libcurl gives all DNS entries in the cache a TTL of 60 seconds. This is way too low. Should be at least 300 seconds minimum. Higher is better (most DNS servers report 3600 seconds) but comes with caveats that if the DNS server starts reporting something else, the cache could hurt us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below are a couple other easy opts that may be useful for performance. Admittedly two of these are taken from the Performance page on everything.curl.dev but in my experience these are the most impactful performance tweaks.
- CURLOPT_BUFFERSIZE controls the preferred size of the receive buffer. The default (usually CURLOPT_MAX_WRITE_SIZE which is usually 16 KiB) is good enough for 90% of use cases, but for larger data like ours it is probably helpful to have it larger. My gut says 512 KiB or 1 MiB, but we may need to do a little testing here. Beware, set it once per handle in the pool, and never set it again, especially if that handle is currently being used for a connection. Bad things will happen.
Do you know if that's the same as CPR_RESERVE_SIZE? That's set to 3Mb. I will set that.
I'll add the other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe CPR_RESERVE_SIZE
is specific to CPR. That said, I haven't checked but I wouldn't be surprised if under the covers CPR just passes the value to libcurl using this opt.
src/core/src/UrlAssetAccessor.cpp
Outdated
auto request = std::make_shared<UrlAssetRequest>(verb, url, headers); | ||
auto payloadCopy = std::make_shared<std::vector<std::byte>>(contentPayload.begin(), contentPayload.end()); | ||
asyncSystem.runInWorkerThread([promise, request, payloadCopy, this]() { | ||
VSGCS_ZONESCOPEDN("UrlAssetAccessor::request inner"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is leftover tracy stuff from vsgcs? Maybe remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, #defined out, but still cruft. These were good tracing points in vsgCs; I'll see about replacing them with cesium-native tracing calls.
So it turns out CPR doesn't autoresolve proxies like I thought it did, so my comments about the proxy support are moot. Probably needs to be fixed one day. |
Add back the buffersize option from HttpAssetAccessor and also options suggested in the review. Replace vsgCs tracing macro with Cesium Native tracing macro.
I'm seeing SSL errors on Windows:
|
I've seen reports of this on Windows with vsgCs, but it has always worked for me. I think it's sensitive to the exact version of curl+ssl that is downloaded. I will check if cpr has some explicit dependency on SSL / TLS that isn't being satisfied now. |
Test to see if fixes Windows problem. If it does, that is probably a symptom of other configuration issues.
Sean, can you try it on Windows with the latest commit? We thought that the local certificate was redundant, but maybe there are additional hoops to jump through to use OS provided certificates. I would try it myself, but I'm in the airport and don't have any Windows environment installed. |
The latest commit fixes it. It would still be good to know why it's needed for Windows. I was hoping we could get away without shipping certs. |
I wonder how that option translates from CLI to libcurl? |
Hopefully this will remove the dependency on a local certificate.
Another try, this time with the CURLSSOPT_NATIVE_CA option. @lilleyse can you give the latest a quick try? |
@timoore looks like there's a merge conflict now that #626 is merged. I think the strategy would be to take the changes from Also, could you remove |
I did some load time testing on this branch and observed some very positive results compared to current main, and also the version of main prior to this branch fork (to ensure the massive refactor didn't impact anything) Process:
Current main: f89fa4e This Branch: e602ddd Main pre-libcurl updates: d1b7ebc Unless I've made a mistake, that's a 35% load time reduction for a complex Google 3D tiles scene. |
1ec71a8
to
d206fa8
Compare
Change getHttpAssetAccessor() to getAssetAccessor(). Remove the HttpAssetAccessor source and header files and any references to them. Remove any references to the cacert.pem certificate as well as scripts and libraries used to fetch it.
const std::filesystem::path& Context::getCertificatePath() const { | ||
return _certificatePath; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove getCertificatePath
from Context.h
}; | ||
|
||
// Simple implementation of AssetAcessor that can make network and local requests | ||
class UrlAssetAccessor : public CesiumAsync::IAssetAccessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UrlAssetAccessor
, UrlAssetResponse
, UrlAssetRequest
could all be marked final
|
||
void tick() noexcept override; | ||
CurlCache curlCache; | ||
std::string userAgent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can userAgent
be private?
CacheEntry() | ||
: curl(nullptr) | ||
, free(false) {} | ||
CacheEntry(CURL* in_curl, bool in_free) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cesium-native I see the convention free_
a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that too and will make the change.
bool free; | ||
}; | ||
std::mutex cacheMutex; | ||
std::vector<CacheEntry> cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the CURL
objects need to be explicitly freed somewhere? Like when CurlCache is destroyed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't need to be freed for the duration of the program. It's complicated to figure out the multi-threading implications of when a CURL object is in use or may be freed, and this seems to be a weakness in the curl design. I thought it was best to blow off freeing the CURL objects and just let them be deleted when the program stops.
#include <cstring> | ||
|
||
namespace cesium::omniverse { | ||
const auto CURL_BUFFERSIZE = 3145728L; // 3 MiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know when the buffer gets freed? Is it when curl_easy_reset
is called, or when the CURL
object is destroyed? Hopefully not the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think this could be shrunk to 0.5 MiB given the avg tile sizes in CWT and Google.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe curl_easy_reset() doesn't do much except to prepare for another request. For example, it doesn't tear down an existing connection. The buffer freeing should probably be verified, either by examining the code or with a debugger.
src/core/src/UrlAssetAccessor.cpp
Outdated
} | ||
|
||
const std::string& method() const override { | ||
return this->_method; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nitpick: convention in this repo is to omit this->
. Though looks like native is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to omit this->
too. I probably copied the code directly from cesium-unreal.
asyncSystem.runInWorkerThread([promise, request, this]() { | ||
CESIUM_TRACE("UrlAssetAccessor::get"); | ||
CurlHandle curl(this); | ||
curl_slist* list = setCommonOptions(curl(), request->url(), request->headers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the overhead of calling setCommonOptions
? Many of the options won't change per-request. Could some of them be set when acquiring a new handle?
If there's minimal overhead then no need to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that it is harmless to set the options each time and that it's not too obvious from the docs what options persist after the curl_easy_reset() call. Merits further examination.
Also, move CurlHandle out of public UrlAssetAccesor.h
@timoore could you update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just update CHANGES.md
, ThirdParty.extra.json
, and copyrights, and then you can go ahead and merge.
Update ThirdParty.extra.json.
This PR replaces the HttpAccessAccessor with the accessor from vsgCs, called UrlAssetAccessor. UrlAssetAccessor caches the CURL handle between requests, allowing libcurl to keep connections open to remote servers. This can be a big performance gain, especially for https connections. Apparently this was always problematic in the cpr library used by HttpAccessAccessor.
There was also some cleanup in the rest of the code to refer to CesiumAsync:IAssetAccessor instead of a concrete accessor type where possible.