Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdnizer 2.0.0 appears to apply quotes randomly #29

Closed
jgonggrijp opened this issue Aug 9, 2018 · 14 comments · Fixed by #40
Closed

cdnizer 2.0.0 appears to apply quotes randomly #29

jgonggrijp opened this issue Aug 9, 2018 · 14 comments · Fixed by #40

Comments

@jgonggrijp
Copy link
Contributor

Input:

<script src="/node_modules/jquery/dist/jquery.js"></script>
<script src="/node_modules/lodash/lodash.js"></script>
<script src="/node_modules/backbone/backbone.js"></script>
<script src="/node_modules/handlebars/dist/handlebars.runtime.js"></script>

cdnizer call:

var cdnizer = require('gulp-cdnizer');

gulp.src('index.html')
    .pipe(cdnizer({ files: [{
        file: '**/jquery/**',
        package: 'jquery',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' 
    }, {
        file: '**/lodash/**',
        package: 'lodash',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/${filenameMin}'
    }, {
        file: '**/backbone/**',
        package: 'backbone',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
    }, {
        file: '**/handlebars/**',
        package: 'handlebars',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/dist/handlebars.runtime.min.js'
    }]}))
    .pipe(gulp.dest('dist'));

Output (note the incoherent presence and absence of quotes):

<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/lodash.min.js"></script>
<script src=https://cdnjs.cloudflare.com/ajax/libs/backbone/1.3.3/backbone.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/[email protected]/dist/handlebars.runtime.min.js></script>

Output after a second run:

<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/[email protected]/lodash.min.js"></script>
<script src=https://cdnjs.cloudflare.com/ajax/libs/backbone/1.3.3/backbone.min.js"></script>
<script src=https://cdn.jsdelivr.net/npm/[email protected]/dist/handlebars.runtime.min.js></script>

My browser (Safari 11) attempts to load https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js%22, which fails with a 403 status, so the uneven quoting really breaks things.

@OverZealous
Copy link
Owner

Hey there! Sorry you are having a problem here.

This issue is related to the leading **. I'm not sure why it happens, and I can't easily fix it right now (I don't have a lot of bandwidth for this project any more).

Luckily, the solution / workaround is easy: don't start with **. Either use /**/... or /node_modules/... for your matchers, and you'll have consistent output again. So simply change your file matchers to be:

files: {
        file: '/**/jquery/**',
        package: 'jquery',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' 
    }, {
        file: '/**/lodash/**',
        package: 'lodash',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/${filenameMin}'
    }, {
        file: '/**/backbone/**',
        package: 'backbone',
        cdn: 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}'
    }, {
        file: '/**/handlebars/**',
        package: 'handlebars',
        cdn: 'https://cdn.jsdelivr.net/npm/${package}@${version}/dist/handlebars.runtime.min.js'
    }

Though I recommend you use a more specific matcher like /node_modules/... to prevent accidentally matching something you don't want. In fact, you probably should use /node_modules/jquery/**/*.js, which ensures you are only matching JavaScript files. This also reduces the burden on the regular expression.

Thanks for the report! I'll leave this open for others to find until I have a chance to fix the issue, but it could be a long time.

OverZealous added a commit that referenced this issue Aug 9, 2018
- This doesn't actually fix the issue, but provides some basic code that can be used to verify the issue is fixed in the future.
@OverZealous
Copy link
Owner

Note to self: I think this is directly related to issue #20, which also featured a leading ** pattern.

jgonggrijp added a commit to UUDigitalHumanitieslab/readit-interface that referenced this issue Aug 9, 2018
@jgonggrijp
Copy link
Contributor Author

Thanks for you quick response and suggestions.

At first I didn't want to use the more specific prefixes you're suggesting, because the input html is generated from a template in which the node_modules prefix is taken from a configuration file. The prefix could be anything, for example /node_modules but also ../node_modules or vendor. '**/jquery/**' is the only glob pattern that matches all scenarios.

Still, I reached a compromise that kind of works. I changed the glob pattern to **/node_modules/jquery/**. This rules out the vendor prefix but not /node_modules or ../node_modules. This yields correctly quoted replacements with /node_modules while still giving me incorrect quotes with ../node_modules.

At this point, I discovered something. If I comment out the second pattern in lib/utils.js (script tag with src attribute without quotes) and use ../node_modules as prefix, cdnizer does not replace the links anymore. After having read #20, I understand that both script patterns in utils.js probably match my script tags (the second yielding a quoted url), but minimatch only matches my glob pattern to the relative prefix if it is quoted (so it matches "../node_modules/jquery/jquery.js" but not ../node_modules/jquery/jquery.js. The first quote is lost, so it doesn't reappear in the cdnized html, but the second quote is treated as the last character of the file name. Hence, a src attribute that starts without a quote but ends with a quote.

As it stands, I'm better off using /**/jquery/** as you suggested. Relative urls don't get replaced, but at least they remain properly quoted and I can work with absolute urls that don't contain node_modules.

I can think of a workaround for this scenario: after passing the url to minimatch, strip quotes from the ${filename} and the ${filenameMin} (they shouldn't be there anyway). You end with an unquoted src attribute while the input was quoted, but at least the html isn't broken. If you want, I can send a pull request that implements this.

@OverZealous
Copy link
Owner

I'm not sure I'd be OK always outputting a modification without quotes. The regexes a supposed to preserve the existing quotes as a best as they can. There's clearly some quirks with this library, it's definitely not a "smart" library (as it doesn't parse the HTML at all).

Honestly, I'd hate to see you put in the time, it could be weeks before I have any free time to dig into this further. So I don't want to promise I'll accept or even get time to review any PRs in the near future.

@jgonggrijp
Copy link
Contributor Author

If it puts your mind at ease: the workaround that I'm proposing doesn't always cause cdnizer to strip quotes in the replacement. Rather, it only does so in the pathological case that this issue ticket is about. I think that roughly the following is currently happening.

  1. The first pattern in utils.js matches my script tag: <script src="../node_modules/jquery/dist/jquery.js"></script> --> pre = '<script src="', url = '../node_modules/jquery/dist/jquery.js', post = '"></script>'.
  2. Minimatch doesn't match globs starting with ** to relative urls, so no replacement is done.
  3. The second pattern in utils.js matches my script tag, too: pre = '<script src=', url = '"../node_modules/jquery/dist/jquery.js"', post = '></script>'.
  4. For some mysterious reason, minimatch does match globs starting with ** to paths starting with a quote.
  5. Cdnizer splits the url on the last slash, obtaining jquery.js" as the ${filename} and jquery.min.js" as the ${filenameMin}.
  6. Cdnizer replaces pre + 'https://cdnjs.cloudflare.com/ajax/libs/${package}/${version}/${filenameMin}' + post, obtaining <script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>.

Now, html attribute values logically shouldn't ever contain quotes, since quotes are reserved for delimiting them. If you find a trailing quote in step 5, then the previous steps must have gone wrong as I described. So you can safely strip that trailing quote.

Another option would be to prevent step 3 from happening. You can stop the pattern from matching quoted urls by adjusting the middle group. Right now, that group is ([\s\S]+?), essentially consuming anything until the last group is matched. You could change that into ([^\s"'>]+), which I'd argue is more correct. This solution fixes the problem in a strictly more correct way, but prevents replacements in my use case with relative paths and might break some other use cases.

Either way, my pull request would likely affect only one or two lines of code. Reviewing that would take less of your time than this discussion has already taken. I'll wait for your green light before doing anything, though.

@OverZealous
Copy link
Owner

I like your solution to fixing the regex, which is definitely a bit too aggressive. Thanks for taking the time to look into this, I'd definitely accept the PR. The only thing left is I'm internally debating if this is a breaking change or not. Sadly, I think it is, even though it's really a bug fix, but someone could, theoretically, be relying on that aggressive regex.

I understand that the fix is simple, but I have to decide whether this would be a breaking change or not, and ensure that there's tests in place, etc. Then I have to deploy it. (It's usually rather quick, but I have a full time job, a family, etc. So time is always at a premium.)

Honestly, I'd rather it take longer to discuss a fix than review it! It means that the problem was better understood, and the solution wasn't rushed. Especially with a library like this, where I only touch it once in a while, so it always takes me some time to remember how everything works.

@jgonggrijp
Copy link
Contributor Author

I'll patiently wait for your internal debate to come to a conclusion.

In the meanwhile: do I understand correctly that you prefer the regex adjustment over the quote stripping?

@OverZealous
Copy link
Owner

Oh, sorry, I meant that I'll totally accept a PR for fixing the regex.

The only decision then is whether to deploy it as a breaking (major) change or bugfix (minor change), but I'll probably just roll it out as a breaking change on both repos (cdnizer/gulp-cdnizer) to avoid any headaches.

@jgonggrijp
Copy link
Contributor Author

I think some of the other regexes could be a bit stricter, too. Should I submit fixes for all regexes, given that you'll bring out a major release anyway?

@OverZealous
Copy link
Owner

Sure, that would be great, thanks!

@nemoDreamer
Copy link

Hi there,
has there been a resolution to this issue?
I'm currently using cdnizer via webpack-s3-plugin, and this issue is popping up in the generated HTML script/style tags.
Can't apply the workaround, since this is internal to the plug-in.
Will try to workaround by fiddling Webpack's output.publicPath.

@jgonggrijp
Copy link
Contributor Author

I think I forgot about this. I'll put it back on my to-do list, but I can't currently give any prediction on when I'll be able to submit the PR as agreed.

jgonggrijp added a commit to jgonggrijp/cdnizer that referenced this issue Jun 26, 2022
jgonggrijp added a commit to jgonggrijp/cdnizer that referenced this issue Jun 26, 2022
At first sight, the failing test seems to suggest that every odd
<script> has a quote stripped while every even <script> does not. This
order change reveals that it is not the order, but the content of the
src= attribute that influences the behavior. The URLs from cdnjs have
a quote stripped, the URLs from jsdelivr do not.
jgonggrijp added a commit to jgonggrijp/cdnizer that referenced this issue Jun 26, 2022
jgonggrijp added a commit to jgonggrijp/cdnizer that referenced this issue Jun 26, 2022
@jgonggrijp
Copy link
Contributor Author

I just debugged the issue in detail and found that the cause is a bit different from what I thought. In the index.js, an outer loop goes over all patterns. In the inner loop (contents.replace), all matches of the pattern are replaced. Since the patterns were not mutually exclusive (at least not until the fix I'm about to submit a PR for), sometimes the same URL is replaced twice with alternative patterns (first with the quoted version and then with the unquoted version). The second iteration, which shouldn't happen, causes the juggling with the quotes.

@OverZealous
Copy link
Owner

Closed by #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants