Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The content type of page directive in jsp code is ignored. #22

Closed
mvcbot opened this issue Dec 7, 2018 · 11 comments · Fixed by #170
Closed

The content type of page directive in jsp code is ignored. #22

mvcbot opened this issue Dec 7, 2018 · 11 comments · Fixed by #170
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@mvcbot
Copy link

mvcbot commented Dec 7, 2018

Issue by mvcbot
Tuesday Jan 05, 2016 at 02:06 GMT
Originally opened as mvc-spec/ozark#83


Original issue OZARK-67 created by kogawa88:

When fowarding the following jsp page via Ozark, the response always writes the body content with UTF-8 encoding.

<%@ page contentType="text/html; charset=Shift_JIS" pageEncoding="UTF-8" session="false" %>
<html>
..... (snip) .....
</html>

This jsp returns the response with Shift_JIS charset encoding.
The issue can be avoided using JAX-RS api, but it means duplicate declaration of the content type concurrently.
|| How to declare page uri || ||
| @view annotation | Declare @produces("text/html; charset=Shift_JIS"). |
| Return of uri string | Declare @produces("text/html; charset=Shift_JIS"). |
| Return of JAX-RS response | Set the MediaType.TEXTHTML_TYPE with ShiftJIS charset to the response. |

There is no problem if using servlet based MVC framework (like Struts).

@mvcbot mvcbot added the bug Something isn't working label Dec 7, 2018
@mvcbot
Copy link
Author

mvcbot commented Dec 7, 2018

Comment by mvcbot
Saturday Mar 12, 2016 at 15:23 GMT


Comment by lefloh:

Thanks for reporting. I started a discussion in the (https://java.net/projects/mvc-spec/lists/users/archive/2016-03/message/15).

@mvcbot
Copy link
Author

mvcbot commented Dec 7, 2018

Comment by chkal
Monday Feb 26, 2018 at 20:01 GMT


A few thoughts on this one:

The pull request mvc-spec/mvc-spec#132 will change how view engines write the result of processing the view to the client. So handling of content type and charset is much more explicit.

IMHO the content type and encoding negotiation should be done by JAX-RS. That's how it works for all other JAX-RS responses.

Perhaps we could simply prevent that the JSP directives do any harm by modifying HttpServletResponse.setContentType() in our wrapper so that calling the method actually doesn't modify the content type? Perhaps it could emit a warning?

@mvcbot
Copy link
Author

mvcbot commented Dec 7, 2018

Comment by chkal
Monday Feb 26, 2018 at 20:02 GMT


@kogawa88: Did you file this issue back then?

@chkal chkal added this to the 1.0.0-m06 milestone Jan 19, 2019
@chkal chkal added the help wanted Extra attention is needed label Jan 19, 2019
@lefloh lefloh self-assigned this Jan 20, 2019
@lefloh
Copy link
Contributor

lefloh commented Feb 3, 2019

To sum up the discussion as the old mailing list is gone: We have two ways to specify the response encoding:

  1. In a controller: @Produces("text/html; charset=utf-8")
  2. In a view: <%@ page contentType="text/html; charset=utf-8" %>

The spec only states:

Moreover, the default media type for a response is assumed to be text/html, but otherwise can be declared using @Produces just like in JAX-RS.

I'm with Christian and would also prefer to specify the encoding in a controller. But other developers might prefer the other way.

Instead of defining strict rules which way wins I'd propose that view engines should ensure encoding works or otherwise log a warning:

  • Everything is fine if a developer uses only one way or defines the same encoding in controller and view
  • If a developer defines conflicting encodings Krazo should log a warning
  • If the view engine cannot use the defined encoding Krazo should log a warning
  • If a developer defines nothing UTF-8 is used

This means that there could be view engines which support both ways but also some which support only one way.

I'm not too strong on this one if anyone comes up with a better solution.

@chkal
Copy link
Contributor

chkal commented Feb 10, 2019

Hi @lefloh,

thanks for the great summary. I just saw that the original issue is over 3 years old. So we should finally try to fix this. :-)

Let my try to explain what my understanding of the response media type handling in plain JAX-RS is.

  1. You can use @Produces to tell JAX-RS that your resource method is able to produce a certain media type. The primary role of this annotation is to help JAX-RS to find the resource method for the media type the client is requesting via the Accept HTTP header. A minor side effect of this annotation is that the specified media type is used as a default if none of the following components change it. BTW: I'm actually not sure if the media type parameters (like charset) are actually considered for the matching algorithm. AFAIK they don't play any role if media types are compared. So setting media type parameters in @Produces doesn't make much sense in plain JAX-RS.
  2. The resource can use response.type(MediaType) to explicitly set the media type for the response. This is important, because the media type is used to find the correct MessageBodyWriter (MBW).
  3. A ContainerResponseFilter could change the media type by calling context.setEntity(). But I guess that is a very rare case.
  4. JAX-RS will select a MBW for serializing the entity to the desired media type. A MBW will get the media type as a parameter for the writeTo() method AND (at least Jersey) will already prepare a Content-Type HTTP response header in the provided mutable header map. However, a MBW could use a different content type and/or encoding for the response. This will work if the MBW also updates the HTTP response header accordingly. In the worst case, the MBW could even produce a completely different content type which the client never requested, but of course this isn't something a MBW should do.

So for JAX-RS there are many places where the response media type can be set. And actually it is very well-defined who wins, as all these steps are executed in order and every stage can change the media type set by the previous stage.

What does this mean for MVC? Perhaps we should simply do it the same way. And actually I think that we are already doing it almost exactly like that.

  1. Users can set the media type (and the encoding) explicitly via @Produces. The spec states that text/html is the default, so most users won't need to do that. I'm not sure if users should set the response encoding via the annotation at all, because it would be a media type parameter (like text/html; charset=utf8) which is IMO not considered for the matching and should therefore be specified at a later stage.
  2. Controllers could return a Response with a custom media type (which would overwrite the default) and the view name as the entity, but I don't think that developers will want to do this. However, if they do it, it would work just like in the JAX-RS case.
  3. ContainerResponseFilter implementations could also change the media type just as in the JAX-RS case.
  4. Finally, there is the MBW part, which is implemented by Krazo and simply delegates the real work to the ViewEngine. A view engine can act exactly like an MBW in JAX-RS. It could change the media type and/or encoding if required, but the view engine MUST update the Content-Type header correctly. And I think it is fine to allow view engines to do it, if there is a good reason. If the view technology allows the developer to configure the encoding or the media type as part of the template, we should allow the view engine to set it accordingly.

I think Krazo is currently handling 1-3 perfectly fine. And that's great. The only thing we have to ensure is that 4 works correctly. And IMO there are two cases.

  • All the 3rd party view engines already handle this correctly. All of them use this method to get the encoding from the media type. If there is no encoding, it falls back to UTF-8 AND sets the Content-Type header correctly to reflect this.
  • The builtin support for JSP/Facelets uses Servlet forwards which makes things a bit more complicated. JSP for example allows to set the media type and the encoding via JSP tags. AFAIK the tags just delegate this to HttpServletResponse.setContentType() and HttpServletResponse.setCharacterEncoding(). But IIRC we are creating a custom response and pass this one down with the Servlet forward. Therefore, calling response.setContentType() will most likely be simply ignored (as it is not implemented by the wrapper). What we COULD do instead is to intercept calls to setContentType() and setCharacterEncoding() and do what a standard JAX-RS MBW would do: Change the header map provided to the writeTo() method to change the headers sent to the client. My guess is, that this would simply work and fix this issue.

To sum up. I don't think that we need to change anything on the specification level. And we don't need to define where setting the media type and encoding is allowed and where it is not. We just need to ensure that the last component in the chain (which is the MBW/ViewEngine) correctly applies the value provided by the previous components AND also allow to change it even in the case of Servlet forwards (which could be more or less).

Sorry for the long text. Any thoughts? :-)

@lefloh
Copy link
Contributor

lefloh commented Feb 17, 2019

Thanks for your detailed response @chkal! Thought about this for a while and it absolutely makes sense.
I'll try to fix it like described the next days.

@chkal chkal removed this from the 1.0.0-m06 milestone Oct 23, 2019
@lefloh
Copy link
Contributor

lefloh commented Apr 4, 2020

Hi @chkal,

Intercepting setContentType() and setCharacterEncoding() alone does not work because we've already initialised a PrintWriter using an OutputStreamWriter with the charset from the Content-Type header.
#170 initialises the PrintWriter lazy with the character encoding from the page directive and updates the Content-Type header. Not the prettiest solution but fixes the original issue.
But how should the ViewEngine know if there's a good reason to update the header?
I'm doing this only if the Content-Type is not text/html which is used if nothing is specified in the JSP. Otherwise @Produces would never work and some Tests like MediaTypeTest are broken. Therefore I also removed the conflicting page directive from hello.jsp to make ProducesIT work again.

What still does not work is using a JSP without a Content-Type attribute in the page directive. At least not without encoding issues. The reason is, that the JSP implementation assumes that a JSP is written in ISO-8859-1 when no Content-Type page attribute is found. So the encoding is broken if the file encoding is not ISO-8859-1. We could force UTF-8 here as it is our default but I don't know if it's worth it.

Thoughts?

@chkal
Copy link
Contributor

chkal commented Apr 11, 2020

Hey @lefloh. Sorry for the delayed response. Thank you so much for spending time on this issue. I'll try to find some time at the weekend to have a deeper look at this. I guess I need to dig a bit deeper into the topic to give qualified thoughts. :-)

@lefloh
Copy link
Contributor

lefloh commented Apr 11, 2020

You were waiting for more than one year for my response so no need to say sorry ;-)

@chkal
Copy link
Contributor

chkal commented Apr 13, 2020

Hey @lefloh,

I just had a deeper look at your changes. I think you are definitely on the right track. 🍻

A few thoughts regarding your questions:

But how should the ViewEngine know if there's a good reason to update the header?

Creating the PrintWriter lazily is definitely a great idea. This allows us to handle calls to setContentType() before getWriter() is called. But maybe we should not change the header map directly in setContentType() but later, just before we create the PrintWriter? Wouldn't this fix that problem?

Maybe we can store the "current" charset in a field in MvcHttpServletResponse and use it like this:

  • The field get initialized with the charset which we find in the Content-Type header in the JAX-RS header map. If we don't find a charset this way, we can either fallback to UTF-8 or ISO-8859-1.
  • If someone calls setContentType() and the content type contains a charset, we update the field.
  • When the JSP engine (or somebody else) calls getWriter(), we perform the following steps:
    • We build the correct content type using the "current" charset from the class field and update the JAX-RS header map.
    • Then we create a PrintWriter using this charset and return it to the caller.

If my understanding is correct and if the JSP engines always uses getWriter() this could perhaps work. Or am I missing something?

@lefloh
Copy link
Contributor

lefloh commented Apr 19, 2020

Hi @chkal,

thanks for your suggestions! I updated the PR. Is this how you expected it?
I'm using setCharacterEncoding to store the current charset or do you want to duplicate this?

ProducesIT is now working although hello.jsp defines a different content type. So now character encoding should work but it's not possible to set another content type via the page directive. I'm fine with that.

@chkal chkal added this to the 1.1.0 milestone May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants