Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saveWorbook() creates "corrupted" excel file when autoFilter is used #127

Closed
JMPivette opened this issue Dec 14, 2020 · 9 comments
Closed

Comments

@JMPivette
Copy link
Contributor

Same bug as:
awalker89/openxlsx#524

Describe the bug
If autoFilter is use inside an Excel file, saveWorkbook() will create a "corrupted" file.

To Reproduce

  1. Use loadWorkbook() on this file:
    auto_filter_errors.xlsx

  2. Save the workbook using saveWorkbook()

  3. Open the created file with Excel

  4. Excel gives the following error:
    We found a problem with some content in ’filter_output.xlsx’. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
    Looking at the logs:
    <removedParts summary="Following is a list of removed parts:"><removedPart>Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Xml parsing error Line 1, column 1518.</removedPart>

Desktop :

  • R: [4.0.2]
  • Version [4.2.3]
@JMPivette
Copy link
Contributor Author

I just compared the input and output xml files and there are missing end-tags at the end of the XML:

  • Input file:
	<autoFilter ref="A1:C1">
		<sortState ref="A2:C5">
			<sortCondition ref="B1"/>
		</sortState>
	</autoFilter>
        ...
  • Output file:
	<autoFilter ref="A1:C1">
		<sortState ref="A2:C5">
			<sortCondition ref="B1"/>
                         ....

@bsleik
Copy link

bsleik commented Dec 14, 2020

Thanks for reopening this issue under this repository and finding the root cause.

@bsleik
Copy link

bsleik commented Dec 14, 2020

Look like it happens (missing end tags) anytime there are additional tags added to the autofilter - not just the sort. Another case I just came across: "<autoFilter ref="A1:IZ2096"><filterColumn colId="0"><filter val="202065"/>"

JanMarvin added a commit to JanMarvin/openxlsx that referenced this issue Dec 20, 2020
@JanMarvin
Copy link
Collaborator

JanMarvin commented Dec 20, 2020

Since I was under the impression that this should not be fairly easy, I gave it a go.

I have added a addClosing(), this function checks for unclosed xml tags in a string and returns the closing tags in the correct order. As shown in my branch this should fix the issue at hand. (At least my output contains the correct closing tags 😄 .)

Even though it works, it is merely a hack. IMHO the correct fix should happen when the autoFilter content is read and/or validated. Also please note that I am not familiar with this area of openxlsx nor the entire workbook . If you need this, you might use this as a starting point.

@JanMarvin
Copy link
Collaborator

I'm not sure if multiple autofilters are possible. If this is possible, further tweaking might be required, e.g. splitting and inserting before a second <autoFilter.

@JanMarvin
Copy link
Collaborator

I assume that the issue is here: getChildlessNode() searches for and stops at />. Therefore in the example above it stops reading at <sortCondition ref="B1"/>. Therefore it works with the openxlsx implementation which writes something like <autoFilter ref="A1:C1" />, but fails with this specific excel file.

std::string tagEnd = "/>";
while(1){
pos = xml.find(tag, pos+1);
if(pos == std::string::npos)
break;
endPos = xml.find(tagEnd, pos+k);

@JMPivette
Copy link
Contributor Author

JMPivette commented Dec 21, 2020

Thanks @JanMarvin for the update.

I am not an expert in C++ or Excel but I tried to understand a bit more the logic behind load_workbook.cpp:

2 different functions are used to parse XML elements:

  • getNodes() is used when looking at classic start-tag and end-tag (example <section></section>)
  • getChildlessNode() is used when looking at empty-element-tag (example <section />)

Depending on the element name, one of this function is used.
For autoFilter, getChildlessNode() is used:

node_xml = getChildlessNode(xml_post, "<autoFilter ");
if(node_xml.size() > 0)
this_worksheet.field("autoFilter") = node_xml;

Which is OK if autoFilter is applied with no sorting or filtering:

   <autoFilter ref="A1:C1" xr:uid="{8A035462-3C98-164D-B762-20AECBF0D619}"/>

But not anymore if we sort or filter:

	<autoFilter ref="A1:C3" xr:uid="{8A035462-3C98-164D-B762-20AECBF0D619}">
		<filterColumn colId="0">
			<filters>
				<filter val="2"/>
			</filters>
		</filterColumn>
		<sortState ref="A2:C3" xmlns:xlrd2="http://schemas.microsoft.com/office/spreadsheetml/2017/richdata2">
			<sortCondition ref="A1:A3"/>
		</sortState>
	</autoFilter>

I don't know if other elements in the xml files can sometimes use empty-element-tag and sometimes start-tag and end-tag

@JMPivette
Copy link
Contributor Author

JMPivette commented Dec 21, 2020

I didn't see you were already working on the issue: #130 😃

@JanMarvin
Copy link
Collaborator

hehe, the linking did not work as expected :)

@ycphs ycphs closed this as completed in 7e0a8d2 Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants