saveWorbook() creates "corrupted" excel file when autoFilter is used #127

JMPivette · 2020-12-14T11:19:13Z

Same bug as:
awalker89/openxlsx#524

Describe the bug
If autoFilter is use inside an Excel file, saveWorkbook() will create a "corrupted" file.

To Reproduce

Use loadWorkbook() on this file:
auto_filter_errors.xlsx
Save the workbook using saveWorkbook()
Open the created file with Excel
Excel gives the following error:
We found a problem with some content in ’filter_output.xlsx’. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Looking at the logs:
<removedParts summary="Following is a list of removed parts:"><removedPart>Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Xml parsing error Line 1, column 1518.</removedPart>

Desktop :

R: [4.0.2]
Version [4.2.3]

The text was updated successfully, but these errors were encountered:

JMPivette · 2020-12-14T11:29:17Z

I just compared the input and output xml files and there are missing end-tags at the end of the XML:

Input file:

	<autoFilter ref="A1:C1">
		<sortState ref="A2:C5">
			<sortCondition ref="B1"/>
		</sortState>
	</autoFilter>
        ...

Output file:

	<autoFilter ref="A1:C1">
		<sortState ref="A2:C5">
			<sortCondition ref="B1"/>
                         ....

bsleik · 2020-12-14T15:37:11Z

Thanks for reopening this issue under this repository and finding the root cause.

bsleik · 2020-12-14T17:35:32Z

Look like it happens (missing end tags) anytime there are additional tags added to the autofilter - not just the sort. Another case I just came across: "<autoFilter ref="A1:IZ2096"><filterColumn colId="0"><filter val="202065"/>"

JanMarvin · 2020-12-20T17:59:51Z

Since I was under the impression that this should ~~not~~ be fairly easy, I gave it a go.

I have added a addClosing(), this function checks for unclosed xml tags in a string and returns the closing tags in the correct order. As shown in my branch this should fix the issue at hand. (At least my output contains the correct closing tags 😄 .)

Even though it works, it is merely a hack. IMHO the correct fix should happen when the autoFilter content is read and/or validated. Also please note that I am not familiar with this area of openxlsx nor the entire workbook . If you need this, you might use this as a starting point.

JanMarvin · 2020-12-20T18:56:41Z

I'm not sure if multiple autofilters are possible. If this is possible, further tweaking might be required, e.g. splitting and inserting before a second <autoFilter.

JanMarvin · 2020-12-20T20:06:44Z

I assume that the issue is here: getChildlessNode() searches for and stops at />. Therefore in the example above it stops reading at <sortCondition ref="B1"/>. Therefore it works with the openxlsx implementation which writes something like <autoFilter ref="A1:C1" />, but fails with this specific excel file.

openxlsx/src/load_workbook.cpp

Lines 900 to 908 in 7cb0ac8

    
           std::string tagEnd = "/>"; 
        
           while(1){ 
        
             pos = xml.find(tag, pos+1);     
        
             if(pos == std::string::npos) 
        
               break; 
        
             endPos = xml.find(tagEnd, pos+k);

JMPivette · 2020-12-21T13:09:18Z

Thanks @JanMarvin for the update.

I am not an expert in C++ or Excel but I tried to understand a bit more the logic behind load_workbook.cpp:

2 different functions are used to parse XML elements:

getNodes() is used when looking at classic start-tag and end-tag (example <section></section>)
getChildlessNode() is used when looking at empty-element-tag (example <section />)

Depending on the element name, one of this function is used.
For autoFilter, getChildlessNode() is used:

openxlsx/src/load_workbook.cpp

Lines 220 to 222 in 7cb0ac8

    
           node_xml = getChildlessNode(xml_post, "<autoFilter "); 
        
           if(node_xml.size() > 0) 
        
             this_worksheet.field("autoFilter") = node_xml;

Which is OK if autoFilter is applied with no sorting or filtering:

   <autoFilter ref="A1:C1" xr:uid="{8A035462-3C98-164D-B762-20AECBF0D619}"/>

But not anymore if we sort or filter:

	<autoFilter ref="A1:C3" xr:uid="{8A035462-3C98-164D-B762-20AECBF0D619}">
		<filterColumn colId="0">
			<filters>
				<filter val="2"/>
			</filters>
		</filterColumn>
		<sortState ref="A2:C3" xmlns:xlrd2="http://schemas.microsoft.com/office/spreadsheetml/2017/richdata2">
			<sortCondition ref="A1:A3"/>
		</sortState>
	</autoFilter>

I don't know if other elements in the xml files can sometimes use empty-element-tag and sometimes start-tag and end-tag

JMPivette · 2020-12-21T13:22:51Z

I didn't see you were already working on the issue: #130 😃

JanMarvin · 2020-12-21T13:31:04Z

hehe, the linking did not work as expected :)

JanMarvin added a commit to JanMarvin/openxlsx that referenced this issue Dec 20, 2020

close autoFilter. fixes ycphs#127

1aa33e2

ycphs closed this as completed in 7e0a8d2 Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

saveWorbook() creates "corrupted" excel file when autoFilter is used #127

saveWorbook() creates "corrupted" excel file when autoFilter is used #127

JMPivette commented Dec 14, 2020

JMPivette commented Dec 14, 2020

bsleik commented Dec 14, 2020

bsleik commented Dec 14, 2020

JanMarvin commented Dec 20, 2020 •

edited

Loading

JanMarvin commented Dec 20, 2020

JanMarvin commented Dec 20, 2020

JMPivette commented Dec 21, 2020 •

edited

Loading

JMPivette commented Dec 21, 2020 •

edited

Loading

JanMarvin commented Dec 21, 2020

saveWorbook() creates "corrupted" excel file when autoFilter is used #127

saveWorbook() creates "corrupted" excel file when autoFilter is used #127

Comments

JMPivette commented Dec 14, 2020

JMPivette commented Dec 14, 2020

bsleik commented Dec 14, 2020

bsleik commented Dec 14, 2020

JanMarvin commented Dec 20, 2020 • edited Loading

JanMarvin commented Dec 20, 2020

JanMarvin commented Dec 20, 2020

JMPivette commented Dec 21, 2020 • edited Loading

JMPivette commented Dec 21, 2020 • edited Loading

JanMarvin commented Dec 21, 2020

JanMarvin commented Dec 20, 2020 •

edited

Loading

JMPivette commented Dec 21, 2020 •

edited

Loading

JMPivette commented Dec 21, 2020 •

edited

Loading