Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document 001: lists in paragraphs (in XML) #341

Open
strogonoff opened this issue Feb 7, 2025 · 23 comments · May be fixed by #344
Open

Document 001: lists in paragraphs (in XML) #341

strogonoff opened this issue Feb 7, 2025 · 23 comments · May be fixed by #344
Assignees
Labels
bug Something isn't working

Comments

@strogonoff
Copy link
Contributor

strogonoff commented Feb 7, 2025

In document 001’s XML (can be found in this artifact), the paragraph with ID of _c0ef23ff-7174-e24b-2402-0b7829592fc7 contains a list within. It seems like it may be a bug according to @opoudjis.

@strogonoff
Copy link
Contributor Author

strogonoff commented Feb 7, 2025

(This causes a breakage in Firelight build, which does not allow non-flow content in paragraphs, and while it handles embedded notes and footnotes it does not handle lists [yet?]. Related: metanorma/firelight#46)

@strogonoff strogonoff changed the title 001: Lists in paragraphs Document 001: lists in paragraphs (in XML) Feb 7, 2025
@ReesePlews
Copy link
Contributor

hello @strogonoff thank you for pointing this out. i am not familiar with the paragraph id. if you can tell me where it is (which and clause and perhaps some text) i will check it. it seems like its an anomaly so it can probably be corrected in the document content. thank you.

@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Feb 8, 2025
@opoudjis opoudjis added the bug Something isn't working label Feb 8, 2025
@ReesePlews
Copy link
Contributor

ReesePlews commented Feb 8, 2025

hello @strogonoff and @opoudjis i looked in the document.presentation.xml for this id _c0ef23ff-7174-e24b-2402-0b7829592fc7 which shows up here based on a search of the blue highlighted text (copied from the .xml file). i believe this to be the location.

this is an autogenerated UML table. the source .adoc code for this is coming out of the Enterprise Architect (EA) after being input in the "Notes property" diagram. an .xmi file is out from EA and lutaml reads that and the contents are autogenerated into these tables.

Image

i dont have the current model. if you can describe what should be changed, i will ask the client to update the .adoc code in the model. i indicated to the client that complex adoc "structures" are not be supported. if this is a complex adoc "structure" which mn does not support the text in the EA model can be modified.

please let us know. thank you.

strogonoff added a commit to metanorma/firelight that referenced this issue Feb 10, 2025
@strogonoff
Copy link
Contributor Author

@ReesePlews I worked around this problem. This is primarily a bug for Nick, because according to him Metanorma should handle lists in other ways. However, I see that you merged some changes that also fixed the build. Maybe you removed that use of lists? I’m not sure. Either way, checking the build now.

@ReesePlews
Copy link
Contributor

thank you @strogonoff ; there could be an issue in the model, which is where the .adoc is stored for these type of "autogenerated tables" ; if changing that .adoc code is less of a problem for @opoudjis we should consider that from the standpoint of a mn code fix and testing compared to some simple modification of the content .adoc code. let me know what you both feel is better in this case(s). thank you.

@strogonoff
Copy link
Contributor Author

I believe we have no list-related issue now.

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Metanorma Feb 11, 2025
@strogonoff
Copy link
Contributor Author

Not sure if @opoudjis wanted to keep it open for his purposes. Either way, Firelight doesn’t break on it.

@strogonoff strogonoff reopened this Feb 11, 2025
@github-project-automation github-project-automation bot moved this from ✅ Done to 🏗 In progress in Metanorma Feb 11, 2025
@ReesePlews
Copy link
Contributor

hello @strogonoff i checked the output in a local build of doc01 (sources/001-v5) but the list from the PDF is still an ordered list of a, b, c and then an un-ordered list of bulllets, identical to the first image.

i agree we should keep this open to see if this fix will be in the forthcoming metanorma release. thank you.

@opoudjis opoudjis moved this from 🏗 In progress to 🆕 New in Metanorma Feb 12, 2025
@opoudjis
Copy link
Contributor

@strogonoff @ReesePlews

As I have REPEATEDLY said in the past, I cannot grep for screencaptures, and I am do not have the time to work through the convoluted processes you have in place for generating Asciidoc. I expect you to be more helpful than this.

Based on the underspecified indication above, I gather the source in the XMI document is:

 <properties documentation="密集市街地整備法第32条第1項の規定による防災街区整備地区計画。&#xA;(都市計画法第12条の4第1>項第2号)&#xA;防災街区整備地区計画については、都市計画法第12条の4第2項に定める事項のほか、都市計画に、密集市街地法第32条第2項第1号及び第2号に掲げる事>項を定めるものとするとともに、第3号に掲げる事項を定めるよう努めるものとする。&#xA;&#xA;. 特定建築物地区整備計画&#xA;. 防災街区整備地区整備計画&#xA;. >当該防災街区整備地区計画の目標その他当該区域の整備に関する方針&#xA;&#xA;関連役割「地区整備計画」により、「特定建築物地区整備計画」及び「防災街区整備地
区整備計画」を保持する。&#xA;&#xA;* 属性gml:nameは、都市計画法第12条の4第2項で定める名称(地区計画を識別する名前)とする。多重度は任意となっているが、運用上必須とする。文字列とする。&#xA;* 属性urf:functionは、都市計画法第12条の4第2項に定める地区計画の種類とする。促進区を定める場合、当該地区計画は、>再開発等促進区又は開発整備促進区を定める地区計画となる。コードリスト(&amp;lt;&amp;lt;DistrictPlan_function.xml&amp;gt;&amp;gt;)より選択する。多重度>は任意となっているが、運用上必須とする。&#xA;* 属性urf:locationは、都市計画法第12条の4第2項に定める位置とする。町丁目又は字まで記載する。多重度は任意>となっているが、運用上必須とする。&#xA;* 属性urf:objectivesは、密集市街地整備法第32条第2項第3号に定める当該地区計画の目標とする。&#xA;* 属性urf:policyは、密集市街地整備法第32条第2項第3号に定める当該地区計画の方針とする。&#xA;* 関連役割urf:districtDevelopmentPlanは、防災街区整備地区計画に定められた特
定建築物地区整備計画及び防災&#xA;街区整備地区整備計画とする。&#xA;* 関連役割urf:promotionDistrictは、使用しない。"  ...

THIS should have been pasted into the ticket.

@opoudjis
Copy link
Contributor

Translation:

|===
a|密集市街地整備法第32条第1項の規定による防災街区整備地区計画。
(都市計画法第12条の4第1>項第2号)
防災街区整備地区計画については、都市計画法第12条の4第2項に定める事項のほか、都市計画に、密集市街地法第32条第2項第1号及び第2号に掲げる事>項を定めるものとするとともに、第3号に掲げる事項を定めるよう努
めるものとする。

. 特定建築物地区整備計画
. 防災街区整備地区整備計画
. >当該防災街区整備地区計画の目標その他当該区域の整備に関する方針


関連役割「地区整備計画」により、「特定建築物地区整備計画」及び「防災街区整備地
区整備計画」を保持する。

* 属性gml:nameは、都市計画法第12条の4第2項で定める名称(地区計画を識別する名前)とする。多重度は任意となっているが、>運用上必須とする。文字列とする。
* 属性urf:functionは、都市計画法第12条の4第2項に定める地区計画の種類とする。促進区を定める場合、当該地区計画は、>>再開発等促進区又は開発整備促進区を定める地区計画となる。コードリスト(&amp;lt;&amp;lt;DistrictPlan_function.xml&amp;gt;&amp;gt;)より選択する。多重度>>は任意となっているが、運用上必須とする。
* 属性urf:locationは、都市計画法第12条の4第2項に定める位置とする。町丁目又は字まで記載する。多重度は任意>>となっているが、運用上必須とする。
* 属性urf:objectivesは、密集市街地整備法第32条第2項第3号に定める当該地区計画の目標とする。
* 属性urf:policy>は、密集市街地整備法第32条第2項第3号に定める当該地区計画の方針とする。
* 関連役割urf:districtDevelopmentPlanは、防災街区整備地区計画に定められた特
定建築物地区整備計画及び防災
街区整備地区整備計画とする。
* 関連役割urf:promotionDistrictは、使用しない。
|===

@opoudjis
Copy link
Contributor

The XML that is generating is:

td valign="top" align="left"><p id="_7d7672c1-cf27-fdf4-a15c-7336a500d5bd">密集市街地整備法第32条第1項の規定による防災街区整備地区計画。 (都市計画法第12条>
の4第1&#x3e;項第2号) 防災街区整備地区計画については、都市計画法第12条の4第2項に定める事項のほか、都市計画に、密集市街地法第32条第2項第1号及び第2号に掲
げる事&#x3e;項を定めるものとするとともに、第3号に掲げる事項を定めるよう努めるものとする。</p>

<ol id="_d2cdd214-2977-d064-fedc-60c43aab53a2" type="alphabet"><li id="_a16bd473-dc88-4cdf-bdaa-d24b7d0d3f2a" label="a"><p id="_bcfcc41b-1ef3-e96c-e3ea-c0230bf80e10">特定建築物地区整備計画</p>
</li>
<li id="_f4c6a086-082d-4ab9-8615-de2775ad16b6" label="b"><p id="_b3353ea8-a2c0-1fe9-d18a-ae43c9a7bd10">防災街区整備地区整備計画</p>
</li>
<li id="_d075636b-de01-42db-9f48-a35d7e9c02a8" label="c"><p id="_2f9b41d1-6719-a59c-d8e0-55dc9406203b">&#x3e;当該防災街区整備地区計画の目標その他当該区域の整備に関する方針</p>
</li>
</ol>

<p id="_fcad7f8a-ebb8-c665-4fe3-08bdfa4c4d3b">関連役割「地区整備計画」により、「特定建築物地区整備計画」及び「防災街区整備地区整備計画」を保持する。</p>

<ul id="_c5e67d73-6263-36d5-96b8-edadb38db301"><li><p id="_c1959215-efc8-c3c7-2a13-170509c091ea">属性gml:nameは、都市計画法第12条の4第2項で定める名称(地区計画を識別する名前)とする。多重度は任意となっているが、&#x3e;運用上必須とする。文字列とする。</p>
</li>
<li><p id="_ec69b98a-4264-228c-1ac2-f39d649f40af">属性urf:functionは、都市計画法第12条の4第2項に定める地区計画の種類とする。促進区を定める場合、当該地>区計画は、&#x3e;&#x3e;再開発等促進区又は開発整備促進区を定める地区計画となる。コードリスト(&#x3c;&#x3c;DistrictPlan_function.xml&#x3e;&#x3e;)より選択する。多重度&#x3e;&#x3e;は任意となっているが、運用上必須とする。</p>
</li>
<li><p id="_3d71d340-0cf3-10a4-d487-11faa7ba1063">属性urf:locationは、都市計画法第12条の4第2項に定める位置とする。町丁目又は字まで記載する。多重度は任>意&#x3e;&#x3e;となっているが、運用上必須とする。</p>
</li>
<li><p id="_7e93bd46-2745-7282-85d2-143feae3647b">属性urf:objectivesは、密集市街地整備法第32条第2項第3号に定める当該地区計画の目標とする。</p>
</li>
<li><p id="_57b10d0e-ed72-c0bf-dcf1-8259a836ce32">属性urf:policy&#x3e;は、密集市街地整備法第32条第2項第3号に定める当該地区計画の方針とする。</p>
</li>
<li><p id="_9d604d2a-dd24-2e94-6f98-ebf1d8716d26">関連役割urf:districtDevelopmentPlanは、防災街区整備地区計画に定められた特定建築物地区整備計画及び防災街区整備地区整備計画とする。</p>
</li>
<li><p id="_f6fe5439-1852-c60f-619e-cc6fb937691d">関連役割urf:promotionDistrictは、使用しない。</p>
</li>
</ul>
</td>

which does not have the nesting claimed for paragraphs.

However, @ReesePlews,

the list from the PDF is still an ordered list of a, b, c and then an un-ordered list of bulllets, identical to the first image

that is EXACTLY what the Asciidoc specifies:

. 特定建築物地区整備計画
. 防災街区整備地区整備計画
. >当該防災街区整備地区計画の目標その他当該区域の整備に関する方針


関連役割「地区整備計画」により、「特定建築物地区整備計画」及び「防災街区整備地
区整備計画」を保持する。

* 属性gml:nameは、都市計画法第12条の4第2項で定める名称(地区計画を識別する名前)とする。多重度は任意となっているが、>運用上必須とする。文字列とする。
* 属性urf:functionは、都市計画法第12条の4第2項に定める地区計画の種類とする。促進区を定める場合、当該地区計画は、>>再開発等促進区又は開発整備促進区を定める地区計画となる。コードリスト(&amp;lt;&amp;lt;DistrictPlan_function.xml&amp;gt;&amp;gt;)より選択する。多重度>>は任意となっているが、運用上必須とする。
* 属性urf:locationは、都市計画法第12条の4第2項に定める位置とする。町丁目又は字まで記載する。多重度は任意>>となっているが、運用上必須とする。
* 属性urf:objectivesは、密集市街地整備法第32条第2項第3号に定める当該地区計画の目標とする。
* 属性urf:policy>は、密集市街地整備法第32条第2項第3号に定める当該地区計画の方針とする。
* 関連役割urf:districtDevelopmentPlanは、防災街区整備地区計画に定められた特
定建築物地区整備計画及び防災
街区整備地区整備計画とする。
* 関連役割urf:promotionDistrictは、使用しない。

So Reese, this is not a bug, this is what was written in the XMI file; and Anton, you or whoever is maintaining metanorma-plugin-lutaml need to provide me with the Asciidoc that is generating whatever XML you are seeing, and for that matter what XML you are seeing. Without adequate data to action a bug report, I will not be actioning bug reports.

@opoudjis opoudjis moved this from 🆕 New to On hold in Metanorma Feb 12, 2025
@opoudjis
Copy link
Contributor

In document 001’s XML (can be found in this artifact), the paragraph with ID of _c0ef23ff-7174-e24b-2402-0b7829592fc7 contains a list within. It seems like it may be a bug according to @opoudjis.

@strogonoff No idea what to do with that. I need XML I can access.

@ronaldtse
Copy link
Contributor

@opoudjis I believe @strogonoff meant this "mn" artifact, not the "artifact page" which contains a failed Firelight build of 001:

This is where the XML is.

@opoudjis
Copy link
Contributor

opoudjis commented Feb 12, 2025

I have confirmed that Anton is seeing the XML with ordered lists nested inside of paragraphs, and I cannot replicate that based on the source Asciidoc. I am generating the document locally, dumping the preprocessor output to disk, so I can see what exactly Asciidoc is being processed. My inclination is to do that universally, this keeps coming up as a debugging requirement as we keep using preprocessors like lutaml.

@strogonoff
Copy link
Contributor Author

@opoudjis I believe @strogonoff meant this "mn" artifact, not the "artifact page" which contains a failed Firelight build of 001:

This is where the XML is.

There is the mn artifact in the build summary.

I worked around this issue, so it does not break Firelight.

@opoudjis
Copy link
Contributor

The processing of those XMI tables happens in Lutaml Table processing, which is block processing, so I cannot get to it in preprocessing, as it hasn't happened there yet: I need to grab it at the start of Metanorma processing proper:

def document1(node)
        File.open("asciidoc.txt", "w") { |f| f.write(node.document.source_lines.join("\n")) }
        init(node)
        ret = to_xml(makexml(node))
        outputs(node, ret) unless node.attr("nodoc") || !node.attr("docfile")
        ret
      end

@opoudjis
Copy link
Contributor

No, that is not catching the results of Lutaml Table processing.

This is EXCEEDINGLY difficult to debug, and I needed this like a hole in the head when I'm in the middle of refactoring. It doesn't matter that Anton has a workaround, this is clearly contaminated Asciidoc, and there will be wrath doled out if I find out that it is not my fault.

One thing that is occurring to me is the monstrously dumb way Asciidoc has been shoved into Enterprise Architect as an XML attribute. An XML attribute, of all things. Its linebreaks are being done as &#xA;.

All it would take is for the code processing that Enterprise Architect XMI to mangle the linebreak, for the Asciidoc to turn out dodgy.

@opoudjis
Copy link
Contributor

opoudjis commented Feb 13, 2025

I STILL cannot replicate this locally!!!

== Scope
lutaml_klass_table::../../sources/xmi/plateau_all_packages_export.xmi[name="DisasterPreventionBlockImprovementZonePlan",template="../../sources/liquid_templates/_klass_table.liquid"]

is giving me properly nested XML. And look at the HTML:

Image

It has spaces between blocks. What Reese is seeing does not.

&#xA; is the Unix native carriage return, and it's not surprising that OSX is dealing with it fine. I have no idea what OS the docker job is being run on, and it's not my job to find out, but my suspicion is that this is being run on Windows in Docker, and the XMI preprocessor when running on Windows does not understand that &#xa; needs to be translated to &#xd; in Windows. Which means that any XMI with Asciidoc encoded this way as attributes is going to have dodgy output—treating the entire cell as one great big paragraph.

This looks to me like something that @kwkwan needs to investigate in lutaml_klass_table under metanorma-plugin-lutaml . Asciidoc with non-native linebreaks is incorrect Asciidoc, and that needs to be rectified at the source, it is too late by the time Metanorma gets to it.

@ronaldtse
Copy link
Contributor

ronaldtse commented Feb 13, 2025

  1. The Metanorma XML schema does not allow lists within paragraphs. So it is a bug that allows that to happen.
  2. @kwkwan please check the LutaML plugin that it converts Enterprise Architect XMI descriptions properly. Note to all that EA XMI text is “HTML”, not normal text, and we have to parse it with Coradoc to generate AsciiDoc from it. Then that text goes to Metanorma AsciiDoc to convert into XML.
  3. Likely the workaround now is to manually insert a line break by the user inside the XMI.
  4. This bug in Metanorma is deemed “too hard to fix” right now.

The difference is this:

Correct: (generates valid XML)

This is a paragraph. 

* This is a valid list because there is an empty line as delimiter. 

Incorrect: (generates invalid XML)

This is a paragraph. 
* This is an invalid list because there should be an empty line as delimiter. 

The weird part is the “incorrect” example used to always merge the lines into the same paragraph, ie there is no list detected and therefore no invalid XML. I don’t understand why the behavior has changed.

@ronaldtse
Copy link
Contributor

We should have a place on our website about known issues and workarounds.

@ribose-jeffreylau
Copy link

ribose-jeffreylau commented Feb 13, 2025

Correct: (generates valid XML)

This is a paragraph. 

* This is a valid list because there is an empty line as delimiter. 

Incorrect: (generates invalid XML)

This is a paragraph. 
* This is an invalid list because there should be an empty line as delimiter. 

Is this worthy of some sort of metanorma/coradoc linting tools, to detect "doc-smell" like this?

(existing implementation: https://github.com/docToolchain/asciidoc-linter)

@kwkwan kwkwan linked a pull request Feb 13, 2025 that will close this issue
4 tasks
@kwkwan
Copy link
Contributor

kwkwan commented Feb 13, 2025

Hi @ReesePlews and @strogonoff I have updated the liquid template of the class table to replace line feed by carriage return. Please check whether it can solve your problem. Thanks.

@ReesePlews
Copy link
Contributor

thank you @kwkwan i will check later today. if there are any other adjustments to make in the asciidoc code please suggested them and i will discuss with the client.

@opoudjis regarding the XML fragments. i am sorry i just dont understand enough about what XML to give you to track down the issue. i have been trying to add links to adoc files and other clarifications (based on earlier advice from you) but this was the first time dealing with checking the actual XML to find this issue. i am sorry it was not enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: On hold
Development

Successfully merging a pull request may close this issue.

6 participants