You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current FormattedText model, which is used as intermediate format for every conversion (except conversions between two Paratext formats) has been there since the beginning of BibleMultiConverter. Yet, other Bible formats have evolved. Threrfore, rework the internal model.
Some ideas:
FormattingInstructionKind: Add new constants
PSALM_TITLE (titles of Psalms which sometimes are part of verse 1, sometimes before it)
ADDED_TEXT (text added by the translator which is not linked to original source, often conjunctions
When exporting those to a format that does not support them, treat both as ITALIC.
Add Speaker markup to mark text spoken by a person other than Jesus. Speakers can be identified
by labels (e.g. "Moses") or Strongs numbers (e.g. "H4872").
GrammarInformation: Add suffix letters for Strongs numbers (optional), also add a way to add
arbitrary key-value pairs (like in OSIS or Paratext). Values need not be ASCII only (e.g. Greek Lemma).
Links: Support
Anchors in the text (by id)
Links to those anchors
Links to external hyperlinks
Links to external images (which may be displayed inline if supported by the format)
Footnotes: Add a flag whether a footnote contains text or cross references. For now, this is done by adding XREF_MARKER to the beginning of the footnote text, but many new formats have this distrinction and parsing for magic strings gets cumbersome.
Cross References: Support cross references that span more than one book; also support cross references that do not reference individual verses, but whole chapters or books.
As this is a major task (needs to touch most of the modules), my plan is in a first step to only update the roundtrip formats, and make the other formats "just" work again (using fallbacks or ignoring the new options). Will keep a list of status of the modules (e.g. compiles again, tested, compared against format spec), trying to not make a format worse than before anywhere in the process.
When exporting other features from USFM to FormattedText, use ExtraAttributes wherever possible. This should also include custom tags and custom milestones. There should be an option to convert UBXF alignment milestones (for a single alignment source) to GrammarInformation instead of extra attributes.
Did I miss anything? Feature should be present in both USFM3/USX3 and in more than one other format.
Lately I have not been actively working with this tool, I mostly use it to convert from USX (2/3) to USFM (3) (which my application surprisingly can parse faster then XML).
From my perspective I don't have any remarks about your plans to rework FormattedText. I think it would be good if the intermediate format supports as many features as possible, in a sensible and generic way.
@Rolf-Smit just a heads up: in a553d4b I changed the intermediate format used by Paratext formats by moving Figure, VerseStart and VerseEnd to be BookContent instead of CharacterContent (all Paratext formats supported so far do not support those nested in character tags or footnotes anyway). This makes some parsing easier and removes some ugly workarounds that made extending the format harder.
The current
FormattedText
model, which is used as intermediate format for every conversion (except conversions between two Paratext formats) has been there since the beginning of BibleMultiConverter. Yet, other Bible formats have evolved. Threrfore, rework the internal model.Some ideas:
FormattingInstructionKind: Add new constants
PSALM_TITLE
(titles of Psalms which sometimes are part of verse 1, sometimes before it)ADDED_TEXT
(text added by the translator which is not linked to original source, often conjunctionsWhen exporting those to a format that does not support them, treat both as
ITALIC
.Add Speaker markup to mark text spoken by a person other than Jesus. Speakers can be identified
by labels (e.g. "Moses") or Strongs numbers (e.g. "H4872").
Rework LineBreakKind based on
ExtendedLineBreakKind
used for Paratext export
GrammarInformation: Add suffix letters for Strongs numbers (optional), also add a way to add
arbitrary key-value pairs (like in OSIS or Paratext). Values need not be ASCII only (e.g. Greek Lemma).
Links: Support
Footnotes: Add a flag whether a footnote contains text or cross references. For now, this is done by adding XREF_MARKER to the beginning of the footnote text, but many new formats have this distrinction and parsing for magic strings gets cumbersome.
Cross References: Support cross references that span more than one book; also support cross references that do not reference individual verses, but whole chapters or books.
As this is a major task (needs to touch most of the modules), my plan is in a first step to only update the roundtrip formats, and make the other formats "just" work again (using fallbacks or ignoring the new options). Will keep a list of status of the modules (e.g. compiles again, tested, compared against format spec), trying to not make a format worse than before anywhere in the process.
When exporting other features from USFM to FormattedText, use ExtraAttributes wherever possible. This should also include custom tags and custom milestones. There should be an option to convert UBXF alignment milestones (for a single alignment source) to GrammarInformation instead of extra attributes.
Did I miss anything? Feature should be present in both USFM3/USX3 and in more than one other format.
// cc @Rolf-Smit @Michahel @shadow-light @paul1149
The text was updated successfully, but these errors were encountered: