Skip to content

Commit

Permalink
version 4.0.1, fixed types, hr-closingtag, keepWhitespace option
Browse files Browse the repository at this point in the history
  • Loading branch information
TobiasNickel committed Jan 19, 2021
1 parent 470c663 commit 48637e2
Show file tree
Hide file tree
Showing 7 changed files with 55 additions and 25 deletions.
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ so, there are good reasons to give tXml.js a try.

## Try Online

Try without installing online: http://tnickel.de/2017/04/02/txml-online
Try without installing online: https://tnickel.de/2017/04/02/txml-online

## new in version 4
- improved support for CDATA
Expand All @@ -69,6 +69,8 @@ and then in your script you require it by `const txml = require('txml');` or in
- **filter** a method, to filter for interesting nodes, use it like Array.filter.
- **simplify** to simplify the object, to an easier access.
- **pos** where to start parsing.
- **keepComments** if you want to keep comments in your data (keeped as string including `<!-- -->`) (default false)
- **keepWhitespace** keep whitespaces like spaces, tabs and line breaks as string content (default false)
- **noChildNodes** array of nodes, that have no children and don't need to be closed. Default is working good for html. For example when parsing rss, the link tag is used to really provide an URL that the user can open. In html however a link text is used to bind css or other resource into the document. In HTML it does not need to get closed. so by default the noChildNodes containes the tagName 'link'. Same as 'img', 'br', 'input', 'meta', 'link'. That means: when parsing rss, it makes to set `noChildNodes` to [], an empty array.
```js
txml.parse(`<user is='great'>
Expand Down Expand Up @@ -178,13 +180,17 @@ for await(let element of xmlStream) {
// your logic here ...
}
```
The transform stream is great, because when your logic within the processing loop is slow, the file read stream will also run slower, and not fill up the RAM memory. For a more detailed explanation read [here](http://tnickel.de/2019/10/15/2019-10-for-async-on-nodejs-streams/)

The transform stream is great, because when your logic within the processing loop is slow, the file read stream will also run slower, and not fill up the RAM memory. For a more detailed explanation read [here](https://tnickel.de/2019/10/15/2019-10-for-async-on-nodejs-streams/)

## Changelog
- version 4.0.1
- fixed children type definition not to include number (issue #20)
- add `hr` to self closing tags
- new parser option `keepWhitespace` (issue #21)

## Developer

![Tobias Nickel](https://avatars1.githubusercontent.com/u/4189801?s=150)

[Tobias Nickel](http://tnickel.de/) German software developer in Shanghai.
[Tobias Nickel](https://tnickel.de/) German software developer in Shanghai.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "txml",
"version": "4.0.0",
"version": "4.0.1",
"description": "fastest XML DOM Parser for node/browser/worker",
"main": "tXml.js",
"scripts": {
Expand All @@ -26,6 +26,7 @@
"bugs": {
"url": "https://github.com/TobiasNickel/tXml/issues"
},
"types":"tXml.d.ts",
"homepage": "https://github.com/TobiasNickel/tXml#readme",
"dependencies": {
"through2": "^3.0.1"
Expand Down
10 changes: 6 additions & 4 deletions tXml.d.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
export type tNode = {
tagName: string;
attributes: object;
children: tNode | string | number[];
children: (tNode | string)[];
};
export type TParseOptions = {
pos?: number;
noChildNodes?: string[];
setPos?: boolean;
keepComments?: boolean;
keepWhitespace?: boolean;
simplify?: boolean;
filter?: (a: tNode, b: tNode) => boolean;
};
Expand All @@ -20,24 +21,25 @@ export type TParseOptions = {
* @typedef tNode
* @property {string} tagName
* @property {object} attributes
* @property {tNode|string|number[]} children
* @property {(tNode|string)[]} children
**/
/**
* @typedef TParseOptions
* @property {number} [pos]
* @property {string[]} [noChildNodes]
* @property {boolean} [setPos]
* @property {boolean} [keepComments]
* @property {boolean} [keepWhitespace]
* @property {boolean} [simplify]
* @property {(a: tNode, b: tNode) => boolean} [filter]
*/
/**
* parseXML / html into a DOM Object. with no validation and some failur tolerance
* @param {string} S your XML to parse
* @param {TParseOptions} [options] all other options:
* @return {(tNode | string | number)[]}
* @return {(tNode | string)[]}
*/
export function parse(S: string, options?: TParseOptions): (tNode | string | number)[];
export function parse(S: string, options?: TParseOptions): (tNode | string)[];
/**
* transform the DomObject to an object that is like the object of PHP`s simple_xmp_load_*() methods.
* this format helps you to write that is more likely to keep your program working, even if there a small changes in the XML schema.
Expand Down
17 changes: 12 additions & 5 deletions tXml.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ module.exports = {
* @typedef tNode
* @property {string} tagName
* @property {object} attributes
* @property {tNode|string|number[]} children
* @property {(tNode|string)[]} children
**/

/**
Expand All @@ -33,6 +33,7 @@ module.exports = {
* @property {string[]} [noChildNodes]
* @property {boolean} [setPos]
* @property {boolean} [keepComments]
* @property {boolean} [keepWhitespace]
* @property {boolean} [simplify]
* @property {(a: tNode, b: tNode) => boolean} [filter]
*/
Expand All @@ -41,13 +42,15 @@ module.exports = {
* parseXML / html into a DOM Object. with no validation and some failur tolerance
* @param {string} S your XML to parse
* @param {TParseOptions} [options] all other options:
* @return {(tNode | string | number)[]}
* @return {(tNode | string)[]}
*/
function parse(S, options) {
"use strict";
options = options || {};

var pos = options.pos || 0;
var keepComments = !!options.keepComments;
var keepWhitespace = !!options.keepWhitespace

var openBracket = "<";
var openBracketCC = "<".charCodeAt(0);
Expand Down Expand Up @@ -96,7 +99,7 @@ function parse(S, options) {
if (pos === -1) {
pos = S.length
}
if (options.keepComments === true) {
if (keepComments) {
children.push(S.substring(startCommentPos, pos + 1));
}
} else if (
Expand Down Expand Up @@ -140,7 +143,7 @@ function parse(S, options) {
}
} else {
var text = parseText()
if (text.trim().length > 0)
if (keepWhitespace || text.trim().length > 0)
children.push(text);
pos++;
}
Expand Down Expand Up @@ -174,7 +177,7 @@ function parse(S, options) {
* is parsing a node, including tagName, Attributes and its children,
* to parse children it uses the parseChildren again, that makes the parsing recursive
*/
var NoChildNodes = options.noChildNodes || ['img', 'br', 'input', 'meta', 'link'];
var NoChildNodes = options.noChildNodes || ['img', 'br', 'input', 'meta', 'link', 'hr'];

function parseNode() {
pos++;
Expand Down Expand Up @@ -421,6 +424,10 @@ function stringify(O) {
out += ' ' + i + "='" + N.attributes[i].trim() + "'";
}
}
if(N.tagName[0]==='?'){
out += '?>';
return;
}
out += '>';
writeChildren(N.children);
out += '</' + N.tagName + '>';
Expand Down
20 changes: 11 additions & 9 deletions tXml.min.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 13 additions & 2 deletions test.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ const files = {
commented: __dirname + '/test/examples/commented.svg',
commentOnly: __dirname + '/test/examples/commentOnly.svg',
twoComments: __dirname + '/test/examples/twocomments.svg',
tagesschauRSS: '/test/examples/tagesschau.rss',
tagesschauRSS: __dirname + '/test/examples/tagesschau.rss',
wordpadDocxDocument: __dirname+'/test/examples/wordpad.docx.document.xml',
};

assert(tXml, 'tXml is available');
Expand Down Expand Up @@ -132,7 +133,7 @@ assert.deepStrictEqual(x, xShould, 'find elements by class')

// re-stringify an attribute without value
var s = "<test><something flag></something></test>";
assert(tXml.stringify(tXml.parse(s)) === s, 'problem with attribute without value');
assert.deepStrictEqual(tXml.stringify(tXml.parse(s)), s, 'problem with attribute without value');
assert(tXml.stringify(undefined) === '', 'stringify ignore null values');

assert(tXml.toContentString(tXml.parse('<test>f<case number="2">f</case>f</test>')) === "f f f")
Expand Down Expand Up @@ -275,6 +276,16 @@ assert.deepStrictEqual(tXml.simplifyLostLess(['1',2]), {}, 'ignore non objects')

assert.deepStrictEqual(tXml.filter([{}],()=>true), [{}], 'allow nodes without children')

const wordpadDoc = fs.readFileSync(files.wordpadDocxDocument).toString();
assert.deepStrictEqual(
tXml.filter(
tXml.parse(wordpadDoc, { keepWhitespace: true }),
(n) => n.tagName === 'w:t'
)[1].children[0],
' '
);


// https://github.com/TobiasNickel/tXml/issues/14
testAsync().catch(err=>console.log(err));
async function testAsync(){
Expand Down
1 change: 1 addition & 0 deletions test/examples/wordpad.docx.document.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:body><w:p><w:pPr><w:spacing w:before="0" w:after="200" w:line="276" /><w:ind w:right="0" w:left="0" w:firstLine="0" /><w:jc w:val="left" /><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:cs="Calibri" w:eastAsia="Calibri" /><w:color w:val="auto" /><w:spacing w:val="0" /><w:position w:val="0" /><w:sz w:val="22" /><w:shd w:fill="auto" w:val="clear" /></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:cs="Calibri" w:eastAsia="Calibri" /><w:color w:val="auto" /><w:spacing w:val="0" /><w:position w:val="0" /><w:sz w:val="22" /><w:shd w:fill="auto" w:val="clear" /></w:rPr><w:t xml:space="preserve">-</w:t></w:r></w:p><w:p><w:pPr><w:spacing w:before="0" w:after="200" w:line="276" /><w:ind w:right="0" w:left="0" w:firstLine="0" /><w:jc w:val="left" /><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:cs="Calibri" w:eastAsia="Calibri" /><w:color w:val="auto" /><w:spacing w:val="0" /><w:position w:val="0" /><w:sz w:val="22" /><w:shd w:fill="auto" w:val="clear" /></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Calibri" w:hAnsi="Calibri" w:cs="Calibri" w:eastAsia="Calibri" /><w:color w:val="auto" /><w:spacing w:val="0" /><w:position w:val="0" /><w:sz w:val="22" /><w:shd w:fill="auto" w:val="clear" /></w:rPr><w:t xml:space="preserve"> </w:t></w:r></w:p></w:body></w:document>

0 comments on commit 48637e2

Please sign in to comment.