Add dependency
Some checks failed
/ build_android (push) Failing after 21s

This commit is contained in:
Pieter Vander Vennet 2025-06-18 18:50:46 +02:00
parent 55470c090d
commit 6947a1adba
1260 changed files with 111297 additions and 0 deletions

30
@capacitor/assets/node_modules/node-html-parser/CHANGELOG.md generated vendored Executable file
View file

@ -0,0 +1,30 @@
# Changelog
All notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.
### [5.4.2](https://github.com/taoqf/node-fast-html-parser/compare/v5.4.2-0...v5.4.2) (2022-08-30)
## [5.1.0](https://github.com/taoqf/node-fast-html-parser/compare/v4.1.5...v5.1.0) (2021-10-28)
### Features
* Exposed `HTMLElement#rawAttrs` (made public) ([34f1595](https://github.com/taoqf/node-fast-html-parser/commit/34f1595756c0974b6ae7ef5755a615f09e421f32))
## [5.0.0](https://github.com/taoqf/node-fast-html-parser/compare/v4.1.5...v5.0.0) (2021-10-10)
### ⚠ BREAKING CHANGES
* Added esm named export support ([0d4b922](https://github.com/taoqf/node-fast-html-parser/commit/0d4b922eefd6210fe802991e464b21b0c69d5f63))
### Features
* Added esm named export support (closes [#160](https://github.com/taoqf/node-fast-html-parser/issues/160) closes [#139](https://github.com/taoqf/node-fast-html-parser/issues/139)) ([0d4b922](https://github.com/taoqf/node-fast-html-parser/commit/0d4b922eefd6210fe802991e464b21b0c69d5f63))
* Added HTMLElement#getElementsByTagName ([d462e44](https://github.com/taoqf/node-fast-html-parser/commit/d462e449e7ebb00a5a43fb574133681ad5a62475))
* Improved parsing performance + matching (closes [#164](https://github.com/taoqf/node-fast-html-parser/issues/164)) ([3c5b8e2](https://github.com/taoqf/node-fast-html-parser/commit/3c5b8e2a9104b01a8ca899a7970507463e42adaf))
### Bug Fixes
* Add null to return type for HTMLElement#querySelector (closes [#157](https://github.com/taoqf/node-fast-html-parser/issues/157)) ([2b65583](https://github.com/taoqf/node-fast-html-parser/commit/2b655839bd3868c41fb19cae5786ca097565bc7f))
* blockTextElements incorrectly matching partial tag (detail) (fixes [#156](https://github.com/taoqf/node-fast-html-parser/issues/156) fixes [#124](https://github.com/taoqf/node-fast-html-parser/issues/124)) ([6823349](https://github.com/taoqf/node-fast-html-parser/commit/6823349fdf1809c7484c70d948aa24930ef4983f))

View file

@ -0,0 +1,7 @@
Copyright 2019 Tao Qiufeng
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View file

@ -0,0 +1,286 @@
# Fast HTML Parser [![NPM version](https://badge.fury.io/js/node-html-parser.png)](http://badge.fury.io/js/node-html-parser) [![Build Status](https://img.shields.io/endpoint.svg?url=https%3A%2F%2Factions-badge.atrox.dev%2Ftaoqf%2Fnode-html-parser%2Fbadge%3Fref%3Dmain&style=flat)](https://actions-badge.atrox.dev/taoqf/node-html-parser/goto?ref=main)
Fast HTML Parser is a _very fast_ HTML parser. Which will generate a simplified
DOM tree, with element query support.
Per the design, it intends to parse massive HTML files in lowest price, thus the
performance is the top priority. For this reason, some malformatted HTML may not
be able to parse correctly, but most usual errors are covered (eg. HTML4 style
no closing `<li>`, `<td>` etc).
## Install
```shell
npm install --save node-html-parser
```
> Note: when using Fast HTML Parser in a Typescript project the minimum Typescript version supported is `^4.1.2`.
## Performance
-- 2022-08-10
```shell
html-parser :24.1595 ms/file ± 18.7667
htmljs-parser :4.72064 ms/file ± 5.67689
html-dom-parser :2.18055 ms/file ± 2.96136
html5parser :1.69639 ms/file ± 2.17111
cheerio :12.2122 ms/file ± 8.10916
parse5 :6.50626 ms/file ± 4.02352
htmlparser2 :2.38179 ms/file ± 3.42389
htmlparser :17.4820 ms/file ± 128.041
high5 :3.95188 ms/file ± 2.52313
node-html-parser:2.04288 ms/file ± 1.25203
node-html-parser (last release):2.00527 ms/file ± 1.21317
```
Tested with [htmlparser-benchmark](https://github.com/AndreasMadsen/htmlparser-benchmark).
## Usage
```ts
import { parse } from 'node-html-parser';
const root = parse('<ul id="list"><li>Hello World</li></ul>');
console.log(root.firstChild.structure);
// ul#list
// li
// #text
console.log(root.querySelector('#list'));
// { tagName: 'ul',
// rawAttrs: 'id="list"',
// childNodes:
// [ { tagName: 'li',
// rawAttrs: '',
// childNodes: [Object],
// classNames: [] } ],
// id: 'list',
// classNames: [] }
console.log(root.toString());
// <ul id="list"><li>Hello World</li></ul>
root.set_content('<li>Hello World</li>');
root.toString(); // <li>Hello World</li>
```
```js
var HTMLParser = require('node-html-parser');
var root = HTMLParser.parse('<ul id="list"><li>Hello World</li></ul>');
```
## Global Methods
### parse(data[, options])
Parse the data provided, and return the root of the generated DOM.
- **data**, data to parse
- **options**, parse options
```js
{
lowerCaseTagName: false, // convert tag name to lower case (hurts performance heavily)
comment: false, // retrieve comments (hurts performance slightly)
voidTag:{
tags: ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr'], // optional and case insensitive, default value is ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']
addClosingSlash: true // optional, default false. void tag serialisation, add a final slash <br/>
},
blockTextElements: {
script: true, // keep text content when parsing
noscript: true, // keep text content when parsing
style: true, // keep text content when parsing
pre: true // keep text content when parsing
}
}
```
### valid(data[, options])
Parse the data provided, return true if the given data is valid, and return false if not.
## HTMLElement Methods
### HTMLElement#trimRight()
Trim element from right (in block) after seeing pattern in a TextNode.
### HTMLElement#removeWhitespace()
Remove whitespaces in this sub tree.
### HTMLElement#querySelectorAll(selector)
Query CSS selector to find matching nodes.
Note: Full range of CSS3 selectors supported since v3.0.0.
### HTMLElement#querySelector(selector)
Query CSS Selector to find matching node.
### HTMLElement#getElementsByTagName(tagName)
Get all elements with the specified tagName.
Note: Use * for all elements.
### HTMLElement#closest(selector)
Query closest element by css selector.
### HTMLElement#appendChild(node)
Append a child node to childNodes
### HTMLElement#insertAdjacentHTML(where, html)
Parses the specified text as HTML and inserts the resulting nodes into the DOM tree at a specified position.
### HTMLElement#setAttribute(key: string, value: string)
Set `value` to `key` attribute.
### HTMLElement#setAttributes(attrs: Record<string, string>)
Set attributes of the element.
### HTMLElement#removeAttribute(key: string)
Remove `key` attribute.
### HTMLElement#getAttribute(key: string)
Get `key` attribute.
### HTMLElement#exchangeChild(oldNode: Node, newNode: Node)
Exchanges given child with new child.
### HTMLElement#removeChild(node: Node)
Remove child node.
### HTMLElement#toString()
Same as [outerHTML](#htmlelementouterhtml)
### HTMLElement#set_content(content: string | Node | Node[])
Set content. **Notice**: Do not set content of the **root** node.
### HTMLElement#remove()
Remove current element.
### HTMLElement#replaceWith(...nodes: (string | Node)[])
Replace current element with other node(s).
### HTMLElement#classList
#### HTMLElement#classList.add
Add class name.
#### HTMLElement#classList.replace(old: string, new: string)
Replace class name with another one.
#### HTMLElement#classList.remove()
Remove class name.
#### HTMLElement#classList.toggle(className: string):void
Toggle class. Remove it if it is already included, otherwise add.
#### HTMLElement#classList.contains(className: string): boolean
Returns true if the classname is already in the classList.
#### HTMLElement#classList.values()
Get class names.
#### Node#clone()
Clone a node.
#### Node#getElementById(id: string): HTMLElement;
Get element by it's ID.
## HTMLElement Properties
### HTMLElement#text
Get unescaped text value of current node and its children. Like `innerText`.
(slow for the first time)
### HTMLElement#rawText
Get escaped (as-is) text value of current node and its children. May have
`&amp;` in it. (fast)
### HTMLElement#tagName
Get or Set tag name of HTMLElement. Notice: the returned value would be an uppercase string.
### HTMLElement#structuredText
Get structured Text.
### HTMLElement#structure
Get DOM structure.
### HTMLElement#firstChild
Get first child node.
### HTMLElement#lastChild
Get last child node.
### HTMLElement#innerHTML
Set or Get innerHTML.
### HTMLElement#outerHTML
Get outerHTML.
### HTMLElement#nextSibling
Returns a reference to the next child node of the current element's parent.
### HTMLElement#nextElementSibling
Returns a reference to the next child element of the current element's parent.
### HTMLElement#previousSibling
Returns a reference to the previous child node of the current element's parent.
### HTMLElement#previousElementSibling
Returns a reference to the previous child element of the current element's parent.
### HTMLElement#textContent
Get or Set textContent of current element, more efficient than [set_content](#htmlelementset_contentcontent-string--node--node).
### HTMLElement#attributes
Get all attributes of current element. **Notice: do not try to change the returned value.**
### HTMLElement#classList
Get all attributes of current element. **Notice: do not try to change the returned value.**
### HTMLElement#range
Corresponding source code start and end indexes (ie [ 0, 40 ])

View file

@ -0,0 +1 @@
export default function arr_back<T>(arr: T[]): T;

View file

@ -0,0 +1,6 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
function arr_back(arr) {
return arr[arr.length - 1];
}
exports.default = arr_back;

View file

@ -0,0 +1,20 @@
import CommentNode from './nodes/comment';
import HTMLElement, { Options } from './nodes/html';
import Node from './nodes/node';
import TextNode from './nodes/text';
import NodeType from './nodes/type';
import baseParse from './parse';
import valid from './valid';
export { Options } from './nodes/html';
export { parse, HTMLElement, CommentNode, valid, Node, TextNode, NodeType };
declare function parse(data: string, options?: Partial<Options>): HTMLElement;
declare namespace parse {
var parse: typeof baseParse;
var HTMLElement: typeof import("./nodes/html").default;
var CommentNode: typeof import("./nodes/comment").default;
var valid: typeof import("./valid").default;
var Node: typeof import("./nodes/node").default;
var TextNode: typeof import("./nodes/text").default;
var NodeType: typeof import("./nodes/type").default;
}
export default parse;

View file

@ -0,0 +1,35 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.NodeType = exports.TextNode = exports.Node = exports.valid = exports.CommentNode = exports.HTMLElement = exports.parse = void 0;
var comment_1 = __importDefault(require("./nodes/comment"));
exports.CommentNode = comment_1.default;
var html_1 = __importDefault(require("./nodes/html"));
exports.HTMLElement = html_1.default;
var node_1 = __importDefault(require("./nodes/node"));
exports.Node = node_1.default;
var text_1 = __importDefault(require("./nodes/text"));
exports.TextNode = text_1.default;
var type_1 = __importDefault(require("./nodes/type"));
exports.NodeType = type_1.default;
var parse_1 = __importDefault(require("./parse"));
var valid_1 = __importDefault(require("./valid"));
exports.valid = valid_1.default;
function parse(data, options) {
if (options === void 0) { options = {
lowerCaseTagName: false,
comment: false
}; }
return (0, parse_1.default)(data, options);
}
exports.default = parse;
exports.parse = parse;
parse.parse = parse_1.default;
parse.HTMLElement = html_1.default;
parse.CommentNode = comment_1.default;
parse.valid = valid_1.default;
parse.Node = node_1.default;
parse.TextNode = text_1.default;
parse.NodeType = type_1.default;

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,6 @@
import { Adapter } from 'css-select/lib/types';
import HTMLElement from './nodes/html';
import Node from './nodes/node';
export declare type Predicate = (node: Node) => node is HTMLElement;
declare const _default: Adapter<Node, HTMLElement>;
export default _default;

View file

@ -0,0 +1,106 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
var type_1 = __importDefault(require("./nodes/type"));
function isTag(node) {
return node && node.nodeType === type_1.default.ELEMENT_NODE;
}
function getAttributeValue(elem, name) {
return isTag(elem) ? elem.getAttribute(name) : undefined;
}
function getName(elem) {
return ((elem && elem.rawTagName) || '').toLowerCase();
}
function getChildren(node) {
return node && node.childNodes;
}
function getParent(node) {
return node ? node.parentNode : null;
}
function getText(node) {
return node.text;
}
function removeSubsets(nodes) {
var idx = nodes.length;
var node;
var ancestor;
var replace;
// Check if each node (or one of its ancestors) is already contained in the
// array.
while (--idx > -1) {
node = ancestor = nodes[idx];
// Temporarily remove the node under consideration
nodes[idx] = null;
replace = true;
while (ancestor) {
if (nodes.indexOf(ancestor) > -1) {
replace = false;
nodes.splice(idx, 1);
break;
}
ancestor = getParent(ancestor);
}
// If the node has been found to be unique, re-insert it.
if (replace) {
nodes[idx] = node;
}
}
return nodes;
}
function existsOne(test, elems) {
return elems.some(function (elem) {
return isTag(elem) ? test(elem) || existsOne(test, getChildren(elem)) : false;
});
}
function getSiblings(node) {
var parent = getParent(node);
return parent && getChildren(parent);
}
function hasAttrib(elem, name) {
return getAttributeValue(elem, name) !== undefined;
}
function findOne(test, elems) {
var elem = null;
for (var i = 0, l = elems.length; i < l && !elem; i++) {
var el = elems[i];
if (test(el)) {
elem = el;
}
else {
var childs = getChildren(el);
if (childs && childs.length > 0) {
elem = findOne(test, childs);
}
}
}
return elem;
}
function findAll(test, nodes) {
var result = [];
for (var i = 0, j = nodes.length; i < j; i++) {
if (!isTag(nodes[i]))
continue;
if (test(nodes[i]))
result.push(nodes[i]);
var childs = getChildren(nodes[i]);
if (childs)
result = result.concat(findAll(test, childs));
}
return result;
}
exports.default = {
isTag: isTag,
getAttributeValue: getAttributeValue,
getName: getName,
getChildren: getChildren,
getParent: getParent,
getText: getText,
removeSubsets: removeSubsets,
existsOne: existsOne,
getSiblings: getSiblings,
hasAttrib: hasAttrib,
findOne: findOne,
findAll: findAll
};

View file

@ -0,0 +1,19 @@
import HTMLElement from './html';
import Node from './node';
import NodeType from './type';
export default class CommentNode extends Node {
rawText: string;
clone(): CommentNode;
constructor(rawText: string, parentNode: HTMLElement, range?: [number, number]);
/**
* Node Type declaration.
* @type {Number}
*/
nodeType: NodeType;
/**
* Get unescaped text value of current node and its children.
* @return {string} text content
*/
get text(): string;
toString(): string;
}

View file

@ -0,0 +1,54 @@
"use strict";
var __extends = (this && this.__extends) || (function () {
var extendStatics = function (d, b) {
extendStatics = Object.setPrototypeOf ||
({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||
function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };
return extendStatics(d, b);
};
return function (d, b) {
if (typeof b !== "function" && b !== null)
throw new TypeError("Class extends value " + String(b) + " is not a constructor or null");
extendStatics(d, b);
function __() { this.constructor = d; }
d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());
};
})();
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
var node_1 = __importDefault(require("./node"));
var type_1 = __importDefault(require("./type"));
var CommentNode = /** @class */ (function (_super) {
__extends(CommentNode, _super);
function CommentNode(rawText, parentNode, range) {
var _this = _super.call(this, parentNode, range) || this;
_this.rawText = rawText;
/**
* Node Type declaration.
* @type {Number}
*/
_this.nodeType = type_1.default.COMMENT_NODE;
return _this;
}
CommentNode.prototype.clone = function () {
return new CommentNode(this.rawText, null);
};
Object.defineProperty(CommentNode.prototype, "text", {
/**
* Get unescaped text value of current node and its children.
* @return {string} text content
*/
get: function () {
return this.rawText;
},
enumerable: false,
configurable: true
});
CommentNode.prototype.toString = function () {
return "<!--".concat(this.rawText, "-->");
};
return CommentNode;
}(node_1.default));
exports.default = CommentNode;

View file

@ -0,0 +1,233 @@
import VoidTag from '../void-tag';
import Node from './node';
import NodeType from './type';
export interface KeyAttributes {
id?: string;
class?: string;
}
export interface Attributes {
[key: string]: string;
}
export interface RawAttributes {
[key: string]: string;
}
export declare type InsertPosition = 'beforebegin' | 'afterbegin' | 'beforeend' | 'afterend';
declare class DOMTokenList {
private _set;
private _afterUpdate;
private _validate;
constructor(valuesInit?: string[], afterUpdate?: (t: DOMTokenList) => void);
add(c: string): void;
replace(c1: string, c2: string): void;
remove(c: string): void;
toggle(c: string): void;
contains(c: string): boolean;
get length(): number;
values(): IterableIterator<string>;
get value(): string[];
toString(): string;
}
/**
* HTMLElement, which contains a set of children.
*
* Note: this is a minimalist implementation, no complete tree
* structure provided (no parentNode, nextSibling,
* previousSibling etc).
* @class HTMLElement
* @extends {Node}
*/
export default class HTMLElement extends Node {
rawAttrs: string;
private voidTag;
private _attrs;
private _rawAttrs;
rawTagName: string;
id: string;
classList: DOMTokenList;
/**
* Node Type declaration.
*/
nodeType: NodeType;
/**
* Quote attribute values
* @param attr attribute value
* @returns {string} quoted value
*/
private quoteAttribute;
/**
* Creates an instance of HTMLElement.
* @param keyAttrs id and class attribute
* @param [rawAttrs] attributes in string
*
* @memberof HTMLElement
*/
constructor(tagName: string, keyAttrs: KeyAttributes, rawAttrs: string, parentNode: HTMLElement | null, range: [number, number], voidTag?: VoidTag);
/**
* Remove Child element from childNodes array
* @param {HTMLElement} node node to remove
*/
removeChild(node: Node): this;
/**
* Exchanges given child with new child
* @param {HTMLElement} oldNode node to exchange
* @param {HTMLElement} newNode new node
*/
exchangeChild(oldNode: Node, newNode: Node): this;
get tagName(): string;
set tagName(newname: string);
get localName(): string;
get isVoidElement(): boolean;
/**
* Get escpaed (as-it) text value of current node and its children.
* @return {string} text content
*/
get rawText(): string;
get textContent(): string;
set textContent(val: string);
/**
* Get unescaped text value of current node and its children.
* @return {string} text content
*/
get text(): string;
/**
* Get structured Text (with '\n' etc.)
* @return {string} structured text
*/
get structuredText(): string;
toString(): string;
get innerHTML(): string;
set innerHTML(content: string);
set_content(content: string | Node | Node[], options?: Partial<Options>): this;
replaceWith(...nodes: (string | Node)[]): void;
get outerHTML(): string;
/**
* Trim element from right (in block) after seeing pattern in a TextNode.
* @param {RegExp} pattern pattern to find
* @return {HTMLElement} reference to current node
*/
trimRight(pattern: RegExp): this;
/**
* Get DOM structure
* @return {string} strucutre
*/
get structure(): string;
/**
* Remove whitespaces in this sub tree.
* @return {HTMLElement} pointer to this
*/
removeWhitespace(): this;
/**
* Query CSS selector to find matching nodes.
* @param {string} selector Simplified CSS selector
* @return {HTMLElement[]} matching elements
*/
querySelectorAll(selector: string): HTMLElement[];
/**
* Query CSS Selector to find matching node.
* @param {string} selector Simplified CSS selector
* @return {(HTMLElement|null)} matching node
*/
querySelector(selector: string): HTMLElement | null;
/**
* find elements by their tagName
* @param {string} tagName the tagName of the elements to select
*/
getElementsByTagName(tagName: string): Array<HTMLElement>;
/**
* find element by it's id
* @param {string} id the id of the element to select
*/
getElementById(id: string): HTMLElement;
/**
* traverses the Element and its parents (heading toward the document root) until it finds a node that matches the provided selector string. Will return itself or the matching ancestor. If no such element exists, it returns null.
* @param selector a DOMString containing a selector list
*/
closest(selector: string): Node;
/**
* Append a child node to childNodes
* @param {Node} node node to append
* @return {Node} node appended
*/
appendChild<T extends Node = Node>(node: T): T;
/**
* Get first child node
* @return {Node} first child node
*/
get firstChild(): Node;
/**
* Get last child node
* @return {Node} last child node
*/
get lastChild(): Node;
/**
* Get attributes
* @access private
* @return {Object} parsed and unescaped attributes
*/
get attrs(): Attributes;
get attributes(): Record<string, string>;
/**
* Get escaped (as-is) attributes
* @return {Object} parsed attributes
*/
get rawAttributes(): RawAttributes;
removeAttribute(key: string): this;
hasAttribute(key: string): boolean;
/**
* Get an attribute
* @return {string} value of the attribute
*/
getAttribute(key: string): string | undefined;
/**
* Set an attribute value to the HTMLElement
* @param {string} key The attribute name
* @param {string} value The value to set, or null / undefined to remove an attribute
*/
setAttribute(key: string, value: string): void;
/**
* Replace all the attributes of the HTMLElement by the provided attributes
* @param {Attributes} attributes the new attribute set
*/
setAttributes(attributes: Attributes): this;
insertAdjacentHTML(where: InsertPosition, html: string): this;
get nextSibling(): Node;
get nextElementSibling(): HTMLElement;
get previousSibling(): Node;
get previousElementSibling(): HTMLElement;
get classNames(): string;
/**
* Clone this Node
*/
clone(): Node;
}
export interface Options {
lowerCaseTagName: boolean;
comment: boolean;
parseNoneClosedTags?: boolean;
blockTextElements: {
[tag: string]: boolean;
};
voidTag?: {
/**
* options, default value is ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr']
*/
tags?: string[];
/**
* void tag serialisation, add a final slash <br/>
*/
closingSlash?: boolean;
};
}
/**
* Parses HTML and returns a root element
* Parse a chuck of HTML source.
* @param {string} data html
* @return {HTMLElement} root element
*/
export declare function base_parse(data: string, options?: Partial<Options>): HTMLElement[];
/**
* Parses HTML and returns a root element
* Parse a chuck of HTML source.
*/
export declare function parse(data: string, options?: Partial<Options>): HTMLElement;
export {};

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,23 @@
import NodeType from './type';
import HTMLElement from './html';
/**
* Node Class as base class for TextNode and HTMLElement.
*/
export default abstract class Node {
parentNode: HTMLElement;
abstract nodeType: NodeType;
childNodes: Node[];
range: readonly [number, number];
abstract text: string;
abstract rawText: string;
abstract toString(): string;
abstract clone(): Node;
constructor(parentNode?: HTMLElement, range?: [number, number]);
/**
* Remove current node
*/
remove(): this;
get innerText(): string;
get textContent(): string;
set textContent(val: string);
}

View file

@ -0,0 +1,52 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
var he_1 = require("he");
/**
* Node Class as base class for TextNode and HTMLElement.
*/
var Node = /** @class */ (function () {
function Node(parentNode, range) {
if (parentNode === void 0) { parentNode = null; }
this.parentNode = parentNode;
this.childNodes = [];
Object.defineProperty(this, 'range', {
enumerable: false,
writable: true,
configurable: true,
value: range !== null && range !== void 0 ? range : [-1, -1]
});
}
/**
* Remove current node
*/
Node.prototype.remove = function () {
var _this = this;
if (this.parentNode) {
var children = this.parentNode.childNodes;
this.parentNode.childNodes = children.filter(function (child) {
return _this !== child;
});
this.parentNode = null;
}
return this;
};
Object.defineProperty(Node.prototype, "innerText", {
get: function () {
return this.rawText;
},
enumerable: false,
configurable: true
});
Object.defineProperty(Node.prototype, "textContent", {
get: function () {
return (0, he_1.decode)(this.rawText);
},
set: function (val) {
this.rawText = (0, he_1.encode)(val);
},
enumerable: false,
configurable: true
});
return Node;
}());
exports.default = Node;

View file

@ -0,0 +1,43 @@
import HTMLElement from './html';
import Node from './node';
import NodeType from './type';
/**
* TextNode to contain a text element in DOM tree.
* @param {string} value [description]
*/
export default class TextNode extends Node {
clone(): TextNode;
constructor(rawText: string, parentNode: HTMLElement, range?: [number, number]);
/**
* Node Type declaration.
* @type {Number}
*/
nodeType: NodeType;
private _rawText;
private _trimmedRawText?;
private _trimmedText?;
get rawText(): string;
/**
* Set rawText and invalidate trimmed caches
*/
set rawText(text: string);
/**
* Returns raw text with all whitespace trimmed except single leading/trailing non-breaking space
*/
get trimmedRawText(): string;
/**
* Returns text with all whitespace trimmed except single leading/trailing non-breaking space
*/
get trimmedText(): string;
/**
* Get unescaped text value of current node and its children.
* @return {string} text content
*/
get text(): string;
/**
* Detect if the node contains only white space.
* @return {boolean}
*/
get isWhitespace(): boolean;
toString(): string;
}

View file

@ -0,0 +1,142 @@
"use strict";
var __extends = (this && this.__extends) || (function () {
var extendStatics = function (d, b) {
extendStatics = Object.setPrototypeOf ||
({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||
function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };
return extendStatics(d, b);
};
return function (d, b) {
if (typeof b !== "function" && b !== null)
throw new TypeError("Class extends value " + String(b) + " is not a constructor or null");
extendStatics(d, b);
function __() { this.constructor = d; }
d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());
};
})();
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
var he_1 = require("he");
var node_1 = __importDefault(require("./node"));
var type_1 = __importDefault(require("./type"));
/**
* TextNode to contain a text element in DOM tree.
* @param {string} value [description]
*/
var TextNode = /** @class */ (function (_super) {
__extends(TextNode, _super);
function TextNode(rawText, parentNode, range) {
var _this = _super.call(this, parentNode, range) || this;
/**
* Node Type declaration.
* @type {Number}
*/
_this.nodeType = type_1.default.TEXT_NODE;
_this._rawText = rawText;
return _this;
}
TextNode.prototype.clone = function () {
return new TextNode(this._rawText, null);
};
Object.defineProperty(TextNode.prototype, "rawText", {
get: function () {
return this._rawText;
},
/**
* Set rawText and invalidate trimmed caches
*/
set: function (text) {
this._rawText = text;
this._trimmedRawText = void 0;
this._trimmedText = void 0;
},
enumerable: false,
configurable: true
});
Object.defineProperty(TextNode.prototype, "trimmedRawText", {
/**
* Returns raw text with all whitespace trimmed except single leading/trailing non-breaking space
*/
get: function () {
if (this._trimmedRawText !== undefined)
return this._trimmedRawText;
this._trimmedRawText = trimText(this.rawText);
return this._trimmedRawText;
},
enumerable: false,
configurable: true
});
Object.defineProperty(TextNode.prototype, "trimmedText", {
/**
* Returns text with all whitespace trimmed except single leading/trailing non-breaking space
*/
get: function () {
if (this._trimmedText !== undefined)
return this._trimmedText;
this._trimmedText = trimText(this.text);
return this._trimmedText;
},
enumerable: false,
configurable: true
});
Object.defineProperty(TextNode.prototype, "text", {
/**
* Get unescaped text value of current node and its children.
* @return {string} text content
*/
get: function () {
return (0, he_1.decode)(this.rawText);
},
enumerable: false,
configurable: true
});
Object.defineProperty(TextNode.prototype, "isWhitespace", {
/**
* Detect if the node contains only white space.
* @return {boolean}
*/
get: function () {
return /^(\s|&nbsp;)*$/.test(this.rawText);
},
enumerable: false,
configurable: true
});
TextNode.prototype.toString = function () {
return this.rawText;
};
return TextNode;
}(node_1.default));
exports.default = TextNode;
/**
* Trim whitespace except single leading/trailing non-breaking space
*/
function trimText(text) {
var i = 0;
var startPos;
var endPos;
while (i >= 0 && i < text.length) {
if (/\S/.test(text[i])) {
if (startPos === undefined) {
startPos = i;
i = text.length;
}
else {
endPos = i;
i = void 0;
}
}
if (startPos === undefined)
i++;
else
i--;
}
if (startPos === undefined)
startPos = 0;
if (endPos === undefined)
endPos = text.length - 1;
var hasLeadingSpace = startPos > 0 && /[^\S\r\n]/.test(text[startPos - 1]);
var hasTrailingSpace = endPos < (text.length - 1) && /[^\S\r\n]/.test(text[endPos + 1]);
return (hasLeadingSpace ? ' ' : '') + text.slice(startPos, endPos + 1) + (hasTrailingSpace ? ' ' : '');
}

View file

@ -0,0 +1,6 @@
declare enum NodeType {
ELEMENT_NODE = 1,
TEXT_NODE = 3,
COMMENT_NODE = 8
}
export default NodeType;

View file

@ -0,0 +1,9 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
var NodeType;
(function (NodeType) {
NodeType[NodeType["ELEMENT_NODE"] = 1] = "ELEMENT_NODE";
NodeType[NodeType["TEXT_NODE"] = 3] = "TEXT_NODE";
NodeType[NodeType["COMMENT_NODE"] = 8] = "COMMENT_NODE";
})(NodeType || (NodeType = {}));
exports.default = NodeType;

View file

@ -0,0 +1 @@
export { parse as default } from './nodes/html';

View file

@ -0,0 +1,5 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.default = void 0;
var html_1 = require("./nodes/html");
Object.defineProperty(exports, "default", { enumerable: true, get: function () { return html_1.parse; } });

View file

@ -0,0 +1,6 @@
import { Options } from './nodes/html';
/**
* Parses HTML and returns a root element
* Parse a chuck of HTML source.
*/
export default function valid(data: string, options?: Partial<Options>): boolean;

View file

@ -0,0 +1,13 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
var html_1 = require("./nodes/html");
/**
* Parses HTML and returns a root element
* Parse a chuck of HTML source.
*/
function valid(data, options) {
if (options === void 0) { options = { lowerCaseTagName: false, comment: false }; }
var stack = (0, html_1.base_parse)(data, options);
return Boolean(stack.length === 1);
}
exports.default = valid;

View file

@ -0,0 +1,7 @@
export default class VoidTag {
addClosingSlash: boolean;
private voidTags;
constructor(addClosingSlash?: boolean, tags?: string[]);
formatNode(tag: string, attrs: string, innerHTML: string): string;
isVoidElement(tag: string): boolean;
}

View file

@ -0,0 +1,29 @@
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
var VoidTag = /** @class */ (function () {
function VoidTag(addClosingSlash, tags) {
if (addClosingSlash === void 0) { addClosingSlash = false; }
this.addClosingSlash = addClosingSlash;
if (Array.isArray(tags)) {
this.voidTags = tags.reduce(function (set, tag) {
return set.add(tag.toLowerCase());
}, new Set());
}
else {
this.voidTags = ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr'].reduce(function (set, tag) {
return set.add(tag);
}, new Set());
}
}
VoidTag.prototype.formatNode = function (tag, attrs, innerHTML) {
var addClosingSlash = this.addClosingSlash;
var closingSpace = (addClosingSlash && attrs && !attrs.endsWith(' ')) ? ' ' : '';
var closingSlash = addClosingSlash ? "".concat(closingSpace, "/") : '';
return this.isVoidElement(tag.toLowerCase()) ? "<".concat(tag).concat(attrs).concat(closingSlash, ">") : "<".concat(tag).concat(attrs, ">").concat(innerHTML, "</").concat(tag, ">");
};
VoidTag.prototype.isVoidElement = function (tag) {
return this.voidTags.has(tag);
};
return VoidTag;
}());
exports.default = VoidTag;

View file

@ -0,0 +1,111 @@
{
"name": "node-html-parser",
"version": "5.4.2",
"description": "A very fast HTML parser, generating a simplified DOM, with basic element query support.",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"scripts": {
"compile": "tsc",
"build": "npm run lint && npm run clean && npm run compile:cjs && npm run compile:amd",
"compile:cjs": "tsc -m commonjs",
"compile:amd": "tsc -t es5 -m amd -d false --outFile ./dist/main.js",
"lint": "eslint ./src/*.ts ./src/**/*.ts",
"---------------": "",
"test": "yarn run test:target",
"test:src": "cross-env TEST_TARGET=src yarn run test",
"test:dist": "cross-env TEST_TARGET=dist yarn run test",
"benchmark": "node ./test/benchmark/compare.mjs",
"--------------- ": "",
"clean": "npx rimraf ./dist/",
"clean:global": "yarn run clean && npx rimraf yarn.lock test/yarn.lock test/node_modules node_modules",
"reset": "yarn run clean:global && yarn install && yarn build",
"--------------- ": "",
"test:target": "mocha --recursive \"./test/tests\"",
"test:ci": "cross-env TEST_TARGET=dist yarn run test:target",
"posttest": "yarn run benchmark",
"prepare": "cd test && yarn install"
},
"keywords": [
"html",
"parser",
"nodejs",
"typescript"
],
"files": [
"dist",
"README.md",
"LICENSE",
"CHANGELOG.md"
],
"author": "Xiaoyi Shi <ashi009@gmail.com>",
"contributors": [
"taoqf <tao_qiufeng@126.com>",
"Ron S. <ron@nonara.com>"
],
"license": "MIT",
"publishConfig": {
"registry": "https://registry.npmjs.org"
},
"dependencies": {
"css-select": "^4.2.1",
"he": "1.2.0"
},
"devDependencies": {
"@types/entities": "latest",
"@types/he": "latest",
"@types/node": "latest",
"@typescript-eslint/eslint-plugin": "latest",
"@typescript-eslint/eslint-plugin-tslint": "latest",
"@typescript-eslint/parser": "latest",
"blanket": "latest",
"cheerio": "^1.0.0-rc.5",
"cross-env": "^7.0.3",
"eslint": "^7.32.0",
"eslint-config-prettier": "latest",
"eslint-plugin-import": "latest",
"high5": "^1.0.0",
"html-dom-parser": "^1.0.4",
"html-parser": "^0.11.0",
"html5parser": "^2.0.2",
"htmljs-parser": "^2.11.1",
"htmlparser": "^1.7.7",
"htmlparser-benchmark": "^1.1.3",
"htmlparser2": "^6.0.0",
"mocha": "latest",
"mocha-each": "^2.0.1",
"neutron-html5parser": "^0.2.0",
"np": "latest",
"parse5": "^6.0.1",
"rimraf": "^3.0.2",
"saxes": "^6.0.0",
"should": "latest",
"spec": "latest",
"standard-version": "^9.3.1",
"travis-cov": "latest",
"ts-node": "^10.2.1",
"typescript": "latest"
},
"config": {
"blanket": {
"pattern": "./dist/index.js",
"data-cover-never": [
"node_modules"
]
},
"travis-cov": {
"threshold": 70
}
},
"directories": {
"test": "test"
},
"repository": {
"type": "git",
"url": "https://github.com/taoqf/node-fast-html-parser.git"
},
"bugs": {
"url": "https://github.com/taoqf/node-fast-html-parser/issues"
},
"homepage": "https://github.com/taoqf/node-fast-html-parser",
"sideEffects": false
}