diff --git a/README.md b/README.md index 091996677..9d12c64be 100644 --- a/README.md +++ b/README.md @@ -7,23 +7,12 @@ Join us at our development [Zulip chat](https://chat.developer.gosub.io)! For more general information you can also join our [Discord server](https://chat.gosub.io). -If you are interested in contributing to Gosub, please checkout the [contribution guide](CONTRIBUTING.md)! +If you are interested in contributing to Gosub, please check out the [contribution guide](CONTRIBUTING.md)! -``` - _ - | | - __ _ ___ ___ _ _| |__ - / _` |/ _ \/ __| | | | '_ \ -| (_| | (_) \__ \ |_| | |_) | - \__, |\___/|___/\__,_|_.__/ - __/ | The Gateway to - |___/ Optimized Searching and - Unlimited Browsing -``` ## About -This repository is part of the Gosub browser project. This is the main engine that holds the following components: +This repository is part of the Gosub browser engine project. This is the main engine that holds the following components: - HTML5 tokenizer / parser - CSS3 tokenizer / parser @@ -35,24 +24,24 @@ This repository is part of the Gosub browser project. This is the main engine th - JS bridge - C Bindings -The idea is that this engine will receive some kind of stream of bytes (most likely from a socket or file) and parse -this into a valid HTML5 document tree. +More will follow as the engine grows. The idea is that this engine will receive some kind of stream of bytes (most likely +from a socket or file) and parse this into a valid HTML5 document tree and CSS stylesheets. From that point, it can be fed to a renderer engine that will render the document tree into a window, or it can be fed -to a more simplistic engine that will render it in a terminal. -JS can be executed on the document tree and the document tree can be modified by JS. +to a more simplistic engine that will render it in a terminal. JS can be executed on the document tree and the document +tree can be modified by JS. + ## Status > This project is in its infancy. There is no usable browser yet. However, you can look at simple html pages and parse > them into a document tree. -We can parse html5 and css3 files into a document tree or the respective css tree. -This tree can be shown in the terminal or be rendered in a very unfinished renderer. Our renderer cannot render -everything yet, but it can render simple html pages. +We can parse HTML5 and CSS3 files into a document tree or the respective css tree. This tree can be shown in the terminal +or be rendered in a very unfinished renderer. Our renderer cannot render everything yet, but it can render simple html +pages, sort of. We already implemented other parts of the engine, for a JS engine, networking stack, a configuration store and other -things however these aren't integrated yet. -You can try these out by running the respective binary. +things however these aren't integrated yet. You can try these out by running the respective binary. We can render a part for our own [site](https://gosub.io): @@ -61,6 +50,7 @@ We can render a part for our own [site](https://gosub.io): Note: the borders are broken because of an issue with taffy (the layout engine we use). This will be fixed in the future. + ## How to run
@@ -89,22 +79,19 @@ You can run the following binaries: | Command | Type | Description | |----------------------------------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `cargo run -r --bin gosub-parser` | bin | The actual html5 parser/tokenizer that allows you to convert html5 into a document tree.
| -| `cargo run -r --bin parser-test` | test | A test suite for the parser that tests specific tests. This will be removed as soon as the parser is completely finished as this tool is for developement only. | -| `cargo run -r --bin html5-parser-test` | test | A test suite that tests all html5lib tests for the treebuilding | -| `cargo run -r --bin test-user-agent` | bin | A simple placeholder user agent for testing purposes | | `cargo run -r --bin config-store` | bin | A simple test application of the config store for testing purposes | | `cargo run -r --bin css3-parser` | bin | Show the parsed css tree | +| `cargo run -r --bin display-text-tree` | bin | A simple parser that will try and return a textual presentation of the website | +| `cargo run -r --bin gosub-parser` | bin | The actual html5 parser/tokenizer that allows you to convert html5 into a document tree. | +| `cargo run -r --bin html5-parser-test` | test | A test suite that tests all html5lib tests for the treebuilding | +| `cargo run -r --bin parser-test` | test | A test suite for the parser that tests specific tests. This will be removed as soon as the parser is completely finished as this tool is for developement only. | | `cargo run -r --bin renderer` | bin | Render a html page (WIP) | | `cargo run -r --bin run-js` | bin | Run a JS file (Note: console and event loop are not yet implemented) | -| `cargo run -r --bin style-parser` | bin | Display the html page's text with basic styles in the terminal | -You can then run the binaries like so: +For running the binaries, take a look at a quick introduction at [/docs/binaries.md](/docs/binaries.md) -```bash -cargo run -r --bin renderer file://src/bin/resources/gosub.html -``` +## Benchmark and test suites To run the tests and benchmark suite, do: @@ -125,6 +112,7 @@ wasm-pack build ![Browser in browser](resources/images/browser-in-browser.png) + ## Contributing to the project We welcome contributions to this project but the current status makes that we are spending a lot of time researching, diff --git a/docs/css_properties.md b/crates/gosub_css3/docs/css_properties.md similarity index 100% rename from docs/css_properties.md rename to crates/gosub_css3/docs/css_properties.md diff --git a/docs/css_styles.md b/crates/gosub_css3/docs/css_styles.md similarity index 100% rename from docs/css_styles.md rename to crates/gosub_css3/docs/css_styles.md diff --git a/docs/parsing.md b/crates/gosub_html5/docs/parsing.md similarity index 100% rename from docs/parsing.md rename to crates/gosub_html5/docs/parsing.md diff --git a/docs/binaries.md b/docs/binaries.md new file mode 100644 index 000000000..4da6802b8 --- /dev/null +++ b/docs/binaries.md @@ -0,0 +1,169 @@ +# Gosub binaries + +The current engine is supported with a few binaries to test out some of the different components. They are by themselves +stand-alone binaries but they have not a lot of use besides testing and experimenting with the engine. + + +## config-store + +The `config-store` allows you to view and modify the current config-store system found in the engine. + +```bash + +$ cargo run -r --bin config-store list + +dns.cache.max_entries : u:1000 +dns.cache.ttl.override.enabled : b:false +dns.cache.ttl.override.seconds : u:0 +dns.local.enabled : b:true +dns.local.table : m: '' +dns.remote.doh.enabled : b:false +dns.remote.dot.enabled : b:false +dns.remote.nameservers : m: '' +dns.remote.retries : u:3 +dns.remote.timeout : u:5 +dns.remote.use_hosts_file : b:true +useragent.default_page : s:about:blank +useragent.tab.close_button : m: left +useragent.tab.max_opened : i:-1 +renderer.opengl.enabled : b:true + + +$ cargo run -r --bin config-store search --key 'user*' +useragent.default_page : s:about:blank +useragent.tab.close_button : m: left +useragent.tab.max_opened : i:-1 + +``` + + +## css3-parser + +The `css3-parser` will try and parse a CSS stylesheet and displays any errors it find or shows the parsed css tree. + +```css +div, a { + color: white; + border: 1px solid black; +} +``` + +```bash +$ cargo run -r --bin css3-parser file://tests/data/css3-data/test.css + +[Stylesheet (1)] + [Rule] + [SelectorList (2)] + [Selector] + [Ident] div + [Selector] + [Combinator] + [Ident] a + [Block] + [Declaration] property: color important: false + [Ident] white + [Declaration] property: border important: false + [Dimension] 1px + [Ident] solid + [Ident] black +``` + +It does not test properties to see if their syntax match. So it will parse `color: 1%` as a valid line. + + +## gosub-parser + +Fetches a URL, parses it and returns information about the process. It will return any information about stylesheets loaded, timings and displays the +body of the fetched page + +```bash + +$ cargo run -r --bin gosub-parser https://news.ycombinator.com + +Parsing url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("news.ycombinator.com")), port: None, path: "/", query: None, fragment: None } + +Found 1 stylesheets +Stylesheet location: "https://news.ycombinator.com/news.css?evaBHzX7ZyR20JbMfele" + +Parse Error: expected-doctype-but-got-start-tag +Parse Error: link element with rel attribute 'icon' is not supported in the body +Parse Error: link element with rel attribute 'alternate' is not supported in the body +Parse Error: anything else not allowed in after body insertion mode + +Namespace | Count | Total | Min | Max | Avg | 50% | 75% | 95% | 99% +---------------------------------------------------------------------------------------------------------------------------------------- +html5.parse | 1 | 605ms | 605ms | 605ms | 605ms | 605ms | 605ms | 605ms | 605ms + | 1 | 605ms | https://news.ycombinator.com/ +css3.parse | 1 | 613µs | 613µs | 613µs | 613µs | 613µs | 613µs | 613µs | 613µs + | 1 | 613µs | https://news.ycombinator.com/news.css?evaBHzX7ZyR20JbMfele +... + +``` + + +```bash + +$ cargo run -r --bin gosub-parser https://gosub.io + +Parsing url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("gosub.io")), port: None, path: "/", query: None, fragment: None } + +Found 2 stylesheets +Stylesheet location: "https://gosub.io/#inline" +Stylesheet location: "https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/all.min.css" + +Parse Error: link element with rel attribute 'apple-touch-icon' is not supported in the body +Parse Error: link element with rel attribute 'icon' is not supported in the body +Parse Error: link element with rel attribute 'icon' is not supported in the body +Parse Error: link element with rel attribute 'manifest' is not supported in the body + +Namespace | Count | Total | Min | Max | Avg | 50% | 75% | 95% | 99% +---------------------------------------------------------------------------------------------------------------------------------------- +html5.parse | 1 | 117ms | 117ms | 117ms | 117ms | 117ms | 117ms | 117ms | 117ms + | 1 | 117ms | https://gosub.io/ +css3.parse | 2 | 7ms | 101µs | 7ms | 3ms | 7ms | 7ms | 7ms | 7ms + | 1 | 101µs | https://gosub.io/#inline + | 1 | 7ms | https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.2/css/all.min.css + +``` + + +## html5-parser-test + +Runs the html5 test suite from the commandline. Might actually not function because it might not be able to find the testsuite files. See [this issue](https://github.com/gosub-io/gosub-engine/issues/521) + + +## parser-test + +Runs the html5 parser test suite from the commandline. Might actually not function because it might not be able to find the testsuite files. See [this issue](https://github.com/gosub-io/gosub-engine/issues/521) + + +## renderer + +A simple (graphical) renderer that tries to render the given url. + + +## run-js + +Runs (simple) javascripts through the v8 engine. There is no connection with api's so `console.log` wont work. + +```javascript +var a = 1 + 3 + +a +``` + +```bash + +$ cargo run -r --bin run-js tests/example1.js +Got Value: 4 +``` + + +## display-text-tree + +Generates a textual representation of a given website. Basically it will print all the text nodes from the page. + +```bash + +$ cargo run -r --bin display-text-tree https://gosub.io +``` \ No newline at end of file diff --git a/src/bin/test-user-agent.rs b/src/bin/display-text-tree.rs similarity index 55% rename from src/bin/test-user-agent.rs rename to src/bin/display-text-tree.rs index 826eeb584..4fdf13ac2 100644 --- a/src/bin/test-user-agent.rs +++ b/src/bin/display-text-tree.rs @@ -9,7 +9,7 @@ fn main() -> Result<()> { let url = std::env::args() .nth(1) .or_else(|| { - println!("Usage: gosub-browser "); + println!("Usage: display-text-tree "); exit(1); }) .unwrap(); @@ -29,55 +29,13 @@ fn main() -> Result<()> { let document = DocumentBuilder::new_document(None); let parse_errors = Html5Parser::parse_document(&mut stream, Document::clone(&document), None)?; - match get_node_by_path(&document.get(), vec!["html", "body"]) { - None => { - println!("[No Body Found]"); - } - Some(node) => display_node(&document.get(), node), - } - for e in parse_errors { println!("Parse Error: {}", e.message); } - Ok(()) -} + display_node(&document.get(), document.get().get_root()); -fn get_node<'a>(document: &'a Document, parent: &'a Node, name: &'a str) -> Option<&'a Node> { - for id in &parent.children { - match document.get_node_by_id(*id) { - None => {} - Some(node) => { - if node.name.eq(name) { - return Some(node); - } - } - } - } - None -} - -fn get_node_by_path<'a>(document: &'a Document, path: Vec<&'a str>) -> Option<&'a Node> { - let mut node = document.get_root(); - match document.get_node_by_id(node.children[0]) { - None => { - return None; - } - Some(child) => { - node = child; - } - } - for name in path { - match get_node(document, node, name) { - Some(new_node) => { - node = new_node; - } - None => { - return None; - } - } - } - Some(node) + Ok(()) } fn display_node(document: &Document, node: &Node) { diff --git a/src/bin/document-writer.rs b/src/bin/document-writer.rs deleted file mode 100644 index 6bcf5bb81..000000000 --- a/src/bin/document-writer.rs +++ /dev/null @@ -1,98 +0,0 @@ -use std::fs; -use std::process::exit; -use std::str::FromStr; - -use anyhow::bail; -use url::Url; - -use gosub_html5::node::NodeId; -use gosub_html5::parser::document::{Document, DocumentBuilder}; -use gosub_html5::parser::Html5Parser; -use gosub_shared::byte_stream::{ByteStream, Encoding}; -use gosub_shared::timing::Scale; -use gosub_shared::timing_display; -use gosub_shared::types::Result; - -fn bail(message: &str) -> ! { - println!("{message}"); - exit(1); -} - -fn main() -> Result<()> { - let matches = clap::Command::new("Gosub parser") - .version("0.1.0") - .arg( - clap::Arg::new("url") - .help("The url or file to parse") - .required(true) - .index(1), - ) - .get_matches(); - - let url = matches - .get_one::("url") - .ok_or("Missing url") - .unwrap() - .to_string(); - - let url = Url::from_str(&url).unwrap_or_else(|_| bail("Invalid url")); - - println!("Parsing url: {:?}", url); - - let html = if url.scheme() == "http" || url.scheme() == "https" { - // Fetch the html from the url - let response = ureq::get(url.as_ref()).call()?; - if response.status() != 200 { - bail!("Could not get url. Status code {}", response.status()); - } - response.into_string()? - } else if url.scheme() == "file" { - // Get html from the file - fs::read_to_string(url.to_string().trim_start_matches("file://"))? - } else { - bail("Invalid url scheme"); - }; - - let mut stream = ByteStream::new(Encoding::UTF8, None); - stream.read_from_str(&html, Some(Encoding::UTF8)); - stream.close(); - - // SimpleLogger::new().init().unwrap(); - - // Create a new document that will be filled in by the parser - let handle = DocumentBuilder::new_document(Some(url)); - let parse_errors = Html5Parser::parse_document(&mut stream, Document::clone(&handle), None)?; - - println!("Found {} stylesheets", handle.get().stylesheets.len()); - for sheet in &handle.get().stylesheets { - println!("Stylesheet location: {:?}", sheet.location); - } - - // let mut handle_mut = handle.get_mut(); - // CssComputer::new(&mut *handle_mut).generate_style(); - // drop(handle_mut); - - // println!("Generated tree: \n\n {handle}"); - - for e in parse_errors { - println!("Parse Error: {}", e.message); - } - - timing_display!(true, Scale::Auto); - - let doc = handle.get(); - - let mut body = NodeId::root(); - - for (id, node) in doc.nodes() { - if node.name == "body" { - body = *id; - } - } - - let wrote = doc.write_from_node(body); - - println!("{wrote}"); - - Ok(()) -} diff --git a/src/bin/style-parser.rs b/src/bin/style-parser.rs deleted file mode 100644 index 76d1d578a..000000000 --- a/src/bin/style-parser.rs +++ /dev/null @@ -1,137 +0,0 @@ -use std::fs; - -use anyhow::{bail, Result}; -use url::Url; - -use gosub_html5::parser::document::Document; -use gosub_html5::parser::document::DocumentBuilder; -use gosub_html5::parser::Html5Parser; -use gosub_shared::byte_stream::{ByteStream, Encoding}; - -// struct TextVisitor { -// color: String, -// } -// -// impl TextVisitor { -// fn new() -> Self { -// Self { -// color: String::from(""), -// } -// } -// } -/* -impl TreeVisitor for TextVisitor { - fn document_enter(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &DocumentData) {} - - fn document_leave(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &DocumentData) {} - - fn doctype_enter(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &DocTypeData) {} - - fn doctype_leave(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &DocTypeData) {} - - fn text_enter(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, data: &TextData) { - // let re = Regex::new(r"\s{2,}").unwrap(); - // let s = re.replace_all(&data.value, " "); - let s = &data.value; - - if !self.color.is_empty() { - print!("\x1b[{}", self.color) - } - - if !s.is_empty() { - print!("{}", s) - } - - if !self.color.is_empty() { - print!("\x1b[0m") - } - } - - fn text_leave(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &TextData) {} - - fn comment_enter(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &CommentData) {} - - fn comment_leave(&mut self, _tree: &RenderTree, _node: &RenderTreeNode, _data: &CommentData) {} - - fn element_enter(&mut self, tree: &RenderTree, node: &RenderTreeNode, data: &ElementData) { - if let Some(mut prop) = tree.get_property(node.id, "color") { - if let Some(col) = prop.compute_value().to_color() { - self.color = format!("\x1b[38;2;{};{};{}m", col.r, col.g, col.b) - } - } - - if let Some(mut prop) = tree.get_property(node.id, "background-color") { - if let Some(col) = prop.compute_value().to_color() { - print!("\x1b[48;2;{};{};{}m", col.r, col.g, col.b) - } - } - - print!("<{} ({})>", data.name, data.node_id); - } - - fn element_leave(&mut self, tree: &RenderTree, node: &RenderTreeNode, data: &ElementData) { - if let Some(mut prop) = tree.get_property(node.id, "color") { - if let Some(col) = prop.compute_value().to_color() { - self.color = format!("\x1b[38;2;{};{};{}m", col.r, col.g, col.b) - } - } - - if let Some(mut prop) = tree.get_property(node.id, "background-color") { - if let Some(col) = prop.compute_value().to_color() { - print!("\x1b[48;2;{};{};{}m", col.r, col.g, col.b) - } - } - - print!("", data.name); - print!("\x1b[39;49m"); // default terminal color reset - } -} - */ - -fn main() -> Result<()> { - let matches = clap::Command::new("Gosub Style parser") - .version("0.1.0") - .arg( - clap::Arg::new("url") - .help("The url or file to parse") - .required(true) - .index(1), - ) - .get_matches(); - - let str_url: String = matches.get_one::("url").expect("url").to_string(); - let url = Url::parse(&str_url)?; - - let html = if url.scheme() == "http" || url.scheme() == "https" { - // Fetch the html from the url - let response = ureq::get(url.as_ref()).call()?; - if response.status() != 200 { - bail!(format!( - "Could not get url. Status code {}", - response.status() - )); - } - response.into_string()? - } else if url.scheme() == "file" { - fs::read_to_string(str_url.trim_start_matches("file://"))? - } else { - bail!("Unsupported url scheme: {}", url.scheme()); - }; - - let mut stream = ByteStream::new(Encoding::UTF8, None); - stream.read_from_str(&html, Some(Encoding::UTF8)); - stream.close(); - - let doc_handle = DocumentBuilder::new_document(Some(url)); - let _parse_errors = - Html5Parser::parse_document(&mut stream, Document::clone(&doc_handle), None)?; - - // let _render_tree = generate_render_tree(Document::clone(&doc_handle))?; - - //TODO: what do we do with the TreeVisitor? - - // let mut visitor = Box::new(TextVisitor::new()) as Box>; - // walk_render_tree(&render_tree, &mut visitor); - - Ok(()) -} diff --git a/tests/data/css3-data/test.css b/tests/data/css3-data/test.css new file mode 100644 index 000000000..3249e0218 --- /dev/null +++ b/tests/data/css3-data/test.css @@ -0,0 +1,4 @@ +div, a { + color: white; + border: 1px solid black; +} diff --git a/tests/example1.js b/tests/example1.js new file mode 100644 index 000000000..5c0d671b7 --- /dev/null +++ b/tests/example1.js @@ -0,0 +1,3 @@ +var a = 1 + 3 + +a \ No newline at end of file