Skip to content

Commit

Permalink
preprocessor/lexer/parser: Massive refactor of parser pipeline, added…
Browse files Browse the repository at this point in the history
… import statement (#76)

* preprocessor/lexer/parser: Refactor of pattern language parsing pipeline (#74)

* lexer: update token creation and remove tokens dependency from ast nodes

* lexer: refactor lexer, add nan and inf and general overhaul

* parser: switch to folds over recursion

* parser: remove all unnecessary MATCHES macro expansions

* tests: add doc comments and strings test

* parser/lexer: fixes and stability

* preprocessor/lexer/errors: implement new error system and fragments of new parser system

* misc: remove compiler explorer plugin settings

* build: update libfmt

* tokens/errors/lexer: small refactor and lift of `Location`

* pattern language: make source required to be passed

* preprocessor: calculate location correctly

* preprocessor: fix include resolver not being passed correctly

* error: improved error collection capabilities

* all: new resolver model and partial error implementation

* preprocessor: refactor

* error: support printing compile errors

* pipeline: small fixes

* parser/preprocessor/lexer: small changes to api and fixes

* pattern_language: fixes to support new apis

* preprocessor/lexer: improve lexer and preprocessor and add length to locations

* token: make tokens comparable with literals

* resolver: rework resolvers & parser manager

* parser: simple cleanup & improvements

* build: Improve build times by removing std::chrono from headers

* fix: small parser bug

* fix: small parser fixes

* fix: small mistake

* parser: switch parser over to new error model

* ast: migrate ASTNode over to Location

* clean: remove old unused errors

* validator: refactor validator & mirgrate to new error system

* misc: small refactor, move evaluator over to new runtime errors

* build: update depends

* misc: remove unused code

* fix: remove from cmake lists

* misc: apply suggested changes & rework of source resolving

* doc: document functions

* misc: better name

* fix: add back old `bitfield_order` pragma

* fix: small bug

* impr: add printing of compile errors to tests

* misc: apply review suggestions

* misc: apply review suggestions

* style: adapted style

* lib: Add function to lex a string into tokens

* misc: fix bugs and correct error locations

* parser: Fix various crashes during parsing

* lib: Fix crash when error occurred during preprocessing or lexing

* lib: Fix integer underflow when formatting error messages

* parser: Fixed doc comment parsing

* lib: Make sure resolvers and preprocessor are reset correctly

* parse: Improve handling of infinite loops during error handling

* fix: solve iterator issues by adding safe iterator

* fix: incorrect column numbers

* fix: avoid implicit casts by using diff type

* fix: Compile and linking errors when using clang

* feat: added fuzzing utilities

* feat: add fuzzing dictionary

* fix: update fuzzing instructions

* parser: Use safe pointers

* misc: move safe pointer to own header

* misc: add `IteratorLike` concept to SafeIterator

* misc: update libwolv

* fix: fix alot of crashes

* fix: match not check nullptr

* fix: try-catch not checking for null

* fix: more null-checks

* parser: Fix build errors on clang

* parser: Try to circumvent ICE

* misc: header optimization

* misc: rework error display

* feat: import statement

* misc: Fix merge conflicts

* tests: Fix compile issues

* fix: once import not accounting for different aliases

* fix: Source::Empty() function being constexpr

---------

Co-authored-by: WerWolv <[email protected]>
  • Loading branch information
jumanji144 and WerWolv authored Feb 4, 2024
1 parent 82cce31 commit 188a808
Show file tree
Hide file tree
Showing 76 changed files with 4,025 additions and 2,101 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
cmake-build-*/
build/
.idea/compilerexplorer.settings.xml
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,8 @@ if (LIBPL_ENABLE_CLI)
add_subdirectory(cli)
endif ()

if (LIBPL_ENABLE_FUZZING)
add_subdirectory(fuzz EXCLUDE_FROM_ALL)
endif ()

add_library(pl::libpl ALIAS libpl)
6 changes: 4 additions & 2 deletions cli/source/helpers/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
#include <wolv/io/file.hpp>

#include <fmt/format.h>
#include <pl/helpers/utils.hpp>
#include <wolv/utils/string.hpp>

namespace pl::cli {

Expand Down Expand Up @@ -31,11 +33,11 @@ namespace pl::cli {
});

// Execute pattern file
if (!runtime.executeString(patternFile.readString())) {
if (!runtime.executeString(patternFile.readString(), wolv::util::toUTF8String(patternFile.getPath()))) {
auto error = runtime.getError().value();
fmt::print("Pattern Error: {}:{} -> {}\n", error.line, error.column, error.message);
std::exit(EXIT_FAILURE);
}
}

}
}
2 changes: 1 addition & 1 deletion cli/source/subcommands/docs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ namespace pl::cli::sub {
// Execute pattern file
wolv::io::File patternFile(patternFilePath, wolv::io::File::Mode::Read);

auto ast = runtime.parseString(patternFile.readString());
auto ast = runtime.parseString(patternFile.readString(), wolv::util::toUTF8String(patternFile.getPath()));
if (!ast.has_value()) {
auto error = runtime.getError().value();
fmt::print("Pattern Error: {}:{} -> {}\n", error.line, error.column, error.message);
Expand Down
2 changes: 1 addition & 1 deletion cli/source/subcommands/info.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ namespace pl::cli::sub {
return true;
});

auto ast = runtime.parseString(patternFile.readString());
auto ast = runtime.parseString(patternFile.readString(), wolv::util::toUTF8String(patternFile.getPath()));
if (!ast.has_value()) {
auto error = runtime.getError().value();
fmt::print("Pattern Error: {}:{} -> {}\n", error.line, error.column, error.message);
Expand Down
5 changes: 5 additions & 0 deletions fuzz/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.gdb_history
plfuzz
sync/
output/
graph/
9 changes: 9 additions & 0 deletions fuzz/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
cmake_minimum_required(VERSION 3.16)
project(plfuzz)

set(CMAKE_CXX_STANDARD 23)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})

add_executable(plfuzz source/main.cpp)

target_link_libraries(plfuzz PRIVATE libpl libpl-gen fmt::fmt-header-only)
42 changes: 42 additions & 0 deletions fuzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Pattern Language Fuzzing
Small subproject in the pattern language to allow fuzzing the parser for crashes and other issues.

### Pre-requisites
To use the fuzzer, you must first have the [AFL++](https://github.com/AFLplusplus/AFLplusplus) fuzzer installed.
Follow the instructions on their repository on how to build the afl-fuzz and afl-cc binaries.
Keep in mind that if you are compiling from source, the af-cc/afl-c++ binaries must be compiled with atleast
clang-17 support.

### Building
To build the fuzzer you must set the compiler to afl-cc/afl-c++. To achieve this, you need to alter the cmake flags to
include the following:
```bash
-DCMAKE_C_COMPILER=afl-cc -DCMAKE_CXX_COMPILER=afl-c++ -DLIBPL_ENABLE_FUZZING=ON
```
After this, you can build the fuzzer as normal.
The binary will be in this source folder.

### Fuzzing
The plfuzz binary takes in a file to parse as an argument.

To fuzz you can now follow the AFL++ tutorials on how to effectively fuzz.
There are some simple inputs in the `inputs` folder that you can use to start fuzzing.
There is also a dictionary file in the `dict` folder that you can use to improve the quality of the fuzzing.

Here is an example of how to start fuzzing:
```bash
afl-fuzz -i inputs -o output -x ./dict/hexpat.dict -- ./plfuzz @@
```
This will run a simple fuzzing session with the inputs in the `inputs` folder and outputting to the `output` folder.

### Debugging
During the session, if the fuzzer finds crashes or halts, it will output the crashing input to the
`output/crashes` or `output/hangs` folder.
To debug these cases simply run the plfuzz binary with the file as an argument:
```bash
./plfuzz output/crashes/<crash_file>
```
And you can attach GDB to the process to debug the crash, like so:
```bash
gdb -- ./plfuzz output/crashes/<crash_file>
```
81 changes: 81 additions & 0 deletions fuzz/dict/hexpat.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#
# AFL dictionary for pattern language hexpat format
#

keyword_if="if"
keyword_else="else"
keyword_while="while"
keyword_for="for"
keyword_match="match"
keyword_return="return"
keyword_break="break"
keyword_continue="continue"
keyword_struct="struct"
keyword_enum="enum"
keyword_union="union"
keyword_function="fn"
keyword_bitfield="bitfield"
keyword_unsigned="unsigned"
keyword_signed="signed"
keyword_little_endian="le"
keyword_big_endian="be"
keyword_parent="parent"
keyword_namespace="namespace"
keyword_using="using"
keyword_this="this"
keyword_in="in"
keyword_out="out"
keyword_reference="ref"
keyword_null="null"
keyword_const="const"
keyword_try="try"
keyword_catch="catch"
keyword_import="import"
keyword_as="as"
keyword_is="is"

misc_1=" 1"
misc_a="a"
misc_type="u8"
misc_decl="u8 a=1"
misc_assign=" a=1"
misc_code=" {}"
misc_string="\"a\""
misc_comment="//"
misc_comment2="/* */"
misc_comment3="/** */"
misc_comment4="/*! */"
misc_comment5="///"
misc_comment6="//!"
misc_comment7="/**"
misc_minus=" -"
misc_plus=" +"
misc_div=" /"
misc_mul=" *"
misc_mod=" %"
misc_and=" &"
misc_or=" |"
misc_xor=" ^"
misc_not=" !"
misc_lshift=" <<"
misc_rshift=" >>"
misc_eq=" ="
misc_neq=" !="
misc_lt=" <"
misc_gt=" >"
misc_leq=" <="
misc_geq=" >="
misc_and2=" &&"
misc_or2=" ||"
misc_unicode_char="'\\u0000'"
misc_hex_char="'\\x00'"
misc_member_access="."
misc_namespace_access="::"
misc_bitfield_size=": 1,"
misc_enum_entry="= 1,"
misc_function_call="()"
misc_function_paren="( )"
misc_struct_decl="struct a {}"
misc_enum_decl="enum a : u8 {}"
misc_union_decl="union a {}"
misc_bitfield_decl="bitfield a : u8 {}"
1 change: 1 addition & 0 deletions fuzz/inputs/a.hexpat
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
u8 a;
5 changes: 5 additions & 0 deletions fuzz/inputs/complex-struct.hexpat
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
struct A {
u8 a;
u8 b;
u8 c;
};
26 changes: 26 additions & 0 deletions fuzz/inputs/omni.hexpat
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
u8 a = 43 [[export]];
u8 c = 11 [[hidden]];
u8 d @ 0x00 [[export]];

u8 a[while(0)] @ 0x00 [[export]];

struct A {
u8 a [[export]];
u8 c [[hidden]];
} [[format("A")]];

enum B : u8 {
A = 43;
B = 432,
C = 62
};

bitfield C {
u8 a : 23 [[export]];
u8 c : 252 [[hidden]];
} [[format("A")]];

fn test() {
a = 3;
c = 4;
};
6 changes: 6 additions & 0 deletions fuzz/inputs/preprocessor.hexpat
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#define A
#define B
#ifdef A
#define C
u8 a;
#endif
1 change: 1 addition & 0 deletions fuzz/inputs/struct.hexpat
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
struct A {};
47 changes: 47 additions & 0 deletions fuzz/source/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#include <pl/pattern_language.hpp>
#include <wolv/io/file.hpp>

#include <iostream>
#include <pl/helpers/utils.hpp>
#include <wolv/utils/string.hpp>

int main(int argc, char **argv) {
if(argc < 2) {
std::cout << "Invalid number of arguments specified! " << argc << std::endl;
return EXIT_FAILURE;
}

std::fs::path path;

if(strncmp(argv[1], "-t", 2) == 0) {
if(argc != 3 ) {
std::cout << "Invalid number of arguments specified! " << argc << std::endl;
return EXIT_FAILURE;
}
std::string base = argv[2]; // base path
std::cout << "Number: " << std::endl;
int number = 0;
std::cin >> number;
std::vector<std::fs::path> paths;
for(auto &file : std::filesystem::directory_iterator(base)) {
paths.push_back(file.path());
}
// sort by name
std::ranges::sort(paths, [](const std::fs::path &a, const std::fs::path &b) {
return a.filename().string() < b.filename().string();
});
path = paths[number];
std::cout << "Executing: " << path << std::endl;
} else {
path = argv[1];
}

wolv::io::File patternFile(path, wolv::io::File::Mode::Read);

pl::PatternLanguage runtime;

auto result =
runtime.parseString(patternFile.readString(), wolv::util::toUTF8String(path));

return EXIT_SUCCESS;
}
4 changes: 4 additions & 0 deletions lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ add_library(libpl ${LIBRARY_TYPE}
source/pl/core/parser.cpp
source/pl/core/preprocessor.cpp
source/pl/core/validator.cpp
source/pl/core/parser_manager.cpp

source/pl/core/resolver.cpp
source/pl/core/error.cpp

source/pl/lib/std/pragmas.cpp
Expand All @@ -68,6 +70,8 @@ add_library(libpl ${LIBRARY_TYPE}
source/pl/lib/std/core.cpp
source/pl/lib/std/hash.cpp
source/pl/lib/std/random.cpp
source/pl/core/resolvers.cpp
source/pl/core/api.cpp
)

if(CMAKE_CXX_COMPILER_ID MATCHES "GNU|Clang")
Expand Down
34 changes: 33 additions & 1 deletion lib/include/pl/api.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#pragma once

#include <pl/core/token.hpp>
#include <pl/core/errors/result.hpp>
#include <pl/helpers/types.hpp>

#include <cmath>
Expand All @@ -13,7 +14,10 @@ namespace pl {

class PatternLanguage;

namespace core { class Evaluator; }
namespace core {
class Evaluator;
class Preprocessor;
}

}

Expand All @@ -24,6 +28,34 @@ namespace pl::api {
*/
using PragmaHandler = std::function<bool(PatternLanguage&, const std::string &)>;

using DirectiveHandler = std::function<void(core::Preprocessor*)>;

using Resolver = std::function<hlp::Result<Source*, std::string>(const std::string&)>;

struct Source {
std::string content;
std::string source;
u32 id = 0;

static u32 idCounter;

Source(std::string content, std::string source = DefaultSource) :
content(std::move(content)), source(std::move(source)), id(idCounter++) { }

Source() = default;

static constexpr auto DefaultSource = "<Source Code>";
static constexpr Source* NoSource = nullptr;
static Source Empty() {
return { "", "" };
}

constexpr auto operator<=>(const Source& other) const {
return this->id <=> other.id;
}

};

/**
* @brief A type representing a custom section
*/
Expand Down
Loading

0 comments on commit 188a808

Please sign in to comment.