From 6609569e81396afa6e5257bb95f41f39c00202c9 Mon Sep 17 00:00:00 2001 From: toni-calvin Date: Wed, 26 Jul 2023 19:53:47 +0200 Subject: [PATCH 1/5] First draft of parser documentation --- README.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ff66c75..bf4d1a3 100644 --- a/README.md +++ b/README.md @@ -125,7 +125,30 @@ TODO: Short explanation of Felts and the Cairo/Stark field we use through Lambda ### Program parsing -Go through the main parts of a compiled program `Json` file. `data` field with instructions, identifiers, program entrypoint, etc. +The input of the Virtual Machine is a compiled Cairo program in Json format. +The main fields in the file are listed below: +- data: List of hexadecimal values that represent the instructions defined in the cairo program. +- debug_info: Useful information about how instructions are located inside the source code. For each instruction is defined: + The scopes that can acces to it. + The memory segment they are located. + The offset of the memory segment. + The instruction variables and values used. + Which `hints` have been used. + Information about the position of the instruction within the file and also of its parent instruction. +- hints: All the hints used in the program. +- identifiers: Identifiers of the functions of the cairo program. Each entry represents a block of code. For example for a concrete function we have the following blocks: + Starting identifier + Ending identifier + Arguments identifier + Body identifier + Return identifier + ... +Each identifier is represented by the type of code block ['function', 'struct', 'label', 'reference', ...], the pc register value used to access that block and other useful information. Here we can find the entrypoint of the execution to the program to create the execution trace. +- main_scope: Self explanatory, usually something like __main__. +- prime: A prime number in hexadecimal format. +- reference_manager: Information about the different references among functions of the cairo program. Here you can also find the memory position (segment and offset), pc integer value and the translated operation [memory_position, felt]. + +In this project, we use a C++ library called [simdjson](https://github.com/simdjson/simdjson), the json is stored in a custom structure from which the vm can create the trace of the compiled program. ### Code walkthrough/Write your own Cairo VM From 918d3a21d235b79902953b03a723ba154bc8c0b0 Mon Sep 17 00:00:00 2001 From: toni-calvin Date: Thu, 27 Jul 2023 17:48:18 +0200 Subject: [PATCH 2/5] Documenting parser --- README.md | 44 ++++++++++++++++++++------------------------ 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/README.md b/README.md index bf4d1a3..81fa5cf 100644 --- a/README.md +++ b/README.md @@ -125,30 +125,26 @@ TODO: Short explanation of Felts and the Cairo/Stark field we use through Lambda ### Program parsing -The input of the Virtual Machine is a compiled Cairo program in Json format. -The main fields in the file are listed below: -- data: List of hexadecimal values that represent the instructions defined in the cairo program. -- debug_info: Useful information about how instructions are located inside the source code. For each instruction is defined: - The scopes that can acces to it. - The memory segment they are located. - The offset of the memory segment. - The instruction variables and values used. - Which `hints` have been used. - Information about the position of the instruction within the file and also of its parent instruction. -- hints: All the hints used in the program. -- identifiers: Identifiers of the functions of the cairo program. Each entry represents a block of code. For example for a concrete function we have the following blocks: - Starting identifier - Ending identifier - Arguments identifier - Body identifier - Return identifier - ... -Each identifier is represented by the type of code block ['function', 'struct', 'label', 'reference', ...], the pc register value used to access that block and other useful information. Here we can find the entrypoint of the execution to the program to create the execution trace. -- main_scope: Self explanatory, usually something like __main__. -- prime: A prime number in hexadecimal format. -- reference_manager: Information about the different references among functions of the cairo program. Here you can also find the memory position (segment and offset), pc integer value and the translated operation [memory_position, felt]. - -In this project, we use a C++ library called [simdjson](https://github.com/simdjson/simdjson), the json is stored in a custom structure from which the vm can create the trace of the compiled program. +The input of the Virtual Machine is a compiled Cairo program in Json format. The main part of the file are listed below: + +- data: List of hexadecimal values that represent the instructions and immediate values defined in the cairo program. Each hexadecimal value is stored as a maybe_relocatable element in memory, but they can only be felts because the decoder has to be able to get the instruction fields in its bit representation. + +- debug_info: This field provides information about the instructions defined in the data list. Each one is identified with its index inside the data list. For each one is defined which scopes can access to that instruction, the memory allocation {segment, offset_segment} of the instruction; represented by the ap register, and other information about the instruction location inside the cairo program as the number line or column line. Other information is provided as which hints have been used in that instruction if any. + +- hints: All the hints used in the program, ordered by the pc offset at which they should be executed. + +- identifiers: User-defined symbols in the Cairo code representing variables, functions, classes, etc. with unique names. For each one is provided the expected pc offset, the type of identifier and other information depending on this type. + + For example for the identifier representing the main function (usually the entrypoint of the program), has the pc offset, "function" as a type and a list of decorators wrappers if any. + Another example is a user defined struct, it provides "struct" as a type, its size, the members it contains (with its information) and more. + +- main_scope: Usually something like __main__. All the identifiers associated with main function will be identified as __main__.identifier_name. Useful to identify the entrypoint of the program. + +- prime: The cairo prime in hexadecimal format. As explained above, all arithmetic operations are done over a base field, modulo this primer number. + +- reference_manager: Contains information about cairo variables. This information is useful to access to variables when executing cairo hints. + +In this project, we use a C++ library called [simdjson](https://github.com/simdjson/simdjson), the json is stored in a custom structure which the vm can use to run the program and create a trace of its execution. ### Code walkthrough/Write your own Cairo VM From add210c5ddb34741ea44866589c7096849a37442 Mon Sep 17 00:00:00 2001 From: toni-calvin Date: Fri, 28 Jul 2023 09:57:25 +0200 Subject: [PATCH 3/5] Correcting and simplifying debug_info field. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bc891e7..271ed92 100644 --- a/README.md +++ b/README.md @@ -152,7 +152,7 @@ The input of the Virtual Machine is a compiled Cairo program in Json format. The - data: List of hexadecimal values that represent the instructions and immediate values defined in the cairo program. Each hexadecimal value is stored as a maybe_relocatable element in memory, but they can only be felts because the decoder has to be able to get the instruction fields in its bit representation. -- debug_info: This field provides information about the instructions defined in the data list. Each one is identified with its index inside the data list. For each one is defined which scopes can access to that instruction, the memory allocation {segment, offset_segment} of the instruction; represented by the ap register, and other information about the instruction location inside the cairo program as the number line or column line. Other information is provided as which hints have been used in that instruction if any. +- debug_info: This field provides information about the isnstructions defined in the data list. Each one is identified with its index inside the data list. For each one it contains information about the cairo variables in scope, the hints executed before that instruction if any, and its location inside the cairo program. - hints: All the hints used in the program, ordered by the pc offset at which they should be executed. From 8203a55f1bd809944b49c8d2c3a0699312627466 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antonio=20Calv=C3=ADn=20Garc=C3=ADa?= Date: Fri, 28 Jul 2023 16:58:35 +0200 Subject: [PATCH 4/5] Update README.md Co-authored-by: fmoletta <99273364+fmoletta@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6cf1041..409fc2c 100644 --- a/README.md +++ b/README.md @@ -217,7 +217,7 @@ The input of the Virtual Machine is a compiled Cairo program in Json format. The - data: List of hexadecimal values that represent the instructions and immediate values defined in the cairo program. Each hexadecimal value is stored as a maybe_relocatable element in memory, but they can only be felts because the decoder has to be able to get the instruction fields in its bit representation. -- debug_info: This field provides information about the isnstructions defined in the data list. Each one is identified with its index inside the data list. For each one it contains information about the cairo variables in scope, the hints executed before that instruction if any, and its location inside the cairo program. +- debug_info: This field provides information about the instructions defined in the data list. Each one is identified with its index inside the data list. For each one it contains information about the cairo variables in scope, the hints executed before that instruction if any, and its location inside the cairo program. - hints: All the hints used in the program, ordered by the pc offset at which they should be executed. From 37a86898e48fd55d8e4bed963fcb08df7a8d4c4f Mon Sep 17 00:00:00 2001 From: toni-calvin Date: Tue, 1 Aug 2023 15:14:00 +0200 Subject: [PATCH 5/5] Fixing comments --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 17b7258..c41dfb1 100644 --- a/README.md +++ b/README.md @@ -230,10 +230,10 @@ The input of the Virtual Machine is a compiled Cairo program in Json format. The - hints: All the hints used in the program, ordered by the pc offset at which they should be executed. -- identifiers: User-defined symbols in the Cairo code representing variables, functions, classes, etc. with unique names. For each one is provided the expected pc offset, the type of identifier and other information depending on this type. +- identifiers: User-defined symbols in the Cairo code representing variables, functions, classes, etc. with unique names. The expected offset, type and its corresponding information is provided for each identifier - For example for the identifier representing the main function (usually the entrypoint of the program), has the pc offset, "function" as a type and a list of decorators wrappers if any. - Another example is a user defined struct, it provides "struct" as a type, its size, the members it contains (with its information) and more. + For example, the identifier representing the main function (usually the entrypoint of the program) is of `function` type, and a list of decorators wrappers (if there are any) are provided as additional information. + Another example is a user defined struct, is of `struct` type, it provides its size, the members it contains (with its information) and more. - main_scope: Usually something like __main__. All the identifiers associated with main function will be identified as __main__.identifier_name. Useful to identify the entrypoint of the program.