Add CONTRIBUTING-development.md

citusdata · Nov 15, 2023 · 837d20c · 837d20c
1 parent a960799
commit 837d20c
Show file tree

Hide file tree

Showing 2 changed files with 142 additions and 2 deletions.
diff --git a/CONTRIBUTING-development.md b/CONTRIBUTING-development.md
@@ -0,0 +1,134 @@
+# Coding style
+
+* We almost always use **CamelCase**, when naming functions, variables etc., **not snake_case**.
+
+* We **start functions with a comment**:
+
+  ```c
+  /*
+   * MyNiceFunction <something in present simple tense, e.g., processes / returns / checks / takes X as input / does Y> ..
+   * <some more nice words> ..
+   * <some more nice words> ..
+   */
+  <static?> <return type>
+  MyNiceFunction(..)
+  {
+    ..
+    ..
+  }
+  ```
+
+* `#includes` needs to be sorted based on below ordering and then alphabetically  and we should not include what we don't need in a file:
+
+  * System includes (eg. #include<...>)
+  * Postgres.h (eg. #include "postgres.h")
+  * Toplevel imports from postgres, not contained in a directory (eg. #include "miscadmin.h")
+  * General postgres includes (eg . #include "nodes/...")
+  * Toplevel citus includes, not contained in a directory (eg. #include "citus_verion.h")
+  * Columnar includes (eg. #include "columnar/...")
+  * Distributed includes (eg. #include "distributed/...")
+
+* Comments:
+  ```c
+  /* single line comments start with a lower-case */
+
+  /*
+   * We start multi-line comments with a capital letter
+   * and keep adding a star to the beginning of each line
+   * until we close the comment with a star and a slash.
+   */
+  ```
+
+* Order of function implementations and their declarations in a file:
+
+  We define static functions after the functions that call them. For example:
+
+  ```c
+  #include<..>
+  #include<..>
+  ..
+  ..
+  typedef struct
+  {
+    ..
+    ..
+  } MyNiceStruct;
+  ..
+  ..
+  PG_FUNCTION_INFO_V1(my_nice_udf1);
+  PG_FUNCTION_INFO_V1(my_nice_udf2);
+  ..
+  ..
+  // ..  somewhere on top of the file …
+  static void MyNiceStaticlyDeclaredFunction1(…);
+  static void MyNiceStaticlyDeclaredFunction2(…);
+  ..
+  ..
+
+
+  void
+  MyNiceFunctionExternedViaHeaderFile(..)
+  {
+    ..
+    ..
+    MyNiceStaticlyDeclaredFunction1(..);
+    ..
+    ..
+    MyNiceStaticlyDeclaredFunction2(..);
+    ..
+  }
+
+  ..
+  ..
+
+  // we define this first because it's called by MyNiceFunctionExternedViaHeaderFile()
+  // before MyNiceStaticlyDeclaredFunction2()
+  static void
+  MyNiceStaticlyDeclaredFunction1(…)
+  {
+  }
+  ..
+  ..
+
+  // then we define this
+  static void
+  MyNiceStaticlyDeclaredFunction2(…)
+  {
+  }
+  ```
+
+# Making a pull request ready for reviews
+
+Asking for help and asking for reviews are two different things. When you're asking for help, you're asking for someone to help you with something that you're not expected to know.
+
+But when you're asking for a review, you're asking for someone to review your work and provide feedback. So, when you're asking for a review, you're expected to make sure that:
+
+* Your changes don't perform **unnecessary line addition / deletions / style changes on unrelated files / lines**.
+
+* All CI jobs are **passing**, including **style checks** and **flaky test detection jobs**.
+
+* Your PR has necessary amount of **tests** and that they're passing.
+
+* You separated as much as possible work into **separate PRs**, e.g., a prerequisite bugfix, a refactoring etc..
+
+* Your PR doesn't introduce a typo or something that you can easily fix yourself.
+
+* After all CI jobs pass, code-coverage measurement job (CodeCov as of today) then kicks in. That's why it's important to make the **tests passing** first. At that point, you're expected to check **CodeCov annotations** that can be seen in the **Files Changed** tab and expected to make sure that it doesn't complain about any lines that are not covered. For example, it's ok if CodeCov complains about an `ereport()` call that you put for an "unexpected-but-better-than-crashing" case, but it's not ok if it complains about an uncovered `if` branch that you added.
+
+* And finally, perform a **self-review** to make sure that:
+  * Code and code-comments reflects the idea **without requiring an extra explanation** via a chat message / email / PR comment.
+    This is important because we don't expect developers to reach out to author / read about the whole discussion in the PR to understand the idea behind a commit merged into `main` branch.
+  * PR description is clear enough.
+  * If-and-only-if you're **introducing a user facing change / bugfix**, your PR has a line that starts with `DESCRIPTION: <Present simple tense word that starts with a capital letter, e.g., Adds support for / Fixes / Disallows>`.
+  * **Commit messages** are clear enough if the commits are doing logically different things.
+
+# Regression test best practices
+
+* Instead of connecting to different nodes to check catalog tables, should use `run_command_on_all_nodes()` because it's faster than keep disconnecting / connecting to different nodes.
+
+* Tests should **define functions** for repetitive actions, e.g., by wrapping usual queries used to check catalog tables.
+  If the function is presumed to be used by other tests in future, then the function needs to defined in `multi_test_helpers.sql`.
+
+* If you're adding a new file, consider using `src/test/regress/bin/create_test.py` to create the file. Or if you want to manually create it, make sure that your test file creates a schema and that it drops the schema at the end of the test to make sure that it doesn't leak any objects behind. See which lines `src/test/regress/bin/create_test.py` adds to the test file to understand what you need to do.
+
+  For the object that are not bound to a schema, make sure to drop them at the end of the test too, such as databases and roles.
diff --git a/src/backend/distributed/README.md b/src/backend/distributed/README.md
@@ -1749,8 +1749,6 @@ The reason for handling dependencies and deparsing in post-process step is that
 
 Not all table DDL is currently deparsed. In that case, the original command sent by the client is used. That is a shortcoming in our DDL logic that causes user-facing issues and should be addressed. We do not directly construct a separate DDL command for each shard. Instead, we call the `worker_apply_shard_ddl_command(shardid bigint, ddl_command text)` function which parses the DDL command, replaces the table names with shard names in the parse tree according to the shard ID, and then executes the command. That also has some shortcomings, because we cannot support more complex DDL commands in this manner (e.g. adding multiple foreign keys). Ideally, all DDL would be deparsed, and for table DDL the deparsed query string would have shard names, similar to regular queries.
 
-`markDistributed` is used to indicate whether we add a record to `pg_dist_object` to mark the object as "distributed".
-
 ## Defining a new DDL command
 
 All commands that are propagated by Citus should be defined in DistributeObjectOps struct. Below is a sample DistributeObjectOps for ALTER DATABASE command that is defined in [distribute_object_ops.c](commands/distribute_object_ops.c) file.
@@ -1810,6 +1808,14 @@ GetDistributeObjectOps(Node *node)
 ...
 ```
 
+Finally, when adding support for propagation of a new DDL command, you also need to make sure that:
+* Use `quote_identifier()` or `quote_literal_cstr()` for the fields that might need escaping some characters or bare quotes when deparsing a DDL command.
+* The code is tolerant to nullable fields within given `Stmt *` object, i.e., the ones that Postgres allows not specifying at all.
+* You register the object into `pg_dist_object` if it's a CREATE command and you delete the object from `pg_dist_object` if it's a DROP command.
+* Node activation (e.g., `citus_add_node()`) properly propagates the object and its dependencies to new nodes.
+* Add tests cases for all the scenarios noted above.
+* Add test cases for different options that can be specified for the settings. For example, `CREATE DATABASE .. IS_TEMPLATE = TRUE` and `CREATE DATABASE .. IS_TEMPLATE = FALSE` should be tested separately.
+
 ## Object & dependency propagation
 
 These two topics are closely related, so we'll discuss them together. You can start the topic by reading [Nils' blog](https://www.citusdata.com/blog/2020/06/25/using-custom-types-with-citus-and-postgres/) on the topic.