Charset fixes

boostorg · Nov 14, 2024 · aac17bb · aac17bb
1 parent d3a2981
commit aac17bb
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 21 deletions.
diff --git a/doc/qbk/17_charsets.qbk b/doc/qbk/17_charsets.qbk
@@ -26,14 +26,16 @@ The [*connection's character set determines the encoding for character strings
 sent to and retrieved from the server].
 This includes SQL query strings, string fields and column names in metadata.
 The connection's collation is used for string literal comparison.
+The connection's character set and collation can be changed dynamically
+using SQL.
 
 By default, Boost.MySQL connections use `utf8mb4_general_ci`,
-thus [*using UTF-8 for all strings]. We recommend using the default,
-since MySQL character sets are easy to get wrong.
+thus [*using UTF-8 for all strings]. We recommend using this default,
+as MySQL character sets are easy to get wrong.
 
 The connection's character set is not linked to the character set
-specified for databases, tables and columns. For example,
-with the following declaration:
+specified for databases, tables and columns.
+Consider the following declaration:
 
 ```
 CREATE TABLE test_table(
@@ -62,16 +64,16 @@ of what's affected:
 
 * SQL query strings passed to [refmemunq any_connection async_execute] and
   [refmemunq any_connection async_prepare_statement] must be sent using
-  the connection's charset. Otherwise, syntax errors may happen.
+  the connection's character set. Otherwise, server-side parsing errors may happen.
 * SQL templates and string values passed to [reflink with_params]
-  and [reflink format_sql] must be encoded using the connection's charset.
+  and [reflink format_sql] must be encoded using the connection's character set.
   Otherwise, values will be rejected by Boost.MySQL when composing the query.
   Connections [link mysql.charsets.tracking track the character set in use] to detect these errors.
   If you bypass character set tracking (e.g. by using `SET NAMES` instead of
-  [refmemunq async_set_character_set]), you may run into vulnerabilities.
-* Statement string parameters passed to [refmem statement bind] should use the connection's charset.
+  [refmemunq any_connection async_set_character_set]), you may run into vulnerabilities.
+* Statement string parameters passed to [refmem statement bind] should use the connection's character set.
   Otherwise, MySQL may reject the values.
-* String values in rows and metadata retrieved from the server use the connection's charset.
+* String values in rows and metadata retrieved from the server use the connection's character set.
 * Server-supplied diagnostic messages ([refmem diagnostics server_message]) also
   use the connection's character set.
 
@@ -92,10 +94,10 @@ stick to the following advice:
   If you need to use a different encoding in your application, convert your data to/from UTF-8
   when interacting with the server. The default [reflink connect_params] ensure that UTF-8 is
   used, without the need to run any SQL.
-* [*Don't execute SET NAMES] statements or the `character_set_client` and 
+* [*Don't execute SET NAMES] statements or change the `character_set_client` and 
   `character_set_results` session variables using `async_execute`.
   This breaks character set tracking, which can lead to vulnerabilities.
-* Don't use [refmemunq async_reset_connection] unless you know what you're doing.
+* Don't use [refmemunq any_connection async_reset_connection] unless you know what you're doing.
   If you need to reuse connections, use [reflink connection_pool], instead.
 * Connections obtained from a [reflink connection_pool] always use `utf8mb4`.
   When connections are returned to the pool, their character set is reset to `utf8mb4`.
@@ -113,7 +115,6 @@ There is a number of actions that can change the connection's character set:
   The [include_file boost/mysql/mysql_collations.hpp] and 
   [include_file boost/mysql/mariadb_collations.hpp] headers contain
   available collation IDs.
-
   If the server recognizes the passed collation, the connection's character set
   will be the one associated to the collation. If it doesn't, the connection
   [*will silently fall back to the server's default character set] (usually `latin1`, which is not Unicode).
@@ -159,11 +160,11 @@ Following the above points, this is how tracking works:
   sets the current character set to the passed one.
   The same applies for a successful set character set pipeline stage.
 * Calling [refmemunq any_connection async_reset_connection]
-  makes the current character set to unknown.
+  makes the current character set unknown.
 
 [warning
     [*Do not execute `SET NAMES`], `SET CHARACTER SET` or any other SQL statement
-    that modifies `character_set_client` using `execute`. This will make character set
+    that modifies `character_set_client` using `async_execute`. This will make character set
     information stored in the client invalid.
 ]
 
@@ -206,23 +207,21 @@ for a full implementation.
 Setting the connection's character set during connection establishment
 or using [refmemunq any_connection async_set_character_set] has the ultimate
 effect of changing some session variables. This section lists them as
-a reference. We [*strongly encourage not modifying them manually],
+a reference. We [*strongly encourage you not to modify them manually],
 as this will confuse character set tracking.
 
 * [mysqllink server-system-variables.html#sysvar_character_set_client character_set_client]
   determines the encoding that SQL statements sent to the server should have. This includes
   the SQL strings passed to [refmemunq any_connection async_execute] and
   [refmemunq any_connection async_prepare_statement], and
   string parameters passed to [refmem statement bind].
-
-  Not all character sets are permissible in `character_set_client`. The server will accept setting
-  this variable to any UTF-8 character set, but won't accept UTF-16.
+  Not all character sets are permissible in `character_set_client`.
+  For example, UTF-16 and UTF-32 based character sets won't be accepted.
 * [mysqllink server-system-variables.html#sysvar_character_set_results character_set_results]
   determines the encoding that the server will use to send any kind of result, including
   string fields retrieved by [refmem connection execute], metadata
   like [refmem metadata column_name] and error messages.
-
-  Note that [refmem metadata column_collation] reflects the charset and collation the server
+  Note that [refmem metadata column_collation] reflects the character set and collation the server
   has converted the column to before sending it to the client. In the above example, `metadata::column_collation`
   will be the default collation for UTF16, rather than `latin1_swedish_ci`.
 

diff --git a/test/integration/test/snippets/charsets.cpp b/test/integration/test/snippets/charsets.cpp
@@ -13,6 +13,8 @@
 #include <cassert>
 #include <cstddef>
 
+namespace mysql = boost::mysql;
+
 namespace {
 
 //[charsets_next_char
@@ -67,7 +69,7 @@ BOOST_AUTO_TEST_CASE(section_charsets)
 {
     {
         // Verify that utf8mb4_next_char can be used in a character_set
-        boost::mysql::character_set charset{"utf8mb4", utf8mb4_next_char};
+        mysql::character_set charset{"utf8mb4", utf8mb4_next_char};
 
         // It works for valid input
         unsigned char buff_valid[] = {0xc3, 0xb1, 0x50};