Skip to content

Commit

Permalink
Merge 'Add affinity() function to Column' from Jussi Saurio
Browse files Browse the repository at this point in the history
This is a first step to supporting [sqlite column
affinity](https://www.sqlite.org/datatype3.html) properly in limbo --
just adds an `affinity()` function to `Column` that can then be used
elsewhere

Closes #1030
  • Loading branch information
penberg committed Feb 18, 2025
2 parents 28a85ed + 89e48a1 commit 9801051
Show file tree
Hide file tree
Showing 4 changed files with 105 additions and 45 deletions.
102 changes: 101 additions & 1 deletion core/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ impl PseudoTable {
self.columns.push(Column {
name: Some(normalize_ident(name)),
ty,
ty_str: ty.to_string(),
ty_str: ty.to_string().to_uppercase(),
primary_key,
is_rowid_alias: false,
notnull: false,
Expand Down Expand Up @@ -386,6 +386,52 @@ pub struct Column {
pub default: Option<Expr>,
}

impl Column {
pub fn affinity(&self) -> Affinity {
affinity(&self.ty_str)
}
}

/// 3.1. Determination Of Column Affinity
/// For tables not declared as STRICT, the affinity of a column is determined by the declared type of the column, according to the following rules in the order shown:
///
/// If the declared type contains the string "INT" then it is assigned INTEGER affinity.
///
/// If the declared type of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.
///
/// If the declared type for a column contains the string "BLOB" or if no type is specified then the column has affinity BLOB.
///
/// If the declared type for a column contains any of the strings "REAL", "FLOA", or "DOUB" then the column has REAL affinity.
///
/// Otherwise, the affinity is NUMERIC.
///
/// Note that the order of the rules for determining column affinity is important. A column whose declared type is "CHARINT" will match both rules 1 and 2 but the first rule takes precedence and so the column affinity will be INTEGER.
pub fn affinity(datatype: &str) -> Affinity {
// Note: callers of this function must ensure that the datatype is uppercase.
// Rule 1: INT -> INTEGER affinity
if datatype.contains("INT") {
return Affinity::Integer;
}

// Rule 2: CHAR/CLOB/TEXT -> TEXT affinity
if datatype.contains("CHAR") || datatype.contains("CLOB") || datatype.contains("TEXT") {
return Affinity::Text;
}

// Rule 3: BLOB or empty -> BLOB affinity (historically called NONE)
if datatype.contains("BLOB") || datatype.is_empty() {
return Affinity::Blob;
}

// Rule 4: REAL/FLOA/DOUB -> REAL affinity
if datatype.contains("REAL") || datatype.contains("FLOA") || datatype.contains("DOUB") {
return Affinity::Real;
}

// Rule 5: Otherwise -> NUMERIC affinity
Affinity::Numeric
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum Type {
Null,
Expand All @@ -396,6 +442,60 @@ pub enum Type {
Blob,
}

/// Each column in an SQLite 3 database is assigned one of the following type affinities:
///
/// TEXT
/// NUMERIC
/// INTEGER
/// REAL
/// BLOB
/// (Historical note: The "BLOB" type affinity used to be called "NONE". But that term was easy to confuse with "no affinity" and so it was renamed.)
///
/// A column with TEXT affinity stores all data using storage classes NULL, TEXT or BLOB. If numerical data is inserted into a column with TEXT affinity it is converted into text form before being stored.
///
/// A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively. If the TEXT value is a well-formed integer literal that is too large to fit in a 64-bit signed integer, it is converted to REAL. For conversions between TEXT and REAL storage classes, only the first 15 significant decimal digits of the number are preserved. If the TEXT value is not a well-formed integer or real literal, then the value is stored as TEXT. For the purposes of this paragraph, hexadecimal integer literals are not considered well-formed and are stored as TEXT. (This is done for historical compatibility with versions of SQLite prior to version 3.8.6 2014-08-15 where hexadecimal integer literals were first introduced into SQLite.) If a floating point value that can be represented exactly as an integer is inserted into a column with NUMERIC affinity, the value is converted into an integer. No attempt is made to convert NULL or BLOB values.
///
/// A string might look like a floating-point literal with a decimal point and/or exponent notation but as long as the value can be expressed as an integer, the NUMERIC affinity will convert it into an integer. Hence, the string '3.0e+5' is stored in a column with NUMERIC affinity as the integer 300000, not as the floating point value 300000.0.
///
/// A column that uses INTEGER affinity behaves the same as a column with NUMERIC affinity. The difference between INTEGER and NUMERIC affinity is only evident in a CAST expression: The expression "CAST(4.0 AS INT)" returns an integer 4, whereas "CAST(4.0 AS NUMERIC)" leaves the value as a floating-point 4.0.
///
/// A column with REAL affinity behaves like a column with NUMERIC affinity except that it forces integer values into floating point representation. (As an internal optimization, small floating point values with no fractional component and stored in columns with REAL affinity are written to disk as integers in order to take up less space and are automatically converted back into floating point as the value is read out. This optimization is completely invisible at the SQL level and can only be detected by examining the raw bits of the database file.)
///
/// A column with affinity BLOB does not prefer one storage class over another and no attempt is made to coerce data from one storage class into another.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum Affinity {
Integer,
Text,
Blob,
Real,
Numeric,
}

pub const SQLITE_AFF_TEXT: char = 'a';
pub const SQLITE_AFF_NONE: char = 'b'; // Historically called NONE, but it's the same as BLOB
pub const SQLITE_AFF_NUMERIC: char = 'c';
pub const SQLITE_AFF_INTEGER: char = 'd';
pub const SQLITE_AFF_REAL: char = 'e';

impl Affinity {
/// This is meant to be used in opcodes like Eq, which state:
///
/// "The SQLITE_AFF_MASK portion of P5 must be an affinity character - SQLITE_AFF_TEXT, SQLITE_AFF_INTEGER, and so forth.
/// An attempt is made to coerce both inputs according to this affinity before the comparison is made.
/// If the SQLITE_AFF_MASK is 0x00, then numeric affinity is used.
/// Note that the affinity conversions are stored back into the input registers P1 and P3.
/// So this opcode can cause persistent changes to registers P1 and P3.""
pub fn aff_mask(&self) -> char {
match self {
Affinity::Integer => SQLITE_AFF_INTEGER,
Affinity::Text => SQLITE_AFF_TEXT,
Affinity::Blob => SQLITE_AFF_NONE,
Affinity::Real => SQLITE_AFF_REAL,
Affinity::Numeric => SQLITE_AFF_NUMERIC,
}
}
}

impl fmt::Display for Type {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let s = match self {
Expand Down
2 changes: 1 addition & 1 deletion core/translate/group_by.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ pub fn emit_group_by<'a>(
name: None,
primary_key: false,
ty,
ty_str: ty.to_string(),
ty_str: ty.to_string().to_uppercase(),
is_rowid_alias: false,
notnull: false,
default: None,
Expand Down
4 changes: 2 additions & 2 deletions core/translate/order_by.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ pub fn emit_order_by(
name: None,
primary_key: false,
ty,
ty_str: ty.to_string(),
ty_str: ty.to_string().to_uppercase(),
is_rowid_alias: false,
notnull: false,
default: None,
Expand All @@ -91,7 +91,7 @@ pub fn emit_order_by(
name: None,
primary_key: false,
ty,
ty_str: ty.to_string(),
ty_str: ty.to_string().to_uppercase(),
is_rowid_alias: false,
notnull: false,
default: None,
Expand Down
42 changes: 1 addition & 41 deletions core/vdbe/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ use crate::functions::printf::exec_printf;
use crate::info;
use crate::pseudo::PseudoCursor;
use crate::result::LimboResult;
use crate::schema::{affinity, Affinity};
use crate::storage::sqlite3_ondisk::DatabaseHeader;
use crate::storage::wal::CheckpointResult;
use crate::storage::{btree::BTreeCursor, pager::Pager};
Expand Down Expand Up @@ -3620,47 +3621,6 @@ fn exec_replace(source: &OwnedValue, pattern: &OwnedValue, replacement: &OwnedVa
}
}

enum Affinity {
Integer,
Text,
Blob,
Real,
Numeric,
}

/// For tables not declared as STRICT, the affinity of a column is determined by the declared type of the column, according to the following rules in the order shown:
/// If the declared type contains the string "INT" then it is assigned INTEGER affinity.
/// If the declared type of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.
/// If the declared type for a column contains the string "BLOB" or if no type is specified then the column has affinity BLOB.
/// If the declared type for a column contains any of the strings "REAL", "FLOA", or "DOUB" then the column has REAL affinity.
/// Otherwise, the affinity is NUMERIC.
/// Note that the order of the rules for determining column affinity is important. A column whose declared type is "CHARINT" will match both rules 1 and 2 but the first rule takes precedence and so the column affinity will be INTEGER.
fn affinity(datatype: &str) -> Affinity {
// Note: callers of this function must ensure that the datatype is uppercase.
// Rule 1: INT -> INTEGER affinity
if datatype.contains("INT") {
return Affinity::Integer;
}

// Rule 2: CHAR/CLOB/TEXT -> TEXT affinity
if datatype.contains("CHAR") || datatype.contains("CLOB") || datatype.contains("TEXT") {
return Affinity::Text;
}

// Rule 3: BLOB or empty -> BLOB affinity (historically called NONE)
if datatype.contains("BLOB") || datatype.is_empty() {
return Affinity::Blob;
}

// Rule 4: REAL/FLOA/DOUB -> REAL affinity
if datatype.contains("REAL") || datatype.contains("FLOA") || datatype.contains("DOUB") {
return Affinity::Real;
}

// Rule 5: Otherwise -> NUMERIC affinity
Affinity::Numeric
}

/// When casting a TEXT value to INTEGER, the longest possible prefix of the value that can be interpreted as an integer number
/// is extracted from the TEXT value and the remainder ignored. Any leading spaces in the TEXT value when converting from TEXT to INTEGER are ignored.
/// If there is no prefix that can be interpreted as an integer number, the result of the conversion is 0.
Expand Down

0 comments on commit 9801051

Please sign in to comment.