Skip to content

Commit

Permalink
doc: add information about character set encoding compatibility
Browse files Browse the repository at this point in the history
Based on a suggestion by Михаил (@mkgrgis) in PR #28, but substantially
rewritten, also to take into account the changes in commit 96e1625.
  • Loading branch information
ibarwick committed Dec 28, 2022
1 parent b7ec886 commit 9dd98ff
Show file tree
Hide file tree
Showing 2 changed files with 104 additions and 5 deletions.
25 changes: 20 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,12 @@ Contents
5. [Functions](#functions)
6. [Identifier case handling](#identifier-case-handling)
7. [Generated columns](#generated-columns)
8. [Examples](#examples)
9. [Limitations](#limitations)
10. [TAP tests](#tap-tests)
11. [Development roadmap](#development-roadmap)
12. [Useful links](#useful-links)
8. [Character set handling](#character-set-handling)
9. [Examples](#examples)
10. [Limitations](#limitations)
11. [TAP tests](#tap-tests)
12. [Development roadmap](#development-roadmap)
13. [Useful links](#useful-links)

Features
--------
Expand Down Expand Up @@ -497,6 +498,20 @@ For more details on generated columns see:
- [Generated Columns](https://www.postgresql.org/docs/current/ddl-generated-columns.html)
- [CREATE FOREIGN TABLE](https://www.postgresql.org/docs/current/sql-createforeigntable.html)


Character set handling
----------------------

When `firebird_fdw` connects to a Firebird database, it will set the client
encoding to the PostgreSQL database's server encoding. As there is a broad
overlap between PostgreSQL and Firebird character set encodings, mostly
this will succeed, particularly with the more common encodings such as
`UTF8` and `LATIN1`. A small subset of PostgreSQL encodings for which Firebird
provides a corresponding encoding but no matching name or alias will be
rewritten transparently by `firebird_fdw`. For more details see the
file [PostgreSQL and Firebird character set encoding compatibility](doc/ENCODINGS.md).


Examples
--------

Expand Down
84 changes: 84 additions & 0 deletions doc/ENCODINGS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
PostgreSQL and Firebird character set encoding compatibility
============================================================

Character set mappings
----------------------

The following table provides an overview of available PostgreSQL server
character set encodings and the matching Firebird ones.

Encodings marked with `-` in the Firebird column are not available in Firebird.

| PostgreSQL | Firebird | Notes
|---------------|-----------|--------------------------------------------
| EUC_CN | - | Extended UNIX Code-CN
| EUC_JP | EUCJ_0208 | Compatibility likely but not tested
| EUC_JIS_2004 | - | Extended UNIX Code-JP, JIS X 0213
| EUC_KR | - | Extended UNIX Code-KR
| EUC_TW | - | Extended UNIX Code-TW
| ISO_8859_5 | ISO8859_5 |
| ISO_8859_6 | ISO8859_6 |
| ISO_8859_7 | ISO8859_7 |
| ISO_8859_8 | ISO8859_8 |
| KOI8R | KOI8R |
| KOI8U | KOI8U |
| LATIN1 | LATIN1 |
| LATIN2 | LATIN2 |
| LATIN3 | LATIN3 |
| LATIN4 | LATIN4 |
| LATIN5 | LATIN5 |
| LATIN6 | - | ISO 8859-10 / ECMA 144 "Nordic"
| LATIN7 | LATIN7 |
| LATIN8 | - | ISO 8859-14 "Celtic"
| LATIN9 | - | ISO 8859-15 "LATIN1 with Euro and accents"
| LATIN10 | - | ISO 8859-16 "Romanian"
| MULE_INTERNAL | - | "Multilingual Emacs"
| SQL_ASCII | NONE |
| UTF8 | UTF8 |
| WIN866 | DOS866 |
| WIN874 | - | Windows CP874 "Thai"
| WIN1250 | WIN1250 |
| WIN1251 | WIN1251 |
| WIN1252 | WIN1252 |
| WIN1253 | WIN1253 |
| WIN1254 | WIN1254 |
| WIN1255 | WIN1255 |
| WIN1256 | WIN1256 |
| WIN1257 | WIN1257 |
| WIN1258 | WIN1258 |

See also:

- https://www.postgresql.org/docs/current/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
- https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref40/firebird-40-language-reference.html#fblangref40-appx07-charsets
- https://firebirdsql.org/refdocs/langrefupd25-charsets.html
- https://firebirdsql.org/en/firebird-1-5-character-sets-collations/

Databases with the Firebird "NONE" character set
------------------------------------------------

The `NONE` character set is pretty much the equivalent of PostgreSQL's
`SQL_ASCII`, i.e. a pseudo-character set/encoding which enables the user to
store much any data they care to input without any kind of validation. This
means that it's perfectly possible to insert a mix of data in (for example)
`ISO-8859-1` and `UTF8` encoding.

This does however mean that Firebird can't know what encoding the data is
supposed to be in, so it can't convert the data to whatever encoding the client
is requesting. Conseqeuently, when `firebird_fdw` connects to a Firebird
database configured with the `NONE` character set, the PostgreSQL database's
server encoding has no meaning, and the raw data will be transmitted to
PostgreSQL.

If the data happens to be in the same encoding as the PostgreSQL databases's server
encoding, this is normally not an issue. However, if the data is in a different
encoding, it will need to be treated as a stream of `bytea` values which need to
be explictly converted using e.g. PostgreSQL's `convert_from()` function, e.g.:

SELECT convert_from(some_column_name, 'LATIN1')
FROM firebird_table

where `firebird_table` is a foreign table in a PostgreSQL database with `UTF8`
server encoding which references a table in a Firebird database with `NONE`
pseudo-encoding containing data in `LATIN1` encoding.

0 comments on commit 9dd98ff

Please sign in to comment.