diff --git a/encoding/README.md b/encoding/README.md
new file mode 100644
index 0000000..6b4b186
--- /dev/null
+++ b/encoding/README.md
@@ -0,0 +1,42 @@
+Encoding Tests
+Each file containing encoding tests has any number of tests separated by
+two newlines (LF) and a single newline before the end of the file:
+ LF
+ LF
+...where [TEST] is the format documented below.
+Encoding test format
+Each test must begin with a string "\#data", followed by a newline (LF).
+All subsequent lines until a line that says "\#encoding" are the test data
+and must be passed to the system being tested unchanged, except with the
+final newline (on the last line) removed.
+Then there must be a line that says "\#encoding", followed by a newline
+(LF), followed by string indicating an encoding name, followed by a newline
+(LF). The encoding name indicated is the expected character encoding for
+the output with the given test data as input.
+For the tests in the `preparsed` subdirectory, the encoding name indicated
+is the expected result of running the *encoding sniffing algorithm* at
+https://html.spec.whatwg.org/#encoding-sniffing-algorithm with the given
+test data as input; this is, it's the expected result of running *only* the
+*encoding sniffing algorithm* — without also running the tokenization state
+machine and tree-construction stage defined in the spec — and specifically,
+for running the *prescan the byte stream to determine its encoding*
+algorithm on only the first 1024 bytes of the test data.
+For all tests outside the subdirectory named `preparsed`, the encoding name
+indicated is instead the expected character encoding for the output after
+fully parsing the given test data; that is, it's the expected character
+encoding for the output after running the tokenization state machine and
+tree-construction stage.
diff --git a/encoding/preparsed/tests1.dat b/encoding/preparsed/tests1.dat
new file mode 100644
index 0000000..2dd4801
--- /dev/null
+++ b/encoding/preparsed/tests1.dat
@@ -0,0 +1,51 @@