-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
90161ff
commit 976a2c8
Showing
32 changed files
with
59,698 additions
and
529 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,69 @@ | ||
# dhammapada | ||
Text of the Dhammapadi (Pali language) with Latin translation | ||
# Dhammapada latine | ||
|
||
[![SWH](https://archive.softwareheritage.org/badge/origin/https://github.com/ETCBC/bhsa/)](https://archive.softwareheritage.org/browse/origin/https://github.com/ETCBC/bhsa/) | ||
[![DOI](https://zenodo.org/badge/104559294.svg)](https://zenodo.org/badge/latestdoi/104559294) | ||
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) | ||
|
||
[![etcbc](programs/images/etcbc.png)](http://www.etcbc.nl) | ||
![logo](programs/images/logo.png) | ||
[![dans](programs/images/dans.png)](https://dans.knaw.nl/en) | ||
[![tf](programs/images/tf-small.png)](https://annotation.github.io/text-fabric/tf) | ||
|
||
|
||
## About | ||
|
||
This is the | ||
[text-fabric](https://github.com/Dans-labs/text-fabric/wiki) | ||
representation of the Dhammapada in the edition with Latin translation by V. Fausböll, 1900. | ||
|
||
See [about](docs/about.md) for more information about this textual source. | ||
|
||
The conversion to Text-Fabric is joint work of | ||
|
||
* [prof. dr. Bee Scherer](https://research.vu.nl/en/persons/bee-scherer), | ||
Text and Traditions, | ||
VU-University Amsterdam; | ||
* [prof. dr. Willem van Peursen](https://research.vu.nl/en/persons/willem-van-peursen), | ||
[ETCBC](http://www.etcbc.nl), | ||
VU-University Amsterdam; | ||
* Yvonne Mataar, | ||
transcription and correction | ||
* [dr. Dirk Roorda](https://pure.knaw.nl/portal/en/persons/dirk-roorda), | ||
[DANS](https://www.dans.knaw.nl), | ||
conversion to the Text-Fabric-sphere. | ||
|
||
There is more information on the | ||
[transcription](https://github.com/etcbc/blob/master/docs/transcription.md) | ||
|
||
## How to use | ||
|
||
This data can be processed by | ||
[Text-Fabric](https://annotation.github.io/text-fabric/tf). | ||
|
||
Text-Fabric will automatically download the BHSA data. | ||
|
||
After installing Text-Fabric, you can start the Text-Fabric browser by this command | ||
|
||
```sh | ||
text-fabric dhammapada | ||
``` | ||
|
||
Alternatively, you can work in a Jupyter notebook and say | ||
|
||
```python | ||
from tf.app import use | ||
|
||
A = use('dhammapada') | ||
``` | ||
|
||
In both cases the data is downloaded and ends up in your home directory, | ||
under `text-fabric-data`. | ||
|
||
See also | ||
[start](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/dhammapada/start.ipynb) | ||
and | ||
[search](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/dhammapada/search.ipynb). | ||
|
||
# Author | ||
|
||
[Dirk Roorda](https://github.com/dirkroorda) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# About the Dhammapada | ||
|
||
The Dhammapada is a collection of sayings of the Buddha in verse form | ||
and one of the most widely read and best known Buddhist scriptures. | ||
The first written source dates from 300 BCE. | ||
|
||
It is written in the Pāli language, which is a close relative of Sanskrit. | ||
|
||
## Additional resources | ||
|
||
* [wikipedia Dhammapada](https://en.wikipedia.org/wiki/Dhammapada) | ||
* [wikipedia Pāli](https://en.wikipedia.org/wiki/Pali), ISO-codes `pli`, `pi` | ||
* [Pāli-English with comments, by stanza](https://www.tipitaka.net/tipitaka/dhp/) | ||
* [Interlinear Pāli-English (single pdf)](https://www.ancient-buddhist-texts.net/Texts-and-Translations/Dhammapada/Dhammapada.pdf) | ||
* [English translation online](http://www.buddhanet.net/e-learning/buddhism/dhamma.htm) | ||
* [English translation (single PDF)](http://www.buddhanet.net/pdf_file/scrndhamma.pdf) | ||
|
||
# About this text | ||
|
||
The text that is the source of this dataset rests on the work of | ||
[Viggo Fausböll](https://en.wikipedia.org/wiki/Viggo_Fausböll) who translated the | ||
Dhammapada into Latin in 1855. | ||
|
||
|
||
field | value | ||
--- | --- | ||
title | `The Dhammapada` | ||
subtitle | `being a collection of moral verses in Pāli` | ||
remark | `edited a second time with a literal latin translation and notes for the use of Pāli students` | ||
editor | `V. Fausboll` | ||
publisher | `Luzac & Co., Publishers to the India Office` | ||
publisher address | `46, Great Russell Street, W.C. London` | ||
published | `1900` | ||
|
||
The cover page of the edition that is the source of this dataset: | ||
|
||
![cover](images/cover.png) | ||
|
||
# The text | ||
|
||
The Dhammapada is divided in *vaggas* which are divided in *stanzas*. | ||
There are 26 vaggas and 423 stanzas, which are numbered consecutively throughout the whole | ||
work. | ||
|
||
The book uses the latin script for the Pāli text. | ||
|
||
As an example, here are the first 7 stanzas of the first vagga in Pāli: | ||
|
||
![pali7](images/pali7.png) | ||
|
||
and here the same stanzas in Latin: | ||
|
||
![pali7](images/pali7.png) | ||
|
||
## Additional resources | ||
|
||
* Fausbøll, Michael Viggo, The Dhammapada. Being a collection of moral verses in Pali. | ||
Edited a second time with a literal Latin translation and notes | ||
for the use of Pali students. | ||
[free fragment of an article by Burkhard Scherer (pdf)](https://link.springer.com/article/10.1023/A:1012252226747) | ||
|
||
# The conversion | ||
|
||
The conversion program in in [tfFromTxt.py](programs/tfFromTxt.py). | ||
It can be seen in action in a Jupyter notebook: | ||
[convert.ipynb](https://nbviewer.org/github/etcbc/dhammapada/blob/master/programs/convert.ipynb) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
<img src="images/logo.png" align="right" width="200"/> | ||
<img src="images/tf.png" align="right" width="200"/> | ||
|
||
# Feature documentation | ||
|
||
Here you find a description of the transcriptions of the Dhammapada, | ||
the | ||
[Text-Fabric model](https://annotation.github.io/text-fabric/tf/about/datamodel.html) | ||
in general, and the node types, features of the | ||
Dhammapada corpus in particular. | ||
|
||
See also | ||
|
||
* [about](about.md) for the provenance of the data; | ||
* [TF docs](https://annotation.github.io/text-fabric/tf) for documentation on Text-Fabric. | ||
|
||
## Transcription | ||
|
||
The corpus consists of a text in Pāli and a Latin translation of the text. | ||
The main subdivision is in 26 units named *vaggas*, which are themselves divided | ||
into stanzas. There are 423 stanzas in the whole work and they are numbered across the vaggas | ||
from 1 to 423. These numbers are coded in the feature `n`. | ||
|
||
The original text and its translation are linked stanza-wise. | ||
|
||
During conversion we have made a finer division in clauses and sentences. | ||
Sentences are terminated by `.` and `?`, clauses are terminated by `;`, `:`, and | ||
also by `-` when it is not attached to a word. | ||
|
||
Clauses are subdivided in words, and words consist of | ||
non-letters before, letters, and non-letters after. | ||
|
||
Sentence and clauses sometimes cross stanza boundaries boundaries, but never | ||
vagga boundaries. | ||
That is why we number sentences and clauses by their sequence number within their | ||
vaggas, again in feature `n`. | ||
|
||
Most words are separated by spaces, but we also make word divisions in strings like | ||
`(qui-)que`. | ||
|
||
In the Latin text we encounter `( )`: this is material added for clarity by author | ||
of the translation, Fausbøll. We code it in the feature `clarity`, see below. | ||
|
||
In the Pāli text we also encounter `[ ]`: this is material that is not completely certain. | ||
We code it in the feature `uncertain`, see below. | ||
|
||
In both text there is quoted material. We normalize the quotes to the ASCII double quote | ||
`"`, and we mark words in a quotation by means of the feature `quote`. | ||
|
||
There is (very little) material outside stanzas: one case of interstanza material, | ||
and several cases at the start and end of vaggas. | ||
We mark this material with the feature `extrastanza`. | ||
The stanza number for extra stanza material is the stanza number of the nearest stanza in the | ||
same vagga, increased by 1000. So a 4-digit stanza number is by definition not a real stanza. | ||
And a 3-digit stanza is always a real stanza. | ||
|
||
Sentences, clauses and words either belong to the Pāli original or to the Latin | ||
translation. The feature `trans` codes which is the case. | ||
|
||
!!! caution "Mind the twins" | ||
The fact that stanzas contain both the original and the translation has these consequences: | ||
|
||
* If you count the words inside a stanza, you add up the Pāli words and the | ||
Latin words. Likewise if you count sentences and clauses. | ||
* If you want to count only words, clauses, sentences of one text type, | ||
use the `trans` feature to distinguish between them. | ||
* If you count the words *within* sentences or clauses, you count the words of | ||
one text type only. | ||
|
||
|
||
## Text-Fabric model | ||
|
||
The Text-Fabric model views the text as a series of atomic units, called | ||
*slots*. In this corpus [*words*](#word) are the slots. | ||
|
||
On top of that, more complex textual objects can be represented as *nodes*. In | ||
this corpus we have node types for: | ||
|
||
[*word*](#word), | ||
[*clause*](#clause), | ||
[*sentence*](#sentence), | ||
[*stanza*](#stanza), | ||
[*vagga*](#vagga), | ||
|
||
The type of every node is given by the feature | ||
[**otype**](https://annotation.github.io/text-fabric/tf/cheatsheet.html#f-node-features). | ||
Every node is linked to a subset of slots by | ||
[**oslots**](https://annotation.github.io/text-fabric/tf/cheatsheet.html#special-edge-feature-oslots). | ||
|
||
Nodes can be annotated with features. | ||
Relations between nodes can be annotated with edge features. | ||
See the table below. | ||
|
||
Text-Fabric supports up to three customizable section levels. | ||
In this corpus we use only two: | ||
[*vagga*](#vagga) and [*stanza*](#stanza). | ||
|
||
# Reference table of features | ||
|
||
*(Keep this under your pillow)* | ||
|
||
## *absent* | ||
|
||
When we say that a feature is *absent* for a node, we mean that the node has no value | ||
for the feature. For example, if the feature `trans` is absent for node `n`, then | ||
`F.trans.v(n)` results in the Python value `None`, not the string `'None'`. | ||
|
||
In queries, you can test for absence by means of `#`: | ||
|
||
``` | ||
word trans# | ||
``` | ||
|
||
gives all lines where the feature `trans` is absent (these are all the Pāli words). | ||
|
||
See also | ||
[search templates](https://annotation.github.io/text-fabric/tf/about/searchusage.html) | ||
under **Value specifications**. | ||
|
||
## Node type [*word*](#word) | ||
|
||
Basic unit containing a word plus attached non-word stuff such as punctuation, | ||
or a text-critical sign like `( ) [ ]`. | ||
|
||
feature | values | description | ||
------- | ------ | ------ | ----------- | --- | --- | ||
**pali** | `manasā` | the real word letters of a Pāli word | ||
**latin** | `mente` | the real word letters of a Latin word | ||
**palipre** | `[` | immediately preceding non-word characters of a Pāli word | ||
**latinpre** | `[` | immediately preceding non-word characters of a Latin word | ||
**palipost** | `[` | non-word characters after of a Pāli word, including whitespace | ||
**latinpost** | `[` | non-word characters after of a Latin wor, including whitespaced | ||
**extrastanza** | `1` | indicates the word is outside a stanza | ||
**quote** | `1` | indicates the word is inside a quotation | ||
**uncertain** | `1` | **Pāli only**: indicates the word is uncertain (somewhere inside a `[ ]` pair | ||
**clarity** | `1` | **Latin only**: indicates the word is added for clarity (somewhere inside a `( )` pair | ||
**trans** | `1` | indicates the word belongs to the Latin translation | ||
|
||
## Node type [*clause*](#clause) | ||
|
||
Subdivision of a containing [*sentence*](#sentence). | ||
|
||
feature | values | description | ||
------- | ------ | ------ | ||
**n** | `1` `2` | sequence number of a clause within its vagga | ||
**trans** | `1` | indicates the clause belongs to the Latin translation | ||
|
||
## Node type [*sentence*](#sentence) | ||
|
||
Subdivision of a containing [*vagga*](#vagga). | ||
|
||
feature | values | description | ||
------- | ------ | ------ | ||
**n** | `1` `2` | sequence number of a sentence within its vagga | ||
**trans** | `1` | indicates the sentence belongs to the Latin translation | ||
|
||
## Node type [*stanza*](#stanza) | ||
|
||
Section level 2. | ||
|
||
Subdivision of a containing [*vagga*](#vagga). | ||
|
||
feature | values | description | ||
------- | ------ | ------ | ||
**n** | `1` `2` | sequence number of a stanza within the whole work | ||
|
||
## Node type [*vagga*](#vagga) | ||
|
||
Section level 1. | ||
|
||
Subdivision of the whole work. | ||
|
||
feature | values | description | ||
------- | ------ | ------ | ||
**n** | `1` `2` | sequence number of a vagga within the whole work | ||
|
||
# Text formats | ||
|
||
The following text formats are defined (you can also list them with `T.formats`). | ||
|
||
format | description | ||
--- | --- | --- | ||
`text-orig-full` | prints the text of all words, Pāli and Latin | ||
`text-pali-full` | prints the text of all Pāli words and leaves Latin words empty | ||
`text-latin-full` | prints the text of all Latin words and leaves Pāli words empty | ||
`layout-orig-full` | as `text-orig-full` but with special layout for quote, uncertain, clarity, etc. | ||
`layout-pali-full` | as `text-pali-full` but with special layout for quote, uncertain, clarity, etc. | ||
`layout-latin-full` | as `text-latin-full` but with special layout for quote, uncertain, clarity, etc. | ||
|
||
The formats with `text` result in strings that are plain text, without additional formatting. | ||
|
||
The formats with `layout` result in pieces html with css-styles; | ||
the richness of layout enables us to code more information | ||
in the plain representation, e.g. blurry characters when words are uncertain. | ||
We also use different colours for Pali and Latin. |
Oops, something went wrong.