-
Notifications
You must be signed in to change notification settings - Fork 12
tt crud import
Welcome to the tt crud import
wiki that describes the current functionality and limitations of this implementation.
-
tt crud
- module for interact with the CRUD module of tarantool. -
tt crud import
- subcommand for import data into tarantool via CRUD.
The following types of target space are supported:
- boolean (values [
true
,t
] in fields of input data are interpreted astrue
, if format field require bool; forflase
- [false
,f
]) - string
- integer
- unsigned
- double (unstable, see issue)
- number
- decimal
Before importing a record into a space, the fields values of record are converted according to the format of this space.
If a field in target space has a type that is not specified at the list above, an attempt will be made to insert a string
value there.
At the moment, thousandth separators [`]
and [ ]
(space) for numeric values are supported by default, so -1`000`000
and - 1 000 000
will be -1000000
. Exponential notation of numeric values in the input data is also acceptable (like -2.7e+20
). Decimal separator is [.]
.
For CSV format, only standard delimiter [,]
, quote (["]
) and escape (double quote character
) characters are currently supported. Like RFC 4180.
By default, for null values using double delimiter character, so valOfField1,,valOfField3
in input file will be interpreted valOfField1,null,valOfField3
at the stage of insertion into the database.
There is no generation of unique keys for inserted data. There is no support for data types such as uuid
or timestamp
.
tt crud import
URI FILE SPACE [flags]
-
URI
- address of router. -
FILE
- file with input data for import. -
SPACE
- target space name to import data.
-
--batch-size
uint32
Crud batch size during import. (default 100)
-
--error
string
Name of file with rows that were not imported. Overwrite existed file. (default "error")
-
--format
string
Format of input data. Currently only "csv" is supported. (default "csv")
--header
First line is a header, not a data.
-
-h
,--help
Help for import.
-
--log
string
Name of log file with information about occurred errors. Appended existed file. (default "import")
-
--match
string
Use correspondence between header fields in input file and target space fields. Now it require option header as "true". If there are fields in the space format that are not specified in the header, an attempt will be made to insert null into them. If there are fields in the header that are not specified in the space format, they will be ignored. Now only "header" value for this option is supported. No yet possible to set a manual match, like "spaceId=csvFoo,spaceName=csvBar,...".
-
--null
string
Sets value to be interpreted as NULL when importing. By default, an empty value. Example for csv: field1val,,field3val, where field2val will be taken as NULL.
-
--onerror
string
If any error occurs, either skips the problematic line and goes on or stops work. Allows values "stop" or "skip". (default "stop")
-
--operation
string
Value "insert" or "replace". (default "insert")
-
-p
,--password
string
Connection password.
--progress
Progress file from last launch will be taken into account. File stores the positions of lines that could not be imported at the last launch. Also stores the stop position from the last start. As a result, an attempt will be repeated to insert lines with specified positions, and then work will continue from stop position. At each launch, the content of the progress.json file is completely overwritten.
-
--success
string
Name of file with rows that were imported. Overwrite existed file. (default "success")
-
-u
,--username
string
Connection username.
$ ./tt crud import
localhost:3301
./Employees.csv
mytestspace
--username=guest
--header
--match=header
--onerror=skip
--operation=replace
• Running crud import:
Crud init complete: [true]
Target space exist: [true]
In case of error: [skip]
PID of this process: [49148]
WARNING: Process is not sensitive to SIGINT (ctrl+c), use kill -9 49148
[ read/ignored : 30000/0 | parsed ok/err : 30000/0 | import ok/err : 30000/0 ]
IMPORT SUMMARY
total read: 30000
ignored (--progress): 0
parsed success: 30000
parsed error: 0
import success: 30000
import error: 0
speed (rec per sec): 6746
import logs file: import.log
failed recs file: error.csv
imported recs file: success.csv
import progress file: progress.json
For demonstration, you can use the file playground.lua from crud repository.
Please note that for work crud import
, you need crud
with batching
(need crud.insert_many(...), crud.replace_many(...), etc) and vshard
(vshard
as playground.lua requirement, not crud import
requirement).
Start playground:
$ git clone https://github.com/tarantool/crud.git
$ cd crud
$ tarantoolctl rocks make
$ ./doc/playground.lua
Check playground via tt connect
:
$ tt connect localhost:3301
# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
{'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
{'type': 'number', 'name': 'age', 'is_nullable': false}]
rows:
- [1, 477, 'Alexey', 'Adams', 20]
- [2, 401, 'Sergey', 'Allred', 21]
- [3, 2804, 'Pavel', 'Adams', 27]
- [4, 1161, 'Mikhail', 'Liston', 51]
- [5, 1172, 'Dmitry', 'Jacobi', 16]
- [6, 1064, 'Alexey', 'Sidorov', 31]
- null
...
Input file (2-ed record has duplicated key problem, 1-st and 3-rd are ok):
$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
7,900,Marge,Simpson,33
8,800,Homer,"Simpson",40
Run import:
$ tt crud import
localhost:3301
./developers.csv
developers
--username=guest
--header
--match=header
--onerror=stop
--batch-size=1
• Running crud import:
Crud init complete: [true]
Target space exist: [true]
In case of error: [stop]
PID of this process: [57709]
WARNING: Process is not sensitive to SIGINT (ctrl+c), use kill -9 57709
[ read/ignored : 1/0 | parsed ok/err : 1/0 | import ok/err : 1/0 ]
Onerror: An error has occurred, the import has been stopped. See the details in the log, error, success and progress files.
IMPORT SUMMARY
total read: 2
ignored (--progress): 0
parsed success: 2
parsed error: 0
import success: 1
import error: 1
speed (rec per sec): 89
import logs file: import.log
failed recs file: error.csv
imported recs file: success.csv
import progress file: progress.json
Logs file:
$ cat import.log
Crud import start work timestamp: 2022-06-20 11:53:09.333438 +0300 MSK m=+0.030275632
File description: this file contains logs information about occurred errors
Note: errors are indicated relative to the space format, not relative to the header
...
timestamp: 2022-06-20 11:53:09.341663 +0300 MSK m=+0.038500590
line position: 3
problem record: 7,900,Marge,Simpson,33
error description: CallError: Failed for 33cba7a0-8420-452d-b8b6-0f96fd63f4fa: Function returned an error: Duplicate key exists in unique index "primary_index" in space "developers" with old tuple - [7, 700, "Ned", "Flanders", 35] and new tuple - [7, 900, "Marge", "Simpson", 33]
...
Success file:
$ cat success.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
Error file:
$ cat error.csv
id,bucket_id,name,surname,age
7,900,Marge,Simpson,33
Progress file:
$ cat progress.json
{
"startTimestamp":"2022-06-20 11:53:09.319323 +0300 MSK m=+0.016160728",
"lastDumpTimestamp":"2022-06-20 11:53:09.341703 +0300 MSK m=+0.038540474",
"endOfFileReached":false,
"lastPosition":3,
"retryPositions":[3]
}
Let's correct input file. Now 2-en record has not duplicated key problem:
$ cat developers.csv
id,bucket_id,name,surname,age
7,700,"Ned",Flanders,35
9,900,Marge,Simpson,33
8,800,Homer,"Simpson",40
So let's continue working from the stop point, repeating the problem lines before it. Use --progress
option for it.
$ tt crud import
localhost:3301
./developers.csv
developers
--username=guest
--header
--match=header
--onerror=stop
--batch-size=1
--progress
• Running crud import:
Crud init complete: [true]
Target space exist: [true]
In case of error: [stop]
PID of this process: [60611]
WARNING: Process is not sensitive to SIGINT (ctrl+c), use kill -9 60611
[ read/ignored : 3/1 | parsed ok/err : 3/0 | import ok/err : 2/0 ]
IMPORT SUMMARY
total read: 3
ignored (--progress): 1
parsed success: 3
parsed error: 0
import success: 2
import error: 0
speed (rec per sec): 142
import logs file: import.log
failed recs file: error.csv
imported recs file: success.csv
import progress file: progress.json
Now error.csv
has no record. You can make sure that all records (see id = 7,8,9) are imported via tt connect
:
# localhost:3301> crud.select('developers')
---
- metadata: [{'name': 'id', 'type': 'unsigned'}, {'name': 'bucket_id', 'type': 'unsigned'},
{'name': 'name', 'type': 'string'}, {'type': 'string', 'name': 'surname', 'is_nullable': true},
{'type': 'number', 'name': 'age', 'is_nullable': false}]
rows:
- [1, 477, 'Alexey', 'Adams', 20]
- [2, 401, 'Sergey', 'Allred', 21]
- [3, 2804, 'Pavel', 'Adams', 27]
- [4, 1161, 'Mikhail', 'Liston', 51]
- [5, 1172, 'Dmitry', 'Jacobi', 16]
- [6, 1064, 'Alexey', 'Sidorov', 31]
- [7, 700, 'Ned', 'Flanders', 35]
- [8, 800, 'Homer', 'Simpson', 40]
- [9, 900, 'Marge', 'Simpson', 33]
- null
...