Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secor uploads (different) files with the same "name" into different "days". #2126

Open
glebsam opened this issue Jun 21, 2021 · 2 comments
Open

Comments

@glebsam
Copy link

glebsam commented Jun 21, 2021

Secor uploads files (different content) with the same "name" into different days when first day ends and the next begins.

In the example below, I have two files:

/topic-name/dt=2021-06-15/1_0_00000000004302033536.gz
/topic-name/dt=2021-06-16/1_0_00000000004302033536.gz
2021-06-16 00:01:05,444 [Thread-4] (com.pinterest.secor.uploader.S3UploadManager) INFO  uploading file /mnt/secor_data/message_logs/partition/9_13/topic-name/dt=2021-06-15/1_0_00000000004302033536.gz to s3://kafka-backup.s3.domain/dumps/topic-name/dt=2021-06-15/1_0_00000000004302033536.gz with no encryption
2021-06-16 00:01:05,444 [Thread-4] (com.pinterest.secor.uploader.S3UploadManager) INFO  uploading file /mnt/secor_data/message_logs/partition/9_13/topic-name/dt=2021-06-16/1_0_00000000004302033536.gz to s3://kafka-backup.s3.domain/dumps/topic-name/dt=2021-06-16/1_0_00000000004302033536.gz with no encryption

Is it Ok? Where I can read more details about such behaviour?
At least, I need to know, which offsets contains which file and possible ways to maybe set it explicitly in the file name (I expect that the first offset of the file is the offset specified in its name, but in described case it is not true).

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jun 21, 2021 via email

@glebsam
Copy link
Author

glebsam commented Jun 22, 2021

@HenryCaiHaiying thank you for the answer, but I still can't get, why <previous-persisted-kafka-offset> is in the convention while I can see that this offset is the first offset that contains the file? I mean, not the last offset of a previous file. Btw, I have a topic with a single partition and we use an idempotent producer for this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants