kvrocks.conf to optimize bulk loading #1910

mark-e-hoffman · 2023-11-24T19:42:26Z

mark-e-hoffman
Nov 24, 2023

Hi, Can you please give some suggestions on an optimal configuration for bulk loading millions of small rows for an initial db load? It appears that rocksdb compaction is turned off by default and I should use cron to periodically compact. Is that correct?
One more thing, what's the largest amount of disk space that you've encountered for a single node. I may need to get up 100G, if that is possible. Thanks

git-hulk · 2023-11-25T04:11:34Z

git-hulk
Nov 25, 2023
Collaborator

Hi @mark-e-hoffman

Can you please give some suggestions on an optimal configuration for bulk loading millions of small rows for an initial db load?

It depends on where's your data. You can import data via redis-cli --csv manually if it's a local dataset, or use RedisShake if your data is in Redis.

It appears that rocksdb compaction is turned off by default and I should use cron to periodically compact.

We don't turn off the compaction. It will use rocksdb auto-compaction + periodically compact by default.

I may need to get up 100G, if that is possible

Yes, we suggest the max size of DB is between 200-300GiB per instance(you need to leave some disk space for compaction).

0 replies

mark-e-hoffman · 2023-11-27T13:23:44Z

mark-e-hoffman
Nov 27, 2023
Author

Thanks for the quick response. I've restricted my indexing and inserts are now going in much faster. I also re-enabled automatic compact since it didn't seem to make much of a difference. Now that I have around 20G of data loaded, when I run a single pipeline or mget I'm getting a decent response time: ( 5-10 )ms but once I start to run a concurrency test ( 5-10) users, I see the query times slow down significantly, some as slow as 3-4 seconds. I've configured the server with 100 workers from the default 8 but that doesn't seem to help. Any suggestions on how to improve query times with hundreds of users? Thanks!

…

On Fri, Nov 24, 2023 at 11:11 PM hulk ***@***.***> wrote: Hi @mark-e-hoffman <https://github.com/mark-e-hoffman> Can you please give some suggestions on an optimal configuration for bulk loading millions of small rows for an initial db load? It depends on where's your data. You can import data via redis-cli --csv manually if it's a local dataset, or use RedisShake <https://github.com/tair-opensource/RedisShake> if your data is in Redis. It appears that rocksdb compaction is turned off by default and I should use cron to periodically compact. We don't turn off the compaction. It will use rocksdb auto-compaction + periodically compact by default. I may need to get up 100G, if that is possible Yes, we suggest the max size of DB is between 200-300GiB per instance(you need to leave some disk space for compaction). — Reply to this email directly, view it on GitHub <#1910 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMZ4PPA33MCDC4WOM4NJPLYGFVYDAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNRUGY4DS> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

git-hulk · 2023-11-27T14:14:54Z

git-hulk
Nov 27, 2023
Collaborator

I've configured the server with 100 workers from the default 8 but that doesn't seem to help.

It should be too large to set workers to 100, you can simply set it to your CPU cores.

Now that I have around 20G of data loaded, when I run a single pipeline or
mget I'm getting a decent response time: ( 5-10 )ms but once I start to run
a concurrency test ( 5-10) users, I see the query times slow down
significantly, some as slow as 3-4 seconds.

How many QPS when you're running the concurrency test?

0 replies

mark-e-hoffman · 2023-11-27T17:09:09Z

mark-e-hoffman
Nov 27, 2023
Author

Ok, I've changed the number of workers back to 8. Here's my database overview has some testing results. Firstly I'm using binary data for both key and value. The key is 17 bytes and the value varies from a minimum of 16 bytes with a maximum that could be 200x that size. The binary value is a serialized HashSet. My client is written in Rust. I have about 30 million keys in the database currently. For a query I am using an MGET with 1000 keys at most per query. I've experimented with a pipeline but mget seems to get better results. If I run a single client query I'm getting around 40 QPS and the average mget response time is <20ms. For this sample I am getting a lot of matches to the 1000 keys sent. I'm also running the service talking to the kvrocks db on the same machine as the kvrocks server just to eliminate any network latency. This is a good response time for my application. When I run two clients my QPS goes down to about 30-35 QPS combined and the mget response time spikes to around 120-150 ms. I'm running random queries so I'm avoiding results being cached anywhere. A 3 user test returns about the same results but 4 users the mget response times double and 5 users increases more, as well. I really like the db server but I need to support thousands of concurrent queries with a reasonable resp time ( < 1 sec ). I'm trying to evaluate if it's a suitable choice for me. Any thoughts or suggestions would be greatly appreciated.

…

On Mon, Nov 27, 2023 at 9:15 AM hulk ***@***.***> wrote: I've configured the server with 100 workers from the default 8 but that doesn't seem to help. It should be too large to set workers to 100, you can simply set it to your CPU cores. Now that I have around 20G of data loaded, when I run a single pipeline or mget I'm getting a decent response time: ( 5-10 )ms but once I start to run a concurrency test ( 5-10) users, I see the query times slow down significantly, some as slow as 3-4 seconds. How many QPS when you're running the concurrency test? — Reply to this email directly, view it on GitHub <#1910 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMZ4PK3YKYVVXSL2UMOFC3YGSN6TAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBRGQZTK> . You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

git-hulk Nov 27, 2023
Collaborator

Which type of disk device you're running on? HDD, SSD, or NVMe? The latency of MGET depends on the speed of your device since you're random reading from the db.

mark-e-hoffman · 2023-11-27T17:41:02Z

mark-e-hoffman
Nov 27, 2023
Author

I'm using an aws instance with EBS ( NVMe). I can attach another volume and move the data to that. It's currently running off the root drive.

…

On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote: Which type of disk device you're running on? HDD, SSD, or NVMe? The latency of MGET depends on the speed of your device since you're random reading from the db. — Reply to this email directly, view it on GitHub <#1910 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

mark-e-hoffman · 2023-11-27T17:46:16Z

mark-e-hoffman
Nov 27, 2023
Author

I can mount an IOPS ssd io2 and see how much it improves thing.

…

On Mon, Nov 27, 2023 at 12:40 PM Mark Hoffman ***@***.***> wrote: I'm using an aws instance with EBS ( NVMe). I can attach another volume and move the data to that. It's currently running off the root drive. On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote: > Which type of disk device you're running on? HDD, SSD, or NVMe? The > latency of MGET depends on the speed of your device since you're random > reading from the db. > > — > Reply to this email directly, view it on GitHub > <#1910 (reply in thread)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

0 replies

mark-e-hoffman · 2023-11-27T22:21:47Z

mark-e-hoffman
Nov 27, 2023
Author

things are running much better with an XFS filesystem on the iop volume. Testing goes on the determine if it can take the load. Thanks a bunch! BTW, would you recommend that I try the SPEEDB option?

…

On Mon, Nov 27, 2023 at 12:46 PM Mark Hoffman ***@***.***> wrote: I can mount an IOPS ssd io2 and see how much it improves thing. On Mon, Nov 27, 2023 at 12:40 PM Mark Hoffman ***@***.***> wrote: > I'm using an aws instance with EBS ( NVMe). I can attach another volume > and move the data to that. It's currently running off the root drive. > > On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote: > >> Which type of disk device you're running on? HDD, SSD, or NVMe? The >> latency of MGET depends on the speed of your device since you're random >> reading from the db. >> >> — >> Reply to this email directly, view it on GitHub >> <#1910 (reply in thread)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS> >> . >> You are receiving this because you were mentioned.Message ID: >> ***@***.***> >> >

1 reply

git-hulk Nov 28, 2023
Collaborator

@mark-e-hoffman Can have a try, but still no user enables this on production.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvrocks.conf to optimize bulk loading #1910

{{title}}

Replies: 7 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

kvrocks.conf to optimize bulk loading #1910

mark-e-hoffman Nov 24, 2023

Replies: 7 comments · 2 replies

git-hulk Nov 25, 2023 Collaborator

mark-e-hoffman Nov 27, 2023 Author

git-hulk Nov 27, 2023 Collaborator

mark-e-hoffman Nov 27, 2023 Author

git-hulk Nov 27, 2023 Collaborator

mark-e-hoffman Nov 27, 2023 Author

mark-e-hoffman Nov 27, 2023 Author

mark-e-hoffman Nov 27, 2023 Author

git-hulk Nov 28, 2023 Collaborator

mark-e-hoffman
Nov 24, 2023

Replies: 7 comments 2 replies

git-hulk
Nov 25, 2023
Collaborator

mark-e-hoffman
Nov 27, 2023
Author

git-hulk
Nov 27, 2023
Collaborator

mark-e-hoffman
Nov 27, 2023
Author

git-hulk Nov 27, 2023
Collaborator

mark-e-hoffman
Nov 27, 2023
Author

mark-e-hoffman
Nov 27, 2023
Author

mark-e-hoffman
Nov 27, 2023
Author

git-hulk Nov 28, 2023
Collaborator