kvrocks.conf to optimize bulk loading #1910
Replies: 7 comments 2 replies
-
It depends on where's your data. You can import data via
We don't turn off the compaction. It will use rocksdb auto-compaction + periodically compact by default.
Yes, we suggest the max size of DB is between 200-300GiB per instance(you need to leave some disk space for compaction). |
Beta Was this translation helpful? Give feedback.
-
Thanks for the quick response. I've restricted my indexing and inserts are
now going in much faster. I also re-enabled automatic compact since it
didn't seem to make much of a difference.
Now that I have around 20G of data loaded, when I run a single pipeline or
mget I'm getting a decent response time: ( 5-10 )ms but once I start to run
a concurrency test ( 5-10) users, I see the query times slow down
significantly, some as slow as 3-4 seconds. I've configured the server with
100 workers from the default 8 but that doesn't seem to help. Any
suggestions on how to improve query times with hundreds of users?
Thanks!
…On Fri, Nov 24, 2023 at 11:11 PM hulk ***@***.***> wrote:
Hi @mark-e-hoffman <https://github.com/mark-e-hoffman>
Can you please give some suggestions on an optimal configuration for bulk
loading millions of small rows for an initial db load?
It depends on where's your data. You can import data via redis-cli --csv
manually if it's a local dataset, or use RedisShake
<https://github.com/tair-opensource/RedisShake> if your data is in Redis.
It appears that rocksdb compaction is turned off by default and I should
use cron to periodically compact.
We don't turn off the compaction. It will use rocksdb auto-compaction +
periodically compact by default.
I may need to get up 100G, if that is possible
Yes, we suggest the max size of DB is between 200-300GiB per instance(you
need to leave some disk space for compaction).
—
Reply to this email directly, view it on GitHub
<#1910 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMZ4PPA33MCDC4WOM4NJPLYGFVYDAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMNRUGY4DS>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
It should be too large to set workers to 100, you can simply set it to your CPU cores.
How many QPS when you're running the concurrency test? |
Beta Was this translation helpful? Give feedback.
-
Ok, I've changed the number of workers back to 8.
Here's my database overview has some testing results.
Firstly I'm using binary data for both key and value.
The key is 17 bytes and the value varies from a minimum of 16 bytes with a
maximum that could be 200x that size. The binary value is a serialized
HashSet. My client is written in Rust.
I have about 30 million keys in the database currently.
For a query I am using an MGET with 1000 keys at most per query. I've
experimented with a pipeline but mget seems to get better results.
If I run a single client query I'm getting around 40 QPS and the average
mget response time is <20ms. For this sample I am getting a lot of matches
to the 1000 keys sent. I'm also running the service talking to the
kvrocks db on the same machine as the kvrocks server just to eliminate any
network latency. This is a good response time for my application.
When I run two clients my QPS goes down to about 30-35 QPS combined and the
mget response time spikes to around 120-150 ms. I'm running random queries
so I'm avoiding results being cached anywhere.
A 3 user test returns about the same results but 4 users the mget response
times double and 5 users increases more, as well.
I really like the db server but I need to support thousands of concurrent
queries with a reasonable resp time ( < 1 sec ). I'm trying to evaluate if
it's a suitable choice for me.
Any thoughts or suggestions would be greatly appreciated.
…On Mon, Nov 27, 2023 at 9:15 AM hulk ***@***.***> wrote:
I've configured the server with 100 workers from the default 8 but that
doesn't seem to help.
It should be too large to set workers to 100, you can simply set it to
your CPU cores.
Now that I have around 20G of data loaded, when I run a single pipeline or
mget I'm getting a decent response time: ( 5-10 )ms but once I start to run
a concurrency test ( 5-10) users, I see the query times slow down
significantly, some as slow as 3-4 seconds.
How many QPS when you're running the concurrency test?
—
Reply to this email directly, view it on GitHub
<#1910 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMZ4PK3YKYVVXSL2UMOFC3YGSN6TAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBRGQZTK>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm using an aws instance with EBS ( NVMe). I can attach another volume and
move the data to that. It's currently running off the root drive.
…On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote:
Which type of disk device you're running on? HDD, SSD, or NVMe? The
latency of MGET depends on the speed of your device since you're random
reading from the db.
—
Reply to this email directly, view it on GitHub
<#1910 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I can mount an IOPS ssd io2 and see how much it improves thing.
…On Mon, Nov 27, 2023 at 12:40 PM Mark Hoffman ***@***.***> wrote:
I'm using an aws instance with EBS ( NVMe). I can attach another volume
and move the data to that. It's currently running off the root drive.
On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote:
> Which type of disk device you're running on? HDD, SSD, or NVMe? The
> latency of MGET depends on the speed of your device since you're random
> reading from the db.
>
> —
> Reply to this email directly, view it on GitHub
> <#1910 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
-
things are running much better with an XFS filesystem on the iop volume.
Testing goes on the determine if it can take the load.
Thanks a bunch!
BTW, would you recommend that I try the SPEEDB option?
…On Mon, Nov 27, 2023 at 12:46 PM Mark Hoffman ***@***.***> wrote:
I can mount an IOPS ssd io2 and see how much it improves thing.
On Mon, Nov 27, 2023 at 12:40 PM Mark Hoffman ***@***.***> wrote:
> I'm using an aws instance with EBS ( NVMe). I can attach another volume
> and move the data to that. It's currently running off the root drive.
>
> On Mon, Nov 27, 2023 at 12:29 PM hulk ***@***.***> wrote:
>
>> Which type of disk device you're running on? HDD, SSD, or NVMe? The
>> latency of MGET depends on the speed of your device since you're random
>> reading from the db.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#1910 (reply in thread)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AAMZ4PNTJZEOCBHCMUEXYLLYGTEZLAVCNFSM6AAAAAA7ZQJMV2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TMOBTGY3DS>
>> .
>> You are receiving this because you were mentioned.Message ID:
>> ***@***.***>
>>
>
|
Beta Was this translation helpful? Give feedback.
-
Hi, Can you please give some suggestions on an optimal configuration for bulk loading millions of small rows for an initial db load? It appears that rocksdb compaction is turned off by default and I should use cron to periodically compact. Is that correct?
One more thing, what's the largest amount of disk space that you've encountered for a single node. I may need to get up 100G, if that is possible. Thanks
Beta Was this translation helpful? Give feedback.
All reactions