PostgreSQL- Performance deviation on run same test and workload on multiple trails #286

anilbommareddy · 2021-11-19T12:55:30Z

anilbommareddy
Nov 19, 2021

PostgreSQLv14 source code compiled with GCCv11 and binaries' installed in /usr/test/pgsqlv14-gcc.
Run TPC-C tests with HammerDB scripts.
OS: RHEL 8.4
HammerDBv4.2

steps are followed

umount /usr/local/pgsql/data
rm -rf /usr/local/pgsql/data
mkdir /usr/local/pgsql/data
mkfs.xfs -f /dev/nvme0n1
mount /dev/nvme0n1 /usr/local/pgsql/data
chown postgres /usr/local/pgsql/data
2.
su - postgres
Export Environment Variable's of of pgsqlv14-gcc

export PATH=/usr/test/pgsqlv14-gcc/bin:$PATH
export LD_LIBRARY_PATH=/usr/test/pgsqlv14-gcc/lib:$LD_LIBRARY_PATH

commands to run:

initdb -D /usr/local/pgsql/data
pg_ctl start -D /usr/local/pgsql/data

HammerDBv4.2 test:

(a) build_schema warehouse 500 and vu50.
(b)On schema done successfully restart postgresql
(c)run the test for vu's like:192vu and 250vu
Virtual Users trail-1 trail-2
192 1172413 1578301 difference trail1vs trail2 >20%
250 828171 494046 difference >20%

After trail-1 was done.
drop data base tpc-c in postgresql
restart postgresql
Run build_schema and schemd done succesfully.
restart postgresql
run the tests -2nd trail

As per HammerDB documents for multiple test runs the deviation expected is 1-2%
Here Observed trail-1 and trail-2 deviation is more than 20% for 192vu and 250vu

Reference links:
https://www.enterprisedb.com/how-to-benchmark-postgresql-using-hammerdb-open-source-tool (hammerdb scripts)

Answered by sm-shaw

Nov 23, 2021

Hi, the question about threads is not entirely clear. HammerDB itself is not impacted by NUMA issues as the Virtual Users which are OS threads are independent of each other and there is no GIL in the language used https://www.hammerdb.com/blog/uncategorized/what-programming-languages-does-hammerdb-use-and-why-does-it-matter/. PostgreSQL is not thread based but is process based.
The new HammerDB statistics will report information from the active session history.

I would recommend repeating your test but with a gap of 4 or 8 virtual users and continue past 72 rather than jumping to high thread counts. For the minor variations in results after multiple tests related to autovacuum use a tool …

View full answer

sm-shaw · 2021-11-19T14:37:46Z

sm-shaw
Nov 19, 2021
Maintainer

Moved to Discussions as this is a PostgreSQL configuration issue rather than a HammerDB one.

So to start with you don't give any details about your hardware configuration so CPUs, (which ones, how many cores?) memory & I/O. Secondly, you don't include your PostgreSQL configuration such as the postgresql.conf. You need to include a lot more information about your environment to be able to get any help. Thirdly and most importantly, you don't include any details about database and system performance and wait events during the test. So what are the top wait events in PostgreSQL when it is running? What is the CPU utilization etc,

You also don't give the actual NOPM/TPM values.

So, to start with, I would recommend stepping back a bit and watch the introductory talk linked here https://www.hammerdb.com/about.html and then read the documentation and blog eg https://www.hammerdb.com/blog/uncategorized/how-many-warehouses-for-the-hammerdb-tpc-c-test/ (250 vusers on 500 warehouses is too high). In the video in particular, pay close attention to performance profiles at about 45 minutes. Always start with 1VU then 2, 4 etc and plot the graph of your system's potential - you should see a smooth curve as the load increases.

If you have configured the database and system correctly and you run the test twice your lines with overlay each other closely.

1 reply

sm-shaw Nov 20, 2021
Maintainer

As a follow up - v4.3 has been released - this includes a new PostgreSQL graphical metrics viewer to help you find your key wait events.
You should also use the transaction counter to make sure that the numbers you see really are consistent over time, i.e. a flat performance graph rather than one with peaks or troughs.

arjunshetty955 · 2021-11-22T11:04:21Z

arjunshetty955
Nov 22, 2021

@sm-shaw and @anilbommareddy Similar behavior observed on PostgreSQLv14 with below Environment( benchmark and configuration).
Bare metal Intel.
PostgreSQLv14.
HammerDBv4.2
CPU(s): 256
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 8
RAM SIZE :512GB
SSD/HHD:1TB

On Perf observed the top functions for most of vu's(example 24vu).
18.99% postgres postgres [.] LWLockAcquire
7.09% postgres postgres [.] _bt_compare
8.66% postgres postgres [.] LWLockRelease
2.28% postgres postgres [.] GetSnapshotData

build_schema.tcl.txt
test_run.tcl.txt

Virtual Users	Trail-1	Trail-2
1	82798	79692
8	529138	526546
12	792214	788268
16	1185747	1202157
20	1505292	1482111
36	1681780	1679249
44	1664852	1633192
52	1727104	1679525
60	1732666	1626505
72	1781263	1730332
192	1172313	1578401
256	828171	494046
For higher virtual users like 192/256 on multiple run's(trail-1 and 2) the performance deviation is more than 15%

0 replies

anilbommareddy · 2021-11-22T14:34:08Z

anilbommareddy
Nov 22, 2021
Author

@arjunshetty955 the attached files like Hammerdb and postgres.conf similar.
May you can cross check for higher vu's the performance deviation still exists with

autovaccum =off and rerun tests
run Hammerdb in root user rather in postgresql user.
performance and wait events during the test. So what are the top wait events in PostgreSQL when it is running? What is the CPU utilization etc(@sm-shaw suggested points).

0 replies

sm-shaw · 2021-11-22T17:27:07Z

sm-shaw
Nov 22, 2021
Maintainer

There are a lot of introductory database benchmarking topics here, but they are useful for other people following the thread in future.

To start with, you are assuming linear core and cross-socket scalability from the combination of hardware and software, you are using, is this a valid assumption? Do you have empirical evidence to back this proposition up?

Let's take the above questions first:

autovaccum =off and rerun tests

For long running sequences of tests - no you don't want to do this - please read a PostgreSQL manual to understand why.

run Hammerdb in root user rather in postgresql user.

Will not make a difference

performance and wait events during the test. So what are the top wait events in PostgreSQL when it is running? What is the CPU utilization etc(@sm-shaw suggested points).

perf will show you the top function but not the actual details. There is a blog post here about new HammerDB v4.3 functionality.

https://www.hammerdb.com/blog/uncategorized/hammerdb-v4-3-new-features-pt1-graphical-metrics-for-postgresql/

This will help you drill down on the actual PostgreSQL wait events rather than just the group and top functions.

So to step back on the empirical evidence side - you don't mention whether your data is TPM or NOPM? However, this aside, I have seen data for HammerDB v4.2 workloads on PostgreSQL 13 on x86-64 2 x socket systems in the range between 2.5M to 3M NOPM. So we do know that HammerDB performance on an alternative configuration can exceed what you see here. Also, the data from a commercial database is again significantly higher, so we know that the bottleneck is not in HammerDB and this is not a HammerDB issue.

So next, let's graph your data, the data points from both runs are very close initially, then there is some minor variance at higher VU counts (your research on autovacuum can explain why this is the case) but then you sort of gave up after 72 and jumped straight to 192 and 256 virtual Users.

Also, as previous, note that a NOPM value is an average over a period of time. A single run with a lot of variance with higher and lower points may produce the same average as something that is in-between and very consistent. Consistent is what you want - here is a laptop example:

So, it is clear that this is not a HammerDB issue and out of scope for further HammerDB support, in fact this is exactly what HammerDB is for to explore the potential of your database hardware and software. In this example, you are seeing the scalability limitations of your configuration and should drill down into the statistics to understand where these are. There is a very clear hint in seeing 14.99% postgres postgres [.] LWLockAcquire as the top function so I would suggest starting there.

0 replies

anilbommareddy · 2021-11-23T05:55:04Z

anilbommareddy
Nov 23, 2021
Author

@sm-shaw : Thank you for info and the explanation on HammerDB.
In my environment i also noticed PostgreSQL on a 2 socket system results are deliver more than 2M PostgreSQL TPM and 1M NOPM with the HammerDB TPC-C test
And I think @arjunshetty955 values related to NOPM.
And thanks info on HammerDBv4.3 includes PostgreSQL enabling the user to drill down on database metrics in real time.
In my environment, I'll check PostgreSQL with HammerDBv4.3 functionality.it helpfully me out deviation issue.

is HammerDBv4.3 give info as PostgreSQL threads are bound to Numa Nodes(threads are related/acquired from Numa Nodes(CPU_AFFINITY)?

0 replies

sm-shaw · 2021-11-23T11:59:29Z

sm-shaw
Nov 23, 2021
Maintainer

Hi, the question about threads is not entirely clear. HammerDB itself is not impacted by NUMA issues as the Virtual Users which are OS threads are independent of each other and there is no GIL in the language used https://www.hammerdb.com/blog/uncategorized/what-programming-languages-does-hammerdb-use-and-why-does-it-matter/. PostgreSQL is not thread based but is process based.
The new HammerDB statistics will report information from the active session history.

I would recommend repeating your test but with a gap of 4 or 8 virtual users and continue past 72 rather than jumping to high thread counts. For the minor variations in results after multiple tests related to autovacuum use a tool such as this https://bucardo.org/check_postgres/ to check for table bloat. Vacuum full or rebuild (which are basically the same thing) will remove the bloat.

As session counts increase and performance does not, this is related to overall system scalability (and not a HammerDB problem because as above we have already noted higher performance elsewhere - I do not give exact details as HammerDB never publishes any benchmarks directly, performance data is only used for reference or illustration). In particular, as you add Virtual Users you should look to see if the CPU utilisation is increasing or are particular wait events increasing instead and you can't increase the utilisation any higher (i.e. scalability). The new GUI metrics will show LWLock in pink.

6 replies

sm-shaw Dec 13, 2021
Maintainer

If you cannot use the GUI, you would need to use SQL to query pg_active_session_history directly.

arjunshetty955 Dec 16, 2021

PostgreSQLv14 source build/compiled with GCCv11.1 and bin's run different machine like single machine and client-server machine.
observed Single Milan machine, the NOPM is more or less half with the Client-Server method.
And checked the network bandwidth on Client-Server machine, it is similar bandwidth(transmit request and receive) and tcp/udp ports same bandwidth.
Only the difference in Client-Server is RAM size and Cache(L1/L2/L3). is this cause drop in NOPM?
Is another recommend configurations or parameters need to check via HammerDBv4.x

In Client-server model(HammerDBv4.x run in Client and PostgreSQLv14 run in Server Model)
12 VU:NOPM 431811)
On Single or Sole Machine (both HammerDBv4.x & PostgreSQLv14 run same machine )
12 VU: NOPM:728825

sm-shaw Dec 16, 2021
Maintainer

Yes, different databases will show different performance profiles when run locally and client/server. The open source databases in particular will show different performance profiles. You will need to increase the VU count to increase the NOPM. Again, the performance data will show when the database is waiting on a client.

arjunshetty955 Mar 31, 2022

Observed PostgreSQL not utilized 100% CPU on higher virtual users.
Example:

PostgreSQLv14-TPC-C run with HammerDBv4.3 Vu's like 1,2,4,8,12,16,......128,132,------256(the difference vu is 4).
The values NOPM /TPM values increment with linear from 1vu to 128vu
Max performance(NOPM/TPM) observed at 128 vu(NOPM:2014893/TPM:4660175)and after 132-256vu's performance is dropped of 2-4% NOPM value of 128vu .
And MAX CPU utilized is 50% at 128vu and after 132-256vu's the CPU utilized is 30%

pg_top and Iostat tools are used to check CPU utilization on PostgreSQL,

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 256
Online CPU(s) list: 0-255
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 4
NUMA node0 CPU(s): 0-31,128-159
NUMA node1 CPU(s): 32-63,160-191
NUMA node2 CPU(s): 64-95,192-223
NUMA node3 CPU(s): 96-127,224-255
Model name: Intel
RAM: 512GB
SSD 1TB
schema:

diset tpcc pg_count_ware 1024
diset tpcc pg_num_vu 256

(1)tired tune postgres.conf parameters and varying the schema pg_count_ware 2100 and pg_num_vu to 512 but still observed max CPU utilization is 50% at 128vu.
(2) start PostgreSQL with numactl --interleave=all: then performance improvement is 5%v wrt all vu's but still Max CPU utilization 50-53% @128vu.
Is PostgreSQL is scalability up to 128cores (0-127 )?

sm-shaw Apr 1, 2022
Maintainer

PostgreSQL scalability is dependent on the combination of both system and software. 2M NOPM is reasonable, I have observed performance in the range of 2M-3M on a 2 socket server so additional tuning may be possible, however you are probably close to the limits of maximum performance with this combination. If you review the previous answers, you will probably find that LWLockAcquire is the bottleneck. You will not be able to push CPU utilization higher, instead adding load adds to the bottleneck. The solution will be in the PostgreSQL software to improve scalability (That is not to say that this is anything but a complex issue, as more often than not fixing one bottleneck moves it to another place) As an example, run PostgreSQL 12 or earlier on the same system, you will see performance is much lower than v13 and above. Similarly, for now, the commercial databases will scale up to maximum CPU utilization.
This is exactly what HammerDB is for! - it has enabled you to quantify PostgreSQL scalability and identify that for this database release simply adding more CPU resource does not deliver OLTP scalability beyond a particular point and can in fact lower it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL- Performance deviation on run same test and workload on multiple trails #286

{{title}}

Replies: 6 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

PostgreSQL- Performance deviation on run same test and workload on multiple trails #286

anilbommareddy Nov 19, 2021

steps are followed

umount /usr/local/pgsql/data rm -rf /usr/local/pgsql/data mkdir /usr/local/pgsql/data mkfs.xfs -f /dev/nvme0n1 mount /dev/nvme0n1 /usr/local/pgsql/data chown postgres /usr/local/pgsql/data 2. su - postgres Export Environment Variable's of of pgsqlv14-gcc

commands to run:

HammerDBv4.2 test:

Replies: 6 comments · 7 replies

sm-shaw Nov 19, 2021 Maintainer

sm-shaw Nov 20, 2021 Maintainer

arjunshetty955 Nov 22, 2021

anilbommareddy Nov 22, 2021 Author

sm-shaw Nov 22, 2021 Maintainer

anilbommareddy Nov 23, 2021 Author

sm-shaw Nov 23, 2021 Maintainer

sm-shaw Dec 13, 2021 Maintainer

arjunshetty955 Dec 16, 2021

sm-shaw Dec 16, 2021 Maintainer

arjunshetty955 Mar 31, 2022

Observed PostgreSQL not utilized 100% CPU on higher virtual users. Example:

sm-shaw Apr 1, 2022 Maintainer

anilbommareddy
Nov 19, 2021

umount /usr/local/pgsql/data
rm -rf /usr/local/pgsql/data
mkdir /usr/local/pgsql/data
mkfs.xfs -f /dev/nvme0n1
mount /dev/nvme0n1 /usr/local/pgsql/data
chown postgres /usr/local/pgsql/data
2.
su - postgres
Export Environment Variable's of of pgsqlv14-gcc

Replies: 6 comments 7 replies

sm-shaw
Nov 19, 2021
Maintainer

sm-shaw Nov 20, 2021
Maintainer

arjunshetty955
Nov 22, 2021

anilbommareddy
Nov 22, 2021
Author

sm-shaw
Nov 22, 2021
Maintainer

anilbommareddy
Nov 23, 2021
Author

sm-shaw
Nov 23, 2021
Maintainer

sm-shaw Dec 13, 2021
Maintainer

sm-shaw Dec 16, 2021
Maintainer

Observed PostgreSQL not utilized 100% CPU on higher virtual users.
Example:

sm-shaw Apr 1, 2022
Maintainer