-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service instable #243
Comments
Hi Harald, thanks again for struggling with this matter. After reading your old comments weeks ago I just could test the stability of the server while is idle. I didn't see deterioration of heap and cpu in the JVM. Jetpad is rebooted frequently by the operators. So I can't see the issue you are pointing out, but of course it happens from time to time. I really would like to help you to tackle this issue. First I would monitor the server's JVM using jconsole. Could you do it? I suspect that there is a memory leak and heap gets exhausted. In addition, it is very important to tweak server's thread configuration. Are you using the default values? See the "threads" section of the config/reference.conf file The latest commit on master has a lot of small fixes and improvements, it should be more stable but doesn't include any specific change on performance. The tag is: https://github.com/P2Pvalue/swellrt/releases/tag/2.0.0-beta We could discuss and work together all this by chat/conference if you like. If we found the cause I will be happy to patch the server quickly.
|
Thanks for your promt response and your will to help, now you got me enthusiastic again too. Sure we can get more direct contact, just give me some details on [email protected]. I am currently checking out the suggested version, added the jmx options to the gradlew.bat "DEFAULT_JVM_OPTS". Compiling dev version right now and changing my code to support the newer version... |
Prod/Dev compile options only matters to js client library, won't impact on
server issue.
El mié., 29 ago. 2018 a las 20:15, haraldjordan78 (<[email protected]>)
escribió:
… Thanks for your promt response and your will to help, now you got me
enthusiastic again too. Sure we can get more direct contact, just give me
some details on ***@***.***
I am currently checking out the suggested version, added the jmx options
to the gradlew.bat "DEFAULT_JVM_OPTS".
Does it matter for you if i compile prod or dev?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/P2Pvalue/swellrt/issues/243#issuecomment-417052403>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEP6D5RpB8_-YzoPYzgqezr7tbaHkVCMks5uVtpAgaJpZM4WR9pl>
.
|
I have created a new tag with some improvements regarding memory consumption:
Repeating the previous test I got following results (I shortened the test time due to obvious results...) |
Hey Pablo,
thanks a lot for your efforts!
As you know, i am not a Java JVM expert, but i’ll try to Interpret your data (thinking technically):
Analysis of „before“
Memory consumption was growing About 250MB in one hour, gc frequency speeding up over time
Analysis of „after“
Memory consumption was growing About 20-30MB in 20 minutes, gc frequency speeding up over time
Please correct me if i am wrong, but from feeling i Interpret that there might be still one or multiple Memory leaks.
Should i possibly drive some test doing the same as you and after that, idle for About a day, then check what we still have in Memory compared to start? …or similar?
Cheers!
Harry
Von: Pablo Ojanguren
Gesendet: Sonntag, 2. September 2018 13:41
An: SwellRT/swellrt
Cc: haraldjordan78; Author
Betreff: Re: [SwellRT/swellrt] Service instable (#243)
I have created a new tag with some improvements regarding memory consumption:
https://github.com/SwellRT/swellrt/releases/tag/2.0.1-beta
• Fix deltas in memory collection bug
• Disable user's presence tracking defaults (user presence feature can make heavy use of transient wavelet)
• Configurable user presence event rate
• Safer rate control of caret update events (caret update events was doing heavy use of transient wavelet)
• Properly clean deltas cache (cached deltas in memory were not properly flushed after being persisted)
• Store transient data in db to reduce memory use
Repeating the previous test I got following results (I shortened the test time due to obvious results...)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sure, please run some test, one idle and one with activity would be ideal.
Let's see how heap behaves then. In any case, wave's server holds in memory snapshots of each wave (swell objects) so their size must increase over the time when users write text...
Anyway, work on this is a good exercise to optimize the server,
Cheers
El 2 sept. 2018 8:45 p. m., haraldjordan78 <[email protected]> escribió:
Hey Pablo,
thanks a lot for your efforts!
As you know, i am not a Java JVM expert, but i’ll try to Interpret your data (thinking technically):
Analysis of „before“
Memory consumption was growing About 250MB in one hour, gc frequency speeding up over time
Analysis of „after“
Memory consumption was growing About 20-30MB in 20 minutes, gc frequency speeding up over time
Please correct me if i am wrong, but from feeling i Interpret that there might be still one or multiple Memory leaks.
Should i possibly drive some test doing the same as you and after that, idle for About a day, then check what we still have in Memory compared to start? …or similar?
Cheers!
Harry
Von: Pablo Ojanguren
Gesendet: Sonntag, 2. September 2018 13:41
An: SwellRT/swellrt
Cc: haraldjordan78; Author
Betreff: Re: [SwellRT/swellrt] Service instable (#243)
I have created a new tag with some improvements regarding memory consumption:
https://github.com/SwellRT/swellrt/releases/tag/2.0.1-beta
• Fix deltas in memory collection bug
• Disable user's presence tracking defaults (user presence feature can make heavy use of transient wavelet)
• Configurable user presence event rate
• Safer rate control of caret update events (caret update events was doing heavy use of transient wavelet)
• Properly clean deltas cache (cached deltas in memory were not properly flushed after being persisted)
• Store transient data in db to reduce memory use
Repeating the previous test I got following results (I shortened the test time due to obvious results...)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi! using the dev build you configured not to send any annotations about a week ago, i recognized the latest service dysfunctions (clients disconnected, java high memory and cpu usage) with this process: 2 swell instances were configured with jmx, 2 not. This was done having 2 editors and 2 viewers online at the same document concurrently. At the time when the clients disconnect, it looks like only the java process with jmx has high memory and cpu usage. Anyway, i'll keep trying to collect evidences. |
I can't see the entire command literal string, so it is a bit misleading.
But, how many server instances are you executing? Do you verify the JVM
process is really killed every time you stop (Ctrl+C) the ./gradlew run
command? the OS could let the process in a zombie state.
If you want to avoid using gradle to run the server, you must build a
tar/zip distribution file with the command
./gradlew createDistBinTar
The generated file is placed at *distributions/* folder. Extracts the file
and use the *run-server.sh* or *run-server.bat* scripts to start the
server. In these scripts you can enable or not JMX monitoring and/or remote
debugging.
Remind to edit configuration in *config/wave.conf* based on
*config/reference.conf*.
El mié., 24 oct. 2018 a las 18:38, haraldjordan78 (<[email protected]>)
escribió:
… Hi!
i recognized the latest service dysfunctions (clients disconnected, java
high memory and cpu usage) with this process:
[image: grafik]
<https://user-images.githubusercontent.com/34220041/47446236-1c646d80-d7bb-11e8-8242-1eb118eba3cf.png>
To be honest, i am a little confused about all the java processes. In the
screenshot, you see 4 instances of swellrt server on the bottom
(-Dorg.gradle.appname=gradlew), then above them you see 2 processes that
run "Djava.security.auth..." and above them you see 2 processes
"Dcom.sun.management.jmx...".
At the time when the clients disconnect, it looks like only the java
process with jmx has high memory and cpu usage.
i am not sure why there are not 1 jmx and Djava.security... process for
each of the swell instances, the config was copied. Also i am not sure if
the actual problem was with the jmx process or something else actually in
the swell process causing the misbihaviour in jmx. Anyway, i'll keep trying
to collect evidences.
Also i'll try to disable jmx so there is no extra java instanace running.
It would be cool to even save more memory and also disable the gradle
instance.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#243 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEP6D7tFxIgxaHxv_DEqp6Gvs_UJdT9Aks5uoJd1gaJpZM4WR9pl>
.
|
Hey,
in my application, there are 2 editors and 3 viewers on the same document connected simultanously. It looks like after the 2 editors did their work for about 2 hous, inputting about 500 lines, the server application does not respond anymore a few hours later.
I am working on this now for some months, got a really performant server 8 cores, 12gb ram and ssd. With the new server, the stability improved from about 1-2 hours concurrent usage to 3 hours concurrent usage plus 6 hours running idle after the usage is over.
What i wonder of is, how do you run this for jetpad.net? Do you periodically restart the server or use a special version? ...i just checked out the latest version that was tagged with "version bump" (as there is no "release" afaik).
Cheers!
Harald
The text was updated successfully, but these errors were encountered: