Hi everyone,My AEM Production instance is experiencing very high garbage collection CPU Usage and performance issues. I think it might be because of sessions being left open.I am checking /system/console/jmx on one of the publishers experiencing high garbage collection (over 60% cpu usage over 10+ hours)I know that by default, there is ~85 SessionStatistics mbeans, and right now there is ~119 SessionStatistics mbeans. I think that very large objects on the heap are being retained due to these sessions staying open. These large objects are related to Search or QueryBuilder (based on a heap dump I checked)The sessions that are staying open and their start times are directly related to the spikes in garbage collection.org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-434675@Oct 4, 2017 2:39:22 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-435491@Oct 4, 2017 2:40:34 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-442302@Oct 4, 2017 2:49:22 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-443169@Oct 4, 2017 2:50:34 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-467825@Oct 4, 2017 3:35:40 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-468804@Oct 4, 2017 3:38:06 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-471911@Oct 4, 2017 3:45:39 PM"org.apache.jackrabbit.oakSessionStatistics"repository-reader-service@session-472840@Oct 4, 2017 3:47:59 PM"These are the sessions being started that seem to spike my GC CPU usage.There is about 3 other sessions that are associated with each of these repository-reader-services that are staying open, Two from admin@session, and one from anonymous@session.Is there anywhere i can find out more information about repository-reader-services sessions being started? What should my next steps be in solving this problem?Any advice is GREATLY appreciated!Thank you!Ian

New Participant

Solved

Investigating the cause of open sessions and bad site performance!

Forum|Forum|8 years ago
October 5, 2017
13 replies
4883 views

Hi everyone,

My AEM Production instance is experiencing very high garbage collection CPU Usage and performance issues. I think it might be because of sessions being left open.

I am checking /system/console/jmx on one of the publishers experiencing high garbage collection (over 60% cpu usage over 10+ hours)

I know that by default, there is ~85 SessionStatistics mbeans, and right now there is ~119 SessionStatistics mbeans. I think that very large objects on the heap are being retained due to these sessions staying open. These large objects are related to Search or QueryBuilder (based on a heap dump I checked)

The sessions that are staying open and their start times are directly related to the spikes in garbage collection.

org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-434675@Oct 4, 2017 2:39:22 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-435491@Oct 4, 2017 2:40:34 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-442302@Oct 4, 2017 2:49:22 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-443169@Oct 4, 2017 2:50:34 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-467825@Oct 4, 2017 3:35:40 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-468804@Oct 4, 2017 3:38:06 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-471911@Oct 4, 2017 3:45:39 PM"
org.apache.jackrabbit.oak	SessionStatistics	"repository-reader-service@session-472840@Oct 4, 2017 3:47:59 PM"

These are the sessions being started that seem to spike my GC CPU usage.

There is about 3 other sessions that are associated with each of these repository-reader-services that are staying open, Two from admin@session, and one from anonymous@session.

Is there anywhere i can find out more information about repository-reader-services sessions being started? What should my next steps be in solving this problem?

Any advice is GREATLY appreciated!

Thank you!

Ian

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by joerghoh

Hi,

The JCR queries need memory to perform the query. But when search takes a long time, and more and more queries come in, then the memory usage increase with each query.

If you gc.log doesn't show any increase in garbage collection and you don't get OutOfMemory exceptions, there is no issue with memory at all. Rather focus on improving the searches to speed them up.

Jörg

jimt82977416

New Participant

Universal GC Log analyser, GCeasy's advanced machine learning algorithm saves time and hassle from dealing with cryptic CG logs. You can quickly detect memory leaks, long GC pauses, premature object promotions plus many other performance impacting problems. Powerful to tune your application's memory and GC settings. Key performance Indicators and GC statistics section of the report, helps you to tune to perfection.

I

iannovicAuthor

New Participant

Ok I will focus on search speed then,

Thank you for your insight and time

joerghohAccepted solution

Employee

Hi,

The JCR queries need memory to perform the query. But when search takes a long time, and more and more queries come in, then the memory usage increase with each query.

If you gc.log doesn't show any increase in garbage collection and you don't get OutOfMemory exceptions, there is no issue with memory at all. Rather focus on improving the searches to speed them up.

Jörg

I

iannovicAuthor

New Participant

Hi smacdonald2008,

I am speaking to daycare about adjusting the GC parameters and trying this.

Thank you for reaching out to support!

Ian

I

iannovicAuthor

New Participant

Hi Jorg,

The search.html component has a very big index on the /content/sites/global/en/*. I imagine this needs to be broken up into smaller indices. If these JCR Queries are causing lots of load on the system is the memory usage constantly at capacity? Wouldn't we be seeing some level of trends up and down to reflect these requests?

Do you think any of these JCR Queries are persisting on the heap?

Here are some graphs of the publisher's heap

Why are the JCR Queries persisting in memory for such a long period of time? Wouldn't that be because they are unreclaimable (in an open session)?

Thank you again for all of your help

Ian

joerghoh

Employee

Your system is doing a lot of queries. Look for the string "/content/sites/global/en/search.html" and it will show that the system is executing a JCR query, which can take considerable time and causes load on the system.

$ grep /content/sites/global/en/search.html jstack.14721.190924.299822273 
"10.43.32.115 [1507244970665] GET /content/sites/global/en/search.html HTTP/1.1" #13759 prio=5 os_prio=0 tid=0x00007fe5640e5000 nid=0x53ba runnable [0x00007fe5136a7000] 
"10.43.32.115 [1507146486819] GET /content/sites/global/en/search.html HTTP/1.1" #4987 prio=5 os_prio=0 tid=0x00007fe56ccfd800 nid=0x6466 runnable [0x00007fe5242de000] 
"10.43.32.115 [1507145885754] GET /content/sites/global/en/search.html HTTP/1.1" #4882 prio=5 os_prio=0 tid=0x00007fe578510000 nid=0x536c runnable [0x00007fe51d8b8000] 
"10.43.32.115 [1507145746704] GET /content/sites/global/en/search.html HTTP/1.1" #4879 prio=5 os_prio=0 tid=0x00007fe564046800 nid=0x5369 runnable [0x00007fe51d1af000] 
"10.43.32.115 [1507146346446] GET /content/sites/global/en/search.html HTTP/1.1" #4799 prio=5 os_prio=0 tid=0x00007fe5904ae000 nid=0x4932 runnable [0x00007fe5137a4000] 
"10.43.32.115 [1507143034783] GET /content/sites/global/en/search.html HTTP/1.1" #4608 prio=5 os_prio=0 tid=0x00007fe5700f4800 nid=0x2f35 runnable [0x00007fe50f8fe000] 
"10.43.32.115 [1507142962835] GET /content/sites/global/en/search.html HTTP/1.1" #4597 prio=5 os_prio=0 tid=0x00007fe58c12f800 nid=0x2ef4 runnable [0x00007fe522089000] 
"10.43.32.115 [1507142434757] GET /content/sites/global/en/search.html HTTP/1.1" #4366 prio=5 os_prio=0 tid=0x00007fe568250000 nid=0x348 runnable [0x00007fe51c775000] 
"10.43.32.115 [1507142362789] GET /content/sites/global/en/search.html HTTP/1.1" #3542 prio=5 os_prio=0 tid=0x00007fe588024800 nid=0x675a runnable [0x00007fe51ce72000] 
$

I

iannovicAuthor

New Participant

Hi,

I posted the threaddump on my google drive: threaddump.tar.gz - Google Drive. Let me know if you can access and what you think.

Thank you again for all of your help.

joerghoh

Employee

Hi,

that's not an issue, as this a ThreadPool worker waiting for something to do. It's not impacting your performance and also not eating up CPU. If you can provide a complete threaddump, we can have a better look at it.

Jörg

I

iannovicAuthor

New Participant

Hi Jorg, thank you for your response.

My version is AEM 6.2, and we do not have AEM Forms deployed at the moment. I will check threaddumps, and I have to check and see if GC logs are being recorded (I do not think they are, I haven't seen or looked at them before).

In the thread dump analyzer, this stack trace below was blocking 8 threads:

oak-repository-executor-1

oak-repository-executor-1 - priority:5 - threadId:0x00007fe5a0ff1000 - nativeId:0x3a80 - state:RUNNABLE

stackTrace:

java.lang.Thread.State: RUNNABLE

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1855)

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2068)

at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)

at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)

at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:

- <0x0000000741632470> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

do you make anything of this?

I

iannovicAuthor

New Participant

Hi,

Thank you for the reply and check in with support, I will try this and update on this thread

Show more replies

oak-repository-executor-1

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded