Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.OutOfMemoryError: Java heap space I met this error #54

Open
WuDiDaBinGe opened this issue Jul 25, 2021 · 4 comments
Open

java.lang.OutOfMemoryError: Java heap space I met this error #54

WuDiDaBinGe opened this issue Jul 25, 2021 · 4 comments

Comments

@WuDiDaBinGe
Copy link

WuDiDaBinGe commented Jul 25, 2021

java.lang.OutOfMemoryError: Java heap space
	at com.carrotsearch.hppc.Internals.newArray(Internals.java:37)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.allocateBuffers(IntObjectOpenHashMap.java:364)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.expandAndPut(IntObjectOpenHashMap.java:318)
	at com.carrotsearch.hppc.IntObjectOpenHashMap.put(IntObjectOpenHashMap.java:194)
	at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestDocumentsWithWord(WindowSupportingLuceneCorpusAdapter.java:124)
	at org.aksw.palmetto.corpus.lucene.WindowSupportingLuceneCorpusAdapter.requestWordPositionsInDocuments(WindowSupportingLuceneCorpusAdapter.java:102)
	at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:54)
	at org.aksw.palmetto.prob.window.BooleanSlidingWindowFrequencyDeterminer.determineCounts(BooleanSlidingWindowFrequencyDeterminer.java:45)
	at org.aksw.palmetto.prob.AbstractProbabilitySupplier.getProbabilities(AbstractProbabilitySupplier.java:37)
	at org.aksw.palmetto.DirectConfirmationBasedCoherence.calculateCoherences(DirectConfirmationBasedCoherence.java:87)
	at org.aksw.palmetto.webapp.PalmettoApplication.calculate(PalmettoApplication.java:198)
	at org.aksw.palmetto.webapp.PalmettoApplication.npmiService(PalmettoApplication.java:111)
	at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
	at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:440)
	at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:428)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:842)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:106)

When i using multi thread to get topic cohrence i met this issue.
I ram is 16gb , intel-i9

@MichaelRoeder
Copy link
Member

In general, this behavior is expected if you try to use many threads that evaluate different topics in parallel.

The problem is that window-based coherence measures need to know the positions of the single words within documents. If you have words that occur often, the program has to handle many positions at the same time. If you do that in parallel with different topics that have different words, it is not very surprising that the program runs out of memory 😉

It is hard to give you a hint without more information.

  • How do you have parallelized the workflow (i.e., what is the task of a single thread)
  • How many threads do you use?
  • How many topics do you try to evaluate?
  • How many top words does one of your topics have?

@WuDiDaBinGe
Copy link
Author

In general, this behavior is expected if you try to use many threads that evaluate different topics in parallel.

The problem is that window-based coherence measures need to know the positions of the single words within documents. If you have words that occur often, the program has to handle many positions at the same time. If you do that in parallel with different topics that have different words, it is not very surprising that the program runs out of memory

It is hard to give you a hint without more information.

  • How do you have parallelized the workflow (i.e., what is the task of a single thread)
  • How many threads do you use?
  • How many topics do you try to evaluate?
  • How many top words does one of your topics have?

Thanks for you replying.
I use three threads to I use three threads to calculate c_a, c_p and npmi respectively. I send the same data to three threads. The topic number is 100 and each topic has top 10 words to evaluate. Topics_words is a topic-words matrix. In my case, his size is (100,10).

def calculate_coherence(word_list, ret, coherence_type):
    result = []
    for words in word_list:
        result.append(palmetto.get_coherence(words, coherence_type=coherence_type))
    ret[coherence_type] = result
    return
th_ca   = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'ca'], name='th_ca')
th_cp   = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'cp'], name='th_cp')
th_npmi = threading.Thread(target=calculate_coherence, args=[topic_words, ret, 'npmi'], name='th_npmi')

I have relieve this problem by running this code "export CATALINA_OPTS="-Xms512m -Xmx3072m -XX:-UseGCOverheadLimit" before "mvn org.apache.tomcat.maven:tomcat7-maven-plugin:2.2:run -Dmaven.tomcat.port=7777" It works useful when topic num is 75. But when topic num is 100, i often met the problrm-- "Aborted (core dumped)"

@MichaelRoeder
Copy link
Member

Your setup looks good and should work. I am just wondering why you have -Xmx3072m in the options as it limits the server to use not more than 3GB of RAM. You may want to increase it and try it again.

Another workaround would be to split up the list of documents and restart the server in-between. But that is a very bad solution 😉

We are aware of the problem that the web service sometimes has issues in budgeting its memory. Until now, it is unclear which part of the server creates the problem since the Palmetto library runs without memory issues if it is executed as a plain Java program.

@WuDiDaBinGe
Copy link
Author

Ok. In will increase "-Xmx" again. I use python-Palmetto,so i don't try Palmetto java library.Maybe i will try next time.Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants