Tuning for large code bases

JVM tuning

In general it is recommended to run both the indexer and web application with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/some/sensible/place/to/store/jvm/dumps in order to capture the JVM dumps in case of out-of-memory exception so that is possible to analyze the dumps with tools like jhat or http://www.eclipse.org/mat/

Indexer

The indexer.py script by default does not set Java heap size so it will use the default value. This might not be enough.

Lucene flush buffer size

Lucene 4.x sets indexer defaults:

DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB = 1945;
DEFAULT_MAX_THREAD_STATES = 8;
DEFAULT_RAM_BUFFER_SIZE_MB = 16.0;

which might grow as big as 16GB (though DEFAULT_RAM_BUFFER_SIZE_MB shouldn't really allow it, but keep it around 1-2GB)
the Lucene RAM_BUFFER_SIZE_MB can be tuned now using the parameter -m, so running a 8GB 64 bit server JDK indexer with tuned docs flushing:
```
$ indexer.py -J=-Xmx8g -J=-d64 -J=-server -m 256 -s /source -d /data ...
```

Open File and processes hard and soft limits

The initial index creation process is resource intensive and often the error java.io.IOException: error=24, Too many open files appears in the logs. To avoid this increase the ulimit value to a higher number.

It is noted that the hard and soft limit for open files of 10240 works for mid sized repositories and so the recommendation is to start with 10240.

If you get a similar error, but for threads: java.lang.OutOfMemoryError: unable to create new native thread it might be due to strict security limits and you need to increase the limits.

Web application

The heap size limit for web application should be derived from the size of data generated by the indexer and also to reflect the size of WFST structures generated by the Suggester in the web application. The former will create memory pressure especially for multi-project searches. For Suggester data, it should be sufficient to compute the sum of lengths of all *.wfst files under the data root and bump the heap limit by that value.

Tomcat

Tomcat by default also supports only small deployments. For bigger ones you might need to increase its heap which might necessitate the switch to 64-bit Java. It will most probably be the same for other containers as well. For tomcat you can easily get this done by creating $CATALINA_BASE/bin/setenv.sh:

# cat $CATALINA_BASE/bin/setenv.sh
# 64-bit Java
JAVA_OPTS="$JAVA_OPTS -d64 -server"

# OpenGrok memory boost to cover all-project searches
# (7 MB * 247 projects + 300 MB for cache should be enough)
# 64-bit Java allows for more so let's use 8GB to be on the safe side.
# We might need to allow more for concurrent all-project searches.
JAVA_OPTS="$JAVA_OPTS -Xmx8g"

export JAVA_OPTS

Tomcat/Apache tuning for HTTP headers

For tomcat you might also hit a limit for HTTP header size (we use it to send the project list when requesting search results):

For Tomcat increase(add) in conf/server.xml, for example:

  <Connector port="8888" protocol="HTTP/1.1"
             connectionTimeout="20000"
             maxHttpHeaderSize="65536"
             redirectPort="8443" />

Refer to docs of other containers for more info on how to achieve the same.

Failure to do so will result in HTTP 400 errors after first query - with the error "Error parsing HTTP request header".

The same tuning to Apache (handy in case you are running Apache in reverse proxy mode to Tomcat) can be done with the LimitRequestLine directive:

LimitRequestLine 65536
LimitRequestFieldSize 65536

Multi-project search speed tip

If multi-project search is performed frequently, it might be good to warm up file system cache after each reindex. This can be done e.g. with https://github.com/hoytech/vmtouch

OpenGrok

Provide feedback

Saved searches

Use saved searches to filter your results more quickly