-
Notifications
You must be signed in to change notification settings - Fork 173
LUBM_Cluster
This generates the LUBM U100 data set. You can generate much larger data sets in exactly the same way. The files will be written onto $NAS in the specified directory. By default this will generate gzip'd RDF/XML files. The LUBM generator is single threaded so it can take quite a while to generate a large data set.
mkdirs ${NAS}/data/U100
# This assumes that you are running as root. If running as a normal user, then set the corresponding group on this directory.
chgrp -R wheel ${NAS}/data
cd ${NAS}/data/U100
lubmGen.sh 100
Edit the main bigdata configuration file and specify the data set to bulk load in the RDFDataLoadMaster configuration section. With the federation running, start the bulk load using RDFDataLoadMaster.sh. The same approach works with any data set. The RDFDataLoadMaster is setup by default to load files from a shared volume, but the behavior is extensible and can be made to load from URLs, HDFS, etc.
nohup RDFDataLoadMaster.sh&
tail -f nohup.out
nohup is used since a large data set load can run for hours. If you have setup the ssh tunnel then you can watch the progress using the Excel worksheets.
Run the LUBM queries for the named KB instance.
nohup lubmQuery.sh U100&
tail -f nohup.out