Skip to content
/ CXS Public
forked from MITHaystack/CorrelX

CXS: a high performance VLBI correlator written in Python, based on Apache Spark

License

Notifications You must be signed in to change notification settings

ajvazquez/CXS

 
 

Repository files navigation

CXS

The project CXS (originally CXS338) is a fork of MIT Haystack's CorrelX VLBI Correlator, developed by A.J. Vazquez Alvarez on a postdoctoral research position at MIT Haystack back in 2015-2017. The original project's main objectives were "scalability, flexibility and simplicity". This project aims at adding "performance" to that list.

This project (CXS) starts as a migration of CorrelX to run on Apache Spark as part of a Masters' Thesis on Big Data at UNED by this author in 2021, as a proof of concept with the following objectives:

  • Simplifying architecture and usage (simplicity).
  • Migrating from Python 2 to Python 3 (flexibility).
  • Migrating from Hadoop to Spark (performance).
  • Running a test correlation on a cloud computing service (scalability).

Versions

About the naming convention:

  • CXH227: CorrelX on Hadoop 2, Python 2.7 (CorrelX legacy).
  • CXPL38: CorrelX on Pipeline, Python 3.8.
  • CXS338: CorrelX on Spark 3, Python 3.8.
  • CXS3311: CorrelX on Spark 3, Python 3.11.

Configuration

Download Apache Spark 3.5.1 pre-built for Apache Hadoop 3:

wget https://ftp.cixug.es/apache/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
tar -xvzf spark-3.5.1-bin-hadoop3.tgz 

Create environment and install requirements:

python3.11 -m venv venv3
source venv3/bin/activate
pip install -r requirements.pkg.txt
python cxs/tools/gen_symlinks.py

Add the following lines to venv3/bin/activate (replace the path as required):

export SPARK_HOME=/home/aj/spark-3.5.1-bin-hadoop3
export PYTHONPATH=$PYTHONPATH:`pwd`/src
export PYTHONPATH=$PYTHONPATH:`pwd`/cxs

Reactivate environment:

source venv3/bin/activate

Basic Correlation

Pipeline

bash examples/run_example_vgos.sh

Hadoop

bash sh/configure_hadoop_cx.sh
bash examples/run_example_vgos_hadoop.sh

Spark