Supported Technologies:
- Hadoop 3.3.6 (with JDK 8.0.352-zulu, Maven 3.6.3)
- Zookeeper 3.9.2
- Kafka 2.12-3.7.1
- Clone the repository:
git clone https://github.com/mcddhub/mcdd-big-data-study.git --depth=1 && cd mcdd-big-data-study
- Build the Docker image:
cd docker docker build -t caobaoqi1029/big-data-study:x.x.x .
- Build the Docker image:
Note: Replace
x.x.x
with the appropriate version number.
- Start the containers:
docker compose up -d
- Connect to the remote server via VS Code and attach to a running container.
- Install the Java Dev extension in VS Code.
- Restart the extension host to apply changes.
- Initialize Hadoop environment:
docker exec -it master bash hdfs namenode -format
- Start Hadoop services:
start-all.sh
- Use the following commands to interact with Hadoop:
vim input.txt hdfs dfs -put -f ./input.txt / hdfs dfs -ls /
- Build and run the Hadoop job:
mvn clean package cd target/ hadoop jar big-data.jar
Tip: You can set the environment variable to run Java directly:
export CLASSPATH=$CLASSPATH:/tmp/ # Add this to .bashrc for persistence.
- View the output:
hdfs dfs -ls /output hdfs dfs -cat /output/part-r-00000
We welcome contributions! Feel free to submit a pull request. For more details, see the Contribution Guide.
This project is licensed under the MIT License. See the LICENSE file for details.
If you find this project helpful, consider giving it a ⭐️ on GitHub!