the-future-tense

Welcome to our github page!! We are four Students from the Leipzig University.

Within the scope of the seminar "Big Data and Language Technologies 2022" we decided to examine perceptions towards the future of AI. Therefore we analyzed statements about the future corresponding to several AI related topics. To realize this we utlized the web Archive of the Webis Group (https://webis.de/), from which we extracted AI statements using the WARC-DL pipeline (https://github.com/webis-de/WARC-DL).

In the following we describe our approach and explain how our code can be executed.

Workflow

The following chart shows the workflow of our project. First AI statements are extracted from the WARC archive. Afterwards the model pipeline is excuted. Since the Topic model contains dummy topics only, the output of the Model Pipeline is utilized for the topic selection. After this step the selected topics are given to the topic assignment model and the Model Pipeline is ready to use. Subsequently we utilize the output of further executions for the analyzis and visualization.

Stage_1 WARC-DL Extraction

Stage_2_1 Models

All scripts at this step serve as the preparation for the model pipeline.

Future Model Training

Navigate to the directory: the-future-tense/stage_2_1_models/future_model/dataset
Extract dataset to train the future model: ./extract.py
Navigate to the model directory: the-future-tense/stage_2_1_models/future_model/training/future_model_ft
Run the jupyter notebook script: future_model_ft.ipynp

Sentiment Model Evaluation

Navigate to the directory: the-future-tense/stage_2_1_models/sentiment_model
Run the sentiment model test: ./test_sentiment_model.py

Topic Selection

Navigate to the following directory: the-future-tense/stage_2_1_models/topic_model
Run the following jupyter notebook : topic_eval.ipynb

Stage_2_2 Model Pipeline

The Model Pipeline can now be executed in order to create the final dataset.

Navigate to the Model Pipeline directory: the-future-tense/stage_2_2_model_pipeline
Execute the Model Pipeline: sbatch run_main.job

Stage_3 Visualization

The visualization for the analysis is generated at this stage.

Navigate to the visualization directory: the-future-tense/stage_3_visualization

Deposit your Openai API-Key in your .env as OPENAI_API_KEY
Execute the jupyter notebook visualize.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github/workflows		.github/workflows
images		images
latex		latex
stage_1_warc_dl/warc_dl_output		stage_1_warc_dl/warc_dl_output
stage_2_1_models		stage_2_1_models
stage_2_2_model_pipeline		stage_2_2_model_pipeline
stage_3_visualization		stage_3_visualization
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
excluded_websites.txt		excluded_websites.txt
future_statements_with_topics.csv		future_statements_with_topics.csv
requirements.txt		requirements.txt
whitelisted_domains.txt		whitelisted_domains.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

the-future-tense

Workflow

Stage_1 WARC-DL Extraction

Stage_2_1 Models

Future Model Training

Sentiment Model Evaluation

Topic Selection

Stage_2_2 Model Pipeline

Stage_3 Visualization

About

Releases

Packages

Contributors 4

Languages

4-t-r/the-future-tense

Folders and files

Latest commit

History

Repository files navigation

the-future-tense

Workflow

Stage_1 WARC-DL Extraction

Stage_2_1 Models

Future Model Training

Sentiment Model Evaluation

Topic Selection

Stage_2_2 Model Pipeline

Stage_3 Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages