Remove unwanted graph_text files from main branch. (#5) #9

thibaultprouteau · 2023-05-05T14:00:11Z

No description provided.

* Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9)

* Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Edit PR template.

* Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9)

* Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Edit PR template. * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9)

* adding dictionary argument to avoid default initialization * Update issue templates * Add plotly as dev dependency. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. (#5) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix files and add templates (#8) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Fix files and add templates (#8) (#10) (#11) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Dev (#12) * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Edit PR template. * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Edit PR template. * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * [AUTO-COMMIT] Update release version to v1.2.0. Files changed: M pyproject.toml M sinr/__init__.py * Fix #50 by changing categorization and evaluation exception files. * Update README.rst (#89) * Update README.rst Update installation, example and publications * Update README.rst * Update README.rst * Delete notebooks/transfert.ipynb * Delete notebooks/preproc.ipynb * Update build-doc.yml (#92) Change checkout@master to checkout@v2 * Update pyproject.toml (#93) * Update pyproject.toml Add xgboost to dependencies * Update pyproject.toml * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> --------- Co-authored-by: anthony <anthony.perez@univ-orleans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Thibault Prouteau <thibault.prouteau.etu@univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> Co-authored-by: Guillot Simon <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* adding dictionary argument to avoid default initialization * Update issue templates * Add plotly as dev dependency. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. (#5) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix files and add templates (#8) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Fix files and add templates (#8) (#10) (#11) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Dev (#12) * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Edit PR template. * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Edit PR template. * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * [AUTO-COMMIT] Update release version to v1.2.0. Files changed: M pyproject.toml M sinr/__init__.py * Fix #50 by changing categorization and evaluation exception files. * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Update README.rst (#89) * Update README.rst Update installation, example and publications * Update README.rst * Update README.rst * Delete notebooks/transfert.ipynb * Delete notebooks/preproc.ipynb * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method * Update build-doc.yml (#92) Change checkout@master to checkout@v2 * Update pyproject.toml (#93) * Update pyproject.toml Add xgboost to dependencies * Update pyproject.toml * 99 diachronic features (#100) * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * Diachronic features * Update publications.rst --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> --------- Co-authored-by: anthony <anthony.perez@univ-orleans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Thibault Prouteau <thibault.prouteau.etu@univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> Co-authored-by: Guillot Simon <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Joss submission * Update joss * Proof read and modify some sentences. * Update paper.md * Joss paper updated with edits * Adding DOIs * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method * Merging new version of SINr to joss branch (#101) * adding dictionary argument to avoid default initialization * Update issue templates * Add plotly as dev dependency. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. (#5) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix files and add templates (#8) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Fix files and add templates (#8) (#10) (#11) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Dev (#12) * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Edit PR template. * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix PR template not appearing. * Fix PR template not appearing. * Edit PR template. * Fix files and add templates (#8) (#10) * Remove unwanted graph_text files from main branch. * Remove unwanted graph_text files from main branch. * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Create pull request templates and fix issue template. (#7) * Remove unwanted graph_text files from main branch. (#5) (#9) * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * Fix bug leading to deterministic Louvain as a result of setting networkit.setNumberOfThreads to 1. * Update documentation. * Fix miswritten string in graph_embeddings * Fix log string Fix log string in in for number of threads in graph-embeddings. * Fix preprocess for lowering words * fixing the obj_stereotypes/descriptors bug (#19) Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 17 similarity evaluation of embeddings (#21) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * remove pk file and add fetch oanc --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> * 24 workflow for tests (#25) * add workflow to run tests on push and pull request * workflow test on current branch * workflow test on current branch * add requirement.txt * remove tests from sinr folder * 20 allowing for lexical exceptions in text extraction function (#29) * similarity evaluation from txt file and dataset * rename evaluation in evaluate * similarity with MEN and WS353 datasets * similarity unit tests * sparsify and binarize SINrVectors * notebook sim + sparsify + binarize * add new path to oanc SINrVectors * Add oanc model * allowing for exceptions in preprocess filtering + modifying named entity options to choose between chunking, tagging and deleting * rename function to match * Pushing correct refactored function to take into account exception list and lowering * Deleting deprecated tests * Deleting oanc model --------- Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: simon.guillot@univ-lemans.fr <sguillot@lst.clusterlst.univ-lemans.fr> * taking into account lower for similarity dataset and adding SCWS dataset (#30) * 31 missing exceptions dataset for similarity (#32) * Adds exceptions.txt which contains the words required for sim datasets: scws, ws, men * rename exception file * Adds concept categorization dataset exception file. (#34) * Adds concept categorization dataset exception file. * Adds concept categorization dataset exception file. * Adds single file for exceptions, combination of similarity and categorization datasets. Closes #33 * Adds workflow configuration to automatically build docs html pages. (#36) * Adds workflow configuration to automatically build docs html pages. * Update build-doc.yml * 37 workflow to build and publish sinr (#38) * Add poetry-bumpversion. * Workflow for PyPi deployment. * Workflow for PyPi deployment. * Update deploy.yml * Revert "Update deploy.yml" This reverts commit 6ccfbff. * Revert "Revert "Update deploy.yml"" This reverts commit 32b14e2. * 39 similarity simlex and file name depending on time (#40) * update notebook evaluate + add fetch simlex * evaluate notebook update + file name depending on the time of creation for fetch methods * SimLex 999 665 222 111 comments * correction of SIMLEX to SimLex * 41 dimensions filtering (#42) * add remove dimensions with nnz thresholds min and max + unit test * correction dataset fetch for similarity, name file with current time (milliseconds) * correction import time * detection of thresholds to filter dimensions based on nnz * Add tabulate in requirements.txt --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 43 update pyprojecttoml for documentation (#44) * Update pyproject.toml * Update pyproject.toml * [AUTO-COMMIT] Update release version to v1.2.0. Files changed: M pyproject.toml M sinr/__init__.py * Fix #50 by changing categorization and evaluation exception files. * Update README.rst (#89) * Update README.rst Update installation, example and publications * Update README.rst * Update README.rst * Delete notebooks/transfert.ipynb * Delete notebooks/preproc.ipynb * Update build-doc.yml (#92) Change checkout@master to checkout@v2 * Update pyproject.toml (#93) * Update pyproject.toml Add xgboost to dependencies * Update pyproject.toml * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> --------- Co-authored-by: anthony <anthony.perez@univ-orleans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Thibault Prouteau <thibault.prouteau.etu@univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> Co-authored-by: Guillot Simon <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * 99 diachronic features (#100) * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * Diachronic features * Update publications.rst --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * remove disconnected nodes from the graph (#105) * remove isolated nodes from the graph * test update * 98 multiple edges - temporary fix (#107) * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * temporary fix multiple (i,j) entries issue in cooccurrence matrix --------- Co-authored-by: Nico <nicolas.dugue@univ-lemans.fr> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * 108 incompatibility numpy pandas (#109) * Dev (#94) * factory method to load embeddings at the w2v format (#53) * Revert "factory method to load embeddings at the w2v format (#53)" This reverts commit 7da0e99. * 52 loading vectors such as w2v or spine ones (#57) * factory method to load embeddings at the w2v format * Update graph_embeddings.py, small fix * Adding distRatio (#59) * moving dist ratio in sinr.text.evaluate, adding unit tests (#61) * commenting cosine_dist, pick_intruder, dist_ratio, dist_ratio_dim, intra_sim, inter_sim methods * moving distRatio from graph_embeddings to text/evaluate * tests unitaires distratio * cleaning comments * adding creation and deletion of w2v file for distRatio unit tests * fix with_value() argument (#65) Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * load_from_word2vec model s name bug fixed (#70) * missing word list bug fixed (#68) * 73 wrong community memberships update when filtering dimensions (#75) * update of community_membership when filtering dimensions * sinr filtered: removing dimensions and updating communities_sets * fixed code to pass tests * comments * 76 preprocessing multiple documents (#77) * preprocessing by documents * Tests : preprocessing by sentences and by documents * adding size indicator for spacy model and downloading spacy model in tests workflow' * downloading spacy * 78 classification (#80) * preprocess : minimal length of documents kept + tests * vectorizer + test * classification's methods + tests * xgboost interpretable dimensions * adding xgboost for test workflow * classification, fit and score test modification * get_dimension_stereotypes on removed community fixed (#82) * Filtering words using a dictionnary (#84) * Exceptions list, path to save / load SINrVectors (#86) * not removing words when in exceptions list * add path to method save * exceptions list to set + test exceptions list * path parameter method load * new exceptions list for similarity * optionnal parameter path for load and save methods * 90 notebooks (#91) * fix save, load, dim_nnz_thresholds + add obj_nnz_count * add notebook with gutenberg example * bnc model for notebook * notebook bnc * notebook frwac * remove nb evaluate * add tqdm to sparsify method --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> * Update deploy.yml * [AUTO-COMMIT] Update release version to v1.3.1. (#96) Files changed: M pyproject.toml M sinr/__init__.py Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> * doc update (#97) * doc update * links update * remove doc, quality, build * fix LICENSE link * Update conf.py --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> * Correcting conflict between pandas and numpy --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> Co-authored-by: nicolasdugue <dugue@skinner.clusteretu.univ-lemans.fr> * Removing conflict between numpy and karateclub, karateclub not being used in sinr * Solving depency problem with docutils --------- Co-authored-by: Thibault PROUTEAU <thibault.prouteau@univ-lemans.fr> Co-authored-by: Simon Guillot <47661058+SimonGuillot@users.noreply.github.com> Co-authored-by: Thibault PROUTEAU <thibault.prouteau@gmail.com> Co-authored-by: Anna B <72624798+aberanger@users.noreply.github.com> Co-authored-by: Simon Guillot <simon.guillot@univ-lemans.fr> Co-authored-by: anthony <anthony.perez@univ-orleans.fr> Co-authored-by: Thibault Prouteau <thibault.prouteau.etu@univ-lemans.fr> Co-authored-by: Guillot Simon <sguillot@lst.clusterlst.univ-lemans.fr> Co-authored-by: Beranger Anna <aberanger@lst.clusterlst.univ-lemans.fr> Co-authored-by: Anna Beranger <anbberanger@gmail.com> Co-authored-by: nicolasdugue <nicolasdugue@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: nicolasdugue <dugue@skinner.clusteretu.univ-lemans.fr>

Remove unwanted graph_text files from main branch. (#5)

bc4004c

thibaultprouteau merged commit d79c37e into dev May 5, 2023

thibaultprouteau added a commit that referenced this pull request May 5, 2023

Remove unwanted graph_text files from main branch. (#5) (#9)

6038f21

thibaultprouteau mentioned this pull request May 5, 2023

Fix files and add templates (#8) #10

Merged

thibaultprouteau mentioned this pull request May 5, 2023

Fix files and add templates #11

Merged

14 tasks

thibaultprouteau added a commit that referenced this pull request May 5, 2023

Remove unwanted graph_text files from main branch. (#5) (#9)

3f1f84d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unwanted graph_text files from main branch. (#5) #9

Remove unwanted graph_text files from main branch. (#5) #9

thibaultprouteau commented May 5, 2023

Remove unwanted graph_text files from main branch. (#5) #9

Remove unwanted graph_text files from main branch. (#5) #9

Conversation

thibaultprouteau commented May 5, 2023