Skip to content

Commit

Permalink
Merge pull request clearlydefined#586 from qtomlinson/qt/fix_pypi_lic…
Browse files Browse the repository at this point in the history
…ense

Derive license from info.license over classifiers in pypi registry data
  • Loading branch information
qtomlinson authored Sep 26, 2024
2 parents dc8d5a2 + 2c1f105 commit 85544f8
Show file tree
Hide file tree
Showing 13 changed files with 701 additions and 68 deletions.
18 changes: 12 additions & 6 deletions providers/fetch/pypiFetch.js
Original file line number Diff line number Diff line change
Expand Up @@ -79,18 +79,24 @@ class PyPiFetch extends AbstractFetch {
for (const classifier in classifiers) {
if (classifiers[classifier].includes('License :: OSI Approved ::')) {
const lastColon = classifiers[classifier].lastIndexOf(':')
const rawLicense = classifiers[classifier].slice(lastColon + 1)
return spdxCorrect(rawLicense)
return classifiers[classifier].slice(lastColon + 1)
}
}
return null
}

_extractDeclaredLicense(registryData) {
const licenseFromClassifiers = this._extractLicenseFromClassifiers(registryData)
if (licenseFromClassifiers) return licenseFromClassifiers
const license = get(registryData, 'info.license')
return license && spdxCorrect(license)
const licenseInMetadata = get(registryData, 'info.license')
const hasVersionInMeta = /\d+/.test(licenseInMetadata)
const licenseInClassifiers = this._extractLicenseFromClassifiers(registryData)
const hasVersionInClassifier = /\d+/.test(licenseInClassifiers)

let licenses = [licenseInMetadata, licenseInClassifiers]
if (hasVersionInClassifier && !hasVersionInMeta) licenses = [licenseInClassifiers, licenseInMetadata]
for (const rawLicense of licenses) {
const parsed = rawLicense && spdxCorrect(rawLicense)
if (parsed) return parsed
}
}

async _getPackage(spec, registryData, destination) {
Expand Down
2 changes: 1 addition & 1 deletion providers/process/pypiExtract.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class PyPiExtract extends AbstractClearlyDefinedProcessor {
}

get toolVersion() {
return '1.1.1'
return '1.2.1'
}

canHandle(request) {
Expand Down
43 changes: 43 additions & 0 deletions test/fixtures/pypi/registryData-info_bsd3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"info": {
"author": "Joel Nothman",
"author_email": "joel.nothman@gmail.com",
"bugtrack_url": null,
"classifiers": [
"Intended Audience :: Science/Research",
"License :: OSI Approved :: BSD License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.6",
"Topic :: Scientific/Engineering :: Visualization"
],
"description": "UpSetPlot documentation\n============================\n\n|version| |licence| |py-versions|\n\n|issues| |build| |docs| |coverage|\n\nThis is another Python implementation of UpSet plots by Lex et al. [Lex2014]_.\nUpSet plots are used to visualise set overlaps; like Venn diagrams but\nmore readable. Documentation is at https://upsetplot.readthedocs.io.\n\nThis ``upsetplot`` library tries to provide a simple interface backed by an\nextensible, object-oriented design.\n\nThere are many ways to represent the categorisation of data, as covered in\nour `Data Format Guide <https://upsetplot.readthedocs.io/en/stable/formats.html>`_.\n\nOur internal input format uses a `pandas.Series` containing counts\ncorresponding to subset sizes, where each subset is an intersection of named\ncategories. The index of the Series indicates which rows pertain to which\ncategories, by having multiple boolean indices, like ``example`` in the\nfollowing::\n\n >>> from upsetplot import generate_counts\n >>> example = generate_counts()\n >>> example\n cat0 cat1 cat2\n False False False 56\n True 283\n True False 1279\n True 5882\n True False False 24\n True 90\n True False 429\n True 1957\n Name: value, dtype: int64\n\nThen::\n\n >>> from upsetplot import plot\n >>> plot(example) # doctest: +SKIP\n >>> from matplotlib import pyplot\n >>> pyplot.show() # doctest: +SKIP\n\nmakes:\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_generated_001.png\n :target: ../auto_examples/plot_generated.html\n\nAnd you can save the image in various formats::\n\n >>> pyplot.savefig(\"/path/to/myplot.pdf\") # doctest: +SKIP\n >>> pyplot.savefig(\"/path/to/myplot.png\") # doctest: +SKIP\n\nThis plot shows the cardinality of every category combination seen in our data.\nThe leftmost column counts items absent from any category. The next three\ncolumns count items only in ``cat1``, ``cat2`` and ``cat3`` respectively, with\nfollowing columns showing cardinalities for items in each combination of\nexactly two named sets. The rightmost column counts items in all three sets.\n\nRotation\n........\n\nWe call the above plot style \"horizontal\" because the category intersections\nare presented from left to right. `Vertical plots\n<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html>`__\nare also supported!\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_vertical_001.png\n :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_vertical.html\n\nDistributions\n.............\n\nProviding a DataFrame rather than a Series as input allows us to expressively\n`plot the distribution of variables\n<http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html>`__\nin each subset.\n\n.. image:: http://upsetplot.readthedocs.io/en/latest/_images/sphx_glr_plot_diabetes_001.png\n :target: http://upsetplot.readthedocs.io/en/latest/auto_examples/plot_diabetes.html\n\nLoading datasets\n................\n\nWhile the dataset above is randomly generated, you can prepare your own dataset\nfor input to upsetplot. A helpful tool is `from_memberships`, which allows\nus to reconstruct the example above by indicating each data point's category\nmembership::\n\n >>> from upsetplot import from_memberships\n >>> example = from_memberships(\n ... [[],\n ... ['cat2'],\n ... ['cat1'],\n ... ['cat1', 'cat2'],\n ... ['cat0'],\n ... ['cat0', 'cat2'],\n ... ['cat0', 'cat1'],\n ... ['cat0', 'cat1', 'cat2'],\n ... ],\n ... data=[56, 283, 1279, 5882, 24, 90, 429, 1957]\n ... )\n >>> example\n cat0 cat1 cat2\n False False False 56\n True 283\n True False 1279\n True 5882\n True False False 24\n True 90\n True False 429\n True 1957\n dtype: int64\n\nSee also `from_contents`, another way to describe categorised data, and\n`from_indicators` which allows each category to be indicated by a column in\nthe data frame (or a function of the column's data such as whether it is a\nmissing value).\n\nInstallation\n------------\n\nTo install the library, you can use `pip`::\n\n $ pip install upsetplot\n\nInstallation requires:\n\n* pandas\n* matplotlib >= 2.0\n* seaborn to use `UpSet.add_catplot`\n\nIt should then be possible to::\n\n >>> import upsetplot\n\nin Python.\n\nWhy an alternative to py-upset?\n-------------------------------\n\nProbably for petty reasons. It appeared `py-upset\n<https://github.com/ImSoErgodic/py-upset>`_ was not being maintained. Its\ninput format was undocumented, inefficient and, IMO, inappropriate. It did not\nfacilitate showing plots of each subset's distribution as in Lex et al's work\nintroducing UpSet plots. Nor did it include the horizontal bar plots\nillustrated there. It did not support Python 2. I decided it would be easier to\nconstruct a cleaner version than to fix it.\n\nReferences\n----------\n\n.. [Lex2014] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister,\n *UpSet: Visualization of Intersecting Sets*,\n IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), vol. 20, no. 12, pp. 1983–1992, 2014.\n doi: `doi.org/10.1109/TVCG.2014.2346248 <https://doi.org/10.1109/TVCG.2014.2346248>`_\n\n\n.. |py-versions| image:: https://img.shields.io/pypi/pyversions/upsetplot.svg\n :alt: Python versions supported\n\n.. |version| image:: https://badge.fury.io/py/UpSetPlot.svg\n :alt: Latest version on PyPi\n :target: https://badge.fury.io/py/UpSetPlot\n\n.. |build| image:: https://github.com/jnothman/upsetplot/actions/workflows/test.yml/badge.svg\n :alt: Github Workflows CI build status\n :scale: 100%\n :target: https://github.com/jnothman/UpSetPlot/actions/workflows/test.yml\n\n.. |issues| image:: https://img.shields.io/github/issues/jnothman/UpSetPlot.svg\n :alt: Issue tracker\n :target: https://github.com/jnothman/UpSetPlot\n\n.. |coverage| image:: https://coveralls.io/repos/github/jnothman/UpSetPlot/badge.svg\n :alt: Test coverage\n :target: https://coveralls.io/github/jnothman/UpSetPlot\n\n.. |docs| image:: https://readthedocs.org/projects/upsetplot/badge/?version=latest\n :alt: Documentation Status\n :scale: 100%\n :target: https://upsetplot.readthedocs.io/en/latest/?badge=latest\n\n.. |licence| image:: https://img.shields.io/badge/Licence-BSD-blue.svg\n :target: https://opensource.org/licenses/BSD-3-Clause\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://upsetplot.readthedocs.io",
"keywords": "",
"license": "BSD-3-Clause",
"maintainer": "",
"maintainer_email": "",
"name": "UpSetPlot",
"package_url": "https://pypi.org/project/UpSetPlot/",
"platform": null,
"project_url": "https://pypi.org/project/UpSetPlot/",
"project_urls": {
"Homepage": "https://upsetplot.readthedocs.io"
},
"release_url": "https://pypi.org/project/UpSetPlot/0.9.0/",
"requires_dist": null,
"requires_python": "",
"summary": "Draw Lex et al.'s UpSet plots with Pandas and Matplotlib",
"version": "0.9.0",
"yanked": false,
"yanked_reason": null
}
}
55 changes: 55 additions & 0 deletions test/fixtures/pypi/registryData-info_chardet-5.1.0.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
{
"info": {
"author": "Mark Pilgrim",
"author_email": "mark@diveintomark.org",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Text Processing :: Linguistic"
],
"description": "Chardet: The Universal Character Encoding Detector\n--------------------------------------------------\n\n.. image:: https://img.shields.io/travis/chardet/chardet/stable.svg\n :alt: Build status\n :target: https://travis-ci.org/chardet/chardet\n\n.. image:: https://img.shields.io/coveralls/chardet/chardet/stable.svg\n :target: https://coveralls.io/r/chardet/chardet\n\n.. image:: https://img.shields.io/pypi/v/chardet.svg\n :target: https://warehouse.python.org/project/chardet/\n :alt: Latest version on PyPI\n\n.. image:: https://img.shields.io/pypi/l/chardet.svg\n :alt: License\n\n\nDetects\n - ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)\n - Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)\n - EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)\n - EUC-KR, ISO-2022-KR, Johab (Korean)\n - KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)\n - ISO-8859-5, windows-1251 (Bulgarian)\n - ISO-8859-1, windows-1252, MacRoman (Western European languages)\n - ISO-8859-7, windows-1253 (Greek)\n - ISO-8859-8, windows-1255 (Visual and Logical Hebrew)\n - TIS-620 (Thai)\n\n.. note::\n Our ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily\n disabled until we can retrain the models.\n\nRequires Python 3.7+.\n\nInstallation\n------------\n\nInstall from `PyPI <https://pypi.org/project/chardet/>`_::\n\n pip install chardet\n\nDocumentation\n-------------\n\nFor users, docs are now available at https://chardet.readthedocs.io/.\n\nCommand-line Tool\n-----------------\n\nchardet comes with a command-line script which reports on the encodings of one\nor more files::\n\n % chardetect somefile someotherfile\n somefile: windows-1252 with confidence 0.5\n someotherfile: ascii with confidence 1.0\n\nAbout\n-----\n\nThis is a continuation of Mark Pilgrim's excellent original chardet port from C, and `Ian Cordasco <https://github.com/sigmavirus24>`_'s\n`charade <https://github.com/sigmavirus24/charade>`_ Python 3-compatible fork.\n\n:maintainer: Dan Blanchard\n",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/chardet/chardet",
"keywords": "encoding,i18n,xml",
"license": "LGPL",
"maintainer": "Daniel Blanchard",
"maintainer_email": "dan.blanchard@gmail.com",
"name": "chardet",
"package_url": "https://pypi.org/project/chardet/",
"platform": null,
"project_url": "https://pypi.org/project/chardet/",
"project_urls": {
"Documentation": "https://chardet.readthedocs.io/",
"GitHub Project": "https://github.com/chardet/chardet",
"Homepage": "https://github.com/chardet/chardet",
"Issue Tracker": "https://github.com/chardet/chardet/issues"
},
"release_url": "https://pypi.org/project/chardet/5.1.0/",
"requires_dist": null,
"requires_python": ">=3.7",
"summary": "Universal encoding detector for Python 3",
"version": "5.1.0",
"yanked": false,
"yanked_reason": null
}
}
Loading

0 comments on commit 85544f8

Please sign in to comment.