Namespaces defined in .NET assemblies of 227,839 NuGet packages extracted in late November 2019. There is information about 681,858 .NET namespaces overall. The dataset is an archived JSON file, 145MB uncompressed.
List with JSON objects, each has exactly 4 properties:
name
- NuGet package name. It should correspond to the package URL: https://www.nuget.org/packages/$(name)description
- package description from.nuspec
.tags
- the list of tags from.nuspec
.namespaces
- namespaces where types are defined in the DLLs. The dictionary key is a namespace name and the value is the number of defined types summed through all the assemblies of the package.
namespaces
, tags
or description
may be empty.
Build the mapping from namespaces to package names.
We fetched .nupkg
and .nuspec
files using emgarten/NuGet.CatalogReader. The exact command was:
nugetmirror nupkgs https://api.nuget.org/v3/index.json -o /path/to/packages --latest-only --max-threads 16 --ignore-errors
Then we processed each .nupkg
using consumer.py
- beanstalkd-based namespace extractor. It requires pystalk. There were 5 processes launched with
python3 consumer.py -x /path/to/Examples -t /tmp/nuget -o r$(index).json
/path/to/Examples
is the path to hacked Examples
executable from 0xd4d/dnlib. See the patched Example1.cs
and Program.cs
. The tasks were ingested into beanstalkd using beanstool:
for pkg in $(ls /path/to/packages); do ./beanstool put -t default -b "/path/to/packages/$pkg"; done
The resulting JSON files were joined together and deduplicated by name
.
We could not process 492 out of 228,331 downloaded packages. According to a brief error analysis, most of the errors were due to corrupted or invalid nupkg (zip) files.
Code: MIT. Data: Open Data Commons Open Database License (ODbL).