Skip to content

Commit

Permalink
feat: allow admins to disable scraping package managers
Browse files Browse the repository at this point in the history
  • Loading branch information
ewan-escience committed Aug 5, 2024
1 parent cae5193 commit f1402c5
Show file tree
Hide file tree
Showing 9 changed files with 118 additions and 29 deletions.
2 changes: 2 additions & 0 deletions database/005-create-relations-for-software.sql
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,11 @@ CREATE TABLE package_manager (
download_count BIGINT,
download_count_last_error VARCHAR(500),
download_count_scraped_at TIMESTAMPTZ,
download_count_scraping_disabled_reason VARCHAR(200),
reverse_dependency_count INTEGER,
reverse_dependency_count_last_error VARCHAR(500),
reverse_dependency_count_scraped_at TIMESTAMPTZ,
reverse_dependency_count_scraping_disabled_reason VARCHAR(200),
position INTEGER,
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
Expand Down
32 changes: 26 additions & 6 deletions documentation/docs/03-rsd-instance/03-administration.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
This section describes administration options available in the RSD.

:::tip
To be able to log in as RSD administrator you first need to define a list of rsd admin users in the .env file.
See [Login as rsd administrator in the getting started section](/rsd-instance/getting-started/#log-in-as-rsd-administrator).
To be able to log in as an RSD administrator, you first need to grant an existing user admin privileges in the database.
See [Log in as rsd administrator in the getting started section](/rsd-instance/getting-started/#log-in-as-rsd-administrator).
:::

## Public pages
Expand Down Expand Up @@ -65,7 +65,7 @@ You can add, search and delete ORCIDs from the RSD. Use the bulk import button t

## RSD users

This section shows all RSD users who logged in to RSD at least once. You can search for users, assign the administrator role (rsd_admin) or delete user accounts.
This section shows all RSD users who logged in to RSD at least once. You can search for users, assign the administrator role (`rsd_admin`) or delete user accounts.

:::danger

Expand Down Expand Up @@ -93,7 +93,7 @@ Use the search box to find organisations in the ROR database. This is the prefer

### Define organisation primary maintainer

The primary maintainer of an organisation is defined by an RSD administrator. You need to provide the user id in the general settings section. The user id is unique, and it is automatically created by RSD after a user is logged in for the first time.
The primary maintainer of an organisation is defined by an RSD administrator. You need to provide the user ID in the general settings section. The user ID is unique, and it is automatically created by RSD after a user is logged in for the first time.

![animation](img/organisation-maintainers-primary-invite.gif)

Expand Down Expand Up @@ -140,7 +140,7 @@ Only RSD administrators can create communities.

### Add community

To create new community use "Add" button. Provide name, short description and logo in the modal.
To create new community use "Add" button. Provide a name, short description and logo in the modal.

### Edit community

Expand Down Expand Up @@ -205,6 +205,26 @@ This section is used to show public announcements to all users of the RSD. It is

![animation](img/admin-announcement.gif)

## Software

### Slug

When editing a software page, the **slug** of the page (called **RSD path**) can be changed by admins under the **Description** tab.

### Disable Git harvesting

If you want to disable the harvesting of a Git repo, you can do so by providing a reason under the **Links & metadata** tab. Page maintainers will be able to see if and why the harvesting is disabled under the **Background services** tab.

### Disable package manager harvesting

If you want to disable the harvesting of a package manager, you can do so by providing a reason under the **Package managers** tab. Page maintainers will be able to see if and why the harvesting is disabled under the **Background services** tab.

## Project

### Slug

When editing a project page, the **slug** of the page (called **RSD path**) can be changed by admins under the **Project details** tab.

## News

RSD administrators are able to create news items. The additional option "Add news" will appear in the "+" menu at the top right of the page header.
Expand Down Expand Up @@ -232,7 +252,7 @@ After news item is created you will be redirected to edit news item page. Here y
- Publication date is shown in the header of the news title. It can be changed at any time. Note that changing the publication title also changes public url of the news item.
- First uploaded image is used in the news card.
- Using "Copy link" button you can copy the Markdown syntax to the clipboard and the paste the link at the desired location of the body.
- Using "Delete" button will delete image and the Markdown link syntax from the news body.
- Using "Delete" button will delete the image and the Markdown link syntax from the news body.

:::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ import ListItemAvatar from '@mui/material/ListItemAvatar'
import Avatar from '@mui/material/Avatar'

import {PackageManager, packageManagerSettings} from './apiPackageManager'
import List from '@mui/material/List'
import ListItem from '@mui/material/ListItem'
import TextField from '@mui/material/TextField'
import {useSession} from '~/auth'
import {createJsonHeaders, getBaseUrl} from '~/utils/fetchHelpers'
import useSnackbar from '~/components/snackbar/useSnackbar'
import logger from '~/utils/logger'

type PackageManagerItemProps = {
pos: number,
Expand All @@ -33,15 +40,15 @@ function RsdScraperStatus({services,download_count,download_count_scraped_at,rev
if (services?.length===0) {
return <span>RSD scraper services not available</span>
}
if (services.includes('downloads')===true){
if (services.includes('downloads')){
if (download_count_scraped_at && Number.isInteger(download_count)){
html.push(<span key="downloads">Downloads: {download_count}</span>)

}else{
html.push(<span key="downloads">Downloads: no info</span>)
}
}
if (services.includes('dependents')===true){
if (services.includes('dependents')){
if (reverse_dependency_count_scraped_at && Number.isInteger(reverse_dependency_count)){
html.push(<span key="dependents">Dependents: {reverse_dependency_count}</span>)
}else{
Expand All @@ -51,11 +58,41 @@ function RsdScraperStatus({services,download_count,download_count_scraped_at,rev
return html
}


export default function PackageManagerItem({pos, item, onDelete, onEdit}: PackageManagerItemProps) {
const {showErrorMessage} = useSnackbar()
const {user, token} = useSession()
const isAdmin = user?.role === 'rsd_admin'
// get package manager info
const info = packageManagerSettings[item.package_manager ?? 'other']
const url = new URL(item.url)

async function saveReason(reason: string, field: 'download_count_scraping_disabled_reason' | 'reverse_dependency_count_scraping_disabled_reason') {
let sanitisedReason: string | null = reason.trim()

if (sanitisedReason.length === 0) {
sanitisedReason = null
}

const patchUrl = `${getBaseUrl()}/package_manager?id=eq.${item.id}`
fetch(patchUrl, {
method: 'PATCH',
headers: {
...createJsonHeaders(token)
},
body: JSON.stringify({[field]: sanitisedReason})
})
.then(async resp => {
if (!resp.ok) {
showErrorMessage('Failed to update the reason, please try again or contact us')
logger(`PackageManagerItem.tsx.saveReason: status ${resp.status}, body: ${await resp.text()}`, 'error')
}
})
.catch((e) => {
showErrorMessage('Failed to update the reason, please try again or contact us')
logger(`PackageManagerItem.tsx.saveReason: error when saving reason: ${e}`, 'error')
})
}

return (
<SortableListItem
key={item.id}
Expand Down Expand Up @@ -102,6 +139,26 @@ export default function PackageManagerItem({pos, item, onDelete, onEdit}: Packag
padding:'0rem 1rem'
}}
/>
{
isAdmin &&
<List>
<ListItem>
<TextField
label="Why scraping download count is disabled"
defaultValue={item.download_count_scraping_disabled_reason}
onBlur={e => saveReason(e.target.value, 'download_count_scraping_disabled_reason')}
/>
</ListItem>
<ListItem>
<TextField
label="Why scraping reverse dependency count is disabled"
defaultValue={item.reverse_dependency_count_scraping_disabled_reason}
onBlur={e => saveReason(e.target.value, 'reverse_dependency_count_scraping_disabled_reason')}
/>
</ListItem>
</List>
}

</SortableListItem>
)
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@
// SPDX-License-Identifier: Apache-2.0

import logger from '~/utils/logger'
import {
createJsonHeaders, extractErrorMessages,
extractReturnMessage, getBaseUrl
} from '~/utils/fetchHelpers'
import {createJsonHeaders, extractErrorMessages, extractReturnMessage, getBaseUrl} from '~/utils/fetchHelpers'

export type PackageManagerSettings={
name: string,
Expand Down Expand Up @@ -126,7 +123,7 @@ export type NewPackageManager = {
id: string|null
software: string,
url: string,
package_manager: PackageManagerTypes|null,
package_manager: PackageManagerTypes | null,
position: number
}

Expand All @@ -138,11 +135,13 @@ export type PackageManager = NewPackageManager & {
id: string,
download_count: number | null,
download_count_scraped_at: string | null,
download_count_scraping_disabled_reason: string | null,
reverse_dependency_count: number | null,
reverse_dependency_count_scraped_at: string | null
reverse_dependency_count_scraped_at: string | null,
reverse_dependency_count_scraping_disabled_reason: string | null,
}

export async function getPackageManagers({software, token}: { software: string, token?: string }) {
export async function getPackageManagers({software, token}: { software: string, token?: string }): Promise<PackageManager[]> {
try {
const query = `software=eq.${software}&order=position.asc,package_manager.asc`
const url = `${getBaseUrl()}/package_manager?${query}`
Expand All @@ -156,8 +155,7 @@ export async function getPackageManagers({software, token}: { software: string,
})

if (resp.status === 200) {
const json:PackageManager[] = await resp.json()
return json
return await resp.json()
}
logger(`getPackageManagers...${resp.status} ${resp.statusText}`,'warn')
return []
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// SPDX-FileCopyrightText: 2023 - 2024 Dusan Mijatovic (Netherlands eScience Center)
// SPDX-FileCopyrightText: 2023 - 2024 Netherlands eScience Center
// SPDX-FileCopyrightText: 2024 Ewan Cahen (Netherlands eScience Center) <e.cahen@esciencecenter.nl>
//
// SPDX-License-Identifier: Apache-2.0

Expand Down Expand Up @@ -35,6 +36,7 @@ export default function PackageManagerServices() {
last_error={service.download_count_last_error}
url={service.url}
platform={null}
scraping_disabled_reason={service.download_count_scraping_disabled_reason}
/>
: null
}
Expand All @@ -46,6 +48,7 @@ export default function PackageManagerServices() {
last_error={service.reverse_dependency_count_last_error}
url={service.url}
platform={null}
scraping_disabled_reason={service.reverse_dependency_count_scraping_disabled_reason}
/>
: null
}
Expand Down
19 changes: 13 additions & 6 deletions frontend/components/software/edit/services/ServiceInfoListItem.tsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
// SPDX-FileCopyrightText: 2023 - 2024 Dusan Mijatovic (Netherlands eScience Center)
// SPDX-FileCopyrightText: 2023 - 2024 Netherlands eScience Center
// SPDX-FileCopyrightText: 2024 Ewan Cahen (Netherlands eScience Center) <e.cahen@esciencecenter.nl>
//
// SPDX-License-Identifier: Apache-2.0

Expand All @@ -15,14 +16,15 @@ import DoDisturbOnIcon from '@mui/icons-material/DoDisturbOn'
import {CodePlatform} from '~/types/SoftwareTypes'

type ServiceInfoListItemProps={
title:string
scraped_at: string|null
last_error: string|null
url: string|null
platform: CodePlatform|null
readonly title:string
readonly scraped_at: string|null
readonly last_error: string|null
readonly url: string|null
readonly platform: CodePlatform|null
readonly scraping_disabled_reason: string|null
}

export function ServiceInfoListItem({title,scraped_at,last_error,url,platform}:ServiceInfoListItemProps){
export function ServiceInfoListItem({title,scraped_at,last_error,url,platform,scraping_disabled_reason}:ServiceInfoListItemProps){
let status:'error'|'success'|'not_active'|'scheduled'|'not_supported' = 'not_active'

// set service status
Expand All @@ -38,6 +40,7 @@ export function ServiceInfoListItem({title,scraped_at,last_error,url,platform}:S
if (status==='not_active') color='warning.main'

function getStatusIcon(){
if (scraping_disabled_reason !== null) return <DoDisturbOnIcon sx={{width:'2.5rem',height:'2.5rem'}} />
if (status === 'error') return <ErrorIcon sx={{width:'2.5rem',height:'2.5rem'}} />
if (status === 'success') return <CheckCircleIcon sx={{width:'2.5rem',height:'2.5rem'}} />
if (status === 'scheduled') return <ScheduleIcon sx={{width:'2.5rem',height:'2.5rem'}} />
Expand All @@ -46,6 +49,10 @@ export function ServiceInfoListItem({title,scraped_at,last_error,url,platform}:S
}

function getStatusMsg(){
if (scraping_disabled_reason !== null) {
return (<span className="text-error">{`This harvester was disabled by the admins for the following reason: ${scraping_disabled_reason}`}</span>)
}

if (last_error) return (
<span className="text-error">{last_error}</span>
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ export default function SoftwareRepoServices() {
platform: services ? services['code_platform'] : null
}
return (
<ServiceInfoListItem key={service.name} {...props} />
<ServiceInfoListItem key={service.name} scraping_disabled_reason={null} {...props} />
)
})}
</List>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ export type PackageManagerService = {
package_manager: PackageManagerTypes,
download_count_scraped_at: string|null,
download_count_last_error: string|null,
download_count_scraping_disabled_reason: string|null,
reverse_dependency_count_scraped_at: string|null,
reverse_dependency_count_last_error: string|null
reverse_dependency_count_last_error: string|null,
reverse_dependency_count_scraping_disabled_reason: string|null,
}

async function getSoftwareServices(id:string,token:string){
Expand Down Expand Up @@ -61,7 +63,7 @@ async function getSoftwareServices(id:string,token:string){

async function getPackageManagerServices(id:string,token:string){
try{
const select='select=software,url,package_manager,download_count_scraped_at,download_count_last_error,reverse_dependency_count_scraped_at,reverse_dependency_count_last_error'
const select='select=software,url,package_manager,download_count_scraped_at,download_count_last_error,download_count_scraping_disabled_reason,reverse_dependency_count_scraped_at,reverse_dependency_count_last_error,reverse_dependency_count_scraping_disabled_reason'
const query = `${select}&software=eq.${id}&order=position`
const url = `${getBaseUrl()}/package_manager?${query}`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ public PostgrestConnector(String backendUrl) {
}

public Collection<BasicPackageManagerData> oldestDownloadCounts(int limit) {
String filter = "or=(package_manager.eq.dockerhub)";
String filter = "download_count_scraping_disabled_reason=is.null&or=(package_manager.eq.dockerhub)";
String data = Utils.getAsAdmin(backendUrl + "?" + filter + "&select=id,url,package_manager&order=download_count_scraped_at.asc.nullsfirst&limit=" + limit
+ "&" + Utils.atLeastOneHourAgoFilter("download_count_scraped_at")
);
return parseBasicJsonData(data);
}

public Collection<BasicPackageManagerData> oldestReverseDependencyCounts(int limit) {
String filter = "or=(package_manager.eq.anaconda,package_manager.eq.cran,package_manager.eq.crates,package_manager.eq.golang,package_manager.eq.maven,package_manager.eq.npm,package_manager.eq.pypi,package_manager.eq.sonatype)";
String filter = "reverse_dependency_count_scraping_disabled_reason=is.null&or=(package_manager.eq.anaconda,package_manager.eq.cran,package_manager.eq.crates,package_manager.eq.golang,package_manager.eq.maven,package_manager.eq.npm,package_manager.eq.pypi,package_manager.eq.sonatype)";
String data = Utils.getAsAdmin(backendUrl + "?" + filter + "&select=id,url,package_manager&order=reverse_dependency_count_scraped_at.asc.nullsfirst&limit=" + limit + "&" + Utils.atLeastOneHourAgoFilter("reverse_dependency_count_scraped_at"));
return parseBasicJsonData(data);
}
Expand Down

0 comments on commit f1402c5

Please sign in to comment.