Skip to content

The BOINC test drive

David Anderson edited this page Apr 28, 2023 · 4 revisions

Suppose we've solved the supply side of the problem; BOINC has 10 million users, supplying many ExaFLOPS How do we get more scientists to use it?

The major conference and trade show for scientific computing is Supercomputing. Scientists who do HTC go there. Suppose BOINC had a booth at SC 2022 Scientists walk up, we give them a flyer What should it say? What "test drive" experience do we want them to have?

Ideally, in 10 or 15 minutes they'd be running jobs ~100 CPUs, and there's be a clear path to scaling up to millions.

The test drive can't include:

  • reading any existing BOINC doc
  • writing any XML
  • doing sysadmin
  • creating a web site
  • recruiting volunteers
  • building apps on Windows, Mac, or Android
  • developing validators or assimilators

First, we create a "BOINC app library". It includes a number of widely-used apps (like Autodock, Charm, Rosetta, etc), compiled to run on BOINC (w/ the BOINC library). For each app, the library includes app versions for various platforms, CPU features, and GPUs. Each app version has an associated plan class specification. One of the apps is the VBox wrapper.

These apps are viewed as "secure": running them on a computer doesn't pose a security risk, regardless of the input files and cmdline parameters, even if the job was created by a malevolent hacker. That means we have to be careful about what we put in the library; we need to build the versions ourselves or vet the people who build them. And the apps themselves must not have - for example - the ability to run scripts in input files.

The app library exports a list of the app versions and their hashes. The BOINC client imports this list, so it can know if an app version is from the BOINC library.

In the BOINC client, an attachment to a project can be marked "restricted", in which case the client will only run apps for that project that are from the app library.

Notes:

  1. maintaining this library could be a lot of work!
  2. the library could be useful for other purposes; e.g. we could bundle Android app versions with the BOINC Android client

Second, we create a "Demo grid": a set of computers willing to run jobs for anyone, in restricted mode. Could be volunteers, or cluster nodes somewhere, or Amazon spot instances. The BOINC client running on these nodes is attached to an account manager which lets us dynamically attach them to projects. This may as well be an enhanced version of Science United.

Third, we create a BOINC project that I'll call BOINC Central (the name doesn't matter, no one sees it). Its job is to dispatch jobs for users who don't or can't run their own BOINC server. It has all the apps in library, and all versions, with the plan classes set up. (these are the only app versions it has).

Finally, we use Science United as a "switchboard" for dynamically attaching hosts to project. It knows which hosts are part of the Demo grid. For each project, it knows whether it is

  • unvetted
  • vetted (partial or full; see below) This info is used in deciding what projects to attach each host to.

Test-drive scenarios

Unvetted/central

    goal: quickly run batches of jobs on computers you don't own
    User experience:
    - create an account on BOINC Central, Recaptcha, verify email address
    2 variants:
    1) Command line interface (Condor-like)
        install a package
        make a "submit file" that specifies a batch of jobs
            - app
            - input files
            - cmdline params
            - possible resource usage estimates
        run "boinc_submit"
        other cmdline commands to
            - wait for competion of batch
                (or email notification)
            - show pending jobs (condor_q)
            - abort jobs
            - get resource usage of completed jobs
                (for use in later submissions)
            - get output files of completed jobs
    2) Web interface: go to BOINC Central
        pick an application
        specify (through a web interface) a set of cmdline args
        and/or a range of input files
        click submit
        email notification option
        web interfaces for showing status, aborting
        download output files as zip

    How to implement
    - Use BOINC Central for dispatching jobs
        use existing job-submission and file-management RPCs
    - Use the Demo grid;
        SU attaches all Demo nodes to BOINC Central
        (in restricted mode, though apps coming from there are secure).

    There are limits on
        - how much computing you get per week
        - size of input/output files

    possible variant:
        - you can pay to get more computing

    This is similar to Open Science Grid but
        - no vetting of job submitters.
        - has the BOINC "polymorphic app" concept

This is the "test drive" experience. It gives anyone - scientist or not - sporadic access to a few hundred computers. This may be all that some scientists need.

One of the apps in the library is the VBox wrapper, so you can bring your own apps but they have to run in VMs. Use boinc2docker (and TACC's extensions) to automate converting any Linux/Intel app to a Docker image. Could also develop tools for managing a set of these images. (my earlier "tire-kicking" google doc describes this)

Notes:

  • no result validation is done; Demo grid nodes are assumed to be reliable.
  • you don't have to specify job sizes (CPU, RAM, disk). We could have a system that estimates these for you, based on past jobs

Unvetted/distributed

    Similar, but user has their own BOINC server;
        avoids storage and bandwidth bottleneck of central server
        Also lets you attach your own computers directly.
    - get a Linux machine visible on Internet
        could be Cloud node
    - install BOINC server on that machine and create a project
        could be from a package
        could be BOINC server Docker
        could be from a VM image
    - BOINC server is a black box to user
    - run commands to install apps from library
    - submit jobs through same cmdline or web interface
    - register your BOINC server with SU
        no vetting
        server is registered with SU as "unvetted project"

    Implementation
        Uses Demo grid hosts
        Science United attaches Demo grid hosts to unvetted projects in restricted mode

Vetting: 
    partially vetted: we believe that
        - your identity and affiliation are true
        - you're doing the kind of computing you claim
            (science area, location)
        This gives you access to more computing but you can only use library apps
    fully vetting: partial vetting plus
        - we believe that your apps are not malware
        - we believe that you do code signing
        This lets you use your own non-VM apps

Partially vetted
    You can use either the central or distributed model.
    Your apps run on all Science United hosts (currently about 5,000).

Fully vetted
    Use with distributed model (your own server)
    You can add your own apps and app versions.
        May as well use the current BOINC tools for this;
        requires logging in to your project server,
        code-signing, maybe writing XML plan class specs
    Your project is registered on Science United,
        and it's attached to hosts based on science area
        and computing resources (that's how SU currently works)
    Your apps run on all Science United hosts in trusted mode
    Your project is listed on the BOINC web site,
        and in the project list in the client GUI,
        so volunteers can attach to it explicitly.

Notes:
- result validation becomes an issue,
    mostly because of possible credit cheating.
    Need to figure out how to do this in a way that doesn't require
    users to write validators.

    Or get rid of credit

How hard is this to implement?

Things I can do:
    BOINC library framwork
    BOINC Central
    Changes to SU
    Changes to BOINC client

Things I'd need help with:
    Job submission interfaces

Things others would have to do
    build app versions for BOINC library
Clone this wiki locally