Skip to content

Commit

Permalink
UNIVERSE-HPC#199 - remove textual references to week-based schedule
Browse files Browse the repository at this point in the history
  • Loading branch information
steve-crouch committed Feb 6, 2025
1 parent a35bcc3 commit df9d777
Show file tree
Hide file tree
Showing 9 changed files with 17 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ It’s not hard to imagine that all of these may record and store their measurem

### Execution

Quite often, especially on a large machines, once the simulation has been started it runs until the end or until a certain, significant point in a calculation (we call these checkpoints) has been reached, and only then the output is produced. We have mentioned the batch submission procedure in earlier weeks, but just to reiterate - once an user submits their executable along with required input files to the submission queue, the job gets scheduled by a job scheduler, and some time later it runs and generates its output. In other words, you do not really see what is happening in the simulation and cannot interact with it.
Quite often, especially on a large machines, once the simulation has been started it runs until the end or until a certain, significant point in a calculation (we call these checkpoints) has been reached, and only then the output is produced. With batch systems, once an user submits their executable along with required input files to the submission queue, the job gets scheduled by a job scheduler, and some time later it runs and generates its output. In other words, you do not really see what is happening in the simulation and cannot interact with it.

There are a number of reasons why supercomputing facilities use this approach but the main ones are:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ In this discussion step we would like you to have a look at [ARCHER2’s case st
- is there any social or economic benefit to it?
- what are the main reasons for using supercomputers to study this problem?
- in your opinion, what is the most surprising or interesting aspect of this case study?
- are there any similarities between the case study you picked and the three case studies we have discussed in this week? Why do you think that is?
- are there any similarities between the case study you picked and the three case studies we have discussed in this module? Why do you think that is?

---

Expand Down
8 changes: 4 additions & 4 deletions high_performance_computing/parallel_computers/01_basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ attribution:

## Computer Basics

Before we look at how supercomputers are built, it’s worth recapping what we learned last week about how a standard home computer or laptop works.
Before we look at how supercomputers are built, it’s worth recapping what we learned previously about how a standard home computer or laptop works.

Things have become slightly more complicated in the past decade, so for a short while let’s pretend we are back in 2005 (notable events from 2005, at least from a UK point of view, include Microsoft founder Bill Gates receiving an honorary knighthood and the BBC relaunching Dr Who after a gap of more than a quarter of a century).

Expand Down Expand Up @@ -78,12 +78,12 @@ Think of owning one quad-core laptop compared to two dual-core laptops - which i

## Simple Parallel Calculation

We can investigate a very simple example of how we might use multiple CPU-cores by returning to the calculation we encountered in the first week: computing the average income of the entire world’s population.
We can investigate a very simple example of how we might use multiple CPU-cores by returning to the calculation we encountered in the first module: computing the average income of the entire world’s population.

If we’re a bit less ambitious and think about several hundred people rather than several billion, we can imagine that all the individual salaries are already written on the shared whiteboard. Let’s imagine that the whiteboard is just large enough to fit 80 individual salaries. Think about the following:

- how could four workers cooperate to add up the salaries faster than a single worker?
- using the estimates of how fast a human is from last week, how long would a single worker take to add up all the salaries?
- using the estimates of how fast a human is from the previous module, how long would a single worker take to add up all the salaries?
- how long would 4 workers take for the same number of salaries?
- how long would 8 workers take (you can ignore the issue of overcrowding)?
- would you expect to get exactly the same answer as before?
Expand All @@ -99,7 +99,7 @@ We’ll revisit this problem in much more detail later but you know enough alrea

We’ve motivated the need for many CPU-cores in terms of the need to build more powerful computers in an era when the CPU-cores themselves aren’t getting any faster. Although this argument makes sense for the world’s largest supercomputers, we now have multicore laptops and mobile phones - why do we need them?

You might think the answer is obvious: surely two CPU-cores will run my computer program twice as fast as a single CPU-core? Well, it may not be apparent until we cover how to parallelise a calculation next week, but it turns out that this is not the case. It usually requires manual intervention to enable a computer program to take advantage of multiple CPU-cores. Although this is possible to do, it certainly wouldn’t have been the case back in 2005 when multicore CPUs first became commonplace.
You might think the answer is obvious: surely two CPU-cores will run my computer program twice as fast as a single CPU-core? Well, it may not be apparent until we cover how to parallelise a calculation later, but it turns out that this is not the case. It usually requires manual intervention to enable a computer program to take advantage of multiple CPU-cores. Although this is possible to do, it certainly wouldn’t have been the case back in 2005 when multicore CPUs first became commonplace.

So what is the advantage for a normal user who is not running parallel programs? We call these serial programs.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ To minimise the communication-related costs, try to make as few phone calls as p

## Case study of a real machine

To help you understand the general concepts we have introduced this week, we’ll now look at a specific supercomputer. I’m going to use the UK National Supercomputer, ARCHER2, as a concrete example. As well as being a machine I’m very familiar with, it has a relatively straightforward construction and is therefore a good illustration of supercomputer hardware in general.
To help you understand the general concepts we have introduced, we’ll now look at a specific supercomputer. I’m going to use the UK National Supercomputer, ARCHER2, as a concrete example. As well as being a machine I’m very familiar with, it has a relatively straightforward construction and is therefore a good illustration of supercomputer hardware in general.

### General

Expand Down Expand Up @@ -155,7 +155,7 @@ The ARCHER2 cabinets are kept cool by pumping water through cooling pipes. The w
5:41 - Here we only have 16 nodes, each of which has four CPU-cores, as opposed to thousands of nodes with tens of CPU-cores in them. And so Wee ARCHIE mirrors a real supercomputer such as ARCHER in almost every way, except for just the speed and the scale.
:::

Finally, Wee ARCHIE makes its appearance again! This video uses Wee ARCHIE to explain the parallel computer architecture concepts introduced in this week.
Finally, Wee ARCHIE makes its appearance again! This video uses Wee ARCHIE to explain the parallel computer architecture concepts we've introduced.

It is worth emphasising that the physical distance between the nodes does impact their communication time i.e. the further apart they are the longer it takes to send a message between them. Can you think of any reason why this behaviour may be problematic on large machines and any possible workarounds?

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ ARCHER2 has a very high-performance network with the following characteristics:

A 2 microseconds latency corresponds to around 5000 instruction cycles, so even communicating over a network as fast as ARCHER2’s is a substantial overhead. Although the bandwidth is shared between all the CPU-cores on a node, there are thousands of separate network links so, overall, ARCHER2 can in principle transfer many TBytes/second.

We will see next week that if we are careful about how we split our calculation up amongst all the CPU-cores we can accommodate these overheads to a certain extent, enabling real programs to run effectively on tens of thousands of cores. Despite this, it is still true that:
We will see in the next module that if we are careful about how we split our calculation up amongst all the CPU-cores we can accommodate these overheads to a certain extent, enabling real programs to run effectively on tens of thousands of cores. Despite this, it is still true that:

**The maximum useful size of a modern supercomputer is limited by the performance of the network.**

Expand Down
2 changes: 1 addition & 1 deletion high_performance_computing/parallel_computing/01_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ In this second example, we use the chessboard to look at the traffic congestion.

Do you understand why only one car can move at each iteration?

This is actually quite important not only in our toy traffic model but also in any kind of computer simulations. The process of transferring continuous functions, equations and models into small distinct bits to be executed in a sequence of steps is called discretisation. This process is one of the initial steps in creating computer simulations. We will talk more about this in Week 4.
This is actually quite important not only in our toy traffic model but also in any kind of computer simulations. The process of transferring continuous functions, equations and models into small distinct bits to be executed in a sequence of steps is called discretisation. This process is one of the initial steps in creating computer simulations.

---

Expand Down
2 changes: 1 addition & 1 deletion high_performance_computing/supercomputing/01_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ To do science in the real world we have to build complicated instruments for eac

However, some problems are actually too large, too distant or too dangerous to study directly: we cannot experiment on the earth’s weather to study climate change, we cannot travel thousands of light years into space to watch two galaxies collide, and we cannot dive into the centre of the sun to measure the nuclear reactions that generate its enormous heat. However, one supercomputer can run different computer programs that reproduce all of these experiments inside its own memory.

This gives the modern scientist a powerful new tool to study the real world in a virtual environment. The process of running a virtual experiment is called computer simulation, and we will spend much of week 4 looking at this process in more detail. Compared to disciplines such as chemistry, biology and physics it is a relatively new area of research which has been around for a matter of decades rather than centuries. This new area, which many view as a third pillar of science which extends the two traditional approaches of theory and experiment, is called computational science and its practitioners are computational scientists.
This gives the modern scientist a powerful new tool to study the real world in a virtual environment. The process of running a virtual experiment is called computer simulation, and compared to disciplines such as chemistry, biology and physics it is a relatively new area of research which has been around for a matter of decades rather than centuries. This new area, which many view as a third pillar of science which extends the two traditional approaches of theory and experiment, is called computational science and its practitioners are computational scientists.

### Computational Science

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@ attribution:

So what are supercomputers made of? Are the building components really so different from personal computers? And what determines how fast a supercomputer is?

In this step, we start to outline the answers to these questions. We will go into a lot more detail next week, but for now we will cover enough of the basics for you to be able to understand the characteristics of the supercomputers in the [Top500](https://www.top500.org/lists/top500/2023/11/) list (the linked list is from November 2023).
In this step, we start to outline the answers to these questions. We will go into a lot more detail in a future module but for now we will cover enough of the basics for you to be able to understand the characteristics of the supercomputers in the [Top500](https://www.top500.org/lists/top500/2023/11/) list (the linked list is from November 2023).

It may surprise you to learn that supercomputers are built using the same basic elements that you normally find in your desktop, such as processors, memory and disk. The difference is largely a matter of scale. The reason is quite simple: the cost of developing new hardware is measured in billions of euros, and the market for consumer products is vastly larger than that for supercomputing, so the most advanced technology you can find is actually what you find in general-purpose computers.

When we talk about a processor, we mean the central processing unit (CPU) in a computer which can also be considered as the computer’s brain. The CPU carries out the instructions of a computer program. The terms CPU and processor are generally used interchangeably. The slightly confusing thing is that a modern CPU actually contains several independent brains; it is actually a collection of several separate processing units, so we really need another term to avoid confusion. We will call each independent processing unit a CPU-core - some people just use the term core.

A modern domestic device (e.g. a laptop, mobile phone or iPad) will usually have a few CPU-cores (perhaps two or four), while a supercomputer has tens or hundreds of thousands of CPU-cores. As mentioned before, a supercomputer gets its power from all these CPU-cores working together at the same time - working in parallel. Conversely, the mode of operation you are familiar with from everyday computing, in which a single CPU-core is doing a single computation, is called serial computing.

Interestingly, the same approach is used for computer graphics - the graphics processor (or GPU) in a home games console will have hundreds of cores. Special-purpose processors like GPUs are now being used to increase the power of supercomputers - in this context they are called accelerators. We will talk more about GPUs in Week 2.
Interestingly, the same approach is used for computer graphics - the graphics processor (or GPU) in a home games console will have hundreds of cores. Special-purpose processors like GPUs are now being used to increase the power of supercomputers - in this context they are called accelerators.

![Image denoting the more powerful, fewer cores of a CPU versus the smaller, more numerous cores of a GPU](images/large_hero_8408f33c-87f5-4061-aec7-42ef976e83fd.webp)
*A typical CPU has a small number of powerful, general-purpose cores; a GPU has many more specialised cores. © NVIDIA*
Expand Down Expand Up @@ -147,7 +147,7 @@ It is the job of the batch scheduler to look at all the jobs in the queue and de

### Compute nodes

The compute nodes are at the core of the system and the part that we’ve concentrated on for most of this week. They contain the resources to execute user jobs - the thousands of CPU-cores operating in parallel that give a supercomputer its power. They are connected by fast interconnect, so that the communication time between CPU-cores impacts program run times as little as possible.
The compute nodes are at the core of the system and the part that we’ve concentrated on for most of this module. They contain the resources to execute user jobs - the thousands of CPU-cores operating in parallel that give a supercomputer its power. They are connected by fast interconnect, so that the communication time between CPU-cores impacts program run times as little as possible.

### Storage

Expand Down Expand Up @@ -447,7 +447,8 @@ However, you may have heard of ongoing developments that take more unconventiona

- Quantum Computers are built from hardware that is radically different from the mainstream.
- Artificial Intelligence tackles problems in a completely different way from the computer software we run in traditional computational science.
We will touch on these alternative approaches in some of the final week’s articles. In the meantime, feel free to raise any questions you have about how they relate to supercomputing by commenting in any of the discussion steps.

We will touch on these alternative approaches in some of the final foundational module. In the meantime, feel free to raise any questions you have about how they relate to supercomputing by commenting in any of the discussion steps.

---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Every two years the number of transistors that can fit onto an integrated circui

Every two years the number of CPU-cores in a processor now doubles.

In this video David uses the income calculation example to illustrate what is the impact of the first point in practice. We will return to the second point next week.
In this video David uses the income calculation example to illustrate what is the impact of the first point in practice.

---

Expand Down

0 comments on commit df9d777

Please sign in to comment.