Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Country include description out of sync? #76

Open
corneliusroemer opened this issue Nov 29, 2023 · 2 comments · May be fixed by #77
Open

Country include description out of sync? #76

corneliusroemer opened this issue Nov 29, 2023 · 2 comments · May be fixed by #77
Labels
bug Something isn't working

Comments

@corneliusroemer
Copy link
Member

Current Behavior

The website currently states for clade frequencies:

Only locations with more than 100 sequences from samples collected in the previous 150 days are included.

image

We show the following countries:

  • Australia
  • Belgium
  • Canada
  • China
  • Denmark
  • Finland
  • France
  • Germany
  • Iceland
  • Ireland
  • Italy
  • Japan
  • Netherlands
  • Singapore
  • South Korea
  • Spain
  • Switzerland
  • Sweden
  • USA
  • UK

This doesn't seem to be correct, or at least missing important context, as when I look for countries with more than 100 sequences with collection date <150 days ago on covSpectrum (https://cov-spectrum.org/explore/World/AllSamples/from%3D2023-07-02%26to%3D2023-11-22/variants/international-comparison?&) I get the following countries:

Country Total Variant Sequences First seq. found at Last seq. found at
United States 99114 2023-26 2023-47
Canada 33981 2023-26 2023-46
United Kingdom 22897 2023-26 2023-46
Japan 22771 2023-26 2023-45
South Korea 18858 2023-26 2023-45
France 17394 2023-26 2023-46
Spain 14246 2023-26 2023-46
China 13271 2023-26 2023-46
Australia 7386 2023-26 2023-46
Sweden 6758 2023-26 2023-47
Italy 5333 2023-26 2023-47
Denmark 4696 2023-27 2023-46
Singapore 4517 2023-26 2023-44
Germany 3514 2023-26 2023-46
Netherlands 3139 2023-26 2023-46
Belgium 3077 2023-26 2023-47
Brazil 2781 2023-26 2023-45
New Zealand 2668 2023-26 2023-43
Israel 2617 2023-26 2023-45
Greece 2469 2023-27 2023-40
Ireland 2343 2023-26 2023-47
Russia 1963 2023-26 2023-44
Switzerland 1916 2023-27 2023-46
Finland 1668 2023-26 2023-45
Austria 1411 2023-27 2023-46
Peru 1254 2023-26 2023-43
Luxembourg 1213 2023-27 2023-43
Portugal 1198 2023-27 2023-45
Mexico 1074 2023-26 2023-42
Croatia 858 2023-27 2023-43
Chile 787 2023-27 2023-43
Thailand 773 2023-26 2023-43
Slovenia 752 2023-26 2023-42
Iceland 676 2023-27 2023-46
Colombia 653 2023-26 2023-43
Ukraine 652 2023-27 2023-44
Taiwan 581 2023-26 2023-45
South Africa 493 2023-27 2023-41
Turkey 465 2023-28 2023-40
Poland 459 2023-28 2023-45
Norway 441 2023-26 2023-44
Romania 364 2023-27 2023-40
Argentina 359 2023-26 2023-38
Malaysia 359 2023-26 2023-43
Costa Rica 341 2023-26 2023-43
Guatemala 321 2023-27 2023-40
India 285 2023-26 2023-44
Georgia 272 2023-27 2023-40
Mauritius 270 2023-27 2023-44
Bulgaria 254 2023-27 2023-43
Dominican Republic 200 2023-27 2023-35

Expected behavior

Brazil | 2781 | 2023-26 | 2023-45
New Zealand | 2668 | 2023-26 | 2023-43
Israel | 2617 | 2023-26 | 2023-45
Greece | 2469 | 2023-27 | 2023-40
Russia | 1963 | 2023-26 | 2023-44
Austria | 1411 | 2023-27 | 2023-46
Peru | 1254 | 2023-26 | 2023-43
Luxembourg | 1213 | 2023-27 | 2023-43
Portugal | 1198 | 2023-27 | 2023-45
Mexico | 1074 | 2023-26 | 2023-42
Croatia | 858 | 2023-27 | 2023-43
Chile | 787 | 2023-27 | 2023-43
Thailand | 773 | 2023-26 | 2023-43
Slovenia | 752 | 2023-26 | 2023-42
Colombia | 653 | 2023-26 | 2023-43
Ukraine | 652 | 2023-27 | 2023-44
Taiwan | 581 | 2023-26 | 2023-45
South Africa | 493 | 2023-27 | 2023-41
Turkey | 465 | 2023-28 | 2023-40
Poland | 459 | 2023-28 | 2023-45
Norway | 441 | 2023-26 | 2023-44
Romania | 364 | 2023-27 | 2023-40
Argentina | 359 | 2023-26 | 2023-38
Malaysia | 359 | 2023-26 | 2023-43
Costa Rica | 341 | 2023-26 | 2023-43
Guatemala | 321 | 2023-27 | 2023-40
India | 285 | 2023-26 | 2023-44
Georgia | 272 | 2023-27 | 2023-40
Mauritius | 270 | 2023-27 | 2023-44
Bulgaria | 254 | 2023-27 | 2023-43
Dominican Republic | 200 | 2023-27 | 2023-35

Notably, we include Iceland with only 700 sequences but exclude Brazil with 2500

@corneliusroemer corneliusroemer added the bug Something isn't working label Nov 29, 2023
@corneliusroemer
Copy link
Member Author

I think the text is wrong, as the config shows:

        location_min_seq: 100
        location_min_seq_days: 30

So in reality, to be included, a location needs 100 sequences within 30 days of today. Would be good to relax this I think. Recent data is not the most important criterion. Some countries just don't have recent data, that doesn't mean they shouldn't be included if they have slightly more delayed data. So I think location_min_seq_days should be increased to something like 60 days at least.

In addition, the website/html should pull the description from the config file and not hard code so that doc and code are automatically synced.

corneliusroemer added a commit that referenced this issue Nov 29, 2023
It would be better to sync these values with the config file so that they don't
need to be changed in two places and get out of sync
but it seems complicated to do this with the current setup

At least this partially fixes Country include description out of sync? #76
@corneliusroemer
Copy link
Member Author

These are the force excluded countries:
Austria
Czech Republic
Lithuania
Luxembourg
Slovakia

Not sure why we'd force exclude Czechia with 10m people but not force exclude Iceland with ~100-200k.

@corneliusroemer corneliusroemer linked a pull request Nov 29, 2023 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant