ethereum · hwwhww · Jun 27, 2024 · May 28, 2024 · May 28, 2024 · May 28, 2024
diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md
@@ -22,6 +22,7 @@
     - [`compute_extended_matrix`](#compute_extended_matrix)
     - [`recover_matrix`](#recover_matrix)
     - [`get_data_column_sidecars`](#get_data_column_sidecars)
+    - [`get_extended_sample_count`](#get_extended_sample_count)
 - [Custody](#custody)
   - [Custody requirement](#custody-requirement)
   - [Public, deterministic selection](#public-deterministic-selection)
@@ -30,6 +31,8 @@
 - [Column gossip](#column-gossip)
   - [Parameters](#parameters)
 - [Peer sampling](#peer-sampling)
+  - [Sample selection](#sample-selection)
+  - [Sample queries](#sample-queries)
 - [Peer scoring](#peer-scoring)
 - [Reconstruction and cross-seeding](#reconstruction-and-cross-seeding)
 - [DAS providers](#das-providers)
@@ -197,6 +200,34 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock,
     return sidecars
 ```
 
+#### `get_extended_sample_count`
+
+```python
+def get_extended_sample_count(samples_per_slot: uint64, allowed_failures: uint64) -> uint64:
+    assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2
 assert data.slot + MIN_ATTESTATION_INCLUSION_DELAY <= state.slot <= data.slot + SLOTS_PER_EPOCH 
 assert data.slot + MIN_ATTESTATION_INCLUSION_DELAY <= state.slot <= data.slot + SLOTS_PER_EPOCH 
+
+    def math_comb(n, k):
+        if not 0 <= k <= n:
+            return 0
+        r = 1
+        for i in range(min(k, n - k)):
+            r = r * (n - i) // (i + 1)
+        return r
+
+    def hypergeom_cdf(k, M, n, N):
+        return sum([math_comb(n, i) * math_comb(M - n, N - i) / math_comb(M, N)
+                    for i in range(k + 1)])
+
+    worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1
+    false_positive_threshold = hypergeom_cdf(0, NUMBER_OF_COLUMNS,
+                                             worst_case_missing, samples_per_slot)
+    for sample_count in range(samples_per_slot, NUMBER_OF_COLUMNS + 1):
+        if hypergeom_cdf(allowed_failures, NUMBER_OF_COLUMNS,
+                         worst_case_missing, sample_count) <= false_positive_threshold:
+            break
+    return sample_count
+```
+
 ## Custody
 
 ### Custody requirement
@@ -239,7 +270,29 @@ To custody a particular column, a node joins the respective gossip subnet. Verif
 
 ## Peer sampling
 
-A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. At each slot, a node makes `SAMPLES_PER_SLOT` queries for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) to request from. If a node has enough good/honest peers across all rows and columns, this has a high chance of success.
+### Sample selection
+
+At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns.
+
+Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. A node can use the `get_extended_sample_count(samples_per_slot, allowed_failures) -> sample_count` helper function to determine the sample count for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully.
+
+For reference, the table below shows the number of samples and the number of allowed missing columns assuming `NUMBER_OF_COLUMNS = 128` and `SAMPLES_PER_SLOT = 16`.
+
+| Allowed missing | 0| 1| 2| 3| 4| 5| 6| 7| 8|
+|-----------------|--|--|--|--|--|--|--|--|--|
+| Sample count    |16|20|24|27|29|32|35|37|40|
+
+### Sample queries
+
+A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries.
+
+A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column.
+
+If more than one candidate peer is found for a given column, a node SHOULD randomize its peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). If possible, it is also recommended to avoid requesting many columns from the same peer in order to avoid relying on and exposing the sample selection to a single peer.
+
+If a node already has a column because of custody, it is not required to send out queries for that column.
+
+If a node has enough good/honest peers across all columns, and the data is being made available, the above procedure has a high chance of success.
 
 ## Peer scoring