Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix writing of decimal sample frequency #159

Merged
merged 55 commits into from
Mar 3, 2023
Merged

Conversation

skjerns
Copy link
Collaborator

@skjerns skjerns commented Jan 18, 2022

In this PR I'm fixing some mistakes that have been made before, and include some clarifications w.r.t to sampling_frequency and sample_rate

Summary:

  • sample_frequency is now the main term being used and denoted the samples in Hz (samples per second)
  • we hide/discourage the use of sample_rate. To prevent confusion, for now we pretend sample_frequency and sample_rate are the same term towards the user.
  • the use of record_duration was entirely wrong. Fixed that now.

Terms:
smp_per_record: tells us how many samples there are in each record.
record_duration: denotes the time window of one record. If all sampling frequencies are ints, this can simply be 1.
sample_frequency: is calculated by dividing smp_per_record/record_duration. If we write a file, we have to do the reverse, and find an optimal record_duration such that can actually represent the wished sample_frequency.

Problem:
Sampling frequency (samples per seconds in Hz) is not explicitly denoted in EDF. It is calculated by smp_per_record (samples, int) / record_duration (seconds, float). Previously it was just assumed that sample_rate is equivalent to sample_frequency, which is not the case, as sample_rate is taken directly from smp_per_record. However, when the record_duration is unequal to 1, this is not the same as the sampling frequency.
Similarly, if we want to save a file with decimal sampling_frequency, e.g. 0.5 Hz, we need to make the equation 0.5=smp_per_record/record_duration work such that smp_per_record is an int. That means we need to look for a record_duration that matches the smp_per_record for all channels in the EDF. I posted a simplified solution that should work in most cases (just try out record_duration for 1s to 60s in 1s steps), however, we can think of some more mathematical solution as well later.

Proposition
I think nobody cares how many samples are stored within a record_duration. These are technical details. People want to set a sampling frequency. They do not care how this is realized and stored within the file. Therefore I opt to remove any setting of record_duration and ``

closes #148 #111

This is a rather large commit, and I would appreciate if someone would take a close look at it @holgern

@codecov
Copy link

codecov bot commented Jan 18, 2022

Codecov Report

Patch coverage: 76.92% and project coverage change: +5.00 🎉

Comparison is base (b92d2f9) 76.08% compared to head (ff82c93) 81.08%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #159      +/-   ##
==========================================
+ Coverage   76.08%   81.08%   +5.00%     
==========================================
  Files           4        4              
  Lines         828      809      -19     
  Branches      176      170       -6     
==========================================
+ Hits          630      656      +26     
+ Misses        156      112      -44     
+ Partials       42       41       -1     
Flag Coverage Δ
unittests 81.08% <76.92%> (+5.00%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyedflib/highlevel.py 72.72% <0.00%> (-0.90%) ⬇️
pyedflib/edfwriter.py 86.50% <77.08%> (+4.49%) ⬆️
pyedflib/edfreader.py 80.95% <100.00%> (+13.13%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment on lines +50 to +66
nsigs = int(header['n_signals'])
label = [f.read(16).decode() for i in range(nsigs)]
transducer = [f.read(80).decode().strip() for i in range(nsigs)]
dimension = [f.read(8).decode().strip() for i in range(nsigs)]
pmin = [f.read(8).decode() for i in range(nsigs)]
pmax = [f.read(8).decode() for i in range(nsigs)]
dmin = [f.read(8).decode() for i in range(nsigs)]
dmax = [f.read(8).decode() for i in range(nsigs)]
prefilter = [f.read(80).decode().strip() for i in range(nsigs)]
n_samples = [f.read(8).decode() for i in range(nsigs)]
reserved = [f.read(32).decode() for i in range(nsigs)]
_ = zip(label, transducer, dimension, pmin, pmax, dmin, dmax, prefilter, n_samples, reserved)
values = locals().copy()
fields = ['label', 'transducer', 'dimension', 'pmin', 'pmax', 'dmin', 'dmax', 'prefilter', 'n_samples', 'reserved']
sheaders = [{field:values[field][i] for field in fields} for i in range(nsigs)]
print('\n##### Signal Headers')
print(json.dumps(sheaders, indent=2))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated, just ignore, just for debugging

from ._extensions._pyedflib import set_startdatetime, set_starttime_subsecond, set_samplefrequency, set_physical_minimum, set_label, set_physical_dimension
from ._extensions._pyedflib import set_startdatetime, set_starttime_subsecond, set_samples_per_record, set_physical_minimum, set_label, set_physical_dimension
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to accurately reflect what the function actually does.

Comment on lines 469 to 470
record_duration : integer
Sets the datarecord duration in units of seconds
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was wrong. We insert the record_duration in seconds, not 10mS (this is how it is saved later inside the edf, but not how we indicate it here)

Comment on lines -423 to -427
block_size : int
set the block size for writing. Should be divisor of signal length
in seconds. Higher values mean faster writing speed, but if it
is not a divisor of the signal duration, it will append zeros.
Can be any value between 1=><=60, -1 will auto-infer the fastest value.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use of block_size was entirely wrong here, my bad. needs to be removed.

# get annotations, in format [[timepoint, duration, description], [...]]
annotations = header.get('annotations', [])

with pyedflib.EdfWriter(edf_file, n_channels=n_channels, file_type=file_type) as f:
f.setDatarecordDuration(int(100000 * block_size))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely misunderstood the use of record_duration

Comment on lines +41 to +48

def tearDown(self):
# small hack to close handles in case of tests throwing an exception
for obj in gc.get_objects():
if isinstance(obj, (EdfWriter, EdfReader)):
obj.close()
del obj

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a test fails it will leave the file open. if the same file is accessed in a later test, that will fail again, just due to the file still being open.

added this to close all files that were not closed because a test failed.

Comment on lines -101 to 112
sample_frequencies = [1000, 800, 500, 975, 999]
sample_frequencies = [2000, 1600, 1000, 1950, 1998]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were wrong before. Now they are correct and finally are also what EDFBrowser would display.

Comment on lines -128 to +138
sample_frequencies = [1000, 800, 500, 975, 999]
sample_frequencies = [500, 400, 250, 487.5, 499.5]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were wrong before. Now they are correct and finally are also what EDFBrowser would display.

@@ -156,11 +166,14 @@ def test_fortran_write(self):


def test_read_write_decimal_sample_frequencies(self):
signals = np.random.randint(-2048, 2048, [3, 256*60])
# first test with digital signals
signals = np.random.randint(-2048, 2048, [3, 256*60+8])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+8 as the file is padded by zeros else.

Remove commented code and merged from develop
@skjerns
Copy link
Collaborator Author

skjerns commented Mar 2, 2023

I changed using @master to @v3 and @v4 in GitHub actions as suggested generally in actions/upload-artifact#41 (not specifically for Python setup in GitHub actions, but is probably generally better to use a released version instead of the master/main branch)

Also Python 3.6 is removed from testing as its no longer available.

@skjerns
Copy link
Collaborator Author

skjerns commented Mar 3, 2023

Went once through all the changes, added some more comments. I think all changes are okay and it's long overdue to merge this PR

Thanks for @gcathelain for reviewing!

I'll merge it soon.

@skjerns skjerns merged commit f094d69 into master Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"sample_frequency" is a misleading name for the number of values in a signal
5 participants