Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process dir #208

Merged
merged 18 commits into from
Jan 3, 2025
Merged

Process dir #208

merged 18 commits into from
Jan 3, 2025

Conversation

ZetrextJG
Copy link
Collaborator

Implements:

  • process_file - process a single file E2E
  • process_dir - process a dir files E2E

Copy link

codecov bot commented Dec 9, 2024

Codecov Report

Attention: Patch coverage is 91.97531% with 13 lines in your changes missing coverage. Please review.

Project coverage is 88.58%. Comparing base (8e4bac4) to head (0343a71).
Report is 7 commits behind head on dev.

Files with missing lines Patch % Lines
R/process-dir.R 91.97% 11 Missing ⚠️
R/process-file.R 85.71% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #208      +/-   ##
==========================================
+ Coverage   88.19%   88.58%   +0.38%     
==========================================
  Files          14       16       +2     
  Lines        1720     1874     +154     
==========================================
+ Hits         1517     1660     +143     
- Misses        203      214      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZetrextJG ZetrextJG force-pushed the process_dir branch 2 times, most recently from fa23910 to f3e1935 Compare December 10, 2024 14:18
@ZetrextJG ZetrextJG marked this pull request as ready for review December 10, 2024 23:38
@ZetrextJG ZetrextJG requested a review from Fersoil December 10, 2024 23:40
Copy link
Collaborator

@Fersoil Fersoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes required. In general I would also add a note about process_file to the general vignette and tell the difference between different types of this process_... functions.

R/process-dir.R Outdated Show resolved Hide resolved
R/process-dir.R Outdated Show resolved Hide resolved
R/process-dir.R Outdated Show resolved Hide resolved
R/process-dir.R Show resolved Hide resolved
R/process-dir.R Show resolved Hide resolved
R/process-dir.R Show resolved Hide resolved
R/process-file.R Show resolved Hide resolved
@@ -0,0 +1,53 @@
#' @title
#' Process a file to generate normalised data and reports
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little more description please

R/process-file.R Outdated Show resolved Hide resolved
R/process-file.R Outdated Show resolved Hide resolved
R/process-dir.R Outdated Show resolved Hide resolved
R/process-dir.R Outdated
#'
#' @keywords internal
#'
is_mba_data_file <- function(filepath) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to run this on data from One drive and i get No files found in the input directory. Its because files there are named similar to IG7738~1.CSV
image

for this data layout is called: layout_S4_P5.xlsx

I think patterns set here are to restrective and will result in high ratio of no-match cases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue still persists. The format pattern that you require for the filenames is not present in our files. I guess the function is_mba_data_file should be less restrictive.

@nizwant
Copy link
Collaborator

nizwant commented Dec 21, 2024

Another interesting thing I found is that there is a case where for each plate there is different layout. For sake of convenience here is the link to the data I'm talking about: OneDrive

@nizwant
Copy link
Collaborator

nizwant commented Dec 21, 2024

I think it would be nice if list of plates returned if parametr return_plates = TRUE was sorted by date of conducting experiment - plate_datetime

This was referenced Dec 21, 2024
@ZetrextJG
Copy link
Collaborator Author

I have sorted the outputs of the process_dir function with the flag return_plates set to TRUE by date from oldest to newest.

As for the layout and input file naming convention I think we have to set some kind of standard for those files if we want to process those files accordingly without any misunderstandings

@ZetrextJG ZetrextJG requested a review from Fersoil December 30, 2024 23:21
@ZetrextJG ZetrextJG requested a review from nizwant December 30, 2024 23:21
Copy link
Collaborator

@Fersoil Fersoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem I have noticed is too restrictive validation in is_mba_data_file, the rest looks good

R/process-dir.R Outdated
#'
#' @keywords internal
#'
is_mba_data_file <- function(filepath) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue still persists. The format pattern that you require for the filenames is not present in our files. I guess the function is_mba_data_file should be less restrictive.

@nizwant
Copy link
Collaborator

nizwant commented Jan 2, 2025

The only problem I have noticed is too restrictive validation in is_mba_data_file, the rest looks good

There is also the issue of multiple layouts for one directory, i.e., layout per plate, but I would leave it as is and discuss it at the next meeting with Michael. He will tell us how they want to work with the package, and then we can introduce changes.

@ZetrextJG
Copy link
Collaborator Author

ZetrextJG commented Jan 2, 2025

There is also the issue of multiple layouts for one directory, i.e., layout per plate

The layout per plate is supported. It will search for a layout file called {plate_csv_name}_layout.(xlsx|csv). This will activate if no layout is provided globally.

The format pattern that you require for the filenames is not present in our files

I will adopt the function so that if the format is globally provided then the pattern matching will be less restrictive

@Fersoil Fersoil self-requested a review January 3, 2025 09:33
Copy link
Collaborator

@Fersoil Fersoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fersoil Fersoil merged commit 35d1b27 into dev Jan 3, 2025
9 checks passed
@Fersoil Fersoil deleted the process_dir branch January 3, 2025 09:35
nizwant pushed a commit that referenced this pull request Jan 3, 2025
* Always return the absolute path for output path

* Add process file function

* Move path checking to helpers

* Add mock data from multiplate

* Implement process dir

* Normalization to normalisation

* Document process_dir function

* Fix typo

* removed typos, added commas

* fixed typos, renamed from mbr to mba

* Apply suggestions from code review

Fix typos

Co-authored-by: Tymoteusz Kwieciński <31191783+Fersoil@users.noreply.github.com>

* Create a global src config

* Add more documnetation and run check

* Add another example for process file

* Sort returned plates by plate datetime

* Always print if no plates found

* Less restrictive is_mba_data_file and more output to the user

* Improve test coverage

---------

Co-authored-by: Tymoteusz Kwieciński <31191783+Fersoil@users.noreply.github.com>
Co-authored-by: Fersoil <Fersoil>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants