Data Challenge #65
Replies: 20 comments 34 replies
-
Standardizing my data management practices and sticking to them once I have designed them and started working with them. So, in a way, consistency. |
Beta Was this translation helpful? Give feedback.
-
My main challenge is building a framework for data management which works for me, while also being consistent with what others use that work in the same lab. |
Beta Was this translation helpful? Give feedback.
-
During ny works, we occasionally receive sensitive data from oil companies that they consider highly confidential, prohibiting us from sharing it elsewhere. |
Beta Was this translation helpful? Give feedback.
-
My biggest challenge related to data management is learning more modern tools instead of Excel, such as Python or R. |
Beta Was this translation helpful? Give feedback.
-
My main challenge is keeping all collected data in an organized and understandable way (for me and for others). Often I like to work and try stuff quickly, but later it is hard to remember all the details of the experiments I tried so I need to keep good notes and descriptions of my work. However, organizing all the data I generate in a way that is sustainable long-term can be tricky. |
Beta Was this translation helpful? Give feedback.
-
It is very important for me to be able to trace all the inputs and outputs to the simulations I perform, keeping track in an efficient way of the scenarios analyzed. It is challenging to structure a workflow that allows me keep track of the correct inputs and outputs for each simulation and I need to be disciplined and document all the changes. This would make my life much easier down the road when I need to access results from months prior and be able to efficiently trace them to their source. |
Beta Was this translation helpful? Give feedback.
-
I will create a lot of different data, including lots of images files. Keeping the different files with different samples together will be a challenge. |
Beta Was this translation helpful? Give feedback.
-
Keeping experimental data organised and easy to retrieve, combine and analyse. We collect a lot of multi-dimensional parameter sweeps, e.g. voltage output vs some frequency or some bias level, and the same sweeps are repeated ad-hoc over different regions of interest. The data is currently saved in a data folder with a unique id (this is done by our measurement framework), but there's no easy way to retrieve the data by asking for example: I want a point cloud of all the measurables for when the setpoints are in these intervals. |
Beta Was this translation helpful? Give feedback.
-
A challenge in RDM that I struggle with concerns physical data. When I work in the lab any deviations from the protocol or unexpected observations have to be logged in my logbook, but as I am usually busy dealing with the steps while this occurs, I log them late or forget (potentially losing important information). I've currently implemented a change where I have to write in the logbook, even if I had no deviations or observations, and so far that has helped. |
Beta Was this translation helpful? Give feedback.
-
Working with machine learning models, I need training data from the task I should perform, which must be appropriately stored for easy retrieval and labelled according to the origin, this is a crucial point that I want to improve with respect to what I have done so far. |
Beta Was this translation helpful? Give feedback.
-
Perhaps the most challenging aspect for me is data management. In the field of experimental physics, we work with a variety of samples, parameters, and conditions. As a result, I have to meticulously label each file with relevant information in the header. Failing to do this can lead to a disorganized dataset that becomes a headache to navigate later on. Additionally, we're generating new data practically every second, so the demands on storage space are substantial. If the computer crashes, it's essentially a disaster for me! |
Beta Was this translation helpful? Give feedback.
-
I am always scared of losing all my data, especially hearing from some of my colleagues about their own horrible stories. I do have several ways to make back-up but there is still chance that all of them go wrong. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
The main challenge I think is to effectively organize all my created/collected data in a way that is still easy to find and recognizable in the coming few years. I will have data from a lot of different measurement techniques for different parts of the project, and if I don't organize this well in a structured manner, I'm afraid I (and people after me) might lose track of it all. |
Beta Was this translation helpful? Give feedback.
-
I have a challenge with storing data on many devices. Since I need to collect experiment data and analyze them, I store data on the Lab PC, my extension drive, my office laptop, and TUD network drives. It is difficult to manage, such as synchronization between devices, and avoiding overwrite data by mistake. |
Beta Was this translation helpful? Give feedback.
-
My biggest challenge regarding data management is the flow of data through my image processing pipeline. Different consecutive processing steps are applied to my data. The output of one step needs to match the input format of the next step. Also, some steps I perform on my laptop, some on my linux pc and some the HPC cluster or DelftBlue. The am improving the pipeline along the way, therefore the old data sometimes does not fit the format anymore. I spend a lot of time on putting the data in the correct format and moving the data around. |
Beta Was this translation helpful? Give feedback.
-
My biggest challenge regarding the data analysis. While using equations for calculation, I sometimes use a shortcut method to replace the constant value in the equation. However, after a while when I want to re-calculate the numbers, I do not remember which numbers are calculated based on the shortcuts. |
Beta Was this translation helpful? Give feedback.
-
I work with data from both myself and others. Keeping everything consistently organised (naming conventions, synchronising between devices, etc.) is then a challenge. |
Beta Was this translation helpful? Give feedback.
-
@EstherPlomp One other thing I find tricky sometimes is having a clear definition of what an experiment is. To illustrate, in my lab journal I create daily entries for the experiments I am conducting, but the end of one experiment and the start of another sometimes seems a bit arbitrary to me. Especially if I have several experiments at once, or if I continue with the experiment a few weeks later, I do not yet have a clear rule to split them up consistently. E.g. I prepare an agar plate on day 1, but use the colonies from the plate a few weeks later for an inoculation. In principle you could see it as all part of one experiment, or you could see them as different. And another example would be And another example would be doing a synthesis reaction overnight, and then doing the workup with column chromatography a week later. Or a multi step reaction where you purify the intermediate compound but use it later for the subsequent step, which can take place on the same day. etc etc. Do you have advice for this or a clear distinction I can make? |
Beta Was this translation helpful? Give feedback.
-
my biggest challenge is data storage and quick access to it: working with Machine Learning models I both need a lot of data and quick access to it during the training. While the first issue can be solved by bigger clouds, the second one is more subtle as transferring files from one cluster to another is not always fast. |
Beta Was this translation helpful? Give feedback.
-
What is your main challenge regarding your data management? Please respond below with your data challenge!
Please read your peers’ responses and reply to any you find interesting.
Beta Was this translation helpful? Give feedback.
All reactions