Skip to content

tracykteal/scidatacon2016-presentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SciDataCon 2016 Presentation

View slides: https://tracykteal.github.io/scidatacon2016-presentation/slides.html

License: CC-BY

Conference: SciDataCon 2016
Session: Growing a global education in Research Data Science
Location: Denver, CO
Date: September 12, 2016

Title: Addressing the bottleneck to data-driven discovery: scaling data skills training for researchers

Author: Tracy Teal

Abstract:

Summary

Our increasing capacity to generate data is changing research. This data has the potential to change the questions we can ask and the challenges we can address. However, a bottleneck between data production and scientific advances are researchers who lack sufficient training in both the use and best practices of working effectively with data. Researchers themselves are demanding this training, with over 60% of researchers in a EMBL-ABR Community Survey stating that training is the most useful thing the organization could provide. Researchers need training that is immediate, accessible, appropriate for their level and relevant to their domain. This training needs to include not only technical skills, but ways of thinking about data to provide learners with the knowledge of what is possible along with the confidence to continue self-guided learning. Additionally, we need to give researchers the opportunity to engage in deliberate practice as they learn these skills, starting with strong foundational skills and receiving feedback as they learn. How can we provide enough data training to meet this demand? We cannot do it alone, either as a lab, university or organization. Instead we can scale data training and data literacy by collaboratively developing portfolios of hands-on training resources and building and supporting communities of teaching and practice. Data and Software Carpentry have been developing and using this approach for short workshops, training over 20,000 learners since 2014 on 6 continents with over 500 volunteer instructors.

Data and computational training for researchers

Although petabytes of data are now available, most scientific disciplines are failing to translate this sea of data into scientific advances. The missing step between data collection and research progress is a lack of training for scientists in crucial skills for effectively managing and analyzing large amounts of data. However, good training resources for researchers looking to develop these skills are scarce and it is difficult to determine where to start. Training in data and computing skills is still largely absent from undergraduate and graduate programs. Instead, most researchers learn what they know about programming and data management on their own or the information is passed down within a lab, and as a result are unfamiliar with the equivalent of good lab practices for data science. The hidden costs this creates are significant: researchers spent weeks or months doing things that could be done in hours or days, do not know how trustworthy their results are, and often unable to reproduce their own work, much less that of their colleagues.

There are many challenges in providing effective training in data skills to researchers. One particular challenge is the substantial variation in the training occurring at institutions. There are many reasons for this. The curriculum is already full and there is not room to add specific courses or even lectures incorporating these topics. There may be no instructors at a given institution who are able to teach these courses, either through a lack of knowledge or because of commitments to other activities. Additionally, researchers are time-challenged. Existing commitments to research, grants and service often leave little time to develop new skills. However, there is currently no good model for community lesson development, so materials have to be developed independently at each institution or department and there is no opportunity for community engagement on what would be best taught or refinement as the lessons are taught multiple times. Ideally, training would be high quality with materials vetted by practiced instruction, consistent across universities and locations, could be deployed at multiple and disparate locations, allow researchers to interact with the materials and the instructors, and provide a relatively easy entry in to learning new topics.

A hands-on workshop model with community developed lessons is one that addresses these needs. A set of materials can be developed by the community that can share perspectives on best practices and taught broadly. This not only develops more effective lessons, but because the same lessons are being taught multiple times there are opportunities for feedback and refinement of the lessons to deliver a higher quality product. The hands-on nature gives researchers the chance to develop their computational skills in the course of the workshop, so they leave with practical examples and hands on experience. Finally, workshops can be taught by instructors from outside a given institution, so the institution does not have to rely on local knowledge or availability of instructors.

Building communities of teaching and practice

Learning does not stop when a researcher leaves the classroom; instead it is a continual process. A researcher needs to have the skills and confidence to continue this learning, but it is best developed if there is a community of practice that they can participate in, to look to for best practices as well as overcome obstacles. This means there needs to be a commitment not only to establishing training materials, but also to developing an instructor community that can effectively teach these skills and to supporting learners after workshops or courses. The train-the-trainer program that Software and Data Carpentry have developed focuses on educational pedagogy and live-coding for teaching, emphasizing hands-on, interactive learning. This program has trained over 700 instructors since 2014, who have trained over 10,000 learners on 6 continents. These instructors not only teach workshops, but implement good data and software practices in their own work, often through the support and encouragement of other instructors. They also teach and support learners at their own institutions, self-organizing workshops on Software and Data Carpentry as well as other topics. Many instructors are involved in local user groups that support continued learning. We need to continue to develop approaches that can encourage and support these activities, scaling support for communities of learners after events. Scaling data training means continuing to support learners as they continue in their research, looking to apply skills in a new context or go on to next steps. This support can often come from a community of peers that encourage this continued learning, but also the best practices that make our research more effective, reproducible and impactful.

About

Presentation at SciDataCon 2016-09-12

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages