Skip to content

Skills Development & Core Competences

Carlos Lizarraga-Celaya edited this page Aug 29, 2023 · 8 revisions

Data Scientists & Research Software Engineers Core Competencies


The US Bureau of Labor Statistics projects a 36% increase in Data Scientist positions nationwide over the next decade; and a 25% increase for Software Engineers, over the same period of time.

The increasing demand for professionals in the fields of data science and software engineering necessitates academic administrators to assess the opportunities and address the challenges related to producing, attracting, and retaining students, faculty, and staff who work in these areas.

Data scientists use computational and mathematical tools to create workflows for analyzing data and generating knowledge in a specific research domain (e.g., medicine, wildlife biology, political science). Research software engineers design, develop, maintain, and extend software to support, enable, and accelerate research.

Data scientists and research software engineers share core competencies such as programming, data analysis, machine learning, communication, and problem-solving skills. They use these skills to develop and deploy data science and machine learning models, analyze data, and build and maintain software applications.

Research is about discovering new knowledge and inherently involves figuring out how to do things for the first time. Therefore, there are significant differences between research software engineers and academic data scientists who are part of the institution's research enterprise, and information technology professionals and data analysts who support an institution's business functions.

The research environment often requires individuals with a broader knowledge base in software engineering or data science. These individuals must be comfortable using cutting-edge technologies and must be able to identify unexpected applications for these technologies. They should also be interested in learning about domain science to develop tools and pipelines that are fit for use and purpose in an environment with ambiguous requirements. These factors explain why many successful research software engineers and academic data scientists started as domain scientists who discovered a passion and mindset for these roles, which are focused on enabling scientific discovery.


Data scientists and research software engineers share the following similarities:

  • Both need strong programming skills in Python and R.
  • Both utilize data analysis and machine learning to uncover insights.
  • Both must convey their findings to technical and non-technical audiences.
  • Both require strong problem-solving abilities.

However, their roles differ in the following ways:

Data scientists leverage data science and machine learning techniques to tackle real-world problems. Research software engineers design and maintain software applications that support data science and machine learning. Data scientists emphasize statistics and machine learning, while research software engineers focus on computer science and software engineering.


Data Scientists

To become a data scientist, a strong foundation in statistics and mathematics is needed, as well as learning programming languages such as Python, R, and SQL, build a portfolio of data science projects, be involved in the data science community, and pursue a relevant degree or certification.

Tips for scientists interested in becoming data scientists:

  • Use your existing skills and experience. Research, problem-solving, and critical thinking skills are valuable in data science.
  • Be patient and persistent. Learning data science takes time.
  • Keep up-to-date with the latest trends and technologies in data science.

Key skills for a data scientist:

  • Programming. Data scientists should be able to code to manipulate data, create data visualizations, and build machine learning models. Popular programming languages are Python, R, and SQL.
  • Statistics and probability. Data scientists should have a strong foundation in statistics and probability to analyze data, discover patterns, and trends.
  • Data wrangling and mining. Data wrangling is the process of organizing data for analysis, while data mining extracts insights from data. Data scientists analyze data by wrangling and mining it.
  • Machine learning. Data scientists use machine learning, a type of artificial intelligence, to solve problems by building and applying models.
  • Data visualization. Data scientists communicate their findings through data visualization, which presents information in a way that is easy to understand.
  • Communication and collaboration. Data scientists must communicate their findings and collaborate with other data scientists, engineers, and business stakeholders.

In addition to these core skills, data scientists may also need knowledge of specific domains, such as finance, healthcare, or marketing. They may also require experience with specific tools and technologies, such as Version Control, Docker/Singularity, PyTorch, TensorFlow/Keras, MLFlow.

Research Software Engineer

To become a research software engineer, a bachelor's degree in computer science or a related field is needed, gain experience as a software engineer, develop research skills, get a master's degree in computer science or a related field, and network with other research software engineers.

Tips for becoming a research software engineer:

  • Be passionate and curious about research.
  • Communicate ideas effectively to both technical and non-technical audiences.
  • Collaborate well with other engineers, scientists, and mathematicians.
  • Adapt to new technologies and methodologies as the field evolves.

Core skills for a research software engineer:

  • Programming: proficient in at least one language (e.g. Python, Java, C++) with knowledge of data structures, algorithms, and software engineering principles.
  • Problem solving: identify and solve complex problems through creative and innovative thinking.
  • Communication: clear and concise communication with technical and non-technical audiences.
  • Collaboration: effective teamwork with engineers, scientists, and researchers towards common goals.
  • Self-learning: fast adaptation to new technologies and reading technical documentation or research papers.
  • Adaptability: willingness to take on new challenges and flexibility to changing requirements.

Additional skills helpful for research software engineers:

  • Domain knowledge: Having some knowledge of the specific domain they work in, such as healthcare, finance, or energy, helps to better understand the problems they are trying to solve.
  • Machine learning skills: Skilled in machine learning, research software engineers will be in high demand.
  • Cloud computing skills: Skilled in cloud computing, research software engineers will be able to build and deploy software more efficiently.

Types of Professional Development

Professional development is essential for Data Scientists (DS) and Research Software Engineers (RSE) because the nature of the fields. Both DS and RSE are extremely fast moving fields. The state of the art is continuously changing.

  • Formal Continuous Education (technical and soft skills)
  • Technical Skills Training (projects, collaborations, co-learning)
  • Professional Skills(leadership, management, interpersonal communications)
  • Teaching and Mentoring Skills (teaching small workshops)
  • Mentorship or Apprenticeship (collaborative, evidence based)
    • Traditional mentorship
    • Peer and Near-peer mentorship
    • Group mentoring

Teams

  • Project based
  • Thematic based
  • Expertise based

Project Introduction:

Introduce the need for skilled data scientists and research software engineers.

Emphasize the importance of computational and analytical skills in research domains.

Program Overview:

Describe the 12-week project-based workshop structure.

Highlight the focus on experiential learning for skill development.

Learning Goals:

  • Develop programming proficiency in Python.
  • Gain expertise in data analysis, including regression, classification, clustering, and time-series analysis [2][3].
  • Master machine learning techniques, emphasizing classification and sentiment analysis [2].
  • Cultivate communication and problem-solving skills.

Workshop Syllabus:

Weeks 1-4: Data Science Fundamentals

  • Introduction to data science and its applications [1][5].
  • Python programming basics and data manipulation.
  • Exploratory data analysis and visualization.
  • Regression and classification techniques.

Weeks 5-8: Machine Learning and Analytics

  • Advanced Python programming for analytics.
  • Time-series analysis and forecasting.
  • Unsupervised learning: clustering and association analysis.
  • Introduction to natural language processing and sentiment analysis.

Weeks 9-12: Research Software Engineering

  • Software development principles and best practices.
  • Version control with Git and collaboration on code.
  • Building research software tools and libraries.
  • Final projects integrating data science and software engineering skills.

Assessment and Evaluation:

  • Regular quizzes and coding exercises to assess comprehension.
  • Mid-term and final projects demonstrating data science and software engineering integration.
  • Peer evaluations and self-assessment for collaborative skills.

Budget and Resources:

  • Itemize costs for faculty, guest speakers, software licenses, and project materials.

References:

  • Draw from the provided context and references to demonstrate alignment with industry standards and academic research [4][6].
  • Tailor the proposal to your institution's needs, leveraging the provided context and references to ensure a robust training program for future data scientists and research software engineers.

Created: 08/28/2023; Updated: 08/28/2023

Carlos Lizárraga

CC BY-NC-SA 4.0

Carlos Lizárraga, Data Lab, Data Science Institute, University of Arizona, 2023.

Clone this wiki locally