Skip to content

This repository contains illustrations to explain concepts in data (science).

Notifications You must be signed in to change notification settings

cosimameyer/illustrations

Repository files navigation

Illustrations

License: CC BY 4.0

This folder contains illustrations that I generated to explain concepts in #stats, #rstats, and/or #python.

I'm very happy if you find these resources useful. I created the illustrations to make (more or less) complex topics more understandable and you're more than welcome to use them by CC-BY license. Please attribute it by citing "Illustration by @cosima_meyer".

This work is licensed under a Creative Commons Attribution 4.0 International License.

General art

What I enjoy doing: Creativity, code, and puzzles

ALTImage showing two people holding two puzzle pieces to the sky (on one piece it says ”Creativity“, on the other ”Code“)) with a subtitle below the two persons saying ”What I enjoy doing ♥️

R and Python 💛💙

ALTImage showing a blue R with a pirate's hat and eye patch with a snake (Python) around R's leg

Hello Mastodon

ALTA blue mastodon/elephant holding up a sign with "hello" in hand-writing written on it

R

ALTImage showing a blue R with a pirate's hat and eye patch

Writing functions in R

CheatSheet

ALTImage showing how a general function in R looks like (a function has arguments, a function statement, and usually a return function). Good practices when writing functions are: Use meaningful names for your functions. It’s good to use verbs for functions. Make your function short and simple - each function should do one thing at a time Use an explicit return statement Writing assertions, warnings and stops is helpful

Debugging in R

CheatSheet

ALTImage showing a mole as a comparison for the debugging process (a mole digs in using debug(), stops when there is a browser(), and leaves the tunnel when calling undebug(). It also shows how the flow package works and that you get a visual overview of the "flow" of your package.

Writing a package in R

CheatSheet

ALTA summary reiterating the basic structure in package development (DESCRIPTION, NAMESPACE, R/, man/, and tests/) as well as helpful packages (devtools, use this, roxygen2, testthat, xpectr, cover, goodpractice, inteRgrate).

Shiny

UI and Server

ALTAn image showing a pseudo UI ui <- fluidPage( titltePanel("Your Title"), sidebarLayout(sidebarPanel(... Some content...), mainPanel(...place-your-plot...))

ALTAn image showing a pseudo server server <- function(input, output{output$first_plot <- renderPlot({...create-your-plot....})}

Visualization of reactivity (based on the excellent description by Garett Grolemund)

ALTGIF showing a pigeon carrier flying to the server to update a visualization when it is relevant

CheatSheet

ALTA visual summary of ShinyApps

left side: User interface (body) that defines the outer appearance of the app An image showing a pseudo UI ui <- fluidPage( titltePanel("Your Title"), sidebarLayout(sidebarPanel(... Some content...), mainPanel(...place-your-plot...))

right side: server (brain) where all the calculation happens An image showing a pseudo server server <- function(input, output{output$first_plot <- renderPlot({...create-your-plot....})}

Git(Hub)

Workflow

ALTImage showing a git workflow from the working directory to the remote repo. Working directory → Staging area → local repo → remote repo and also common git commands (git add code.R, git commit -m "Update", git push, git pull, git checkout, git merge)

Branches

ALTGIF showing how a feature branch evolves from a main branch and is then guided back (merged) into the main branch

GitHub and RStudio

ALTVisualization showing a typical workflow when using GitHub in RStudio with a new project: 1) Create a new repository on GitHub, 2) Open . Rproj in RStudio, 3) Connect with GitHub - and now it’s time to pull, commit and push :)

GitHub and VS Code

ALTVisualization showing a typical workflow when using GitHub in VS Code with a new project: 1) Create a new repository on GitHub, 2) Clone repository in your VS Code, 3) Connect with GitHub - and now it’s time to pull, commit and push :)

CheatSheet

ALTVisual summary of how to GitHub in and with RStudio left side: Image showing a git workflow from the working directory to the remote repo. Working directory → Staging area → local repo → remote repo and also common git commands (git add code.R, git commit -m "Update", git push, git pull, git checkout, git merge) right side: Visualization showing a typical workflow when using GitHub in RStudio with a new project: 1) Create a new repository on GitHub, 2) Open .Rproj in RStudio, 3) Connect with GitHub - and now it's time to pull, commit and push :)

NLP

Terms and concepts

ALTImage showing a visual overview of terms and concepts explaining a corpus, tokens, tokenization, DFM, stemming, and lemmatization. The verbalized version is in the text below: Corpus: When you have your text data ready, you have your corpus. It’s a collection of documents. Tokens: Define each word in a text (but it could also be a sentence, paragraph, or character). Tokenization: When you hear the word tokenization, it means that you are splitting up the sentences into single words (tokens) and turning them into a bag of words. You can take this quite literally - a bag of words does not really take the order of the words into account. There are ways to account for the order using n-grams (so for instance a bigram would leave the sentence "Rory lives in a world of books" as "Rory lives", "lives in", "in a", "a world", "world of", "of books") but it’s limited. Document-feature matrix (DFM): To generate the DFM you first split the text into its single terms (tokens), then count how frequently each token occurs in each document. Stemming: With stemming, you are getting the stem of the word. Lemmatization: With lemmatization, it’s slightly different. Instead of "stud" (which is the stem of the study terms), you end up with a meaningful stem - "study"

BERT

ALTImage showing two different workflows (Bag of words and BERT). The main difference is that with BERT you build upon a pre-trained model and tokenizer while with BOW you often have to train a model from scratch.

ALTImage showing three important components to know when training a BERT model. First, with BERT, you identify the order of the input. You give the model information about different embedding layers (the tokens (BERT uses special tokens ([CLS] and [SEP]) to make sense of the sentence), the positional embedding (where each token is placed in the sentence), and the segment embedding (which gives you more info about the sentences to which the tokens belong). And then there is the training: The first half of the training involves masking the words (Mask ML). During the training period, you mask one word at a time and the model learns, which word usually follows. During the second half, you train the model to predict the next sentence. This way, the model learns which sentences usually follow each other.

These visualizations are also available in blue:

Explainable AI/ML

ALT The visualization of six different model agnostic approaches to explain machine learning models post-hoc such as
  • Feature importance: Feature importance is based on the idea of permutation where you shuffle the values of a feature. If this change increases the model error, the feature is perceived to be important.
  • Shapley value: Shapley values are based on a game theoretical approach that calculates the average of all marginal contributions to all possible outcomes.
  • LIME: LIME plots tell you locally around a data point what the most important feature is. While they may look similar to SHAP, they are only an approximation (calculated on a small set of features and do not provide a guarantee of accuracy and consistency.
  • ICE: ICE plots show the individual conditional expectation where all other features are kept the same and the effects for one feature are calculated.
  • Partial dependence: Partial dependency plots visualize the average output of the model for each target feature value for the entire dataset.
  • Breakdown plot: Breakdown plots show the contribution of every variable to the final prediction.

ALT The visualization shows the logic of integrated gradients. You start with your baseline which does not have any effect on the model classification and continue stepwise using linear interpolation to get to the original input. On the way, you calculate the model's prediction, compare it to the baseline, and derive the integrated gradients for each input feature by summing up the results of these calculations.

Amazing Women

The following illustrations are part of a larger project ("Amazing Women in Tech") in which I aim to make women more visible in the world of programming, statistics, and STEM in general. The illustrations are shared along with a short portrait on social media such as LinkedIn and Mastodon.

Ada Lovelace

✨ was a mathematician and the first computer programmer
✨ worked on early versions of a calculator (with Charles Babbage)
✨ imagined the machine following patterns and not only calculating numbers but also forming letters - the basic version of computer programming was described in the 1840s!

🔗 and much more
Alison Presmanes Hill

✨ is a Director of Product at Anaconda, Inc., having previously worked at Voltron Data, Posit PBC, and IBM
✨ holds a PhD in Psychology from Vanderbilt University
✨ is an avid #OSS contributor (from website themes to #rstats packages to #data 🐧)
✨ is a strong advocate for promoting gender diversity in the #rstats community

🔗 and much more
Allison Horst

✨ is a Developing Marketing Manager at Observable
✨ holds a PhD in Environmental Science and was 10+ years a teaching faculty member at UC Santa Barbara
✨ is best known for her beautiful #Rtistry making data science and stats easily accessible
✨ is an avid contributor to #OSS - for instance the {palmerpenguins} package 🐧

🔗 and much more
Anita Borg

✨ was a computer scientist who started Systers and AnitaB.org to support women in tech
✨ founded the Grace Hopper Celebration of Women in Computing
✨ researched operating systems and memory at Digital Equipment Corporation
✨ was honored in the Women in Technology International Hall of Fame

🔗 and much more
Annie Easley

✨ was a pioneering computer scientist, mathematician, and rocket scientist at NASA, contributing to the development of software for the Centaur rocket program
✨ broke barriers as one of the first African-American women in her field, inspiring countless others
✨ used her expertise to develop energy conversion systems, including alternative power technologies
✨ advocated for diversity in STEM and supported others through educational outreach

🔗 and much more
Catherine Nelson

✨ is a freelance data scientist and writer who worked previously as a Principal Data Scientist at SAP Concur
✨ holds a PhD in geophysics from Durham University and a Master of Science in Earth Sciences from University of Oxford
✨ authored several hands-on books for data scientists to improve their daily workflows (Software Engineering for Data Scientists and Building Machine Learning Pipelines)

🔗 and much more
Crystal Ramjattan

✨ is a seasoned data leader with experience in leading data-driven transformations for startups and F500 companies
✨ has a proven track record of architecting and implementing data strategies
✨ founded a causal AI platform that helped companies detect and measure critical business changes, now supporting other startups
✨ has mentored over 30 women in technology, empowering them to navigate their careers and achieve their dreams in data 💫

🔗 and much more
Daliana Liu

✨ is the founder and coach of Data Science & ML Career Accelerator and has previously worked as a data scientist at Amazon
✨ hosts the podcast 'The Data Scientist Show' 🎙️ (https://www.youtube.com/c/thedatascientistshow)
✨ supports others in the field of #datascience with career insights, interviews, and her own personal journey

🔗 and much more
Daniela Rus

✨ is a roboticist and computer scientist, director and professor at the MIT
✨ pioneers in robotics and focuses her work on how a new generation of smart machines can help humans
✨ has co-authored the book 'The Heart and the Chip: Our Bright Future With Robots' (with Gregory Mone), which gives you a better understanding of how humans and robots can coexist

🔗 and much more
Daniela Witten

✨ is a biostatistician and Professor at the University of Washington
✨ focuses her research on high-dimensional statistical learning
✨ co-authored the seminal book 'An Introduction to Statistical Learning' (both with #rstats and #python)
✨ has won multiple awards for her work

🔗 and much more
Dorothy Vaughan

✨ was a mathematician and human computer at NASA
✨ was head of the National Advisory Committee for Aeronautics (NACA)
✨ many of you may also know her story from the book/movie 'Hidden Figures' which tells the story of her life
✨ was a role model as NASA’s first African-American manager

🔗 and much more
Chelsea Finn

✨ is an Assistant Professor at Stanford University and was part of Google Brain
✨ pioneers in the field of deep robotic learning
✨ has won multiple awards for her work

🔗 and much more
Ellie King

✨ is co-founder of Equal IT where they support organizations to recruit inclusive teams globally
✨ hosts the #EqualInspired podcast where she invites speakers to share empowering stories, career journeys, lessons learnt and advice to inspire others - have a listen here: https://linktr.ee/equalinspired 🎙️
✨ is a frequent speaker at conferences and meetups, amplifying the voices of women and non-dominant groups

🔗 and much more
Frauke Kreuter

✨ is a sociologist and statistician and holds Professorships at the University of Munich and the University of Maryland
✨ co-authored the seminal book 'Data Analysis Using Stata'
✨ also co-hosts the German podcast #digdeep that discusses developments in digitalization
✨ has won multiple awards for her work

🔗 and much more
Gabriela de Queiroz

✨is the Director of AI at Microsoft and previously worked in AI strategy and innovation at IBM
✨ is a strong advocate for diversity in the field, having founded #RLadies and #AI Inclusive
✨ has won several awards for her work (including being named one of the 100 Brilliant Women in AI Ethics™ in 2023)

🔗 and much more
Grace Hopper

✨ was a computer scientist, mathematician, and US Navy rear admiral
✨ helped to develop a compiler that led to #COBOL, a widely used programming language
✨ was the first to refer to a computer problem as a 'bug' and to speak of 'debugging' a computer 🐞 (https://bit.ly/3V2mamK)
✨ broke more barriers by receiving the National Medal of Technology in 1991 (as the first female individual recipient; https://bit.ly/4bD7kdy)

🔗 and much more
Hanan Salam

✨ is Assistant Professor and Director of SMART Lab at New York University Abu Dhabi
✨ focuses her research on Artificial Social Intelligence - a form of intelligence that requires a machine to make sense of social cues when interacting with humans (https://bit.ly/3Seo82H)
✨ founded Women in AI in 2017 (together with Moojan Asghari and Caroline Lair) - a global non-profit organization focussing on creating #inclusiveAI for our common future

🔗 and much more
Hedy Lamarr

✨ was a Hollywood actress, inventor, and the 'Mother of Wi-Fi'
✨ came up with an improved stoplight and a tablet that dissolves in water and tastes like Coca-Cola
✨ together with George Antheil, also invented a new communication system that involves ‘frequency hopping’ and is now the basis for #WiFi, #Bluetooth, and #GPS
✨ received many awards for her work
🔗 and much more
Ida Rhodes

✨ was a pioneer in computer science (co-designed the C-10 programming language for the universal automatic computer I)
✨ developed an often-used algorithm for a Jewish calendar
✨ was referred to as the UNIVAC I Pioneer at the AFIPS National Computer Conference in Chicago

🔗 and much more
International Women's Day

🔗 ... learn more about it
Jacqueline Nolis

✨is the Principal Data Scientist at Fanatics
✨ has a track record of helping businesses solve problems with data
✨ has co-authored the book 'Build Your Career in Data Science' (with Emily Robinson), which gives you tons of fantastic tips on how to build a career in #datascience (and they also have a #podcast on the topic)

🔗 and much more
Jessica Cherny

✨✨ is a Senior Data Analyst at Fivetran, founder of Data Angels, and speaker at tech events
✨ is actively involved in the Silicon Valley startup scene, including participation in Accel Partners' Accel Scholar program
✨ supports other women in #data by bringing together a community with more than 2,400 members, organizing fireplace talks and mentoring sessions

🔗 and much more
Joedian Reid

✨is Technical Program Manager for #Go at Google
✨ has a track record of helping businesses solve problems with technology
✨ is a frequent speaker on topics such as career development, #womenintech, and tech in general
✨ is a strong advocate for promoting greater diversity in tech 💜

🔗 and much more
Katherine Johnson

✨was a mathematician at NASA
✨earned a reputation for mastering complex manual calculations and fueled NASA's missions
✨ many of you may know her story from the book/movie 'Hidden Figures' which tells the story of her life
✨ shattered barriers, breaking through as an African-American woman in STEM

🔗 and much more
Lynn Conway

✨ is a computer scientist and electrical engineer
✨ is a pioneer in microelectronics chip design - her inventions have influenced chip design worldwide (bit.ly/4bTlneH)
✨ is a transgender activist who paved the way for those who came after her (https://bit.ly/4bwpWvD)

🔗 and much more
Melanie Mitchell

✨ is the Davis Professor of Complexity at the Santa Fe Institute
✨ focuses her research on abstraction and reasoning tasks with #AI systems
✨ authored many books, including 'Artificial Intelligence: A Guide for Thinking Humans' (which gives you a very nice introduction to the world of artificial intelligence - covering the technical background, but also putting the developments in a historical perspective)
✨ has won multiple awards for her work

🔗 and much more
Mary Jackson

✨ was a mathematician and aerospace engineer at the NACA (the predecessor of NASA)
✨ many of you may know her story from the book/movie 'Hidden Figures' which tells the story of her life
✨ has won multiple awards for her work and supported women and other minorities to advance in their careers

🔗 and much more
Mary Lou Jepsen

✨ is a technical executive and inventor - with more than 250 patents published and inventions in the field of display, imaging, and computer hardware
✨ former executive at Facebook/Oculus, Google, and Intel Corporation as well as a former MIT Media Lab professor
✨ founder of the One Laptop Per Child, a non-profit organization to transform education for children around the world.

🔗 and much more
Mia Shah-Dand

✨ is the CEO of Lighthouse3
✨ founded Women in AI Ethics™ to increase reputation and recognition of women in the field of AI ethics
✨ works towards a human-centered view of AI and is an advocate for #responsibleAI

🔗 and much more
Naomi Ceder

✨is a #Python instructor, speaker, and book author
✨ earned a PhD in Classics before switching to computer science and specifically #Python 🐍
✨ served as chair of the board of the Python Software Foundation
✨ received the PSF Distinguished Service Award 🏆
✨ is a strong advocate for increasing diversity in technology and founded Trans*Code Hackday

🔗 and much more
PyLadies kicked off with their very first "Intro to Python" workshop in May 2011

🔗 Blog post
R-Ladies were founded in October 2012

🔗 ... learn more about them
Reshama Shaikh

✨is a statistician/data scientist consultant and Director of Data Umbrella - a global community for underrepresented persons in data science that organizes online speaker series and workshops
✨ holds a M.S. in statistics from Rutgers University and an M.B.A. from NYU Stern School of Business
✨ is a passionate open source contributor and advocate (for instance for scikit-learn and PyMC 🐍)
✨ was awarded the Community Leadership Award from NumFOCUS and is a Fellow of #PSF

🔗 and much more
Sasha Luccioni

✨ is a leading researcher in ethical artificial intelligence and AI and Climate Lead at Hugging Face
✨ is pioneering with her work and advocacy in the field of #sustainableAI and #ethicalAI
✨ contributes to creating CodeCarbon, a light-weight #Python package that helps to quantify the CO2 emissions produced during the training of AI algorithms (https://codecarbon.io/#about)

🔗 and much more
Susan Wojcicki

✨ was CEO of YouTube
✨ held degrees in history and literature, economics, and an MBA
✨ played a key role in shaping Google's advertising business
✨ advocated for diversity in tech
✨ was recognized as a top media executive and influential leader

🔗 and much more!
Timnit Gebru

✨ is a computer scientist as well as the founder and executive director of The Distributed AI Research Institute (DAIR)
✨ is pioneering with her work and advocacy in the field of #ethicalAI
✨ is co-founder of Black in AI

🔗 and much more

About

This repository contains illustrations to explain concepts in data (science).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published