Data and Visualization in R

A practical guide for social scientists

Author
Published

May 19, 2026

Welcome

All tidy datasets are alike; each messy dataset set is messy in its own way. If you have worked with data in the wild you have surely experienced the frustration of having to spend more time data cleaning than data analyzing. The purpose of this book is to accompany you throughout this unfortunate yet inevitable process and offer some tricks and general guidelines that will make that journey less strenuous.

Back in the dark ages, before large language models (LLMs) became part of our every day life, I taught a course on data visualization at the Barcelona Institute of International Studies. This book started as course notes with some code for students to tweak and adapt to their needs. Today, memorizing the syntax of R code is a far less marketable skill, and the fun part of data analysis and visualization is ever more accessible. However, the grunt work of data cleaning endures.

A special thanks goes out to the many students whose efforts, questions, and feedback served as inspiration for this book!

Contact

This book is in open review. If you have any questions, comments or suggestions; please report an issue on GitHub.

Build Information

This book was built using R version 4.6.0 (2026-04-24). For specific package versions please see the session information below:

R version 4.6.0 (2026-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_4.6.0  fastmap_1.2.0   cli_3.6.6       tools_4.6.0    
 [5] htmltools_0.5.9 rmarkdown_2.31  knitr_1.51      jsonlite_2.0.0 
 [9] xfun_0.57       digest_0.6.39   rlang_1.2.0     evaluate_1.0.5 

License

Creative Commons License
Data Visualization for Social Science by Alfredo Hernandez Sanchez is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://github.com/alfredo-hs/dviz_book.