hanalee07 at gmail dot com


  • Scientist interested in exploring and analyzing big data to solve meaningful problems.
  • Former career as a genetics and genomics researcher, with expertise in analysis of next-generation sequencing data sets and scientific computing for the life sciences.


  • Scientific computing and visualization
    • R: caret, ggplot2, shiny, knitr
    • Python: numpy, pandas, scikit-learn, seaborn, jupyter
  • Relational databases: SQLite, MySQL
  • Web frameworks: Django, web2py

Roles and experiences

  • Data analysis and visualization: I built data cleaning and processing pipelines in Python for terabyte-scale data sets and used R for exploratory data analysis and statistical testing, with nonparametric methods, generalized linear models, and multiple testing correction.

  • Teaching and communication: I introduced undergraduate and doctoral students to programming through Python and R. I also co-taught a workshop for scientists on analyzing next-generation sequencing data in R. I am expert at communicating complex information to nonspecialist audiences through data visualization and oral presentations.

  • Self-directed learning: As side projects, I built a classification model to predict exercise movements, created a webapp to deliver predictions on gas mileage, developed a web interface to crowdsource media metadata, wrote the backend for a website to manage invitations, and maintained a donations website. I am currently working on the Yelp Dataset Challenge and Kaggle competitions.


  • 2012 Ph.D. Molecular Biology, University of California, Berkeley
  • 2007 A.B. Biochemical Sciences, Harvard University

