GSoC’21: New Plots Final Report

Rishabh Sanjay
4 min readAug 20, 2021

So GSoC’21 is coming to an end😢, and I am very glad that I spent my entire summer with ArviZ working with some brilliant people from around the world. This experience not only helped me to learn new things but also motivated me to explore the field of statistics and Bayesian Modelling. With the help of this blog, I would like to summarise my work on the project “New Plots”. My project was mainly about adding new plots in ArviZ for visualization of Bayesian models and data.

I would firstly like to thank my mentors Seth Axen and Osvaldo Martin for their continuous support. They reviewed all my work and provided valuable feedback. Through their suggestions, I got to learn a lot about Bayesian Statistics, Visualization and Open Source. I would also like to thank Oriol Abril Pla, though he wasn’t my mentor still he reviewed all my work and provided constructive suggestions.

ArviZ(NumFOCUS)

New Plots

So my project basically focused on including new plots in the ArviZ for better visualization and analysis of Bayesian Inference data. So I had planned to add Dot Plots, Half-eyed Plots and ECDF Plots to the ArviZ package but during my coding period, we found that Half-eyed Plots can be achieved by slite tweaking the Violin Plots which was present in the package so my main focus shifted on implementing Dot Plots in the first half and ECDF Plots in the 2nd half of GSoC.

Task 1: Implementing Dot Plots

My first task was to implement dot plots. Dot Plots represent individual observations in a batch of data using symbols, especially circular dots. They can be considered as scatter plots with horizontal stacks.

Steps that I followed for this task:

  • Designing the high-level API: After an intense discussion with my mentors on what kind of arguments, data types and other details would be used, I opened a PR for the initial prototype of Dot Plots. This gist was used while discussing the design of the plot.
  • Opening a PR: I opened a PR for the initial prototype of Dot Plots after the API design was finalized. In this PR I had implemented for both the backends that ArviZ supports i.e Matplotlib and Bokeh.
  • Incorporating Suggestions: Incorporated constructive suggestions and feedback from mentors.
  • Added Tests: Added unit tests using pytest for both backends and some utility functions which were used in the plot.
  • Added Examples and Documentation: Examples were added to the docstring as well as to the examples folder. Added docstrings and appropriate comments so that users can understand the plot and its internal working smoothly.
Dot Plots on a sample from Normal(0,1)

Task 2: Implementing ECDF Plots

In my second task, my aim is to implement ECDF and ECDF-difference plots with confidence bands. The algorithm in this paper was used to calculate confidence bands.

The steps were similar to task 1.

  • Designing the function API: The gist can be accessed here.
  • Opening a PR: I opened a PR for the initial prototype of ECDF Plots after the API design was finalized. This PR contains the implementation of both the backends Matplotlib and Bokeh.
  • Incorporating Suggestions: Incorporated constructive suggestions and feedback from mentors.
  • Added Tests: Added unit tests using pytest for both backends and some utility functions which were used in the plot.
  • Added Examples and Documentation: Examples were added to the docstring as well as to the examples folder. Added docstrings and appropriate comments.

Also, there are some future prospects for this plot which could help to improve it and interested people are welcomed to contribute.

Future Work:

  • Implement the optimization-based approach for calculating confidence bands used in the paper.
  • The use of Numba can help in optimizing the plot.
  • Using ECDF Plots for convergence diagnostics.
ECDF-difference on PIT of sample

Key Learnings

Firstly I got to know about various Python packages like ArviZ, Pymc3. Also, I learned the basics of R in this process as the dot plots implementation was present in the ggplot library of R and I used it as a reference. Also, I got to know about pytest which I used during writing unit tests. From the plotting perspective, I went deep into Matplotlib and learned about Bokeh.

Secondly, I got to know the importance of planning and organizing tasks. When working on a big project, it becomes very crucial that every task is properly segregated and planned before the actual work begins.

List of PRs:

Main PRs:

New Features and bug fixes:

Thank You, Everyone❤️

References:

  • Leland Wilkinson (1999) Dot Plots, The American Statistician, 53:3, 276–281, DOI: 10.1080/00031305.1999.10474474
  • Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. 2016. When (ish) is My Bus? User-centred Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. DOI:https://doi.org/10.1145/2858036.2858558
  • Säilynoja, T., Bürkner, P.C. and Vehtari, A., 2021. Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison. arXiv preprint arXiv:2103.10522.

--

--

Rishabh Sanjay

Maths and Computing at IIT Kanpur, Deep Learning | NLP | Bayesian Statistics enthusiast