An in-person collaborative event to contribute to base R.
In August of 2023 I traveled to the University of Warwick in Coventry, England to participate in the R Project Sprint, where I worked collaboratively on contributions to base R alongside novice and experienced contributors, as well as the R Core Team.
The source code for base R is a mixture of R and C code. The bug fixes, maintenance, and enhancement of this code base is upheld by volunteers. Those who make the executive decisions are the R Core Team; however, community members are still encouraged to contribute.
The aim of the R Project Sprint as I perceive it was to foster mentorship, collaborations, and personal relationships between the novice and experienced contributors, reducing the barrier to one’s first contribution.
The purpose of this post is describe my personal experience at this event; a comprehensive guide for how contribute to base R is documented on the R Contribution Working Group website.
Applications to participate in the event were due in March 2023, and later that month I was notified of my acceptance. The expectations for participants’ knowledge base coming into the event was nicely outlined in the application.
Leading up to the event there was communication about optional things we could do to prepare, most of which is available at the R Contribution Working Group website. This included:
Joining the R Contributors slack group.
Participating in online events such as a C book club for R Contributors and R Contributor office hours.
Watching recorded tutorials and demonstrations.
Submitting proposals for topics to tackle during the Sprint.
The R Project Sprint participants were impressively diverse both in terms of professional experience and geographic residence. A non-exhaustive list of countries represented at the in-person event includes Argentina, Brazil, Canada, England, India, Hungary, Nepal, Netherlands, New Zealand, Nigeria, Oman, Senegal, Switzerland, and United States.
Some individuals participated remotely via video conference, either by choice or circumstance. The combination of the United Kingdom’s air traffic control issues and train strikes caused delayed arrival for some.
The issues addressed at the event fell into four broad categories:
Translation of messages, errors, and warnings.
Improving documentation.
Addressing bug reports, such as those related to the stats
package.
Enhancements, such as logging visual outcomes from base graphics calls.
Depending on the task at hand, the technical requirements for contributions included:
Read R source code and understand function implementation (documentation and translation).
Read, modify, or write R source code (bugs and enhancements).
Read, modify, or write C source code (bugs and enhancements).
During the sprint, the progress on these items was tracked at https://github.com/r-devel/r-project-sprint-2023/issues. Either the initial report or the final resolution of these items may be seen in the R Bug Tracking System, known as Bugzilla.
Depending on the nature of the proposed change, the contributor may need to build R from source. The instructions regarding how to do so on your personal computing enviroment are outlined in the R Patched and Development Versions Chapter of the R Development Guide. Alternatively, one could also build R from source without touching their local computing environment using the GitHub Code Space R Dev Container (which I heard facilitated collaboration nicely as well).
Regardless of whether you needed to build R from source, proposed changes could then be tested on multiple computing platforms via a pull request to https://github.com/r-devel/r-svn, which mirrors the official base R server. After tests have passed, the contributor could extract the diff from the pull request to submit a proposed solution via Bugzilla; this process is more thoroughly documented in the Using a git mirror section of the R Development Guide.
Once a proposal is submitted, member(s) of the R core team review. If the proposed change modifies the code base, additional checks are run against all packages on CRAN (which there are currently ~20,000) to determine if any breaking changes are enacted, which takes >14 hours to complete. If breaking changes are found, a possible resolution could include re-writing the code to avoid breaking changes. If that is not possible, the R Core Team would then assess if the breaking changes are good breaking changes for packages (i.e., code could have been returning possibly incorrect results and should indeed break and be addressed) or if the breaking changes have too wide of a reach and minimal impact to be considered worth it. If it is decided that breaking changes should be enacted, the R Core Team notifies all authors of affected packages.
I spent most of my time at the Sprint on two bugs that genuinely intrigued me as there were related to functions I had used often over the years.
base::paste
documentation, discussed in bug 17933 and tested
in r-devel PR138. Opened in 2020, there
was already substantial nuanced discussion on this issue related to the clarity and
correctness of the documentation regarding the collapse
and recycle0
behaviors. It
took me a substantial amount of time to understand and evaluate the scenarios discussed and propose changes. After my initial proposal, I received several rounds of feedback both from fellow R contributors and the R core team that provided additional suggestions and context
for the documentation.
stats::t.test
bug, discussed in bug 14359 and tested in r-devel PR142. Opened in 2010,
there was again already substantial discussion on the both the implementation of the paired
t-test and the examples shown in the documentation. This was again a collaboration
among several individuals to both modify the source code and improve documentation that
had several rounds of iteration.
For both me personally and for the wider R community, I view the event as a huge success. Many friendships were made, many collaborations were born, and many bug fixes and enhancements were implemented throughout the week.
Time zone differences and personal availability among the R Core Team and contributors can lead to time lags in communications, losing momentum for initiated issues. Having contributors in person together facilitated live and immediate feedback, allowing for faster iteration and completion.
I can now confidently either submit a new bug report or address an existing bug report. Moreover, I understand workflows for contributions, resources for help, and how to interact with the community for help should I get stuck.
I will also begin attending the R Contribution Working Group (RCWG) as a representative of R-Ladies to communicate RCWG highlights to the broader R-Ladies community and engage in RCWG initiatives as able.
I learned from R Core Team members that have been contributing to base R since 1997, and who knows, maybe I just shared a dinner, chatted over coffee, or walked around campus with a future R Core Team member. 💙
Infinite thanks to Heather Turner for organizing the Sprint, as well as the sponsors who provided the funding to make this event possible. Thank you to R Core Team members and fellow R contributors who traveled across the world to attend, and who kindly and generously shared their knowledge, expertise, anecdotes, and experiences. It was truly a pleasure. Lastly, thanks to Hannah Frick (a fellow Sprint attendee) for providing feedback on this post.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Pileggi (2023, Sept. 7). PIPING HOT DATA: R Project Sprint.. Retrieved from https://www.pipinghotdata.com/posts/2023-09-07-r-project-sprint/
BibTeX citation
@misc{pileggi2023r, author = {Pileggi, Shannon}, title = {PIPING HOT DATA: R Project Sprint.}, url = {https://www.pipinghotdata.com/posts/2023-09-07-r-project-sprint/}, year = {2023} }