CppCon 2023 Trip Report
by Mark Hoemmen
The main body of CppCon, a C++ conference, took place October 02 – 06 in Aurora, Colorado. I participated all week and gave a talk. This was my first in-person conference in four years, and my first work travel in nearly as long. My previous CppCon was the virtual one in 2020. Many thanks to the CppCon staff for supporting my lodging and travel expenses.
The talks were exceptional. Several related to my daily work. Others had immediately applicable design tips, informed my review of pending WG21 proposals, and gave me a better grasp of C++ features like coroutines and customization methods. Two or three talks made the relationship between error handling and functional safety more clear for me — an important topic, given my employer NVIDIA’s increasing presence in automotive and robotics fields. Three other talks highlighted the social and cultural skills needed to work in teams on large software projects.
The so-called “hallway track” — in-person, unscheduled conversations — proved both informative and useful for recruiting interns and spreading the word about our C++ Standard proposals and libraries.
On Monday, October 02, CppCon student volunteer coordinator Daniel Hanson invited me to the student dinner. Daniel’s goal in inviting a handful of non-students was to seed the conversation and give students networking opportunities. I ended up staying an hour and a half later than anticipated and chatting with many different students. Topics included the students’ research projects, my experiences working at NVIDIA, and the linear algebra library we are proposing for the C++ Standard. Discussing the latter ended up helping me practice for my talk the next day.
Daniel has a background in computational finance. He and I had corresponded over many months about mdspan and linear algebra libraries. He taught me some surprising facts — for example, that software developers in finance find
std::valarray useful, even though it gets hate from the C++ expert crowd. My second-hand understanding of the historical context (summarized in P1417) is that
valarray was proposed before expression templates became a popular technique. It relied on a belief that C++ compilers needed a more restrictive array type in order to optimize array computations with the same performance as Fortran compilers of the time. The proposal’s authors left the WG21 process before pre-C++98 approval. WG21 discussed requiring that
valarray use expression templates, but decided that it was too late to impose this on implementations. The result was the C++98 wording (persisting to C++23) that permits but does not require expression templates. This led to slow implementations that gave the library a bad reputation. There are other reasons not to like the
valarray design, but this discussion with Daniel taught me the value of listening to other developers’ cultures and suspending judgment — a lesson that came up several times in talks this week.
My talk “std::linalg: Linear algebra coming to standard C++”
On Tuesday, October 03, I gave an invited talk at CppCon about std::linalg, our proposal (P1673) to add a linear algebra library to the C++ Standard. Many thanks to NVIDIA colleagues Gonzalo Brito, Jeff Hammond, Jeff Larkin, Bryce Adelstein Lelbach, and Yu You for reviewing slides and offering performance data. The talk grounded std::linalg both in the history of the BLAS, and in the progression of the C++ Standard. The latter includes the C++ Standard’s parallel algorithms (“stdpar,” for which NVIDIA has an optimized implementation) and mdspan (a C++23 feature).
The most interesting feedback was about the brief mdspan overview I had included. People liked it! Several called for a stand-alone mdspan tutorial. This could be a lot of work, especially because it would need several benchmarks to demonstrate the value of layout and accessor customization. However, it would be a great opportunity to demonstrate how the C++ Standard is serving the needs of high-performance computing.
At least 7 audience members came up afterwards to ask questions immediately after the talk. Two questions summarize their concerns.
- How much functionality does std::linalg provide beyond current C or Fortran BLAS functionality?
For example, does it support element types other than
complex<double>? (It does. It even supports mixed-precision computations.) Does it support layouts other than row-major or column-major? (It does.) For functionality that std::linalg provides that the BLAS does not, does std::linalg promise to optimize for those use cases? (“Optimization” is a quality-of-implementation concern that is not in the Standard. std::linalg can be implemented in a high-performance way without a C or Fortran BLAS. This is why the proposal P1673 does not specify a “back-end” requirement or a way to swap different C or Fortran BLAS libraries.)
- What are the advantages of linalg over (say) the Eigen library?
As explained in the “layers” part of my talk, std::linalg specifically targets the “performance primitives” layer of abstraction, while Eigen targets all the layers. One could imagine implementing Eigen or a library like it with std::linalg.
Social aspects of software development work
On Friday, the author Sherry Sontag spoke on her experiences developing a code standard for Bloomberg and winning over software developers. Her hard-won lessons hit home for me, as a “software and algorithms person” still new to a fast computational kernels development group with a work culture much different than my own. She pointed out the value of developing one-on-one communication and making sure to survey stakeholders before suggesting changes.
Another talk along those lines was Katherine Rocha’s “Finding your codebases’ C++ roots.” This drew an analogy between genealogy (finding one’s family history) and working in a large existing code base. The speaker focused on the importance of preserving and maintaining as much history as possible, in redundant ways. For example, git commit comments can supplement code comments. Major rewrites, for instance, to adopt new C++ versions or more modern programming idioms, may need to happen sometimes. However, they may destroy history and context, so they should be done with care.
The Wednesday plenary talk by Laura Savino (Adobe Photoshop) tied into this code history theme, by observing that long-lived code bases likely have contributors who are still your colleagues. She gave important lessons on how to protect other developers when updating a code base, and how to respect history when doing so.
I was reminded of a brief aside another time this week about Doug McIlroy’s dissent on exceptions. I’m grateful that Bjarne Stroustrup had documented this. It’s hard work documenting discussion and alternate opinions, but the future benefits.
Large language models as work assistants
My NVIDIA colleague Andrei Alexandrescu gave the final keynote, on how he used ChatGPT to help improve binary search algorithms. This, along with hallway and dinner discussions, opened my mind to consider exploring the use of large language models (LLMs) to automate some tasks, like creating plots and visualizations. The main risk for software development is that LLMs don’t necessarily know how to respect software licenses.
Matthias Kretz (main author of the C++ SIMD proposal) pointed out that SIMD should be seen as a high-level programming model for exposing fine-grained data parallelism, rather than as a way to expose registers and other low-level hardware features. Matthias asked me later whether NVIDIA has considered exposing “threads in a warp” or “threads in a block” with a SIMD interface. While we do have some actual “SIMD types” in CUDA, we don’t normally expose thread parallelism that way. It was an interesting thought experiment.
Inbal Levi gave a talk Monday on “Customization methods: Connecting user and library code.” I found Inbal’s talk an excellent overview and highly recommend it. Her brief examples were the best part. Even though I had studied this topic to help me understand the senders / receivers proposal P2300, I learned a lot.
Coroutines had been a bit mysterious to me. I knew in theory how they worked, but wasn’t sure how to apply them in practice. I watched several talks on coroutines this week, and am starting to get a better sense for that. Perhaps the most directly relevant for NVIDIA was “Taro: Task graph-based Asynchronous Programming Using C++ Coroutine,” given by Dian-Lun Lin, a PhD candidate at the University of Wisconsin-Madison. The speaker constructed a task scheduler that could manage multiple CUDA streams, with dependent tasks consisting of both CPU-only tasks, and GPU kernels launched from host. It was interesting to hear about some of the infrastructure that he had to build. I wonder if better use of CUDA events could have simplified the scheduler.
Francesco Zoffoli (Meta) gave a talk “Coroutine Patterns and How to Use Them: Problems and Solutions Using Coroutines In a Modern Codebase.” I really appreciated the speaker’s examples on lifetime of temporaries. He urged us always to join work before leaving a scope, in order to avoid lifetime issues. Exceptions are another way of leaving a scope before joining, so we also have to watch out for them. Blocking wait in destructors may cause deadlock. The “async cleanup pattern” addresses these issues. I note particularly that it’s not in the author’s experience to use
shared_ptr to hold resources for coroutines, and that it doesn’t solve the need for manual cleanup.
Jeffrey Erickson, an FPGA expert, spoke in “Behavioral Modeling in HW/SW Co-design Using C++ Coroutines” about using coroutines to simulate hardware behavior. It made perfect sense to simulate something inherently concurrent (hardware) with a concurrent programming model. A common theme in this and other coroutine talks this week was the need to build infrastructure to make coroutines usable. Francesco Zoffoli uses Folly; this speaker uses a different framework. I’m reminded of Ranges in C++20 vs. C+23.
C++ exceptions and error handling
Peter Muldoon (Bloomberg) gave a talk “Exceptionally bad: The story on the misuse of exceptions and how to do better.” The OmegaException, the one exception “to rule them all” made me twitch, but it was a good lesson that exceptions should be both rare and useful for debugging. Uncaught exceptions without a stack trace are at best useless, and at worst can cause hangs (e.g., with distributed-memory parallel programming models like MPI, the Message Passing Interface standard). Most exceptions can’t or shouldn’t be handled by users (how do I recover from
std::out_of_range? am I confident that I would know what to do with
std::bad_alloc?). Too many caught exceptions are an anti-pattern (are you using them for control flow?). The logical conclusion is OmegaException: a single class, easy to catch, and full of debug information.
Another talk I appreciated was by Erik Tomusk (Codeplay) on “The absurdity of error handling.” In the context of functional safety, things that people consider “safe” may not be. For example, bounds checking and its resulting exception handling may increase run time nondeterministically, which may cause violation of a real-time constraint. Out-of-bounds errors probably indicate bugs, which mean the system is in an undefined (and therefore unsafe) state anyway. Handling the exception won’t help put it back into a known state.
Pablo Halpern’s and Timur Doumler’s talk “Noexcept? Enabling Testing of Contract Checks” had a quote that resonated.
Contract-checking annotations can themselves be a source of bugs and should be verified as part of unit testing.
It might be too expensive to do debug builds as part of pre-merge testing, but it’s still important to have builds that enable
asserts. They might be wrong and they might not even build.
The talk also made the reasoning behind the Lakos Rule more clear for me. Unit tests that catch process termination are much harder to write portably than unit tests that catch exceptions. Papers on the Lakos Rule include P2861, P2831, and P2837.
So many other talks
Vincent Reverdy’s talk on a C++ symbolic algebra system made create use of the fact that each lambda has a distinct type, even if they have the same parameters and captured state. Common Lisp has
GENSYM; C++ has lambdas. I also appreciated how overloading
operator= could change its meaning from assignment to binding a value to a symbol.
Benjamin Brock (Intel) gave a talk on distributed ranges. This was both relevant to distributed-memory parallel programming, and a good illustration that sometimes it helps to expose implementation details as interface. The speaker particularly needed to reach inside
zip_view to get at the ranges inside. Since
zip_view does not permit this, he had to write his own
zip_view. I got to speak with Benjamin more after the talk about sparse matrix iteration, a topic of interest to me for two decades.
C++ is very much a living and growing programming language. It was a joy to see colleagues and collaborators in person again. I very much encourage everyone to attend CppCon.