Portable Parallelism using Modern C++ and Threading Building Blocks

Portable Parallelism using Modern C++ and Threading Building Blocks is a two-day online training course with programming exercises taught by Michael Voss and Pablo Reble. It is offered online from 11AM to 3PM Eastern Time (EDT),  Monday September 21st and Tuesday September 22nd, 2020 (after the conference).

Course Description

Threading Building Blocks (TBB) is a portable, open-source C++ library for threading that has been widely used since 2006. Over the years, the developers of TBB and its users have learned many lessons about implementing composable parallelism in real-world C++ applications, including what practices to use and what mistakes to avoid. This course will introduce the TBB library and discuss its relationship with modern C++. Attendees will become familiar with TBB’s generic parallel algorithms, flow graph, concurrent containers and scalable memory allocator. They will also learn about TBB’s ongoing relationship with the ISO C++ standard including the fundamental features introduced in C++11 (std::thread, std::mutex and std::atomic), the parallel execution policies introduced in C++17, the newest features introduced in C++20, as well as proposed features, like executors.

Threading Building Blocks is also part of the specification for oneAPI, a recently announced cross-industry, open, standards-based unified programming model for heterogenous programming. Attendees will be introduced to oneAPI and learned about the role of oneAPI’s Threading Building Blocks (oneTBB) in this cross-industry effort.

Finally, the class will provide tips on how to architect applications and libraries to provide composable parallel performance, based on the experiences of the TBB development team and its customer support teams. The instructors will demonstrate these tips with examples using TBB and the parallel algorithms introduced in C++17.

This course will mix presentations with hands-on exercises. Course materials will be provided via GitHub and hands-on exercises will be supported through Intel ® DevCloud for oneAPI.

Goals

  • Learn about the features of TBB
  • Learn about TBB’s complementary relationship with the parallelism features in C++
  • Learn about oneTBB’s role in oneAPI.
  • Learn how to create applications with portable, composable parallelism

Prerequisites

  • Knowledge of C++11/14 (including templates)
  • As an online class:
    • A reliable internet connection is necessary
    • The exact conference-call software will be announced later
  • To participate in hands-on examples:
    • Option 1: (the primary delivery method): Using the Intel ® DevCloud for oneAPI
      • A limited, but hopefully sufficient, number of instances will be pre-arranged for attendee use
      • All necessary software will be pre-installed on those systems
    • Option 2: On the attendee’s local system
      • A C++ compiler supporting C++14 or later
      • git
      • Due to time constraints, we will offer limited assistance for students that choose this option

Outline

Day 1 (3 Total hours)

  • An introduction to Threading Building Blocks (~ 1.5 hrs)
    • A brief overview of TBB
    • Hands-on:
      • Setting up Intel ® DevCloud for oneAPI
      • Or, installing TBB and the examples locally
    • The libraries components
      • The Generic algorithms
      • The flow graph API
      • Concurrent containers
      • Scalable memory allocation
    • Hands-on:
      • Using TBB features
  • Modern standard C++ parallelism features with TBB (~ 1.5 hrs)
    • Features that TBB had that have been displaced by standard C++
      • tbb::thread, tbb::mutex and tbb::atomic
    • The C++17 parallel algorithms with TBB as an execution engine
      • Understanding execution policies
        • seq, par, unseq, par_unseq
    • TBB’s relationship to upcoming features: co-routines, executors and more.
    • Hands-on:
      • Using C++17 parallel algorithms with oneTBB

Day 2 (3 Total hours)

  • Composability (~ 1.5 hrs)
    • What is composability and why is it important
      • The types of composability (nested, parallel and sequential)
    • Pitfalls when creating parallel applications and libraries
      • Tuning for an assume set of resources
      • Oversubscription
      • Affinity and locality
      • Priorities
    • Hands-on
      • Exploring oneTBB’s performance features
  • Techniques for creating composable applications and libraries (~1.5 hrs)
    • Data parallelism
    • Relaxed sequential semantics
    • Cache-oblivious algorithms
    • Work stealing and recursive subdivision
    • Avoiding oversubscription in nested parallelism
    • Is it ok to sacrifice composability for performance?
    • Demonstration
      • Composability case studies

Register Here

Course Instructors

Michael VossMichael Voss is a Principal Engineer in the Intel Architecture, Graphics and Software Group at Intel. He has been a member of the TBB development team since before the 1.0 release in 2006 and was the initial architect of the TBB flow graph API. He was also one of the lead developers of Flow Graph Analyzer, a graphical tool for analyzing data flow applications targeted at both homogeneous and heterogeneous platforms. He is a co-author of the new book “Pro TBB: C++ Parallel Programming with Threading Building Blocks” and has over 40 published papers and articles on topics related to parallel programming. He frequently consults with customers across a wide range of domains to help them effectively use the threading libraries provided by Intel. Prior to joining Intel in 2006, he taught in the Edward S. Rogers Department of Electrical and Computer Engineering at the University of Toronto. He received his Ph.D. from the School of Electrical and Computer Engineering at Purdue University in 2001.
Pablo ReblePablo Reble is a Software Engineer in the Intel Architecture, Graphics and Software Group at Intel working on oneAPI and the DPC++ library. He has been working on TBB and related Tools since 2015. Before joining Intel in 2016, he worked as Post-doctoral researcher at RWTH Aachen where he received his PhD in Computer Engineering. He has more than 7 years of experience in teaching parallel programming on undergraduate and graduate level. He has authored over 15 published papers and articles on parallel programming, system software, and runtime architectures.