Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework

Reading time: 6 minute
...

📝 Original Info

  • Title: Building Portable Thread Schedulers for Hierarchical Multiprocessors: the BubbleSched Framework
  • ArXiv ID: 0706.2069
  • Date: 2007-06-14
  • Authors: : - Samuel Thibault - Raymond Namyst - Pierre-André Warrier

📝 Abstract

Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems only provide a poor scheduling API that does not allow applications to transmit valuable scheduling hints to the system. In a previous paper, we showed that using a bubble-based thread scheduler can significantly improve applications' performance in a portable way. However, since multithreaded applications have various scheduling requirements, there is no universal scheduler that could meet all these needs. In this paper, we present a framework that allows scheduling experts to implement and experiment with customized thread schedulers. It provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way. Several examples show how experts can then develop, debug and tune their own portable bubble schedulers.

💡 Deep Analysis

Figure 1

📄 Full Content

arXiv:0706.2069v1 [cs.DC] 14 Jun 2007 Building P ortable Thread S hedulers for Hierar hi al Multipro essors: the BubbleS hed F ramew ork Sam uel Thibault, Ra ymond Nam yst, and Pierre-André W a renier INRIA F uturs - LaBRI  351 ours de la lib ération  33405 T alen e edex, F ran e {thibault,namyst,wa renier} la bri.f r Abstra t. Exploiting full omputational p o w er of urren t more and more hierar hi al m ultipro essor ma hines requires a v ery areful dis- tribution of threads and data among the underlying non-uniform ar-

hite ture. Unfortunately , most op erating systems only pro vide a p o or s heduling API that do es not allo w appli ations to transmit v aluable s he duling hints to the system. In a previous pap er [10℄, w e sho w ed that using a bubble -based thread s heduler an signi an tly impro v e appli- ations' p erforman e in a p ortable w a y . Ho w ev er, sin e m ultithreaded appli ations ha v e v arious s heduling requiremen ts, there is no univ er- sal s heduler that ould meet all these needs. In this pap er, w e presen t a framew ork that allo ws s heduling exp erts to implemen t and exp er- imen t with ustomized thread s hedulers. It pro vides a p o w erful API for dynami ally distributing bubbles among the ma hine in a high-lev el, p ortable, and e ien t w a y . Sev eral examples sho w ho w exp erts an then dev elop, debug and tune their o wn p ortable bubble s he dulers. Keyw ords: Thr e ads, S he duling, Bubbles, NUMA, SMP, Multi-Cor e, SMT. 1 In tro du tion Both AMD and Intel no w pro vide quad- ore

hips and are heading for 100- ores

hips. This is, in the lo w-end mark et, the emerging part of a deep trend, in the s i- en ti omputation mark et, to w ards more and more omplex ma hines (e.g. Sun WildFire, SGI Al tix, Bull No v aS ale). Su h large shared-memory ma-

hines are t ypi ally based on Non-Uniform Memory Ar hite tures (NUMA). Re- en t te hnologies su h as Sim ultaneous Multi-Threading (SMT) and m ulti- ore

hips mak e these ar hite tures ev en more hierar hi al. Exploiting these ma hines e ien tly is a real

hallenge, and a thread s hed- uler is fa ed with dilemmas when trying to tak e in to a oun t the memory hierar-

h y and the CPU utilization sim ultaneously . On NUMA ma hines for instan e, threads should generally b e s heduled as lose to their data as p ossible, but bandwidth- onsuming threads should rather b e distributed o v er dieren t

hips. The ore s heduler of op erating systems an often b e inuen ed, but it misses the pre ise appli ation b eha vior: for instan e, adaptiv ely-rened meshes en tail v ery irregular and unpredi table b eha vior. A go o d solution w ould b e to let appli- ation programmers tak e on trol of the s heduling, but writing a whole s heduler for hierar hi al ma hines is a v ery di ult task. In a previous pap er [10℄, w e in tro du ed the bubble s heduling on ept that helps to express the inheren t parallel stru ture of m ultithreaded appli ations in a w a y that an b e e ien tly exploited b y the underlying thread s heduler. Bubbles are abstra tions to group threads whi h w ork together in a re ursiv e w a y . The rst pr o of-of- on ept implemen tation of our bubble s heduler w as featuring a generi hard- o ded s heduler. Ho w ev er, appli ations ma y ha v e dieren t s hedul- ing requiremen ts and th us ma y atta h dieren t seman ti s to bubbles, enfor ing memory anit y or emphasizing a high frequen y of global syn hronization op- erations for instan e. Ob viously , no generi s heduler an meet all these needs. In this pap er, w e presen t BubbleS he d, a framew ork designed to ease the dev el- opmen t and the ev aluation of ustomized, high-lev el thread s hedulers. 2 On the Design of Thread S hedulers Designing a thread s heduler for hierar hi al ma hines is omplex b e ause it means nding an appli ation-sp e i ompromise b et w een lots of onstrain ts: fa v oring anities b et w een threads and memory , taking adv an tage of all ompu- tational p o w er, redu ing syn hronization ost, et . 2.1 What Input Can a S heduler Exp e t? T o mak e appropriate de isions at exe ution time, a thread s heduler an om bine a n um b er of parameters to ev aluate the go o dness of ea h p oten tial s heduling a tion. These parameters an b e olle ted from sev eral pla es, at dieren t times. A t runtime, some useful kno wledge ab out the target ma hine an b e dis- o v ered. The s heduler an not only get the n um b er of pro essors but also the ar hite ture hierar h y: ho w pro essors and memory banks are onne ted, ho w a he lev els are shared b et w een pro essors, et . Moreo v er, indi ation ab out ho w w ell the threads are using the underlying pro essors an b e fet hed from p erfor- man e oun ters. Some NUMA

hips an also rep ort the ratio of remote memory a esses. The s heduler an hen e

he k whether threads and data are prop

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut