• Beyond the socket: NUMA-aware GPUs 

      Ugljesa, Milic; Villa, Oreste; Bolotin, Evgeny; Arunkumar, Akhil; Ebrahimi, Eiman; Jaleel, Aamer; Ramirez, Alex; Nellans, David (Association for Computing Machinery, 2017-10)
      Comunicació de congrés
      Accés obert
      GPUs achieve high throughput and power efficiency by employing many small single instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance variance they utilize a uniform memory system and ...
    • GMT: Enabling easy development and efficient execution of irregular applications on commodity clusters 

      Morari, Alessandro; Villa, Oreste; Tumeo, Antonino; Chavarria Miranda, Daniel; Valero Cortés, Mateo (Association for Computing Machinery (ACM), 2013)
      Comunicació de congrés
      Accés obert
      In this poster we introduce GMT (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT only requires a cluster with x86 nodes ...
    • Scaling irregular applications through data aggregation and software multithreading 

      Morari, Alessandro; Tumeo, Antonio; Chavarria Miranda, Daniel; Villa, Oreste; Valero Cortés, Mateo (Institute of Electrical and Electronics Engineers (IEEE), 2014)
      Text en actes de congrés
      Accés restringit per política de l'editorial
      Emerging applications in areas such as bioinformatics, data analytics, semantic databases and knowledge discovery employ datasets from tens to hundreds of terabytes. Currently, only distributed memory clusters have enough ...