• A co-designed HW/SW approach to general purpose program acceleration using a programmable functional unit 

    Deb, Abhishek; Codina Viñas, Josep M.; González Colás, Antonio María (IEEE Press. Institute of Electrical and Electronics Engineers, 2011)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor in a complexity-effective way. Code is transformed and ...
  • A unified modulo scheduling and register allocation technique for clustered processors 

    Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
    Text en actes de congrés
    Accés obert
    This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more ...
  • AGAMOS: A graph-based approach to modulo scheduling for clustered microarchitectures 

    Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María; Kaeli, D (2009-06)
    Article
    Accés obert
    This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the ...
  • Anaphase: a fine-grain thread decomposition scheme for speculative multithreading 

    Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez, Raul; González Colás, Antonio María (IEEE Computer Society, 2009)
    Text en actes de congrés
    Accés obert
    Industry is moving towards multi-core designs as we have hit the memory and power walls. Multi-core designs are very effective to exploit thread-level parallelism (TLP) but do not provide benefits when executing serial ...
  • Boosting single-thread performance in multi-core systems through fine-grain multi-threading 

    Madriles Gimeno, Carles; López Muñoz, Pedro; Codina Viñas, Josep M.; Gibert Codina, Enric; Latorre Salinas, Fernando; Martínez Vicente, Alejandro; Martinez Morais, Raul; González Colás, Antonio María (ACM Press. Association for Computing Machinery, 2009-06)
    Text en actes de congrés
    Accés restringit per política de l'editorial
    Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism ...
  • Exploiting pseudo-schedules to guide data dependence graph partitioning 

    Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María; David, Kaeli (Institute of Electrical and Electronics Engineers (IEEE), 2002)
    Text en actes de congrés
    Accés obert
    This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning ...
  • Graph-partitioning based instruction scheduling for clustered processors 

    Aleta Ortega, Alexandre; Codina Viñas, Josep M.; Sánchez Navarro, F. Jesús; González Colás, Antonio María (Institute of Electrical and Electronics Engineers (IEEE), 2001)
    Text en actes de congrés
    Accés obert
    This paper presents a novel scheme to schedule loops for clustered microarchitectures. The scheme is based on a preliminary cluster assignment phase implemented through graph partitioning techniques followed by a scheduling ...
  • Instruction replication for clustered microarchitectures 

    Aleta Ortega, Alexandre; Codina Viñas, Josep M.; González Colás, Antonio María; David, Kaeli (Institute of Electrical and Electronics Engineers (IEEE), 2003)
    Text en actes de congrés
    Accés obert
    This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate ...