<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel rdf:about="http://hdl.handle.net/2117/3114">
    <title>DSpace Collection:</title>
    <link>http://hdl.handle.net/2117/3114</link>
    <description />
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://hdl.handle.net/2117/19158" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18455" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18448" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18242" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18207" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18176" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/18025" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/16300" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/15703" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/15702" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/15652" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/15651" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/15648" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/14433" />
        <rdf:li rdf:resource="http://hdl.handle.net/2117/13872" />
      </rdf:Seq>
    </items>
    <dc:date>2013-05-21T09:32:43Z</dc:date>
  </channel>
  <item rdf:about="http://hdl.handle.net/2117/19158">
    <title>DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support</title>
    <link>http://hdl.handle.net/2117/19158</link>
    <description>Title: DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support
Authors: Pavlou, Demos; Gibert Codina, Enric; Latorre, Fernando; González Colás, Antonio María
Abstract: Dynamic Binary Translators (DBT) and Dynamic Binary Opti-&#xD;
mization (DBO) by software are used widely for several reasons&#xD;
including performance, design simplification and virtualization.&#xD;
However, the software layer in such systems introduces non-&#xD;
negligible overheads which affect performance and user experi-&#xD;
ence. Hence, reducing DBT/DBO overheads is of paramount im-&#xD;
portance. In addition, reduced overheads have interesting collateral&#xD;
effects in the rest of the software layer, such as allowing optimiza-&#xD;
tions to be applied earlier. A cost-effective solution to this problem&#xD;
is to provide hardware support to speed up the primitives of the&#xD;
software layer, paying special attention to automate DBT/DBO&#xD;
mechanisms and leave the heuristics to the software, which is more&#xD;
flexible.&#xD;
In this work, we have characterized the overheads of a DBO sys-&#xD;
tem using DynamoRIO implementing several basic optimizations.&#xD;
We have seen that the computation of the Data Dependence Graph&#xD;
(DDG) accounts for 5%-10% of the execution time. For this rea-&#xD;
son, we propose to add hardware support for this task in the form&#xD;
of a new functional unit, called DDGacc, which is integrated in a&#xD;
conventional pipeline processor and is operated through new ISA&#xD;
instructions. Our evaluation shows that DDGacc reduces the cost of&#xD;
computing the DDG by 32x, which reduces overall execution time&#xD;
by 5%-10% on average and up to 18% for applications where the&#xD;
DBO optimizes large code footprints.</description>
    <dc:date>2013-05-10T12:30:37Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18455">
    <title>A Novel variation-tolerant 4T-DRAM with enhance soft-error tolerance</title>
    <link>http://hdl.handle.net/2117/18455</link>
    <description>Title: A Novel variation-tolerant 4T-DRAM with enhance soft-error tolerance
Authors: Ganapathy, Shrikanth; Canal Corretger, Ramon; Alexandrescu, Dan; Costenaro, Enrico; González Colás, Antonio María; Rubio Sola, Jose Antonio
Abstract: In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.</description>
    <dc:date>2013-03-21T13:41:03Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18448">
    <title>Analysis of CPI variance for dynamic binary translators/optimizers modules</title>
    <link>http://hdl.handle.net/2117/18448</link>
    <description>Title: Analysis of CPI variance for dynamic binary translators/optimizers modules
Authors: Brankovic, Aleksandar; Stavrou, Kyriakos; Gibert Codina, Enric; González Colás, Antonio María
Abstract: Dynamic Binary Translators and Optimizers&#xD;
(DBTOs) have been established as a hot research topic. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed microarchitectures.&#xD;
Although many researchers worked on characterizing and reducing the emulation overhead, to the best of our knowledge, there are no published results that explain how the microarchitectural&#xD;
behavior of the emulation software is affected by the guest application which is emulated.&#xD;
In this paper we study the DBTO as an independent application, which is divided into the modules with specific functionality.&#xD;
We show the variance in microarchitectural behavior of DBTO among 48 applications. Moreover, we locate and explain the&#xD;
sources of variance. The results show that the variance is caused&#xD;
by interaction with the code cache (emulated application) and non&#xD;
uniform module execution characteristics. The insights presented&#xD;
in this paper can be exploited towards the design of more efficient&#xD;
DBTOs</description>
    <dc:date>2013-03-20T19:11:15Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18242">
    <title>Setting an error detection infrastructure with low cost acoustics wave detectors</title>
    <link>http://hdl.handle.net/2117/18242</link>
    <description>Title: Setting an error detection infrastructure with low cost acoustics wave detectors
Authors: Upasani, Gaurang; Vera Rivera, Francisco Javier; González Colás, Antonio María
Abstract: The continuing decrease in dimensions and operating voltage of transistors has increased their sensitivity against radiation phenomena making soft errors an important challenge in future chip multiprocessors (CMPs). Hence, new techniques for detecting errors in the logic and memories that allow meeting the desired failures-in-time (FIT) budget in CMPs are required. This paper proposes a low-cost dynamic particle strike detection mechanism through acoustic wave detectors. Our results show that our mechanism can protect both the logic and the memory arrays. As a case study, we also show how this technique can be combined with error codes to protect the last-level cache at low cost.</description>
    <dc:date>2013-03-12T18:17:30Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18207">
    <title>Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs</title>
    <link>http://hdl.handle.net/2117/18207</link>
    <description>Title: Enhancing 3T DRAMs for SRAM replacement under 10nm tri-gate SOI FinFETs
Authors: Jaksic, Zoran; Canal Corretger, Ramon
Abstract: In this paper, we&#xD;
pr&#xD;
esent the dynamic 3T memory&#xD;
cell for future 10nm tri-gate FinFETs as a potential replacement&#xD;
for classical 6T SRAM cell for implementation in high speed&#xD;
cache memories. We investigate read access time, retention time,&#xD;
and static power consumption of the cell when it is exposed&#xD;
to the effects of process and environmental variations. Process&#xD;
variations are extracted from the ITRS predictions and they are&#xD;
modeled at device level. For simulation, we use 10nm SOI tri-gate&#xD;
FinFET BSIM-CMG model card developed by the University&#xD;
of Glasgow, Device Modeling Group. When compared to the&#xD;
classical 6T SRAM, 3T cell has 40% smaller area, leakage is&#xD;
reduced up to 14 times while access time is approximately the&#xD;
same. In order to achieve higher retention times, we propose&#xD;
several cell extensions which, at the same time, enable post-&#xD;
fabrication/run-time adaptability.</description>
    <dc:date>2013-03-12T13:17:05Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18176">
    <title>A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance</title>
    <link>http://hdl.handle.net/2117/18176</link>
    <description>Title: A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance
Authors: Ganapathy, Shrikanth; Canal Corretger, Ramon; Alexandrescu, Dan; Costenaro, Enrico; González Colás, Antonio María; Rubio Sola, Jose Antonio
Abstract: In view of device scaling issues, embedded DRAM (eDRAM)&#xD;
technology is being considered as a strong alternative to conventional&#xD;
SRAM for use in on-chip memories. Memory cells designed using eDRAM&#xD;
technology in addition to being logic-compatible, are variation tolerant&#xD;
and immune to noise present at low supply voltages. However, two major&#xD;
causes of concern are the data retention capability which is worsened by&#xD;
parameter variations leading to frequent data refreshes (resulting in large&#xD;
dynamic power overhead) and the transient reduction of stored charge&#xD;
increasing soft-error (SE) susceptibility. In this paper, we present a novel&#xD;
variation-tolerant 4T-DRAM cell whose power consumption is 20.4%&#xD;
lower when compared to a similar sized eDRAM cell. The retention time&#xD;
on-average is improved by 2.04X while incurring a delay overhead of&#xD;
3% on the read-access time. Most importantly, using a soft-error (SE)&#xD;
rate analysis tool, we have confirmed that the cell sensitivity to SEs is&#xD;
reduced by 56% on-average in a natural working environment</description>
    <dc:date>2013-03-11T14:33:59Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/18025">
    <title>Reducing energy consumption in human-centric wireless sensor networks</title>
    <link>http://hdl.handle.net/2117/18025</link>
    <description>Title: Reducing energy consumption in human-centric wireless sensor networks
Authors: Meseguer Pallarès, Roc; Molina Clemente, Carlos; Ochoa, Sergio; Santos, Rodrigo
Abstract: Energy consumption is a main research issue in&#xD;
wireless sensor networks; and particularly in those where nodes&#xD;
collaborate to reach a goal. This article explores the energy&#xD;
consumption in mobile devices participating in a human-based&#xD;
wireless sensor network. Specifically, the paper proposes the use&#xD;
of a message predictor to help detect and reduce the number of&#xD;
unnecessary control packets delivered by the nodes as a way to&#xD;
keep updated the network topology. In order to evaluate this&#xD;
proposal, the Optimized Link State Routing protocol was&#xD;
modified to add a message predictor between the routing and the&#xD;
network layers. Eleven simulations were performed using a&#xD;
particular setting. The preliminary results indicate the use of the&#xD;
message predictor can help reduce considerably the nodes energy&#xD;
consumption without affecting the routing capability of the&#xD;
protocol. Although these results are still preliminary, they are&#xD;
highly encouraging.</description>
    <dc:date>2013-02-28T17:25:11Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/16300">
    <title>A take-home exam to assess professional skills</title>
    <link>http://hdl.handle.net/2117/16300</link>
    <description>Title: A take-home exam to assess professional skills
Authors: López Álvarez, David; Cruz Díaz, Josep Llorenç; Sánchez Carracedo, Fermín; Fernández Jiménez, Agustín
Abstract: Professional Skills, such as the ability to communicate effectively or the ability to gather and integrate information, are not easy to teach or to assess. A traditional exam is not the best way of assessing these skills because it is limited both by time and by the resources students are able to consult. Moreover, in a traditional exam it is difficult to assess if professional skills have been acquired in depth. In this paper we propose to substitute the traditional exam by a take-home exam in which students have more time to solve the questions and are not restricted by the sources they can consult, thereby providing a highly educational task in which students experience a deep learning process. We also analyze what kind of questions should be asked to evaluate professional skills, as well as analyzing the potential drawbacks of these kind of exams (such as inappropriate student behavior). Finally, we show the results of one subject at the Barcelona School of Informatics, in which the take-home exam replaced the traditional exam. This course has been taught over 11 terms with good results.</description>
    <dc:date>2012-07-19T10:15:35Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/15703">
    <title>Fast time-to-market with via-configurable transistor array regular fabric: A delay-locked loop design case study</title>
    <link>http://hdl.handle.net/2117/15703</link>
    <description>Title: Fast time-to-market with via-configurable transistor array regular fabric: A delay-locked loop design case study
Authors: González Colás, Antonio María; Pons Solé, Marc; Barajas Ojeda, Enrique; Mateo Peña, Diego; López González, Juan Miguel; Moll Echeto, Francisco de Borja; Rubio Sola, Jose Antonio; Abella Ferrer, Jaume; Vera Rivera, Francisco Javier
Abstract: Time-to-market is a critical issue for nowadays integrated circuits manufacturers. In this paper the Via-Configurable Transistor Array regular layout fabric (VCTA), which aims to minimize the time-to-market and its associated costs, is studied for a Delay-Locked Loop design (DLL). The comparison with a full custom design demonstrates that VCTA can be used without loss of functionality while accelerating the design time. Layout implementations, in 90 nm CMOS process, as well as the delay, energy and jitter electrical simulations are provided.</description>
    <dc:date>2012-04-03T18:06:02Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/15702">
    <title>Design of complex circuits using the via-configurable transistor array regular layout fabric</title>
    <link>http://hdl.handle.net/2117/15702</link>
    <description>Title: Design of complex circuits using the via-configurable transistor array regular layout fabric
Authors: Pons Solé, Marc; Moll Echeto, Francisco de Borja; Rubio Sola, Jose Antonio; Abella Ferrer, Jaume; Vera Rivera, Francisco Javier; González Colás, Antonio María
Abstract: Layout regularity will be mandatory for future CMOS technologies to mitigate manufacturability issues. However, existing CAD tools do not meet the needs imposed by regularity constraints. In this paper we present a new method for regular layout generation with Via-Configurable Transistor Arrays focusing on reducing the area overhead associated to regularity. Results for ISCAS85 benchmarks in the 45nm technology node are provided showing that comparable areas to the standard cell approach can be obtained.</description>
    <dc:date>2012-04-03T15:06:22Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/15652">
    <title>Hardware/software-based diagnosis of load-store queues using expandable activity logs</title>
    <link>http://hdl.handle.net/2117/15652</link>
    <description>Title: Hardware/software-based diagnosis of load-store queues using expandable activity logs
Authors: Carretero Casado, Javier Sebastián; Vera Rivera, Francisco Javier; Abella Ferrer, Jaume; Ramírez García, Tanausu; Monchiero, Matteo; González Colás, Antonio María
Abstract: The increasing device count and design complexity are posing significant challenges to post-silicon validation. Bug diagnosis is the most difficult step during post-silicon validation. Limited reproducibility and low testing speeds are common limitations in current testing techniques. Moreover, low observability defies full-speed testing approaches. Modern solutions like on-chip trace buffers alleviate these issues, but are unable to store long activity traces. As a consequence, the cost of post-Si validation now represents a large fraction of the total design cost. This work describes a hybrid post-Si approach to validate a modern load-store queue. We use an effective error detection mechanism and an expandable logging mechanism to observe the microarchitectural activity for long periods of time, at processor full-speed. Validation is performed by analyzing the log activity by means of a diagnosis algorithm. Correct memory ordering is checked to root the cause of errors.</description>
    <dc:date>2012-03-22T14:35:06Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/15651">
    <title>A co-designed HW/SW approach to general purpose program acceleration using a programmable functional unit</title>
    <link>http://hdl.handle.net/2117/15651</link>
    <description>Title: A co-designed HW/SW approach to general purpose program acceleration using a programmable functional unit
Authors: Deb, Abhishek; Codina Viñas, Josep M.; González Colás, Antonio María
Abstract: In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor in a complexity-effective way. Code is transformed and instructions are generated that run on the PFU using a co-designed virtual machine (Cd-VM). Groups of frequently executed micro-operations (micro-ops) are identified and fused into a macro-op (MOP) by the Cd-VM. The MOPs are executed on PFU. Results presented in this paper show that this HW/SW co-designed approach produces average speedups in performance of 17% in SPECFP and 10% in SPECINT, and up-to 33%, over modern out-of-order processor. Moreover, we also show that the proposed scheme not only out-performs dynamic vectorization using SIMD accelerators but also outperforms an 8-wide issue out-of-order processor.</description>
    <dc:date>2012-03-22T14:19:30Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/15648">
    <title>Fg-STP: fine-grain single thread partitioning on multicores</title>
    <link>http://hdl.handle.net/2117/15648</link>
    <description>Title: Fg-STP: fine-grain single thread partitioning on multicores
Authors: Ranjan, Rakesh; Latorre Salinas, Fernando; Marcuello Pascual, Pedro; González Colás, Antonio María
Abstract: Power and complexity issues have led the microprocessor industry to shift to Chip Multiprocessors in order to be able to better utilize the additional transistors ensured by Moore's law. While parallel programs are going to be able to take most of the advantage of these CMPs, single thread applications are not equipped to benefit from them. In this paper we propose Fine-Grain Single-Thread Partitioning (Fg-STP), a hardware-only scheme that takes advantage of CMP designs to speedup single-threaded applications. Our proposal improves single thread performance by reconfiguring two cores with the aim of collaborating on the fetching and execution of the instructions. These cores are basically conventional out-of-order cores in which execution is orchestrated using a dedicated hardware that has minimum and localized impact on the original design of the cores. This approach partitions the code at instruction granularity and differs from previous proposals on the extensive use of dependence speculation, replication and communication. These features are combined with the ability to look for parallelism on large instruction windows without any software intervention (no re-compilation or profiling hints are needed). These characteristics allow Fg-STP to speedup single thread by 18% and 7% on average over similar hardware-only approaches like Core Fusion, on medium sized and small sized 2-core CMP respectively for Spec 2006 benchmarks.</description>
    <dc:date>2012-03-22T12:55:41Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/14433">
    <title>Impact of positive bias temperature instability (PBTI)</title>
    <link>http://hdl.handle.net/2117/14433</link>
    <description>Title: Impact of positive bias temperature instability (PBTI)
Authors: Aymerich Capdevila, Nivard; Ganapathy, Shrikanth; Rubio Sola, Jose Antonio; Canal Corretger, Ramon; González Colás, Antonio María
Abstract: Memory circuits are playing a key role in complex multicore systems with both data and instructions storage and mailbox communication functions. There is a general concern that conventional SRAM cell based on the 6T structure could exhibit serious limitations in future CMOS technologies due to the instability caused by transistor mismatching as well as for leakage consumption reasons. For L1 data caches the new cell 3T1D DRAM is considered a potential candidate to substitute 6T SRAMs. We first evaluate the impact of the positive bias temperature instability, PBTI, on the access and retention time of the 3T1D memory cell implemented with 45 nm technology. Then, we consider all sources of variations and the effect of the degradation caused by the aging of the device on the yield at system level.</description>
    <dc:date>2012-01-09T16:07:39Z</dc:date>
  </item>
  <item rdf:about="http://hdl.handle.net/2117/13872">
    <title>Global productiveness propagation: A code optimization technique to speculatively prune useless narrow computations</title>
    <link>http://hdl.handle.net/2117/13872</link>
    <description>Title: Global productiveness propagation: A code optimization technique to speculatively prune useless narrow computations
Authors: Bhagat, Indu; Gibert Codina, Enric; Sanchez, Jesus; González Colás, Antonio María
Abstract: This paper proposes a unique hardware-software collaborative strategy to remove useless work at 16-bit data-width granularity. The underlying motivation is to design a low power execution platform by exploiting ‘narrow’ computations. The proposal uses a strictly narrow bit-wide microarchitecture (16-bit integer datapath),&#xD;
which realizes the goal of a low cost, low hardware complexity, low power execution engine. Software dynamically maps the 64-bit computations by translating them into an equivalent 16-bit instruction stream and optimizing them. &#xD;
In this paper, we propose an optimization technique, called Global Productiveness Propagation (GPP), which is a dynamic,&#xD;
profile-based optimization technique that infers the minimum required dataflow by pruning narrow computations that are mostprobably useless (non-productive). More precisely, GPP speculatively prunes the static backward slices of selected narrow computations: computations that result in the same value (in their respective storage location) as that at the input of the region. This speculative optimization technique is formulated around the concept&#xD;
of ‘narrow’ computations because the same allow a finer granularity to distinguish between useful (productive) and useless (nonproductive) work. GPP has been evaluated on an in-order narrow bit-wide execution core, achieving an average dynamic instruction stream reduction of 6.6%, while improving overall performance by 4.2%.</description>
    <dc:date>2011-11-13T10:48:24Z</dc:date>
  </item>
</rdf:RDF>

