DATE is pleased to present a special hybrid format for its 2022 event, as the situation related to COVID-19 is improving but safety measures and restrictions will remain uncertain for the upcoming months across Europe and worldwide. In transition towards a future post-pandemic event again, DATE 2022 will host a two-day live event in presence in the city of Antwerp (just north of Brussels in Belgium), to bring the community together again, followed by other activities carried out entirely online in the subsequent days. This setup combines the in-presence experience with the opportunities of on-line activities, fostering the networking and social interactions around an interesting program of selected talks and panels on emerging topics to complement the traditional DATE high-quality scientific, technical and educational activities.

M02 Software-Defined Hardware: Digital Design in the 21st Century with Chisel

Date: Monday, 01 February 2021
Time: 07:00 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/mndgRXBrkGwf3D2fm

Organizer:
Martin Schoeberl, Technical University of Denmark, DK

Chisel is a hardware construction language implemented as a domain specific language in Scala. Therefore, the full power of a modern programming language is available to describe hardware and, more important, hardware generators. Chisel has been developed at UC Berkeley and successfully used for several tape outs of RISC-V. Google has developed a tensor processing unit for edge devices in Chisel. Here at the Technical University of Denmark we use Chisel in the T-CREST project and in teaching digital electronics and advanced computer architecture.

In this tutorial I will give an overview of Chisel to describe circuits at the RTL level, how to use the Chisel tester functionality to test and simulate digital circuits, present how to synthesize circuits for an FPGA, and present advanced functionality of Chisel for the description of circuit generators.

The aim of the course is to get a basic understanding of a modern hardware description language and be able to describe simple circuits in Chisel. This course will give a basis to explore more advanced concepts of circuit generators written in Chisel/Scala. The intended audience is hardware designers with some background in VHDL or Verilog, but Chisel is also a good entry language for software programmers entering into hardware design (e.g., porting software algorithms to FPGAs for speedup).

Besides lecturing we will have lab sessions to describe small circuits, test them in the Chisel simulation, and run them in an FPGA.

Knowledge of a hardware description language like VHDL of Verilog is beneficial, but Chisel is also approachable by software engineers with knowledge of an object-oriented language such as Java or C#.

I provide installation instructions or a VM with all tools installed for download here: https://github.com/schoeberl/chisel-lab/blob/master/Setup.md. As this is an online tutorial you may use your own FPGA board for experiments or using Chisel just in simulation.

 
The book Digital Design with Chisel' accompanies the tutorial. It is available in open access at https://github.com/schoeberl/chisel-book


M03 How Emerging Memory Technology Will Reshape Future Computing

Date: Monday, 01 February 2021
Time: 07:00 CEST - 11:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/oS2aqSxJicHcBHYAk

Organizers:
Jian-Jia Chen, TU Dortmund University, DE
Hussam Amrouch, University of Stuggart, DE
Joerg Henkel, KIT, DE

Speakers:
Jian-Jia Chen, TU Dortmund University, DE
Hussam Amrouch, University of Stuggart, DE
Yuan-Hao Chang, Academia Sinica, TW

Organizers:

  • Jian-Jia Chen, TU Dortmund University
  • Hussam Amrouch, University of Stuttgart
  • Jörg Henkel, Karlsruhe Institute of Technology

Speaker(s):

  • Jian-Jia Chen, TU Dortmund University, Germany
  • Hussam Amrouch, University of Stuttgart, Germany
  • Yuan-Hao Chang (Johnson Chang), Academia Sinica, Taiwan

Motivation:

  • Due to low leakage power, high density, and low unit cost, emerging byte-addressable NVM architectures are being considered as main memory and storage in the near future.
  • This tutorial is to present and discuss commendable technologies and their impact to outlook potential researches and cooperation.
  • It is unique and timely for the DATE community, since presenters have expertise in both the system/architecture-level side and technology side to draw a vision of novel emerging techniques and their impact across different abstraction layers. 

Goal: The goal of this tutorial is to present and discuss commendable technologies and their impact on emerging byte-addressable non-volatile memories (NVM).

Technical Details:

In this tutorial we will discuss various non-volatile memories. One of them will be Ferroelectric Field-Effect Transistor (FeFET), which is a promising non-volatile, area-efficient and low-power combined logic and memory. FeFET is compatible with the existing CMOS fabrication process. We will show how bit-errors induced by FeFETs (due to temperature effects and process variation effects) can be modeled from the device level to system level and then how the impact of such bit-errors can be accurately quantified and mitigated in the context of Binary Neural Networks (BNNs).  Other types of NVMs will be also discussed demonstrating how system-level managements could be impacted. 

Schedule (Time Zone: GMT +1)

  • 07:00 - 07:10: Opening
  • 07:10 - 08:00: Emerging Devices (ReRam, FeFET, NCFET)
  • 08:00 - 08:05: coffee/breakfast break
  • 08:05 - 08:40: Neural Network Techniques for Systems with NVMs, from CPU Cache, Main Memory and Processing-in-Memory
  • 08:40 - 09:20: Random Forest Training Techniques for Systems with NVMs
  • 09:20 - 09:30: break
  • 09:30 - 10:00: Full system-level NVM simulation and optimization 
  • 10:00 - 10:25: Demo and Q&A
  • 10:25 - 10:30: break
  • 10:30 - 11:00: Vision of Integration of Future Technology

Necessary background: 

  • Computer Architecture

M04 Security in the Post-Quantum Era: Threats and Countermeasures

Date: Monday, 01 February 2021
Time: 07:00 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/h8fKGSqAXef62czt5

Organizers:
Anupam Chattopadhyay, Nanyang Technological University, SG
Swaroop Ghosh, Pennsylvania State University, US
Robert Wille, Johannes Kepler University Linz, AT
Francesco Regazzoni, ALaRI, CH

Speakers:
Koen Bertels, TU Delft, NL
Sujoy Sinha Roy, TU Graz, AT
Shivam Bhasin, Nanyang Technological University, SG

Following Feynman’s idea of computing based on the intricate principles of quantum mechanics the scientific community has embarked on a quest to tap into the unprecedented potential of quantum computing. The concerted effort by industry/academia has produced commercial quantum computers and algorithms that offer speed-up over classical counterparts (at least in principle).

In spite of the promises and potentials, quantum computers are still in nascent stage. On the device front, the qubits are fragile and susceptible to noise and error due to decoherence. New noise tolerant qubits are being studied for this purpose. Another approach is to deploy quantum error correction (QEC) e.g., Shor code, Steane code, Surface code. Variational algorithms and hybrid classical-quantum approaches have shown promise to solve practical problems with NISQ-era quantum computers.

Quantum computers are prophesied to break conventional crypto-systems; most notably by leveraging Shor’s factorization algorithm. However, the practical quantum systems need significant scaling and engineering effort to become a real threat. Envisioning that current public-key cryptosystems will be vulnerable with such advances, a new class of cryptographic algorithms known as Post- Quantum Cryptography is being developed. Quantum systems also bring new security promises in terms of Quantum Key Distribution, quantum-enabled security primitives e.g., TRNG. These primitives are not full-proof either, which faces surge of new attacks.

The first phase of the tutorial will discuss the growth of scalable quantum computers, their challenges and the latest research to solve practical problems using NISQ computers. This will be followed by a glue talk connecting and establishing the realistic threats originating due to a quantum-enabled attacker. The third phase of the tutorial will discuss various post-quantum cryptographic primitives. The concluding talk will present new vulnerabilities in post-quantum cryptography, opening up a new research direction.


B.1 BarCamp

Date: Monday, 01 February 2021
Time: 09:00 CEST - 17:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/RqjaanRSLWQpMJYmj

Session chair:
Anton Klotz, Cadence, DE

Session co-chair:
Georg Gläser, IMMS, DE

Organizers:
Kim Gruttner, OFFIS, DE
Gregor Nitsche, OFFIS, DE

In order to present and discuss ideas and results of the ongoing scientific work, we invite researchers, engineers and students from the areas of electronic design automation (EDA), microelectronics and (embedded) systems design to our open BarCamp event. More information on the format of the BarCamp event you will find at: https://www.date-conference.com/barcamp


M01 Industrial Control Systems Security

Date: Monday, 01 February 2021
Time: 15:00 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/KKvhsnTSuqhtFW8bg

Organizers:
Charalambos Konstantinou, Florida State University, US
Michail Maniatakos, NYUAD, AE
Fei Miao, University of Connecticut, US

This tutorial introduces basic and advanced topics on industrial control systems (ICS) security. It starts with operational security, providing guidance on recognizing weaknesses in everyday operations and information which can be valuable to attackers. A comparative analysis between traditional information technology (IT) and operational control system architectures is also presented, along with security vulnerabilities and mitigation strategies unique to the control system domain. Current trends, threats, and vulnerabilities will be discussed, as well as attacking and defending methodologies for ICS. Case studies on cyberattacks and defenses will be presented for two critical infrastructure sectors: the power grid and the chemical sector. The tutorial also discusses the need for an accurate assessment environment, achieved through the inclusion of hardware-in-the-loop (HIL) testbeds.

The participants of the tutorial will learn: (1) known vulnerabilities of ICS, (2) common attacks on ICS and the entry point of those attacks along with impact level, (3) general strategies for secure design of ICS and cyberphysical systems, (4) strategies for attack detection, (5) testing strategies for security objectives, (6) other aspects related to economic aspect of secure design, trade-off between a secure design and usability, maintenance of features.

Agenda:

Part 1: Introduction and Security of ICS

  1. Introduction to ICS security
    - Motivation, Recent Incidents, Terminology, Common practices
  1. Testbeds and Security Studies
     

Break

Part 2: Requirements for ICS security studies

  1. Threat Modeling and Risk Assessment 
  2. Modeling, Resources, and Metrics for ICS studies
  3. Demos of Denial-of-Service and Time-Delay Attacks in a Co-Simulation Testbed 
     

Break

Part 3: Defense strategies for ICS

  1. Attack Detection and Secure Control of Cyber-Physical Systems 
  2. Defense Methodologies and Best Practices
  3. Future Challenges and Concluding Remarks 

 


M05 Automation goes both ways: ML for security and security for ML

Date: Monday, 01 February 2021
Time: 15:00 CEST - 18:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/xpPtvsj9zfKHFnbxw

Organizers:
Alexandra Dimitrenko, University of Würzburg, US
Siddarth Garg, New York University, US
Farinaz Koushanfar, University of California San Diego, US

This tutorial focuses on the state of the art research in the intersection of AI and security. On the one hand, recent advances in Deep Learning (DL) have enabled a paradigm shift to include machine intelligence in a wide range of autonomous tasks. As a result, a largely unexplored surface has opened up for attacks jeopardizing the integrity of DL models and hindering their ubiquitous deployment across various intelligent applications. On the other hand, DL-based algorithms are also being employed for identifying several security vulnerabilities on long streams of multi-modal data and logs. In distributed complex settings, often times this is the only way to monitor and audit the security and robustness of the system. The tutorial integrates the views from three experts: Prof. Garg explores the emerging landscape of "adversarial ML" with the goal of answering basic questions about the trustworthiness and reliability of modern machine learning systems. Prof. Dmitrienko presents novel usages of federated and distributed learning for risk detection on mobile platforms with proof-of-concept realization and evaluation on data from millions of users. Prof. Koushanfar discusses how end-to-end automated frameworks based on algorithm/hardware co-design help with both (1) realizing accelerated low-overhead shields against DL attacks, and (2) enabling low overhead and real-time intelligent security monitoring.


M06 CAD for SoC Security

Date: Monday, 01 February 2021
Time: 15:00 CEST - 18:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/FNow6v5Kpb67Pdh4o

Speakers:
Mark Tehranipoor, University of Florida, US
Farimah Farahmandi, University of Florida, US

Growing complexity of system-on-chips (SoCs) and ever-increasing cost of IC fabrication have forced the semiconductor industry to shift from a vertical business model to a horizontal model. In this model, time-to-market and manufacturing costs are lowered through outsourcing and design reuse. To be more specific, SoC designers obtain licenses for third party intellectual property (3PIPs), design an SoC by integrating the 3PIPs with their in-house IPs, and then sometime outsource the SoC design to contract design houses, foundries and assemblies for synthesis, DFT insertion, GDSII development, fabrication, test and packaging. With most of these entities involved in design, manufacturing, integration, and distribution located across the globe, SOC design houses no longer have the ability to monitor the entire process, and ensure security and trust.

Further, designers are not knowledgeable about all vulnerabilities in the design, and the countermeasures to address them. Unfortunately, existing tools do not help with the alleviating the magnitude of the problem. The tools are developed to optimize designs against power, performance, and area, while security is completely ignored. In fact, in some cases, tools and designers unintentionally create vulnerability in a circuit through security-unaware design processes/practices. These issues and the lack of trust and control have led to a large number of vulnerabilities. Hence, it is imperative to develop computer-aided design (CAD) tools with security in mind to identify and address vulnerabilities through design life-cycle.
To protect the SoC from such vulnerabilities, academic and industry researchers have proposed many design-for-security and security assessment techniques e.g., information flow tracking, side channel leakage analysis, IP encryption, logic obfuscation, design-for-anti-counterfeit, etc. Some of these techniques are currently being evaluated by industry and are expected to be adopted in near future. However, recent literatures have pointed out to some of the limitations of these approaches. Therefore, it is crucial to have in-depth understanding of the security provided by different techniques and understand their limitations.

The goal of this tutorial is to present (i) the threat posed by each entity in the SoC supply chain, (ii) vulnerabilities during design process / life-cycle, (iii) CAD tools and methodologies for security assessment, (iv) Countermeasure tools and methodologies for each vulnerability, and (vi) challenges and research roadmap ahead.


FM01.1 PhD Forum

Date: Monday, 01 February 2021
Time: 17:00 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/FtLuDBwq5KDuvpHzd

Session Chair:
Robert Wille, Johannes Kepler University Linz, AT

All registered conference delegates and exhibition visitors are kindly invited to join the DATE 2021 PhD Forum, which will take place on Monday from 17:00 - 19:00 at the DATE 2021 venue.

The PhD Forum of the DATE Conference is a poster session hosted by the European Design Automation Association (EDAA), the ACM Special Interest Group on Design Automation (SIGDA), and the IEEE Council on Electronic Design Automation (CEDA). The purpose of the PhD Forum is to offer a forum for PhD students to discuss their thesis and research work with people of the design automation and system design community. It represents a good opportunity for students to get exposure on the job market and to receive valuable feedback on their work.

To this end, the forum takes place in two parts:

  • First, everybody is invited to an opening session of the PhD Forum, where all presenters will present their work by means of a 1min pitch.
  • After that (at approx. 17:30), all presenters will present their work within a 1.5 hour "poster" presentation in separate rooms. Within this timeframe, everyone can enter and leave the respective rooms and engage in corresponding discussions.

Furthermore, for each presentation, a poster (in pdf) summarizing the presentation will be provided.

Time Label Presentation Title
Authors
17:00 CEST OPENING OF THE PHD FORUM
Speaker:
Robert Wille, Johannes Kepler University Linz, AT
17:00 CEST FM01.1.1 EXPLOITING ERROR RESILIENCE OF ITERATIVE AND ACCUMULATION BASED ALGORITHMS FOR HARDWARE EFFICIENCY
Speaker and Author:
Dr. G.A. Gillani, University of Twente, NL
Abstract
While the efficiency gains due to process technology improvements are reaching the fundamental limits of computing, emerging paradigms like approximate computing provide promising efficiency gains for error resilient applications. However, the state-of-the-art approximate computing methodologies do not sufficiently address the accelerator designs for iterative and accumulation based algorithms. Keeping in view a wide range of such algorithms in digital signal processing, this thesis investigates systematic approximation methodologies to design high-efficiency accelerator architectures for iterative and accumulation based algorithms. As a case study of such algorithms, we have applied our proposed approximate computing methodologies to a radio astronomy calibration application.
/sites/date21/files/phdforum/FM01.1.1_ExploitingErrorResilienceOfIterativeAndAccumulationBasedAlgorithmsForHardwareEfficiency_Gillani.pdf
17:00 CEST FM01.1.2 IMPROVING ENERGY EFFICIENCY OF NEURAL NETWORKS
Speaker and Author:
Seongsik Park, Seoul National University, KR
Abstract
Deep learning with neural networks has shown remarkable performance in many applications. However, this success of deep learning is based on a tremendous amount of energy consumption, which becomes one of the major obstacles to deploying the deep learning model on mobile devices. To address this issue, many researchers have studied various methods for improving the energy efficiency of the neural networks to expand the applicability of deep learning. This dissertation is in line with those studies and contains mainly three approaches, including quantization, energy-efficient accelerator, and neuromorphic approach.
/sites/date21/files/phdforum/FM01.1.2_improving.energy.efficiency.of.neural.networks_seongsik.park.pdf
17:00 CEST FM01.1.3 DESIGN, IMPLEMENTATION AND ANALYSIS OF EFFICIENT HARDWARE-BASED SECURITY PRIMITIVES
Speaker and Author:
Nalla Anandakumar Nachimuthu, University of Florida, US
Abstract
Hardware-based security primitives play important roles in protecting and securing a system in Internet of Things (IoT) applications. The main primitives are physical unclonable functions (PUF) and true random number generator (TRNG) studied in this paper. Efficient FPGA implementation are proposed in the work along with relevant security analysis using prevalent metrics. Finally, an application of designed TRNG and PUF is proposed for implementing an authenticated key agreement protocol.
/sites/date21/files/phdforum/FM01.1.3_Design,_Implementation_and_Analysis_of_Efficient_Hardware-based_Security_Primitives_N._Nalla_Anandakumar.pdf
17:00 CEST FM01.1.4 FORMAL ABSTRACTION AND VERIFICATION OF ANALOG CIRCUITS
Speaker and Author:
Ahmad Tarraf, research assistant uni frankfurt, DE
Abstract
In the recently submitted dissertation the formal abstraction and verification of analog circuit is examined. The dissertation aims to contribute to the formal verification of AMS circuits by generating accurate behavioral models that can be used for verification. As accurate behavioral models are often handwritten, this dissertation proposes an automatic abstraction method based on sampling a Spice netlist at transistor level with full Spice BSIM accuracy. The approach generates a hybrid automaton (HA) that exhibits a linear behavior described by a state space representation in each of its locations, thereby modeling the nonlinear behavior of the netlist via multiple locations. Hence, due to the linearity of the obtained model, the approach is easily scalable. The HAs can be deployed in various output languages: Matlab, Verilog-A, and SystemC-AMS. Various extensions exist for the models enhancing their exhibited behavior.
/sites/date21/files/phdforum/FM01.1.4_Formal_Abstraction_and_Verification_of_Analog_Circuits_Ahmad_Tarraf.pdf
17:00 CEST FM01.1.5 OPTIMIZATION TOOLS FOR CONVNETS ON THE EDGE
Speaker:
Valentino Peluso, Politecnico di Torino, IT
Authors:
Valentino Peluso, Enrico Macii and Andrea Calimera, Politecnico di Torino, IT
Abstract
The shift of Convolutional Neural Networks (ConvNets) into low-power devices with limited compute and memory resources calls for cross-layer strategies spanning from hardware to software optimization. This work answers to this need, presenting a collection of tools for efficient deployment of ConvNets on the edge.
/sites/date21/files/phdforum/FM01.1.5_Optimization_Tools_for_ConvNets_on_the_Edge_Valentino_Peluso.pdf
17:00 CEST FM01.1.6 DESIGN SPACE EXPLORATION IN HIGH LEVEL SYNTHESIS
Speaker and Author:
Lorenzo Ferretti, Università della Svizzera italiana, CH
Abstract
High Level Synthesis (HLS) is a process which, starting from a high-level description of an application (C/C++), generates the corresponding RTL code describing the hardware implementation of the desired functionality. The HLS process is usually controlled by user-given directives (e.g., directives to set whether or not to unroll a loop) which influence the resulting implementation area and latency. By using HLS, designers are able to rapidly generate different hardware implementations of the same application, without the burden of directly specifying the low level implementation in detail. Nonetheless, the correlation among directives and resulting performance is often difficult to foresee and to quantify, and the high number of available directives leads to an exponential explosion in the number of possible configurations. In addition, sampling the design space involves a time-consuming hardware synthesis, making a brute-force exploration infeasible beyond very simple cases. However, for a given application, only few directive settings result in Pareto-optimal solutions (with respect to metrics such as area, run-time and power), while most are dominated. The design space exploration problem aims at identifying close to Pareto-optimal implementations while synthesising only a small portion of the possible configurations from the design space. In my Ph.D. dissertation I present an overview of the HLS design flow, followed by a discussion about existing strategies in literature. Moreover, I present new exploration methodologies able to automatically generate optimised implementations of hardware accelerators. The proposed approaches are able to retrieve a close approximation of the real Pareto solutions while synthesising only a small fraction of the possible design, either by smartly navigating their design space or by leveraging prior knowledge. I also present a database of design space explorations whose goal is to push the research boundaries by offering to researchers a tool for the standardisation of exploration evaluation, and a reliable source of knowledge for machine learning based approaches. Lastly, the stepping-stones of a new approach relying on deep learning strategies with graph neural networks is presented.
/sites/date21/files/phdforum/FM01.1.6_DesignSpaceExplorationInHigh-LevelSynthesis_LorenzoFerretti.pdf
17:00 CEST FM01.1.7 RELIABILITY IMPROVEMENT OF STT-MRAM CACHE MEMORIES IN DATA STORAGE SYSTEMS
Speaker:
Elham Cheshmikhani, Sharif University of Technology, IR
Authors:
Elham Cheshmikhani1, Hamed Farbeh2 and Hossein Asadi1
1Sharif University of Technology, IR; 2Amirkabir University of Technology, IR
Abstract
Spin-Transfer Torque Magnetic RAM (STT-MRAM) is known as the most promising replacement for SRAM technology in cache memories. Despite its high-density, non-volatility, near-zero leakage power, and immunity to radiation-induced particle strikes as the major advantages, STT-MRAM-based cache memory suffers from high error rates mainly due to retention failure, read disturbance, and write failure. Despite its high-density, non-volatility, near-zero leakage power, and immunity to radiation as the major advantages, STT-MRAM suffers from high error rates. These errors, which are mainly retention failure, read disturbance, and write failure, are the major reliability challenge in STT-MRAM caches. Existing studies are limited to estimate the rate of only one or two of these error types for STT-MRAM cache. However, the overall vulnerability of STT-MRAM caches, which its estimation is a must to design cost-efficient reliable caches has not been offered in none of previous studies. Meanwhile, all of the existing reliability improvement schemes in STT-MRAM caches are limited to overcome a single or two error types and the majority of them have adverse effect on other error types. In this dissertation, we first propose a system-level framework for reliability exploration and characterization of errors behavior in STT-MRAM caches. To this end, we formulate the cache vulnerability considering the inter-correlation of the error types including retention failure, read disturbance, and write failure as well as the dependency of error rates to workloads behavior and Process Variations (PVs). Then, we investigate the effect of temperature on STT-MRAM cache error rate and demonstrate that heat accumulation increases the error rate by 110.9 percent. We also illustrate that this heat accumulation is mainly due to locality of committed write operations in the cache. In addition, we demonstrate that a) extra read accesses to data and tag arrays, which are imposed to enhance the cache access time significantly increase the read disturbance error rate; and b) the diversity in the number of `1's and switching in codewords of a data block significantly degrades the protection capability of error correcting codes. We also propose a new cache architecture, so-called Reliability-Optimized STT-MRAM Memory (ROSTAM), to customize different parts of the cache structure for reliability enhancement. ROSTAM consists of four components: 1) a simple yet effective replacement policy, called TA-LRW, to prevent the heat accumulation in the cache and reduce the rate of all the three error types, 2) a novel tag array structure, so-called 3RSeT to reduce the error rate by eliminating a significant portion of tag reads, 3) an effective scheme, so-called REAP-Cache, to prevent the accumulation of read disturbance in cache blocks and completely eliminate the adverse effect of concealed reads on the cache reliability, and 4) a new ECC configuration, so-called ROBIN, to uniformly distribute the transitions between the codewords and maximize the ECC correction capability. We compare the proposed architecture with an 8-way L2 cache protected by SEC-DED(72,64) and using LRU policy. The experimental results using gem5 full-system simulator and a comprehensive set of multi-programmed workloads from SPEC CPU2006 benchmark suite on a quad-core processor show that: 1) the rate of read disturbance error is reduced by 4966.1x, which is achieved by integrating TA-LRW, 3RSeT, ROBIN, and REAP Cache, 2) write failure is reduced by 3.7x, which is the effect of TA-LRW and ROBIN, 3) retention failure rate is reduced by 8.1x because of TA-LRW and REAP Cache operations, and 4) total error rate considering all error types is reduced by 10x. The significantly reliability enhancement is achieved in the cost of less than 2.7% increase in energy consumption, less than 1% area overhead, and an average of 2.3% performance degradation.
/sites/date21/files/phdforum/FM01.1.7_RELIABILITY_IMPROVEMENT_OF_STT-MRAM_CACHE_MEMORIES_IN_DATA_STORAGE_SYSTEMS_Cheshmikhani-DATEPhDForum.pdf
17:00 CEST FM01.1.8 ENABLING LOGIC-MEMORY SYNERGY USING INTEGRATED NON-VOLATILE TRANSISTOR TECHNOLOGIES FOR ENERGY-EFFICIENT COMPUTING
Speaker and Author:
Sandeep Krishna Thirumala, Purdue University, US
Abstract
Over the last decade, there has been an immense interest in the quest for emerging memory technologies which possess distinct advantages over traditional silicon-based memories. In the era of big-data, a key challenge is to achieve close integration of logic and memory sub-systems, to overcome the von-Neumann bottleneck associated with the long-distance data transmission between logic and memory. Moreover, brain-inspired deep neural networks which have transformed the field of machine learning in recent years, are not widely deployable in edge devices, mainly due to the aforementioned bottleneck. Therefore, there exists a need to explore solutions with tight logic-memory integration, in order to enable efficient computation for current and future generation of systems. Motivated by this, in this thesis, we harness the benefits offered by emerging technologies and propose devices, circuits, and systems which exhibit an amalgamation of logic and memory functionality. We propose two variants of memory devices: (a) Reconfigurable Ferroelectric transistors and (b) Valley-Coupled-Spin Hall effect-based magnetic random access memory, which exhibit unique logic-memory unification. Exploiting the intriguing features of the proposed devices, we carry out a cross-layer exploration from device-to-circuits-to-systems for energy-efficient computing. We investigate a wide spectrum of applications for the proposed devices including embedded memories, non-volatile logic, compute-in-memory fabrics and artificial intelligence systems. Overall, evaluation results of the proposed device-circuit-system techniques in this thesis, show significant reduction in energy consumption along with performance improvement of various systems when compared to conventional von Neumann-based approaches for several application workloads, addressing the critical need for logic-memory synergy in current/next-generation of computing.
/sites/date21/files/phdforum/FM01.1.8_Enabling_Logic-Memory_Synergy_using_Integrated_Non-Volatile_Transistor_Technologies_for_Energy-Efficient_Computing_Sandeep_Krishna_Thirumala.pdf
17:00 CEST FM01.1.9 HARDWARE SECURITY IN DRAMS AND PROCESSOR CACHES
Speaker and Author:
Wenjie Xiong, Facebook AI Research, US
Abstract
The cost reduction and performance improvement of silicon chips have made computing devices ubiquitous, from IoT to cloud servers. These devices have been deployed to collect and process an unprecedented amount of data around us. Also, to make full use of resources, often the system is shared among different applications. This raises a lot of security and privacy concerns. Meanwhile, memory and processor caches are essential components of modern computers, but they have been mainly designed for their functionality and performance, not for security. There are potential positive uses of hardware components that can improve security, but also, there are security attacks that make use of the vulnerabilities in hardware. This dissertation consequently studies both the positive and negative security aspects of Dynamic Random Access Memories (DRAMs) and caches on commercial devices. The proposed DRAM Physically Unclonable Functions (PUFs) can be deployed today for higher security, especially in low-end IoT devices and embedded systems currently utilized in health care, home automation, transportation, or energy grids, which lack other security mechanisms. The discovered cache LRU covert-channel attacks and DRAM temperature spying attacks show new types of vulnerabilities in today's systems, motivating new designs to protect applications in a shared system and to prevent malicious use of the physical features of the hardware.
/sites/date21/files/phdforum/FM01.1.9_Hardware-security-in-DRAMs-and-caches_WenjieXiong.pdf
17:00 CEST FM01.1.11 LESS IS MORE: EFFICIENT HARDWARE DESIGN THROUGH APPROXIMATE LOGIC SYNTHESIS
Speaker and Author:
Ilaria Scarabottolo, USI Lugano, CH
Abstract
As energy efficiency becomes a crucial concern in almost every kind of digital application, Approximate Computing gains popularity as a potential answer to this ever-growing energy quest. Approximate Computing is a design paradigm particularly suited for error-resilient applications, where small losses in accuracy do not represent a significant reduction in the quality of the result. In these scenarios, energy consumption and resources employment (such as electric power, or circuit area) can be significantly improved at the expense of a slight reduction in output accuracy. While Approximate Computing can be applied at different levels, my research focuses on the design of approximate hardware. In particular, my work explores Approximate Logic Synthesis, where the hardware functionality is automatically tuned to obtain more efficient counterparts, while always controlling the entailed error. Functional modifications include, among others, removal or substitution of gates and signals. A fundamental prerequisite for the application of these modifications is an accurate error model of the circuit under exam. My Ph.D. research work has deeply concentrated on the derivation of accurate error models of a circuit. These can, in turn, guide Approximate Logic Synthesis algorithms to optimal solutions and avoid expensive, time-consuming simulations. A precise error model allows to fully explore the design space and, potentially, adjust the desired level of accuracy even at runtime. I have also contributed to the state of the art in ALS techniques by devising a circuit pruning algorithm that produces efficient approximate circuits for given error constraints. The innovative aspect of my work is that it exploits circuit topology and graph partitioning to identify circuit portions that impact to a smaller extent on the final output. With this information, ALS algorithms can improve their efficiency by acting first on those less-influent portions. Indeed, this error characterisation proves to be very effective in guiding and modeling approximate synthesis.
/sites/date21/files/phdforum/FM01.1.11_LESS_IS_MORE_EFFICIENT_HARDWARE_DESIGN_THROUGH_APPROXIMATE_LOGIC_SYNTHESIS.pdf
17:00 CEST FM01.1.12 LONGLIVENOC: WEAR LEVELLING, WRITE REDUCTION AND SELECTIVE VC ALLOCATION FOR LONG LASTING DARK SILICON AWARE NOC INTERCONNECTS
Speaker and Author:
Khushboo Rani, IIT Guwahati, IN
Abstract
With the continuing advancement in semiconductor technologies, more and more cores are integrated on the same die that leads to the concept of Chip Multi-processor. The communication across these multiple cores is facilitated by the switch-based Network-on-Chip (NoC) for efficient and bursty on-chip communication. The power and performance of these interconnect is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. It has been noticed that communication consumes almost 36% of the total chip power. With tighter power budgets and to meet the thermal design power (TDP) for the system, components like the cores/caches undergo voltage and frequency scaling and at times, power off. Powering off several components to stay within the TDP leads to the concept of dark silicon. In dark silicon, although the cores/caches are off, the communication network is expected to be available. In order to reduce the standby power of the network in such events, one looks for avenues in non-volatile memory (NVM) technologies. NVM technologies such as spin-transfer torque random access memory (STT-RAM), offer many advantages over conventional SRAM technology. These advantages include high density, good scalability, and low leakage power consumption. However, the buffers made from these memory technologies suffer from costly write operation and low write endurance. Thus, in my PhD research, I proposed wear-levelling and write reduction techniques to enhance the lifetime and reduce the effect of the costly write operation of NVM buffers in the dark silicon scenario. We evaluate our proposed approaches on a multi-core full system simulator Gem5, with Garnet2.0 as the interconnection network model for NoC performance. We evaluate our work with PARSEC and SPEC benchmark suites.
/sites/date21/files/phdforum/FM01.1.12_LongLiveNoC_rani_khushboo.pdf
17:00 CEST FM01.1.13 ENERGY EFFICIENT AND RUNTIME BASED APPROXIMATE COMPUTING TECHNIQUES FOR IMAGE COMPRESSION APPLICATION: AN INTEGRATED APPROACH COVERING CIRCUIT TO ALGORITHMIC LEVEL
Speaker:
Junqi Huang, University of Nottingham Malaysia, MY
Authors:
Junqi Huang1, Nandha kumar Thulasiraman2 and Haider Abbas Almurib1
1University of Nottingham Malaysia, MY; 2University of Nottingham, MY
Abstract
Approximate computing has been widely used in error resilient design for improving the energy performance by reducing circuit complexity and allowing circuits to produce acceptable error results (approximation). Generally, the approximate computing techniques have been developed and implemented either at algorithmic level or logic level or circuit level and with no feasibility of on-the-fly or Runtime change of approximation. Thus, different from the existing methods, this thesis presents novel energy-efficient integrated approach of implementing approximate computing techniques from circuit level to the algorithmic level that incorporate the change of approximation for a given circuit at Runtime without incurring any extra hardware requirement. The two new techniques are known as Frequency upscaling (FUS) technique and Voltage over scaling (VOS) technique. Meanwhile, these two new techniques developed for the logic/circuit level abstract are integrated into a new proposed algorithmic level approximate computing technique known as zigzag low-complexity approximate DCT (ZLCADCT). Thus, developing an integrated approach of implementing runtime based approximate computing technique from circuit level abstract to algorithmic level abstract for image compression application.
/sites/date21/files/phdforum/FM01.1.13_Energy_Efficient_and_Runtime_based_Approximate_Computing_Techniques_for_Image_Compression_Application_An_Integrate.pdf
17:00 CEST FM01.1.14 THESIS: PERFORMANCE AND PHYSICAL ATTACK SECURITY OF LATTICE-BASED CRYPTOGRAPHY
Speaker and Author:
Felipe Valencia, Univesità della Svizzera Italiana, CH
Abstract
This thesis addresses two problems that limit the widespread of LBC: 1) the physical security of real world implementations, where this thesis focuses on fault attacks, and 2) the not always satisfactory performance of lattice-based cryptography, focusing on accelerators and instruction set extensions.
/sites/date21/files/phdforum/FM01.1.14_PERFORMANCE_AND_PHYSICAL_ATTACK_SECURITY_OF_LATTICE-BASED_CRYPTOGRAPHY_M.Sc_Felipe_Valencia.pdf
17:00 CEST FM01.1.15 AMOEBA-INSPIRED SYSTEM CONTROLLER ON IOT EDGE
Speaker:
Anh Nguyen, Tokyo Institute of Technology, JP
Authors:
Anh Nguyen and Yuko Hara-Azumi, Tokyo Institute of Technology, JP
Abstract
This work aims at developing a light-weight yet efficient controller for IoT systems on the edge devices. The controller bases on a recent emerging computing model inspired by an amoeba to solve the Satisfiability problems (SAT), which can represent various IoT applications. Realizing the massive parallelism feature of this amoeba-inspired SAT solver, AmoebaSAT, we conducted its FPGA-based hardware implementations through a hardware/software co-design approach. By extending the original algorithm to help the solver escape local minima more quickly and utilizing the community structure of different IoT applications, we developed a high efficient IoT controller which well understands the characteristics of different application domains and outperformed state-of-the-arts.
/sites/date21/files/phdforum/FM01.1.15_Amoeba-inspired_System_Controller_for_IoT_Edge_Anh_Hoang_Ngoc_Nguyen.pdf
17:00 CEST FM01.1.16 MONITORING AND CONTROLLING INTERCONNECT CONTENTION IN CRITICAL REAL-TIME SYSTEMS
Speaker:
Jordi Cardona, Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, ES
Authors:
Jordi Cardona1, Carles Hernandez2, Enrico Mezzetti3, Jaume Abella4 and Francisco J Cazorla5
1Barcelona Supercomputing Center and Universitat Politecnica de Catalunya, ES; 2Universitat Politècnica de València, ES; 3Barcelona Supercomputing Center (BSC), ES; 4Barcelona Supercomputing Center (BSC-CNS), ES; 5Barcelona Supercomputing Center, ES
Abstract
Computing performance needs in critical real-time systems (CRTS) domains such as automotive, avionics, railway, and space are on the rise. This is fueled by the trend towards implementing an increasing number of product functionalities in software that ends up managing huge amounts of data and implementing complex artificial-intelligence functionalities such as Advanced Driver Assistance Systems. Manycores are able to satisfy, in a cost-efficient manner, the computing needs of embedded real-time industry. In this line, building as much as possible on manycore solutions deployed in the high-performance (mainstream) market, contribute to further reduce costs and increase availability. However, commercial off the shelf (COTS) manycores bring several challenges for their adoption in the critical embedded market. One of those is deriving timing bounds to tasks’ execution times as part of the overall timing validation and verification processes. In particular, the network-on-chip (NoC) has been shown to be the main resource in which contention arises, and hence hampers deriving tight bounds to the timing of tasks. In this extended abstract ,we will show our proposed hardware/software solutions to reduce the worst-case execution time (WCET) of applications optimizing the NoC setup parameters and also our developed techniques to measure and control contention (first in centralized NoCs and later in distributed NoCs systems).
/sites/date21/files/phdforum/FM01.1.16_Monitoring_and_Controlling_Interconnect_Contention_in_Critical_Real-time_Systems_Jordi_Cardona.pdf
17:00 CEST FM01.1.17 RELIABILITY CONSIDERATIONS IN THE USE OF HIGH-PERFORMANCE PROCESSORS IN SAFETY-CRITICAL SYSTEMS
Speaker:
Sergi Alcaide, Universitat Politècnica de Catalunya - Barcelona Supercomputing Center (BSC), ES
Authors:
Sergi Alcaide1, Leonidas Kosmidis2, Carles Hernandez3 and Jaume Abella4
1Universitat Politècnica de Catalunya - Barcelona Supercomputing Center (BSC), ES; 2Barcelona Supercomputing Center (BSC), ES; 3Universitat Politècnica de València, ES; 4Barcelona Supercomputing Center (BSC-CNS), ES
Abstract
High-Performance Computing (HPC) platforms are a must in Autonomous Driving (AD) systems due to the tremendous jump in performance required. However, since HPC components are not designed following the development process used in the automotive domain, some safety requirements are not met by default on those platforms. The automotive functional safety standard, ISO 26262, stipulates that automotive platforms must avoid Common Cause Failures (CCFs), i.e. any single fault that can cause a failure despite safety measures in place. CCFs can be avoided by enforcing diverse redundancy (e.g. lockstep execution), so that a single fault affecting redundant elements (e.g. a voltage droop) does not produce the same error in those redundant elements. This thesis presents software and hardware techniques to achieve a diverse redundant execution in multiple HPC components to enable their usage in the automotive domain.
/sites/date21/files/phdforum/FM01.1.17_Reliability_considerations_in_the_use_of_high-performance_processors_in_safety-critical_systems_SergiAlcaide.pdf
17:00 CEST FM01.1.18 HARDWARE SECURITY EVALUATION OF IOT EMBEDDED APPLICATIONS
Speaker and Author:
zahra kazemi, PhD. Candidate, FR
Abstract
In recent years, the broad adoption and accessibility of the Internet of Things (IoT) have created major concerns for the manufacturers and enterprises in the hardware security domain. The importance of software developers’ role in the evaluation of the system’s security has raised along with the demand for shortening the time to market and development cost. However, embedded software developers often lack the knowledge to consider the hardware-based threats and their effects on important assets. To overcome such challenges, it is essential for the security specialists to provide the embedded developers with practical necessary tools and evaluation methods against hardware-based attacks. In this thesis work, we develop an evaluation methodology and an easy to use hardware security assessment framework, against major physical attacks ( e.g. side-channel and fault injection attacks). It can assist the software developers to detect their system vulnerabilities and to protect important assets. This work can also guide on implementing software-level countermeasures, which can reduce the effects of the physical attack’s risks to an acceptable level. As a case study, we apply our approach to an IoT medical application named “SecPump” that models an infusion pump in the hospitals. This study mimics a real experimental evaluation process and highlights the potential risks of ignoring the physical attacks.
/sites/date21/files/phdforum/FM01.1.18_Hardware_Security_Evaluation_of_IoT_Embedded_Applications_zahrakazemi.pdf
17:00 CEST FM01.1.19 A COMPUTER-AIDED DESIGN SPACE EXPLORATION FOR DEPENDABLE CIRCUITS
Speaker and Author:
Stefan Scharoba, Brandenburg University of Technology, DE
Abstract
This thesis presents an automated toolset for exploring design choices which provide fault tolerance by means of hardware redundancy. Based on a given VHDL model, various fault tolerant implementations can be automatically created and evaluated regarding their overhead and reliability improvement.
/sites/date21/files/phdforum/FM01.1.19_A_Computer-Aided_Design_Space_Exploration_for_Dependable_Circuits_Stefan_Scharoba.pdf
17:00 CEST FM01.1.20 ROBUST AND ENERGY-EFFICIENT DEEP LEARNING SYSTEMS
Speaker and Author:
Muhammad Abdullah Hanif, Institute of Computer Engineering, Vienna University of Technology, AT
Abstract
Deep Learning (DL) has evolved to become the state-of-the-art machine learning algorithm for many AI applications such as image classification, object detection, object segmentation, voice recognition, and language translation. Due to the state-of-the-art accuracy of the models generated through DL, i.e., Deep Neural Networks (DNNs), they are also being adopted for safety-critical applications, e.g., autonomous driving, healthcare, and security & surveil-lance. Besides energy efficiency, for safety-critical applications, reliability against technology-induced faults (e.g., soft errors, device aging, and manufacturing defects) is one of the foremost concerns, as even a single neglected fault at a critical location can result in a significant drop in the application-level accuracy. This Ph.D. work aims at studying and exploiting the unique error-resilience characteristics of DNNs to improve their robustness against the technology-induced reliability threats at low overhead cost. This work also improves the power/performance/energy-efficiency of the systems through judicious approximations (i.e., carefully crafted designer-induced errors in less-sensitive neurons) that can be tolerated due to error-resilience characteristics of DNNs and can be leveraged to compensate for the overheads of reliability features, or alternatively, be spent for enhancing reliability levels.
/sites/date21/files/phdforum/FM01.1.20_Robust_and_Energy-Efficient_Deep_Learning_Systems_Muhammad_Abdullah_Hanif.pdf
17:00 CEST FM01.1.21 AUTOMATED DESIGN OF APPROXIMATE ACCELERATORS
Speaker and Author:
Jorge Castro-Godínez, Karlsruhe Institute of Technology (KIT), DE
Abstract
Approximate computing has emerged as a design paradigm suitable for applications with inherent error resilience. This paradigm aims to reduce the computing costs of exact calculations by lowering the accuracy of their results. In the last decade, many approximate circuits, particularly approximate adders and multipliers, have been reported in the literature. For an ongoing number of such approximate circuits, selecting those that minimize the required resources for designing and generating an approximate accelerator from a high-level specification while satisfying a previously defined accuracy constraint is a joint design space exploration and high-level synthesis challenge. This dissertation proposes automated methods for designing and implementing approximate accelerators built with approximate arithmetic circuits.
/sites/date21/files/phdforum/FM01.1.21_Automated_Design_of_Approximate_Accelerators_Jorge_Castro-Godínez.pdf
17:00 CEST FM01.1.22 NEXT GENERATION DESIGN FOR TESTABILITY, DEBUG AND RELIABILITY USING FORMAL TECHNIQUES
Speaker and Author:
Sebastian Huhn, University of Bremen, DE
Abstract
Several improvements in the Electronic Design Automation (EDA) flow enabled the design of highly complex Integrated Circuits (ICs). This complexity has been introduced to address the challenging intended application scenarios, for instance, in automotive systems, which typically require several heterogeneous functions to be jointly implemented on-chip at once. On the one hand, the complexity scales with the transistor count and, on the other hand, further non-functional aspects have to be considered, which leads to new demanding tasks during the state-of-the-art IC design and test. Thus, new measures are required to achieve the required level of testability, debug and reliability of the resulting circuit. This thesis proposes several novel approaches to, in the end, pave the way for the next generation of IC, which can be successfully and reliable integrated even in safety-critical applications. In particular, this thesis combines formal techniques - like the Satisfiability (SAT) problem and the Bounded Model Checking (BMC) - to address the arising challenges concerning the increase in Test Data Volume (TDV) as well as Test Application Time (TAT) and the required reliability. One contribution concerns the development of Test Vector Transmitting using enhanced compression-based TAP controllers (VecTHOR). VecTHOR proposes a newly designed compression architecture, which combines a codeword-based compression, a dynamically configurable dictionary and a run-length encoding scheme. VecTHOR fulfills a lightweight character and is seamlessly integrated within an IEEE 1149.1 Test Access Port (TAP) controller. VecTHOR achieves a significant reduction of the TDV and the TAT by 50%, which directly reduces the resulting test costs. Another contribution concerns the design and implementation of a retargeting framework to process existing test data off-chip once prior-to the transfer without the need for an expensive test regeneration. Different techniques have been implemented to provide choosable trade-offs between the resulting the TDV as well as the TAT and the required run-time of the retargeting process. These techniques include a fast heuristic approach and a formal optimization SAT-based method by invoking multiple objective functions. Besides this, one contribution concerns the development of a hybrid embedded compression architecture, which is specifically designed for Low-Pin Count Test (LPCT) in the field of safety-critical systems enforcing a zero-defect policy. This hybrid compression has been realized in close industrial cooperation with Infineon Germany. This approach allows reducing the resulting test time by a factor of approx. three. A further contribution is about the development of a new methodology to significantly enhance the robustness of sequential circuits against transient faults while neither introducing a large hardware overhead nor measurably impacting the latency of the circuit. Application-specific knowledge is conducted by applying SAT-based techniques as well as BMC to achieve this, which yields the synthesis of a highly efficient fault detection mechanism. The proposed techniques are presented in detail and evaluated extensively by considering industrial-representative candidates, which clearly demonstrated the proposed approaches' efficacy.
/sites/date21/files/phdforum/FM01.1.22_NEXT_GENERATION_DESIGN_FOR_TESTABILITY,_DEBUG_AND_RELIABILITY_USING_FORMAL_TECHNIQUES-Huhn.pdf
17:00 CEST FM01.1.23 DESIGN AUTOMATION FOR FIELD-COUPLED NANOTECHNOLOGIES
Speaker:
Marcel Walter, University of Bremen, DE
Authors:
Marcel Walter1 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen/DFKI, DE
Abstract
Circuits based on complementary metal-oxide-semiconductors (CMOS) enabled the digital revolution and still provide the basis for almost all computational devices to this date. Nevertheless, the class of Field-coupled Nanocomputing (FCN) technologies is a promising candidate to outperform CMOS circuitry in various metrics. Not only does FCN process binary information inherently, but it also allows for absolute low-power in-memory computing with an energy dissipation that is magnitudes below that of CMOS. However, physical design for FCN technologies is still in its infancy. In this Student Research Forum Proposal, a complete flow for the physical design of FCN circuitry is presented. This includes exact and heuristic techniques for placement, routing, clocking, and timing, formal verification, and debugging. All proposed algorithms have been made publicly available in a holistic framework called fiction.
/sites/date21/files/phdforum/FM01.1.23_Design_Automation_for_Field-coupled_Nanotechnologies_Marcel_Walter.pdf
17:00 CEST FM01.1.24 HARDWARE AND SOFTWARE TECHNIQUES FOR SECURING INTELLIGENT CYBER-PHYSICAL SYSTEMS
Speaker and Author:
Faiq Khalid, TU Wien, AT
Abstract
This Ph.D. work aims to design a robust intelligent CPS against the hardware-level security attacks (e.g., hardware Trojans, communication network attacks for VANET) and software-level security attacks (e.g., adversarial attacks on ML-based components in CPS). Towards this goal, this work studies and analyzes the security vulnerabilities at hardware- and software levels to identify the potentially vulnerable components, SoCs, or systems in CPS. Based on these analyses, this work improves the security of the CPS by deploying efficient and low-overhead solutions. These solutions can either identify the potential attacks during run-time or provide an efficient defend against these attacks.
/sites/date21/files/phdforum/FM01.1.24_Hardware-and-Software-Techniques-for-Securing-Intelligent-Cyber-Physical-Systems_FaiqKhalid.pdf

O.1 Opening Session: Plenary and Awards Ceremony

Date: Tuesday, 02 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/P3YMvvxN7oEXc2FMt

Session chair:
Franco Fummi, University of Verona, IT

Session co-chair:
Ian O'Connor, Ecole Centrale de Lyon, FR

Time Label Presentation Title
Authors
07:00 CEST O.1.1 WELCOME ADDRESSES
Speakers:
Franco Fummi1 and Ian O'Connor2
1Universita' di Verona, IT; 2Lyon Institute of Nanotechnology, FR
Abstract
Welcome messages by the general chair and the program chair.
07:35 CEST O.1.2 PRESENTATION OF AWARDS
Speakers:
Franco Fummi1 and Ian O'Connor2
1Universita' di Verona, IT; 2Lyon Institute of Nanotechnology, FR
Abstract
Presentation of awards.

K.1 Opening Keynote: "Quantum supremacy using a programmable superconducting processor"

Date: Tuesday, 02 February 2021
Time: 07:50 CEST - 08:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/8xs6zHEtqREs2vvjp

Session chair:
Mathias Soeken, Microsoft, CH

Session co-chair:
Marco Casale Rossi, Synopsys, IT

The promise of quantum computers is that certain computational tasks might be executed exponentially faster on a quantum processor than on a classical processor. A fundamental challenge is to build a high-fidelity processor capable of running quantum algorithms in an exponentially large computational space. Here we report the use of a processor with programmable superconducting qubits to create quantum states on 53 qubits, corresponding to a computational state-space of dimension 2^53 (about 10^16). Measurements from repeated experiments sample the resulting probability distribution, which we verify using classical simulations. Our Sycamore processor takes about 200 seconds to sample one instance of a quantum circuit a million times—our benchmarks currently indicate that the equivalent task for a state-of-the-art classical supercomputer would take approximately 10,000 years. This dramatic increase in speed compared to all known classical algorithms is an experimental realization of quantum supremacy for this specific computational task, heralding a much-anticipated computing paradigm. Bio:John Martinis did pioneering experiments in superconducting qubits in the mid 1980’s for his PhD thesis. He has worked on a variety of low temperature device physics during his career, focusing on quantum computation since the late 1990s. He was awarded the London Prize in Low temperature physics in 2014 for his work in this field. From 2014 to 2020 he worked at Google to build a useful quantum computer, culminating in a quantum supremacy experiment in 2019.

Time Label Presentation Title
Authors
07:50 CEST K.1.1 QUANTUM SUPREMACY USING A PROGRAMMABLE SUPERCONDUCTING PROCESSOR
Speaker and Author:
John Martinis, Google, UCSB and Quantala, US
Abstract
The promise of quantum computers is that certain computational tasks might be executed exponentially faster on a quantum processor than on a classical processor. A fundamental challenge is to build a high-fidelity processor capable of running quantum algorithms in an exponentially large computational space. Here we report the use of a processor with programmable superconducting qubits to create quantum states on 53 qubits, corresponding to a computational state-space of dimension 2^53 (about 10^16). Measurements from repeated experiments sample the resulting probability distribution, which we verify using classical simulations. Our Sycamore processor takes about 200 seconds to sample one instance of a quantum circuit a million times—our benchmarks currently indicate that the equivalent task for a state-of-the-art classical supercomputer would take approximately 10,000 years. This dramatic increase in speed compared to all known classical algorithms is an experimental realization of quantum supremacy for this specific computational task, heralding a much-anticipated computing paradigm.
08:30 CEST K.1.2 LIVE Q&A
Authors:
John Martinis1 and Mathias Soeken2
1Google, UCSB and Quantala, US; 2Microsoft, CH
Abstract
Live question and answer session for interaction among speaker and audience

1.1 Innovative technologies & architectures for tomorrow’s compute platforms

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/3sgkBGBTn54n2YA7T

Session chair:
Adrià Armejach, BSC, ES

Session co-chair:
Nehir Sonmez, BSC, ES

Organizers:
Gilles Sassatelli, LIRMM, FR
Miquel Moreto, BSC, ES

From the optimized use of heterogeneous systems for emerging workloads to leveraging the benefits of emerging technologies in computer architecture, this session will demonstrate through 3 papers a range of possible novel hardware and software approaches for bettering tomorrow’s HPC and cloud computing infrastructure, application, performance and energy-wise.

Time Label Presentation Title
Authors
08:50 CEST 1.1.1 STORAGE CLASS MEMORY WITH COMPUTING ROW BUFFER: A DESIGN SPACE EXPLORATION
Speaker:
Valentin EGLOFF, CEA-List, FR
Authors:
Valentin Egloff1, Jean-Philippe Noel1, Maha Kooli1, Bastien Giraud1, Lorenzo Ciampolini1, Roman Gauchi1, Cesar Fuguet1, Eric Guthmuller1, Mathieu Moreau2 and Jean-Michel Portal2
1University Grenoble Alpes, CEA, List, FR; 2Aix Marseille Univ, Université de Toulon, CNRS, IM2NP, FR
Abstract
Today computing centric von Neumann architecture face strong limitations in the data-intensive context of numerous applications, such as deep learning. One of these limitations corresponds to the well known von Neumann bottleneck. To overcome this bottleneck, the concepts of In-Memory Computing (IMC) and Near-Memory Computing (NMC) have been pro- posed. IMC solutions based on volatile memories, such as SRAM and DRAM, with nearly infinite endurance, solve only partially the data transfer problem from the Storage Class Memory (SCM). Computing in SCM is extremely limited by the intrinsic poor endurance of the Non-Volatile Memory (NVM) technologies. In this paper, we propose to take the best of both solutions, by introducing a Computing Row Buffer (C-RB), using a Computing SRAM (C-SRAM) model, in place of the standard Row Buffer (RB) in the SCM. The principle is to keep operations on large vectors in the C-RB of the SCM, reducing data movement to the CPU, thus drastically saving energy consumption of the overall system. To evaluate the proposed architecture, we use an instruction accurate platform based on Intel Pin software. Pin instruments run time binaries in order to get applications’ full memory traces of our solution. We achieve energy reduction up to 7.9x on average and up to 45x for the best case and speedup up to 3.8x on average and up to 13x for the best case, and a reduction of write access in the SCM up to 18%, compared to SIMD 512-bit architecture.
09:10 CEST 1.1.2 FROM A FPGA PROTOTYPING PLATFORM TO A COMPUTING PLATFORM: THE MANGO EXPERIENCE
Speaker:
Jose Flich, Universitat Politècnica de València, ES
Authors:
Josè Flich1, Rafael Tornero2, David Rodriguez3, Jose Maria Martínez2, Davide Russo4 and Carles Hernández2
1Associate Professor, Universitat Politècnica de València, ES; 2TU Valencia, ES; 3Universitat Jaume I, ES; 4University Federico II - Naples, IT
Abstract
In this paper we describe the evolution of the FPGA-based cluster used in the MANGO project from a hardware prototyping platform of HPC architectures to a computing platform targeting HPC and AI applications on different European projects such as RECIPE and DeepHealth. Our main goal is to reinvest on the MANGO cluster by providing a duality in its use for both large-scale hardware prototyping and high-performance computation. From our work experience we can reach several interesting conclusions about the complexities and hurdles that lay below FPGA technologies, and therefore, shedding some light onto the real complexities that difficult the adoption of FPGAs on either large-scale pure HPC systems or on hybrid systems (HPC + BigData/AI).
09:30 CEST 1.1.3 HETEROGENEOUS COMPUTING SYSTEMS FOR COMPLEX SCIENTIFIC DISCOVERY WORKFLOWS
Speaker:
Christoph Hagleitner, IBM, CH
Authors:
Christoph Hagleitner1, Dionysios Diamantopoulos2, Burkhard Ringlein3, Constantinos Evangelinos4, Edward Pyzer-Knapp5, Michael Johnston6, Charles Johns4, Rong A. Chang4, Bruce D'Amora4, James Kahle4 and James Sexton4
1IBM, CH; 2IBM Research, CH; 3IBM Research -- Zurich, CH; 4IBM, US; 5IBM, GB; 6IBM, IE
Abstract
With Moore's law progressively running out of steam, heterogeneous computing architectures are already powering the #1 supercomputers since a few years and are now finding broader adoption. The trend towards sustainable computing also requires domain-specific heterogeneous hardware architectures, which promises further gains in energy efficiency. At the same time, todays HPC applications have evolved from monolithic simulations in a single domain to complex workflows crossing multiple disciplines. In this paper, we explore how these trends affect the system design decisions and what this means for future computing architectures.
1.1.4 LIVE JOINT Q&A
Authors:
Valentin Egloff1, Jose Flich2, Christoph Hagleitner3, Adrià Armejach4 and Nehir Sonmez4
1CEA-List, FR; 2Universitat Politècnica de València, ES; 3IBM, CH; 4BSC, ES
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

1.2 IT Sustainability (Embedded tutorials)

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/AQHEMYJraSy8Z8CLS

Session chair:
Gilles Sassatelli, LIRMM, FR

Session co-chair:
Miquel Moreto, BSC, ES

Organizers:
Gilles Sassatelli, LIRMM, FR
Miquel Moreto, BSC, ES

This embedded tutorial session is devoted to surveying fundamental considerations of IT sustainability through 2 tutorials. The first tutorial will be giving a broad vision of the sustainability challenges and the overall impact of IT on the same, covering various aspects such as resources and product lifecycle. The second tutorial will be providing an in-depth analysis of the power consumption of contemporary and emerging workloads such as AI and discussing disruptive approaches towards boldly lowering the carbon footprint for future generation systems.

Time Label Presentation Title
Authors
08:50 CEST 1.2.1 MOORE’S LAW AND ICT INNOVATION IN THE ANTHROPOCENE
Speaker:
David Bol, Université catholique de Louvain, BE
Authors:
David Bol, Thibault Pirson and Rémi Dekimpe, Université catholique de Louvain, BE
Abstract
In information and communication technologies (ICTs), innovation is intrinsically linked to empirical laws of exponential efficiency improvement such as Moore's law. By following these laws, the industry achieved an amazing relative decoupling between the improvement of key performance indicators (KPIs), such as the number of transistors, from physical resource usage such as silicon wafers. Concurrently, digital ICTs came from almost zero greenhouse gas emission (GHG) in the middle of the twentieth century to direct annual carbon footprint of approximately 1400 MT CO2e today. Given the fact that we have to strongly reduce global GHG emissions to limit global warming below 2°C, it is not clear if the simple follow-up of these trends can decrease the direct GHG emissions of the ICT sector on a trajectory compatible with Paris agreement. In this paper, we analyze the recent evolution of energy and carbon footprints from three ICT activity sub-sectors: semiconductor manufacturing, wireless Internet access and datacenter usage. By adopting a Kaya-like decomposition in technology affluence and efficiency factors, we find out that the KPI increase failed to reach an absolute decoupling with respect to total energy consumption because the technology affluence increases more than the efficiency. The same conclusion holds for GHG emissions except for datacenters, where recent investment in renewable energy sources lead to an absolute GHG reduction over the last years, despite a moderate energy increase.
09:20 CEST 1.2.2 FEW HINTS TOWARDS MORE SUSTAINABLE ARTIFICIAL INTELLIGENCE
Speaker and Author:
Marc Duranton, CEA, FR
Abstract
Artificial Intelligence (AI) is now everywhere and its domains of application grow every day. But its demand in data and in computing power is also growing at an exponential rate, faster than used to be the “Moore’s law”. The largest structures, like GPT-3, have impressive results but also trigger questions about the resources required for their learning phase, in the order of magnitude of hundreds of MWh. Once the learning done, the use of Deep Learning solutions (the “inference” phase) is far less energy demanding, but the systems are often duplicated in quantities (e.g. for consumer applications) and reused multiple times, so the cumulative energy consumption is also important. It is therefore of paramount importance to improve the efficiency of AI solutions in all their lifetime. This can only be achieved by combining efforts on several domains: on the algorithmic side, on the codesign application/algorithm/hardware, on the hardware architecture and on the (silicon) technology for example. The aim of this short tutorial is to raise awareness on the energy consumption of AI and to show different tracks to improve this problem, from distributed and federated learning, to optimization of Neural Networks and their data representation (e.g. using “Spikes” for information coding), to architectures specialized for AI loads, including systems where memory and computation are near, and systems using emerging memories or 3D stacking.
09:50 CEST 1.2.3 LIVE JOINT Q&A
Authors:
Gilles Sassatelli1, Miquel Moreto2, David Bol3 and Marc Duranton4
1LIRMM, FR; 2BSC, ES; 3Université catholique de Louvain, BE; 4CEA, FR
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

1.3 The Road Towards Predictable Automotive High-Performance Platforms

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ME2W47nSPPcLFLXfp

Session chair:
Arne Hamann, Bosch, DE

Session co-chair:
Matteo Andreozzi, ARM, GB

Organizers:
Arne Hamann, Bosch, DE
Matteo Andreozzi, ARM, GB

Due to the trends of centralizing the E/E architecture and new computing-intensive applications, high-performance hardware platforms are currently finding their way into automotive systems. However, the Systems-on-Chip (SoCs) currently available on the market have significant weaknesses when it comes to providing predictable performance for time-critical applications. The main reason for this is that these platforms are optimized for average-case performance. This shortcoming represents one major risk in the development of current and future automotive systems. In this session we discuss how high-performance and predictability could (and should) be reconciled in future HW/SW platforms. We believe that this goal can only be reached via a close collaboration among system suppliers, IP providers, semiconductor companies, and OS/hypervisor vendors. Furthermore, academic input will be needed to solve remaining challenges and to further improve initial solutions.

Time Label Presentation Title
Authors
08:50 CEST 1.3.1 SOFTWARE MECHANISMS FOR CONTROLLING QOS
Speaker:
Jörg Seitter, Robert Bosch GmbH, DE
Authors:
Falk Rehm and Jörg Seitter, Bosch, DE
Abstract
Available Common of the shelf (COTS) platforms have little support for configuring Quality of Service (QoS) for various shared resources, for instance, the interconnect or the DRAM. In order to achieve predictable performance, one has, thus, to resort to software-based methods for controlling interference and reduce shared resource contention. Examples include memory bandwidth regulation and cache coloring, which can be implemented on hypervisor or operating system level. The Bosch talk will give insights on currently developed VIPs including potential pitfalls and different possible software-based mechanisms that are investigated for increasing performance predictability.
09:05 CEST 1.3.2 RESOURCE CONTENTION AVOIDANCE MECHANISMS IN HIGH-PERFORMANCE ARM-BASED SYSTEMS
Speaker and Author:
Jan-Peter Larsson, Arm, GB
Abstract
There exist a variety of software techniques that reduce shared resource contention. The drawback of such techniques is that they often require detailed knowledge of the hardware platform and its underlying IP, involve workload porting, or impose performance overheads that reduce the overall efficiency of the system. Hardware can and should do more to assist software in this task: by providing identification, monitoring and control mechanisms that help system software observe the behavior of competing workloads and apportion the shared resources among them, hardware-based resource contention avoidance mechanisms can improve on the efficiency and efficacy of purely software-based approaches. In this talk, we provide an overview of two Arm technologies: L3 cache partitioning in the DynamIQ Shared Unit (DSU), and control and monitoring features in the Armv8.4-A Memory Partitioning and Monitoring (MPAM) architecture extension. The DSU provides an L3 cache partitioning scheme under software control that can limit cache contention between competing workloads in a DynamIQ processor cluster. MPAM is an example of an architectural approach to resource contention avoidance and provides workload identification and attribution of memory traffic throughout the system, enabling software-controlled apportioning of system resources like cache capacity and memory bandwidth as well as monitoring of the performance of individual workloads. Finally, we provide examples of how these two complementary Arm technologies can work in tandem with system software to reduce shared resource contention, and we present the principles we believe will increase the determinism and predictability of real-time workloads that execute on high-performance Arm-based platforms.
09:20 CEST 1.3.3 ADMISSION CONTROL FOR GUARANTEEING E2E QOS IN MPSOCS
Speaker and Author:
Selma Saidi, TU Dortmund, DE
Abstract
An application in MPSoCs must generally acquire several shared (interconnect and memory) resources with independent arbiters and often provided by different vendors. One major challenge is to control the effect of interference on shared resources in an end-to-end fashion. In this talk, we discuss an alternative solution for controlling accesses to shared resources using admission control mechanisms. The goal is to decouple the data layer, where transmission is performed, from the control layer responsible for allocation and arbitration of available resources. Interference analysis can then account for applications requests arrival at the resource management control unit instead of individual flits/packets arrival at every (sub) resource. The proposed approach allows to simplify system performance analysis by reducing the complexity of coupling different resources timing analysis, which usually leads to pessimistic formal guarantees or decreased performance and utilization
09:35 CEST 1.3.4 SUPPORTING SYSTEM DESIGN WITH FORMAL PERFORMANCE ANALYSIS
Speaker:
Giovanni Stea, University of Pisa, IT
Authors:
Giovanni Stea1 and Raffaele Zippo2
1University of Pisa, IT; 2University of Florence and University of Pisa, IT
Abstract
From a software perspective, systems will evolve over development time which drives the need for early predictions of system behaviour. System performance analysis is an important ingredient for successful system design. The high dynamic behaviour due to caches, memory controllers, etc. makes system analysis for certain abstraction level very difficult. The talk of University of Pisa will discuss how the network calculus formalism can be used to predict performance in the context of vehicle integration platforms. More precisely, it will present an approach for finding worst-case delay bounds in the access to a shared DRAM controller, using the First-ready, First-come-first-served (FR-FCFS) arbitration policy. Furthermore, it will show that the algorithm to compute those bounds is simple (it can run in milliseconds time) and can be adaptable to several DRAM models (all it takes is to incorporate the DRAM timing parameters and constraint). Benchmark results will show that the distance between lower and upper bounds obtained by the approach is immaterial for practical purposes (few percentage points at most).

1.4 HLS: from hardware optimization to security

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/bfzhsyqpYFSn5Az25

Session chair:
Philippe Coussy, Universite de Bretagne-Sud / Lab-STICC, FR

Session co-chair:
Fabrizio Ferrandi, Politecnico di Milano, IT

Hardware optimization and security are key questions to be answered during accelerator design with high-level synthesis. This session consists of three regular papers and three IP papers that address these challenges using novel techniques. The first paper replaces memory accesses in a loop with scalar values, handling multiple write accesses in a loop body. The second paper integrates obfuscation into the back-end HLS algorithms to apply a set of key-based obfuscations on control and data paths. The third paper trains a machine learning model to represent the design space for an HLS design, driving the exploration towards the most promising directions, by considering both estimates from HLS, logic synthesis, and physical design, as well as both performance and resources.

Time Label Presentation Title
Authors
08:50 CEST 1.4.1 SCALAR REPLACEMENT IN THE PRESENCE OF MULTIPLE WRITE ACCESSES FOR ACCELERATOR DESIGN WITH HIGH-LEVEL SYNTHESIS
Speaker and Author:
Kenshu Seto, Tokyo City University, JP
Abstract
High-level synthesis (HLS) reduces design time of domain-specific accelerators from loop nests. Usually, naive usage of HLS leads to accelerators with insufficient performance, so very time-consuming manual optimizations of input programs are necessary in such cases. Scalar replacement is a promising automatic memory access optimization that removes redundant memory accesses. However, it cannot handle loops with multiple write accesses to the same array, which poses a severe limitation of its applicability. In this paper, we propose a new memory access optimization technique that breaks the limitation. Experimental results show that the proposed method achieves 2.1x performance gain on average for the benchmark programs which the state-of-the-art memory optimization techniques cannot optimize.
09:05 CEST 1.4.2 HOST: HLS OBFUSCATIONS AGAINST SMT ATTACK
Speaker:
Chandan Karfa, IIT Guwahati, IN
Authors:
Chandan Karfa1, TM Abdul Khader2, Yom Nigam2, Ramanuj Chouksey2 and Ramesh Karri3
1Indian Institute of Technology Guwahati, IN; 2IIT Guwahati, IN; 3NYU, US
Abstract
The fab-less IC design industry is at risk of IC counterfeiting and Intellectual Property (IP) theft by untrusted third party foundries. Logic obfuscation thwarts IP theft by locking the functions of gate-level netlists using a locking key. The complexity of circuit designs and migration to high level synthesis (HLS) expands the scope of logic locking to a higher abstraction. Automated RTL locking during HLS integrates obfuscation into the back-end HLS algorithms. This is tedious and requires implementing them in the source code of the HLS tools. Furthermore, recent work proposed an SMT attack on HLS-based obfuscation. In this work, we propose RTL locking tool HOST to thwart the SMT attack. The HOST approach is agnostic to the HLS tool. Experimental results show that the HOST obfuscations have low overhead and thwart SMT attacks.
09:20 CEST IP1_1.1 PARAMETRIC THROUGHPUT ORIENTED LARGE INTEGER MULTIPLIERS FOR HIGH LEVEL SYNTHESIS
Speaker:
Emanuele Vitali, Politecnico di Milano, IT
Authors:
Emanuele Vitali, Davide Gadioli, Fabrizio Ferrandi and Gianluca Palermo, Politecnico di Milano, IT
Abstract
The multiplication of large integers represents a significant computation effort in some cryptographic techniques. The use of dedicated hardware is an appealing solution to improve performance or efficiency. We propose a methodology to generate throughput oriented hardware accelerators for large integers multiplication leveraging High-Level Synthesis. The proposed micro-architectural template is composed of a combination of different multiplication algorithms. It exploits the recursive splitting of Karatsuba, reuse strategies, and the efficiency of Comba to control the extra-functional properties of the generated multiplier. The goal is to enable the end-user to explore a wide range of possibilities, in terms of performance and resource utilization, without requiring them to know implementation and synthesis details. Experimental results show the large flexibility of the generated architectures and that the generated Pareto-set of multipliers can outperform some state-of-the-art RTL design.
09:21 CEST IP1_1.2 LOCKING THE RE-USABILITY OF BEHAVIORAL IPS: DISCRIMINATING THE SEARCH SPACE THROUGH PARTIAL ENCRYPTIONS
Speaker:
Zi Wang, University of Texas at Dallas, US
Authors:
Zi Wang and Benjamin Carrion Schaefer, University of Texas at Dallas, US
Abstract
Behavioral IPs (BIPs) have one salient advantage compare to the traditional RTL IPs given in Verilog or VHDL or even gate netlists. The BIP can be used to generate RTLs with very different characteristics by simply specifying different synthesis directives. These synthesis directives are typically specified at the source code in the form of pragmas (comments) and control how to synthesize arrays (e.g. registers or RAM), loops (unroll or fold) and functions (inline or no). This allows a BIP consumer to purchase a BIP once and re-use it in future projects by simply specifying a different mix of these synthesis directives. This would obviously not benefit the BIP provider as the BIP consumer would not need to purchase the BIP again for future projects as oppose to IPs bought at the RT or gatenetlist level. To address this, this work presents a method to enable the BIP provider to lock the search space of the BIP such that the user can only generate micro-architectures within the specified search space. This leads to a significant benefit to both parties: The BIP provider can now discriminate the BIP price based on how much of the search space is made visible to the BIP consumer, while the BIP consumer benefits from a cheaper BIP, albeit limited in its search space. This approach is made possible through partial encryptions of the BIP. Thus, this work presents a method that selectively fixes some synthesis directives and allows the BIP user to modify the rest of the directives such that the micro-architectures generated are guaranteed to be in a given pre-defined search space limit.
09:22 CEST 1.4.3 (Best Paper Award Candidate)
CORRELATED MULTI-OBJECTIVE MULTI-FIDELITY OPTIMIZATION FOR HLS DIRECTIVES DESIGN
Speaker:
Qi SUN, The Chinese University of Hong Kong, HK
Authors:
QI SUN1, Tinghuan Chen1, Siting Liu1, Jin Miao2, Jianli Chen3, Hao Yu4 and Bei Yu1
1The Chinese University of Hong Kong, HK; 2Cadence Design Systems, US; 3Fudan University, CN; 4Southern University of Science and Technology, China, CN
Abstract
High-level synthesis (HLS) tools have gained great attention in recent years because it emancipates engineers from the complicated and heavy hardware description language writing, by using high-level languages and HLS directives. However, previous works seem powerless, due to the time-consuming design processes, the contradictions among design objectives, and the accuracy difference between the three stages (fidelities). To find good HLS directives, in this paper, a novel correlated multi-objective non-linear optimization algorithm is proposed to explore the Pareto solutions while making full use of data from different fidelities. A non-linear Gaussian process is proposed to model relationships among the analysis reports from different fidelities for the same objective. For the first time, correlated multivariate Gaussian process models are introduced into this domain to characterize the complex relationships of multiple objectives in each design fidelity. A tree-based method is proposed to erase invalid solutions and obviously non-optimal solutions. Experimental results show that our non-linear and pioneering correlated models can approximate the Pareto-frontier of the directive design space in a shorter time with much better performance and good stability, compared with the state-of-the-art.
09:37 CEST IP1_4.1 OPPORTUNISTIC IP BIRTHMARKING USING SIDE EFFECTS OF CODE TRANSFORMATIONS ON HIGH-LEVEL SYNTHESIS
Speaker:
Christian Pilato, Politecnico di Milano, IT
Authors:
Hannah Badier1, Christian Pilato2, Jean-Christophe Le Lann3, Philippe Coussy4 and Guy Gogniat5
1ENSTA Bretagne, FR; 2Politecnico di Milano, IT; 3ENSTA-Bretagne, FR; 4Universite de Bretagne-Sud / Lab-STICC, FR; 5Université Bretagne Sud, FR
Abstract
The increasing design and manufacturing costs are leading to globalize the semiconductor supply chain. However, a malicious attacker can resell a stolen Intellectual Property (IP) core, demanding methods to identify a relationship between a given IP and a potentially fraudulent copy. We propose a method to protect IP cores created with high-level synthesis (HLS): our method inserts a discrete birthmark in the HLS-generated designs that uses only intrinsic characteristics of the final RTL. The core of our process leverages the side effects of HLS due to specific source-code manipulations, although the method is HLS-tool agnostic. We propose two independent validation metrics, showing that our solution introduces minimal resource and delay overheads (<6% and <2%, respectively) and the accuracy in detecting illegal copies is above 96%.

1.5 Adaptive and Learning Systems

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/KnRTkpBoc8JDGhGmQ

Session chair:
Domenico Balsamo, Newcastle University, GB

Session co-chair:
Anuj Pathania, University of Amsterdam, NL

Recent advances in machine learning have pushed the boundaries of what is possible in self-adaptive and learning systems. This session explores this in two directions: the first investigates the state of art in efficient online and adaptive learning and inference for embedded systems, while the second exploits machine learning for improving the efficiency of emerging applications.

Time Label Presentation Title
Authors
08:50 CEST 1.5.1 ONLINEHD: ROBUST, EFFICIENT, AND SINGLE-PASS ONLINE LEARNING USING HYPERDIMENSIONAL SYSTEM
Speaker:
Mohsen Imani, University of California Irvine, US
Authors:
Alejandro Hérnandez-Cano1, Namiko Matsumoto2, Eric Ping2 and Mohsen Imani3
1Universidad Nacional Autónoma de México, MX; 2University of California San Diego, US; 3University of California Irvine, US
Abstract
Hyper-Dimensional computing (HDC) is a brain-inspired learning approach for efficient and robust learning on today’s embedded devices. HDC supports single-pass learning, where it generates a classification model by one-time looking at each training data point. However, the single-pass model provides weak classification accuracy due to model saturation caused by naively accumulating high-dimensional data. Although the retraining model for hundreds of iterations addresses the model saturation and boosts the accuracy, it comes with significant training costs. In this paper, we propose OnlineHD, an adaptive HDC training framework for accurate, efficient, and robust learning. During single-pass training, OnlineHD identifies common patterns and eliminates model saturation. For each data point, OnlineHD updates the model depending on how similar it is to the existing model, instead of naive data accumulation. We expand the OnlineHD framework to support highly-accurate iterative training. We also exploit the holographic distribution of patterns in high-dimensional space to make OnlineHD ultra-robust against possible noise and hardware failure. Our evaluations on a wide range of classification problems show that OnlineHD adaptive training provides comparable classification accuracy to the retrained model while getting all efficiency benefits that a single-pass training provides. OnlineHD achieves, on average, 3.5× and 6.9× (3.7× and 5.8×) faster and more efficient training as compared to state-of-the-art machine learning (HDC algorithms), while providing similar classification accuracy and 8.5× higher robustness to a hardware error.
09:05 CEST 1.5.2 ADAPTIVE GENERATIVE MODELING IN RESOURCE-CONSTRAINED ENVIRONMENTS
Speaker:
Jung-Eun Kim, Yale University, US
Authors:
Jung-Eun Kim1, Richard Bradford2, Max Del Giudice3 and Zhong Shao3
1Department of Computer Science, Yale University, US; 2Collins Aerospace, US; 3Yale University, US
Abstract
Modern generative techniques, deriving realistic data from incomplete or noisy inputs, require massive computation for rigorous results. These limitations hinder generative techniques from being incorporated in systems in resource-constrained environments, thus motivating methods that grant users control over the time-quality trade-offs for a reasonable “payoff” of execution cost. Hence, as a new paradigm for adaptively organizing and employing recurrent networks, we propose an architectural design for generative modeling achieving flexible quality.We boost the overall efficiency by introducing non-recurrent layers into stacked recurrent architectures. Accordingly, we design the architecture with no redundant recurrent cells so we avoid unnecessary overhead.
09:20 CEST IP1_2.1 OPERATING BEYOND FPGA TOOL LIMITATIONS: NERVOUS SYSTEMS FOR EMBEDDED RUNTIME MANAGEMENT
Speaker:
Martin Trefzer, University of York, GB
Authors:
Matthew Rowlings, Martin Albrecht Trefzer and Andy Tyrrell, University of York, GB
Abstract
Deep submicron fabrication issues throttle VLSI designs with pessimistic design constraints required to avoid failure of devices in the field. This imposes overly-conservative design approaches, including worst-case corners and speed-grade device binning, resulting in systems performing far below their maximum possible performance. An alternative is to monitor a device's operating state in the field and manage key parameters autonomously at runtime. In a modern SoC consisting of millions of transistors there are a huge number of potential monitoring and actuation points. This makes the autonomous management task difficult when using centralised intelligence for parameter decisions and is inherently non-scalable. An organism's decentralised control, the Nervous System, manages high degrees of scalability. Nervous Systems use a hierarchy of neural circuitry to: a) integrate sensory data, b) manage local feedback paths between sensory inputs (nerve cells) and local actuators (muscle cells), c) combine many integrated local sensory pathways together to form higher-level decisions that affect many actuators spread across the organism. This model maps well to VLSI designs: low-level sensors are formed of small sensory circuits (timing fault detectors, ring oscillators), low-level actuators map to configurable design elements (voltage islands, clock-tree delay elements) and high-level decision units manage global clock frequencies and device voltage rails which affect the whole chip. This paper motivates the adoption of a Nervous System-inspired approach. We explore the problem of device binning by presenting experimental results characterising an Artix-7 FPGA design. Our test circuit is overclocked by twice the maximum design tool frequency and run at 50 degrees Celsius above its maximum operating temperature without error. Our Configurable Intelligence Array is then introduced as a low-overhead intelligence platform, ideal for implementing nervous system signal pathways. This is used for a prototype neural circuit that closes the loop between a timing-fault detector and a programmable PLL.
09:21 CEST IP1_2.2 ADAPTIVE-LEARNING BASED BUILDING LOAD PREDICTION FOR MICROGRID ECONOMIC DISPATCH
Speaker:
Rumia Masburah, Student, IN
Authors:
Rumia Masburah1, Rajib Lochan Jana2, Ainuddin Khan2, Shichao Xu3, Shuyue Lan3, Soumyajit Dey1 and Qi Zhu3
1Indian Institute of Technology Kharagpur, IN; 2Indian institute of Technology Kharagpur, IN; 3Northwestern University, US
Abstract
Given that building loads consume roughly 40% ofthe energy produced in developed countries, smart buildingswith local renewable resources offer a viable alternative towardsachieving a greener future. Building temperature control strategiestypically employ detailed physical models capturing buildingthermal dynamics. Creating such models require a significantamount of time, information and finesse. Even then, due tounknown building parameters and related inaccuracies, futurepower demands by the building loads are difficult to estimate. Thiscreates unique challenges in the domain of microgrid economicpower dispatch for satisfying building power demands throughefficient control and scheduling of renewable and non-renewablelocal resources in conjunction with supply from the main grid. In this work, we estimate the real-time uncertainties in buildingloads using Gaussian Process (GP) learning and establish theeffectiveness of run time model correction in the context ofmicrogrid economic dispatch. Our system architecture employsa Deep Reinforcement Learning (DRL) framework that adap-tively triggers the GP model learning and updating phase forconsistently providing accurate power demand prediction of thebuilding load. We employ a Model Predictive Control (MPC)based microgrid power dispatch scheme enabled with our demandprediction framework and co-simulate it with EnergyPlus buildingload simulator to establish the efficacy of our approach.
09:22 CEST 1.5.3 PERFORMANCE ANALYSIS AND AUTO-TUNING FOR SPARK IN-MEMORY ANALYTICS
Speaker:
Dimosthenis Masouros, National TU Athens, GR
Authors:
Dimitra Nikitopoulou1, Dimosthenis Masouros1, Sotirios Xydis1 and Dimitrios Soudris2
1National TU Athens, GR; 2NTUA, GR
Abstract
Recently the Apache Spark in-memory computing framework has gained a lot of attention, due to its increased performance on large-scale data processing. Although Spark is highly configurable, its manually tuning is time consuming, due to the high-dimensional configuration space. Prior research has emerged frameworks able to analyze and model the performance of Spark applications, however they either rely on empirical selection of important parameters or/and follow a pure application-specific modeling approach. In this paper, we propose an end-to-end performance auto-tuning framework for Spark in-memory analytics. By adopting statistical hypothesis testing techniques, we manage to extract the higher order effects among differing parameters and their significance in performance optimization. In addition, we propose a new systematic meta-model driven approach utilizing cluster-, rather than application-wise performance modeling for traversing the configuration search space. We evaluate our approach using real scale analytic benchmarks from HiBench suite and show that the proposed framework achieves an average performance gain of x3.07 for known and x2.01 for unknown applications, compared to the default configuration.

1.6 Soft error vulnerability analysis and mitigation, and hotspot identification

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/DraY3EffZetwbDzh2

Session chair:
Hamdioui Said, TU Delft, NL

Session co-chair:
Luca Sterpone, Politecnico di Torino, IT

Soft errors are a growing concern, in this session the impact is analyzed on instruction sets and how we can harden flip-flops against multiple upsets. The session continues how to improve identification of lithography hotspots which is essential in advanced technologies to avoid systematic defects.

Time Label Presentation Title
Authors
08:50 CEST 1.6.1 GLAIVE: GRAPH LEARNING ASSISTED INSTRUCTION VULNERABILITY ESTIMATION
Speaker:
Jiajia Jiao, Cornell University, USA/Shanghai Maritime University, China, CN
Authors:
Jiajia Jiao1, Debjit Pal2, Chenhui Deng2 and Zhiru Zhang3
1Shanghai Maritime University & Cornell University, CN; 2Computer Systems Laboratory, Cornell University, US; 3Cornell University, US
Abstract
Due to the continuous technology scaling and lowering of operating voltages, modern computer systems are highly vulnerable to soft errors induced by the high-energy particles. Soft errors can corrupt program outputs leading to silent data corruption or a Crash. To protect computer systems against such failures, architects need to precisely and quickly identify vulnerable program instructions that need to be protected. Traditional techniques for program reliability estimation either use expensive and time-consuming fault injection or inaccurate analytical models to identify the program instructions that need to be protected against soft errors. In this work, we present GLAIVE, a graph learning-assisted model for fast, accurate, and transferable soft-error induced instruction vulnerability estimation. GLAIVE leverages a synergy between static analysis and datadriven statistical reasoning to automatically learn signatures of instruction-level vulnerabilities and their propagation to program outputs using a fine-grain error propagation information from the bit-level program graphs of a set of realistic benchmarks. Our experiments show that the learned knowledge of instruction vulnerability is transferable to unseen programs. We further show that GLAIVE can achieve an average 221x speedup and up to 33.09% lower program vulnerability estimation error as compared to a baseline fault-injection technique, up to 30.29% higher vulnerability estimation accuracy, and on average can cover up to 90.23% vulnerable instructions for a given protection budget compared to a set of baseline machine learning algorithms.
09:05 CEST 1.6.2 TRIGON: A SINGLE-PHASE-CLOCKING LOW POWER HARDENED FLIP-FLOP WITH TOLERANCE TO DOUBLE-NODE-UPSET FOR HARSH ENVIRONMENTS APPLICATIONS
Speaker:
Yan Li, Karlsruhe Institute of Technology, DE
Authors:
Yan Li1, Jun Han2, Xiaoyang Zeng1 and Mehdi Tahoori3
1State Key Laboratory of ASIC and System, Fudan University, CN; 2Fudan University, CN; 3Karlsruhe Institute of Technology, DE
Abstract
Single Event Upset (SEU) is one of the most susceptible reliability issues for CMOS circuits in a harsh environment, such as space or even a sea-level environment. Especially in the advanced nanoscale node, the phenomenon of Multi-node-upset (MNU) becomes more prominent. Although a lot of work has been proposed to solve this problem, most of them ignored the need for low power consumption. Particularly, most existing solutions are not effective anymore when operating in low supply voltage. Therefore, this paper proposes a novel Flip-Flop called TRIGON based on a single-phase-clocking structure to achieve low power consumption while being able to tolerate Double-node-upset (DNU), even when operating at lower supply voltages. The experimental results show that TRIGON has a significant reduction in the area and Power-delay-area-product (PDAP). Particularly, it achieves about 80% energy saving on average when the input is static compared with the state-of-the-art circuits.
09:20 CEST IP1_3.1 FORSETI: AN EFFICIENT BASIC-BLOCK-LEVEL SENSITIVITY ANALYSIS FRAMEWORK TOWARDS MULTI-BIT FAULTS
Speaker:
Jinting Ren, Chongqing University, CN
Authors:
Jinting Ren, Xianzhang Chen, Duo Liu, Moming Duan, Renping Liu and Chengliang Wang, Chongqing University, CN
Abstract
The per-instruction sensitivity analysis framework is developed to evaluate the resiliency of a program and identify the segments of the program needing protection. However, for multi-bit hardware faults, the per-instruction sensitivity analysis frameworks can cause large overhead for redundant analyses. In this paper, we propose a basic-block-level sensitivity analysis framework, Forseti, to reduce the analysis overhead in analyzing impacts of modern microprocessors' multi-bit faults on programs. We implement Forseti in LLVM and evaluate it with five typical workloads. Extensive experimental results show that Forseti can achieve more than 90% sensitivity classification accuracy and 6.16x speedup over instruction-level analysis.
09:21 CEST IP1_3.2 MODELING SILICON-PHOTONIC NEURAL NETWORKS UNDER UNCERTAINTIES
Speaker:
Sanmitra Banerjee, Duke University, US
Authors:
Sanmitra Banerjee1, Mahdi Nikdast2 and Krishnendu Chakrabarty1
1Duke University, US; 2Colorado State University, US
Abstract
Silicon-photonic neural networks (SPNNs) offer substantial improvements in computing speed and energy efficiency compared to their digital electronic counterparts. However, the energy efficiency and accuracy of SPNNs are highly impacted by uncertainties that arise from fabrication-process and thermal variations. In this paper, we present the first comprehensive and hierarchical study on the impact of random uncertainties on the classification accuracy of a Mach--Zehnder Interferometer (MZI)-based SPNN. We show that such impact can vary based on both the location and characteristics (e.g., tuned phase angles) of a non-ideal silicon-photonic device. Simulation results show that in an SPNN with two hidden layers and 1374 tunable-thermal-phase shifters, random uncertainties even in mature fabrication processes can lead to a catastrophic 70% accuracy loss.
09:22 CEST 1.6.3 ENHANCEMENTS OF MODEL AND METHOD IN LITHOGRAPHY HOTSPOT IDENTIFICATION
Speaker:
Rui Zhang, HiSilicon Technologies Co., Ltd., CN
Authors:
Xuanyu Huang1, Rui Zhang2, Yu Huang2, Peiyao Wang2 and Mei Li2
1Center for Nano and Micro Mechanics, Tsinghua University, China, CN; 2HiSilicon Technologies Co., Ltd. Shenzhen, China, CN
Abstract
The manufacturing of integrated circuits has been continuously improved through the advancement of fabrication technology nodes. However, the lithographic hotspots (HSs) caused by optical diffraction problems seriously affect the yield of ICs. Although lithography simulation can accurately capture the HSs through physically simulating the lithography process, it requires a lot of computing resources, which usually takes > 100 CPU_h=mm2. Due to the image recognition nature, the state-of-the-art HS identification algorithms based on deep learning have obvious advantages in reducing run time comparing to the traditional algorithms. However, its accuracy still needs to be enhanced since there are many false alarms of non-hotspots (NHSs) and escapes of the real HSs, which makes it difficult to be a signoff technique. In this paper, we propose two enhancements in HS identification. First, a hybrid deep learning model is proposed in lithography HS identification, which includes a CNN model to combine physical features. Second, an ensemble learning method is proposed based on multiple sub-models. The proposed enhanced model and method can achieve high HS identification accuracy on the benchmarks 1-4 of the ICCAD 2012 dataset with recall > 98.8%. In addition, it can achieve even 100% recall on the benchmarks 1 and 3 while maintaining the precision at a high level of over 93%. Moreover, for the first time it can achieve not only 100% recall on Benchmark 5, but also high precision of 61.8%, which is much higher than any published deep learning methods for HS identification, as far as we know. The proposed enhanced model and methodology can be applied in industrial IC designs due to its effectiveness and efficiency.

1.7 Novel Compilation Flows for Performance and Memory Footprint Optimization

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ksxqLWzPahhaFbojK

Session chair:
Nicola Bombieri, Università di Verona, IT

Session co-chair:
Andrea Bartolini, University of Bologna, IT

In the era of heterogenous embedded systems, the diverse nature of computing elements pushes more than ever the need of compilers to improve performance, energy efficiency and memory consumption of embedded software. This session tackles these issues with solutions that span from compilation flows based on machine learning to data flow analysis and restructuring.

Time Label Presentation Title
Authors
08:50 CEST 1.7.1 MLCOMP: A METHODOLOGY FOR MACHINE LEARNING-BASED PERFORMANCE ESTIMATION AND ADAPTIVE SELECTION OF PARETO-OPTIMAL COMPILER OPTIMIZATION SEQUENCES
Speaker:
Alessio Colucci, TU Wien, AT
Authors:
Alessio Colucci1, Dávid Juhász2, Martin Mosbeck1, Alberto Marchisio3, Semeen Rehman2, Manfred Kreutzer4, Guenther Nadbath4, Axel Jantsch2 and Muhammad Shafique5
1Vienna University of Technology (TU Wien), AT; 2TU Wien, AT; 3TU Wien (TU Wien), AT; 4ABIX GmbH, AT; 5New York University Abu Dhabi (NYUAD), AE
Abstract
Embedded systems have proliferated in various consumer and industrial applications with the evolution of Cyber-Physical Systems and the Internet of Things. These systems are subjected to stringent constraints so that embedded software must be optimized for multiple objectives simultaneously, namely reduced energy consumption, execution time, and code size. Compilers offer optimization phases to improve these metrics. However, proper selection and ordering of them depends on multiple factors and typically requires expert knowledge. State-of-the-art optimizers facilitate different platforms and applications case by case, and they are limited by optimizing one metric at a time, as well as requiring a time-consuming adaptation for different targets through dynamic profiling. To address these problems, we propose the novel MLComp methodology, in which optimization phases are sequenced by a Reinforcement Learning-based policy. Training of the policy is supported by Machine Learning-based analytical models for quick performance estimation, thereby drastically reducing the time spent for dynamic profiling. In our framework, different Machine Learning models are automatically tested to choose the best-fitting one. The trained Performance Estimator model is leveraged to efficiently devise Reinforcement Learning-based multi-objective policies for creating quasi-optimal phase sequences. Compared to state-of-the-art estimation models, our Performance Estimator model achieves lower relative error (<2%) with up to 50x faster training time over multiple platforms and application domains. Our Phase Selection Policy improves execution time and energy consumption of a given code by up to 12% and 6%, respectively. The Performance Estimator and the Phase Selection Policy can be trained efficiently for any target platform and application domain.
09:05 CEST 1.7.2 DATAFLOW RESTRUCTURING FOR ACTIVE MEMORY REDUCTION IN DEEP NEURAL NETWORKS
Speaker:
Antonio Cippolletta, Politenico di Torino, IT
Authors:
Antonio Cipolletta and Andrea Calimera, Politecnico di Torino, IT
Abstract
The volume reduction of the activation maps produced by the hidden layers of a Deep Neural Network (DNN) is a critical aspect in modern applications as it affects the on-chip memory utilization, the most limited and costly hardware resource. Despite the availability of many compression methods that leverage the statistical nature of deep learning to approximate and simplify the inference model, e.g., quantization and pruning, there is room for deterministic optimizations that instead tackle the problem from a computational view. This work belongs to this latter category as it introduces a novel method for minimizing the active memory footprint. The proposed technique, which is data-, model-, compiler-, and hardware-agnostic, does implement a functional-preserving, automated graph restructuring where the memory peaks are suppressed and distributed over time, leading to flatter profiles with less memory pressure. Results collected on a representative class of Convolutional DNNs with different topologies, from Vgg16 and SqueezeNetV1.1 to the recent MobileNetV2, ResNet18, and InceptionV3, provide clear evidence of applicability, showing remarkable memory savings (62.9% on average) with low computational overhead (8.6% on average).
09:20 CEST IP1_4.2 EFFICIENT TENSOR CORES SUPPORT IN TVM FOR LOW-LATENCY DEEP LEARNING
Speaker:
Wei Sun, Eindhoven University of Technology, NL
Authors:
Wei Sun1, Savvas Sioutas1, Sander Stuijk1, Andrew Nelson2 and Henk Corporaal3
1Eindhoven University of Technology, NL; 2TU Eindhoven, NL; 3TU/e (Eindhoven University of Technology), NL
Abstract
Deep learning algorithms are gaining popularity in autonomous systems. These systems typically have stringent latency constraints that are challenging to meet given the high computational demands of these algorithms. Nvidia introduced Tensor Cores (TCs) to speed up some of the most commonly used operations in deep learning algorithms. Compilers (e.g., TVM) and libraries (e.g., cuDNN) focus on the efficient usage of TCs when performing batch processing. Latency sensitive applications can however not exploit large batch processing. This paper presents an extension to the TVM compiler that generates low latency TCs implementations particularly for batch size 1. Experimental results show that our solution reduces the latency on average by 14% compared to the cuDNN library on a Desktop RTX2070 GPU, and by 49% on an Embedded Jetson Xavier GPU.
09:21 CEST 1.7.3 REDUCING MEMORY ACCESS CONFLICTS WITH LOOP TRANSFORMATION AND DATA REUSE ON COARSE-GRAINED RECONFIGURABLE ARCHITECTURE
Speaker:
Yuge Chen, Department of Micro/NaNo Electronics, Shanghai Jiao Tong University, CN
Authors:
Yuge Chen1, Zhongyuan Zhao2, Jianfei Jiang3, Guanghui He1, Zhigang Mao1 and Weiguang Sheng1
1Department of Micro-Nano Electronics, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, CN; 2Shanghai JiaoTong University, CN; 3Shanghai Jiao Tong University, CN
Abstract
Coarse-Grained Reconfigurable Arrays (CGRAs) are promising to have low power consumption and high energy-efficiency characteristics as accelerators. Recent years, many research works focus on improving the programmability of the CGRAs by enabling the fast reconfiguration during execution. The performance of these CGRAs critically hinges upon the scheduling power of the compiler. One of the critical challenges is to reduce memory access conflicts using static compilation techniques. Memory accessing conflict brings the synchronization overhead which causes the pipelining stall and reduces CGRA performance. Existing compilers usually tackle this challenge by orchestrating the data placement of the on-chip global memory (OGM) in CGRA to let the parallel memory accesses avoid the bank conflict. However, we find bank conflict is not the only reason that causes the memory access conflicts. In some CGRAs, the bandwidth of the data network between OGM and processing element array (PEA) is also limited due to the low power design principle. The unbalanced network bandwidth loads is another reason that causes memory access conflicts. Furthermore, the redundant data access across iterations is one of the primary causes of memory access conflicts. Based on these observations, we provide a comprehensive and generalized compilation flow to reduce the memory conflicts. Firstly, we develop a loop transformation model to maximize the inter-iteration data reuse of the loops to reduce the memory accessing operations under the software pipelining scheme. Secondly, we enhance the bandwidth utilization of the network between OGM and PEA and avoid the bank conflict by providing a conflict-aware spatial mapping algorithm which can be easily integrated into existing CGRA modulo scheduling compilation flow. Experimental results show our method is capable of improving performance by an average of 44\% comparing with state-of-the-art CGRA compiling flow.

1.8 Industrial Design Methods and Tools: Future EDA Applications and Thermal Simulation for 3D

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 09:40 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/syEBmJaaLgL3SDPX9

Organizer:
Jürgen Haase, edacentrum GmbH, DE

This Exhibition Workshop features two talks on industrial design methods and tools. It is open to conference delegates as well as to exhibition visitors.

Time Label Presentation Title
Authors
08:50 CEST 1.8.1 ENABLING EARLY AND FAST THERMAL SIMULATION FOR 3D MULTI-DIE SYSTEM DESIGNS
Speaker:
Iyad Rayane, Zuken, FR
Co-Author:
Koga Kazunari, Zuken, FR
Abstract

As design complexity increases with 3DICs and time-to-market becomes a critical component in the automotive, wearables and IoT segments, reducing design cycle time while maintaining accuracy of analysis has become all the more important. To address this, a system level co-design approach in step with multi-physics analysis is presented. To mitigate errors due to manual exchange of data between various engineering teams spread across chip, package and board with design and analysis adding further level of exchange, a design flow incorporating simplification at the layout level is shown. The flow enables various levels of simplified models to be used, wherein data transfer between the complex 3D structures in layout to the thermal analysis tool is automated. The efficacy of the model simplification is verified through a test case showing comparable results for the simplified and full models.

09:15 CEST 1.8.2 FUTURE VISION OF ALTAIR FOR EDA APPLICATIONS
Speaker:
Philippe Le Marrec, Altair, FR
Abstract

Nowadays, design of EDA applications are not only focused on hardware/software parts and need team collaborations. In many cases as in mechatronic, powertrain and control systems, the environment has to be used with the design itself at different level of abstraction. Altair is providing environments which help users to design these multi-physics environments and interact with dedicated solvers. Cloud solutions and data analytics can also be combined to handle best design tuning for powerful multi-physics simulations.


ET Exhibition Theatre

Date: Tuesday, 02 February 2021
Time: 08:50 CEST - 18:20 CEST
Location / Room: Exhibition Theatre

Organizer:
Jürgen Haase, edacentrum GmbH, DE

In addition to the conference programme, there will be Exhibition Workshops as part of the exhibition. These workshops will feature an Exhibition Keynote, technical presentations on the state-of-the-art in our industry, tutorials, and as a special highlight two sessions dedicated especially to a young audience.

The Exhibition Theatre sessions are open to conference delegates as well as to exhibition visitors.


IP1_1 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ehb3xpNP7hrKYpWKY

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP1_1.1 PARAMETRIC THROUGHPUT ORIENTED LARGE INTEGER MULTIPLIERS FOR HIGH LEVEL SYNTHESIS
Speaker:
Emanuele Vitali, Politecnico di Milano, IT
Authors:
Emanuele Vitali, Davide Gadioli, Fabrizio Ferrandi and Gianluca Palermo, Politecnico di Milano, IT
Abstract
The multiplication of large integers represents a significant computation effort in some cryptographic techniques. The use of dedicated hardware is an appealing solution to improve performance or efficiency. We propose a methodology to generate throughput oriented hardware accelerators for large integers multiplication leveraging High-Level Synthesis. The proposed micro-architectural template is composed of a combination of different multiplication algorithms. It exploits the recursive splitting of Karatsuba, reuse strategies, and the efficiency of Comba to control the extra-functional properties of the generated multiplier. The goal is to enable the end-user to explore a wide range of possibilities, in terms of performance and resource utilization, without requiring them to know implementation and synthesis details. Experimental results show the large flexibility of the generated architectures and that the generated Pareto-set of multipliers can outperform some state-of-the-art RTL design.
IP1_1.2 LOCKING THE RE-USABILITY OF BEHAVIORAL IPS: DISCRIMINATING THE SEARCH SPACE THROUGH PARTIAL ENCRYPTIONS
Speaker:
Zi Wang, University of Texas at Dallas, US
Authors:
Zi Wang and Benjamin Carrion Schaefer, University of Texas at Dallas, US
Abstract
Behavioral IPs (BIPs) have one salient advantage compare to the traditional RTL IPs given in Verilog or VHDL or even gate netlists. The BIP can be used to generate RTLs with very different characteristics by simply specifying different synthesis directives. These synthesis directives are typically specified at the source code in the form of pragmas (comments) and control how to synthesize arrays (e.g. registers or RAM), loops (unroll or fold) and functions (inline or no). This allows a BIP consumer to purchase a BIP once and re-use it in future projects by simply specifying a different mix of these synthesis directives. This would obviously not benefit the BIP provider as the BIP consumer would not need to purchase the BIP again for future projects as oppose to IPs bought at the RT or gatenetlist level. To address this, this work presents a method to enable the BIP provider to lock the search space of the BIP such that the user can only generate micro-architectures within the specified search space. This leads to a significant benefit to both parties: The BIP provider can now discriminate the BIP price based on how much of the search space is made visible to the BIP consumer, while the BIP consumer benefits from a cheaper BIP, albeit limited in its search space. This approach is made possible through partial encryptions of the BIP. Thus, this work presents a method that selectively fixes some synthesis directives and allows the BIP user to modify the rest of the directives such that the micro-architectures generated are guaranteed to be in a given pre-defined search space limit.

IP1_2 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/SzLa3CdcHoLXXd9TB

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP1_2.1 OPERATING BEYOND FPGA TOOL LIMITATIONS: NERVOUS SYSTEMS FOR EMBEDDED RUNTIME MANAGEMENT
Speaker:
Martin Trefzer, University of York, GB
Authors:
Matthew Rowlings, Martin Albrecht Trefzer and Andy Tyrrell, University of York, GB
Abstract
Deep submicron fabrication issues throttle VLSI designs with pessimistic design constraints required to avoid failure of devices in the field. This imposes overly-conservative design approaches, including worst-case corners and speed-grade device binning, resulting in systems performing far below their maximum possible performance. An alternative is to monitor a device's operating state in the field and manage key parameters autonomously at runtime. In a modern SoC consisting of millions of transistors there are a huge number of potential monitoring and actuation points. This makes the autonomous management task difficult when using centralised intelligence for parameter decisions and is inherently non-scalable. An organism's decentralised control, the Nervous System, manages high degrees of scalability. Nervous Systems use a hierarchy of neural circuitry to: a) integrate sensory data, b) manage local feedback paths between sensory inputs (nerve cells) and local actuators (muscle cells), c) combine many integrated local sensory pathways together to form higher-level decisions that affect many actuators spread across the organism. This model maps well to VLSI designs: low-level sensors are formed of small sensory circuits (timing fault detectors, ring oscillators), low-level actuators map to configurable design elements (voltage islands, clock-tree delay elements) and high-level decision units manage global clock frequencies and device voltage rails which affect the whole chip. This paper motivates the adoption of a Nervous System-inspired approach. We explore the problem of device binning by presenting experimental results characterising an Artix-7 FPGA design. Our test circuit is overclocked by twice the maximum design tool frequency and run at 50 degrees Celsius above its maximum operating temperature without error. Our Configurable Intelligence Array is then introduced as a low-overhead intelligence platform, ideal for implementing nervous system signal pathways. This is used for a prototype neural circuit that closes the loop between a timing-fault detector and a programmable PLL.
IP1_2.2 ADAPTIVE-LEARNING BASED BUILDING LOAD PREDICTION FOR MICROGRID ECONOMIC DISPATCH
Speaker:
Rumia Masburah, Student, IN
Authors:
Rumia Masburah1, Rajib Lochan Jana2, Ainuddin Khan2, Shichao Xu3, Shuyue Lan3, Soumyajit Dey1 and Qi Zhu3
1Indian Institute of Technology Kharagpur, IN; 2Indian institute of Technology Kharagpur, IN; 3Northwestern University, US
Abstract
Given that building loads consume roughly 40% ofthe energy produced in developed countries, smart buildingswith local renewable resources offer a viable alternative towardsachieving a greener future. Building temperature control strategiestypically employ detailed physical models capturing buildingthermal dynamics. Creating such models require a significantamount of time, information and finesse. Even then, due tounknown building parameters and related inaccuracies, futurepower demands by the building loads are difficult to estimate. Thiscreates unique challenges in the domain of microgrid economicpower dispatch for satisfying building power demands throughefficient control and scheduling of renewable and non-renewablelocal resources in conjunction with supply from the main grid. In this work, we estimate the real-time uncertainties in buildingloads using Gaussian Process (GP) learning and establish theeffectiveness of run time model correction in the context ofmicrogrid economic dispatch. Our system architecture employsa Deep Reinforcement Learning (DRL) framework that adap-tively triggers the GP model learning and updating phase forconsistently providing accurate power demand prediction of thebuilding load. We employ a Model Predictive Control (MPC)based microgrid power dispatch scheme enabled with our demandprediction framework and co-simulate it with EnergyPlus buildingload simulator to establish the efficacy of our approach.

IP1_3 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/Q6vgHj9Zeeru2NetT

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP1_3.1 FORSETI: AN EFFICIENT BASIC-BLOCK-LEVEL SENSITIVITY ANALYSIS FRAMEWORK TOWARDS MULTI-BIT FAULTS
Speaker:
Jinting Ren, Chongqing University, CN
Authors:
Jinting Ren, Xianzhang Chen, Duo Liu, Moming Duan, Renping Liu and Chengliang Wang, Chongqing University, CN
Abstract
The per-instruction sensitivity analysis framework is developed to evaluate the resiliency of a program and identify the segments of the program needing protection. However, for multi-bit hardware faults, the per-instruction sensitivity analysis frameworks can cause large overhead for redundant analyses. In this paper, we propose a basic-block-level sensitivity analysis framework, Forseti, to reduce the analysis overhead in analyzing impacts of modern microprocessors' multi-bit faults on programs. We implement Forseti in LLVM and evaluate it with five typical workloads. Extensive experimental results show that Forseti can achieve more than 90% sensitivity classification accuracy and 6.16x speedup over instruction-level analysis.
IP1_3.2 MODELING SILICON-PHOTONIC NEURAL NETWORKS UNDER UNCERTAINTIES
Speaker:
Sanmitra Banerjee, Duke University, US
Authors:
Sanmitra Banerjee1, Mahdi Nikdast2 and Krishnendu Chakrabarty1
1Duke University, US; 2Colorado State University, US
Abstract
Silicon-photonic neural networks (SPNNs) offer substantial improvements in computing speed and energy efficiency compared to their digital electronic counterparts. However, the energy efficiency and accuracy of SPNNs are highly impacted by uncertainties that arise from fabrication-process and thermal variations. In this paper, we present the first comprehensive and hierarchical study on the impact of random uncertainties on the classification accuracy of a Mach--Zehnder Interferometer (MZI)-based SPNN. We show that such impact can vary based on both the location and characteristics (e.g., tuned phase angles) of a non-ideal silicon-photonic device. Simulation results show that in an SPNN with two hidden layers and 1374 tunable-thermal-phase shifters, random uncertainties even in mature fabrication processes can lead to a catastrophic 70% accuracy loss.

IP1_4 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/drZX49ChBZw5F3Et8

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP1_4.1 OPPORTUNISTIC IP BIRTHMARKING USING SIDE EFFECTS OF CODE TRANSFORMATIONS ON HIGH-LEVEL SYNTHESIS
Speaker:
Christian Pilato, Politecnico di Milano, IT
Authors:
Hannah Badier1, Christian Pilato2, Jean-Christophe Le Lann3, Philippe Coussy4 and Guy Gogniat5
1ENSTA Bretagne, FR; 2Politecnico di Milano, IT; 3ENSTA-Bretagne, FR; 4Universite de Bretagne-Sud / Lab-STICC, FR; 5Université Bretagne Sud, FR
Abstract
The increasing design and manufacturing costs are leading to globalize the semiconductor supply chain. However, a malicious attacker can resell a stolen Intellectual Property (IP) core, demanding methods to identify a relationship between a given IP and a potentially fraudulent copy. We propose a method to protect IP cores created with high-level synthesis (HLS): our method inserts a discrete birthmark in the HLS-generated designs that uses only intrinsic characteristics of the final RTL. The core of our process leverages the side effects of HLS due to specific source-code manipulations, although the method is HLS-tool agnostic. We propose two independent validation metrics, showing that our solution introduces minimal resource and delay overheads (<6% and <2%, respectively) and the accuracy in detecting illegal copies is above 96%.
IP1_4.2 EFFICIENT TENSOR CORES SUPPORT IN TVM FOR LOW-LATENCY DEEP LEARNING
Speaker:
Wei Sun, Eindhoven University of Technology, NL
Authors:
Wei Sun1, Savvas Sioutas1, Sander Stuijk1, Andrew Nelson2 and Henk Corporaal3
1Eindhoven University of Technology, NL; 2TU Eindhoven, NL; 3TU/e (Eindhoven University of Technology), NL
Abstract
Deep learning algorithms are gaining popularity in autonomous systems. These systems typically have stringent latency constraints that are challenging to meet given the high computational demands of these algorithms. Nvidia introduced Tensor Cores (TCs) to speed up some of the most commonly used operations in deep learning algorithms. Compilers (e.g., TVM) and libraries (e.g., cuDNN) focus on the efficient usage of TCs when performing batch processing. Latency sensitive applications can however not exploit large batch processing. This paper presents an extension to the TVM compiler that generates low latency TCs implementations particularly for batch size 1. Experimental results show that our solution reduces the latency on average by 14% compared to the cuDNN library on a Desktop RTX2070 GPU, and by 49% on an Embedded Jetson Xavier GPU.

UB.01 University Booth

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/YLbHWvDXTm9zmr2eB

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.01 JOINTER: JOINING FLEXIBLE MONITORS WITH HETEROGENEOUS ARCHITECTURES
Speaker:
Giacomo Valente, Università degli Studi dell'Aquila, IT
Authors:
Giacomo Valente1, Tiziana Fanni2, Carlo Sau3, Claudio Rubattu2, Francesca Palumbo2 and Luigi Pomante1
1Università degli Studi dell'Aquila, IT; 2Università degli Studi di Sassari, IT; 3Università degli Studi di Cagliari, IT
Abstract
As embedded systems grow more complex and shift toward heterogeneous architectures, understanding workload performance characteristics becomes increasingly difficult. In this regard, run-time monitoring systems can support on obtaining the desired visibility to characterize a system.
This demo presents a framework that allows to develop complex heterogeneous architectures composed of programmable processors and dedicated accelerators on FPGA, together with customizable monitoring systems, keeping under control the introduced overhead.
The whole development flow (and related prototypal EDA tools), that starts with the accelerators creation using a dataflow model, in parallel with the monitoring system customization using a library of elements, showing also the final joining, will be shown. Moreover, a comparison among different monitoring systems functionalities on different architectures developed on Zynq7000 SoC will be illustrated.
/system/files/webform/tpc_registration/UB.1-JOINTER.pdf

UB.02 University Booth

Date: Tuesday, 02 February 2021
Time: 09:50 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/XNqYPPDzpFuNzt8Fo

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.02 TAPASCO: THE OPEN-SOURCE TASK-PARALLEL SYSTEM COMPOSER FRAMEWORK
Speaker:
Lukas Sommer, Embedded Systems & Applications, TU Darmstadt, DE
Authors:
Carsten Heinz, Lukas Sommer, Lukas Weber, Johannes Wirth and Andreas Koch, Embedded Systems & Applications, TU Darmstadt, DE
Abstract
Field-programmable gate arrays (FPGA) are an established platform for highly specialized accelerators, but in a heterogeneous setup, the accelerator still needs to be integrated into the overall system. The open-source TaPaSCo (Task-Parallel System Composer) framework was created to serve this purpose: The fast integration of FPGA-based accelerators into compute platforms or systems-on-chip (SoC) and their connection to relevant components on the FPGA board. TaPaSCo can support developers in all steps of the development process: from cores resulting from High-Level Synthesis or cores written in an HDL, a complete FPGA-design can be created. TaPaSCo will automatically connect all processing elements to the memory- and host-interface and generate a complete bitstream. The TaPaSCo Runtime API allows to interface with accelerators from software and supports operations such as transferring data to the FPGA memory, passing values and controlling the execution of the accelerators.
/system/files/webform/tpc_registration/UB.2-TaPaSCo.pdf

K.3 Keynote - Special day on sustainable HPC

Date: Tuesday, 02 February 2021
Time: 10:20 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/xraBiZssFzyFz585E

Session chair:
Miquel Moreto, BSC, ES

Session co-chair:
Gilles Sassatelli, LIRMM, FR

Organizers:
Gilles Sassatelli, LIRMM, FR
Miquel Moreto, BSC, ES

High-Performance computers require continued scaling of performance and efficiency to handle more demanding applications and scales. With the end of Moore’s Law and Dennard Scaling, continued performance scaling will come primarily from specialization. Specialized hardware engines can achieve performance and efficiency from 10x to 10,000x a CPU through specialization, parallelism, and optimized memory access. Graphics processing units are an ideal platform on which to build domain-specific accelerators. They provide very efficient, high performance communication and memory subsystems - which are needed by all domains. Specialization is provided via “cores”, such as tensor cores that accelerate deep learning or ray-tracing cores that accelerate specific applications. Bio: Bill is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill is currently working on developing hardware and software to accelerate demanding applications including machine learning, bioinformatics, and logical inference. He has a history of designing innovative and efficient experimental computing systems. While at Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS hardware accelerator. At Caltech he designed the MOSSIM Simulation Engine and the Torus Routing Chip which pioneered wormhole routing and virtual-channel flow control. At the Massachusetts Institute of Technology his group built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanisms from programming models and demonstrated very low overhead synchronization and communication mechanisms. At Stanford University his group developed the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations, the Merrimac supercomputer, which led to GPU computing, and the ELM low-power processor. Bill is a Member of the National Academy of Engineering, a Fellow of the IEEE, a Fellow of the ACM, and a Fellow of the American Academy of Arts and Sciences. He has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, the ACM Maurice Wilkes award, the IEEE-CS Charles Babbage Award, and the IPSJ FUNAI Achievement Award. He currently leads projects on computer architecture, network architecture, circuit design, and programming systems. He has published over 250 papers in these areas, holds over 160 issued patents, and is an author of the textbooks, Digital Design: A Systems Approach, Digital Systems Engineering, and Principles and Practices of Interconnection Networks

Time Label Presentation Title
Authors
10:20 CEST K.3.1 SUSTAINABLE HIGH-PERFORMANCE COMPUTING VIA DOMAIN-SPECIFIC ACCELERATORS
Speaker and Author:
William J. Dally, Stanford University and NVIDIA, US
Abstract
High-Performance computers require continued scaling of performance and efficiency to handle more demanding applications and scales. With the end of Moore’s Law and Dennard Scaling, continued performance scaling will come primarily from specialization. Specialized hardware engines can achieve performance and efficiency from 10x to 10,000x a CPU through specialization, parallelism, and optimized memory access. Graphics processing units are an ideal platform on which to build domain-specific accelerators. They provide very efficient, high performance communication and memory subsystems - which are needed by all domains. Specialization is provided via “cores”, such as tensor cores that accelerate deep learning or ray-tracing cores that accelerate specific applications.
10:50 CEST K.3.2 LIVE Q&A
Author:
William J. Dally, Stanford University and NVIDIA, US
Abstract
Live question and answer session for interaction among speaker and audience

K.2 Opening Keynote: "Superconducting Quantum Materials and Systems (SQMS) – a new DOE National Quantum Information Science Research Center"

Date: Tuesday, 02 February 2021
Time: 15:00 CEST - 15:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/s49Rhjt8reC5GqbzE

Session chair:
Mathias Soeken, Microsoft, CH

Session co-chair:
Marco Casale Rossi, Synopsys, IT

In this talk I will describe the mission, goals and the partnership strengths of the new US National Quantum Information Research Center SQMS. SQMS brings the power of DOE laboratories, together with industry, academia and other federal entities, to achieve transformational advances in the major cross-cutting challenge of understanding and eliminating the decoherence mechanisms in superconducting 2D and 3D devices, with the final goal of enabling construction and deployment of superior quantum systems for computing and sensing. SQMS combines the strengths of an array of experts and world-class facilities towards these common goals. Materials science experts will work in understanding and mitigating the key limiting mechanisms of coherence in the quantum regime. Coherence time is the limit on how long a qubit can retain its quantum state before that state is ruined by noise. It is critical to advancing quantum computing, sensing and communication. SQMS is leading the way in extending coherence time of superconducting quantum systems thanks to world-class materials science and through the world leading expertise in superconducting RF cavities which are integrated with industry-designed and -fabricated computer chips. Leveraging new understanding from the materials development, quantum device and quantum computing researchers will pursue device integration and quantum controls development for 2-D and 3-D superconducting architectures. One of the ambitious goals of SQMS is to build and deploy a beyond-state-of-the-art quantum computer based on superconducting technologies. Its unique high connectivity will provide unprecedented opportunity to explore novel quantum algorithms. SQMS researchers will ultimately build quantum computer prototypes based on 2-D and 3-D architectures, enabling new quantum simulation for science applications. Bio: Anna Grassellino is the Director of the National Quantum Information Science Superconducting Quantum Materials and Systems Center, a Fermilab Senior Scientist and the head of the Fermilab SQMS division. Her research focuses on radio frequency superconductivity, in particular on understanding and improving SRF cavities performance to enable new applications spanning from particle accelerators to detectors to quantum information science. Grassellino is a fellow of the American Physical Society, and the recipient of numerous awards for her pioneering contributions to SRF technology, including the 2017 Presidential Early Career Award, the 2017 Frank Sacherer Prize of the European Physical Society, the 2016 IEEE PAST Award, the 2016 USPAS prize and a $2.5 million DOE Early Career Award. She holds a Ph.D. in physics from the University of Pennsylvania and a master’s of electronic engineering from the University of Pisa, Italy.

Time Label Presentation Title
Authors
15:00 CEST K.2.1 SUPERCONDUCTING QUANTUM MATERIALS AND SYSTEMS (SQMS) – A NEW DOE NATIONAL QUANTUM INFORMATION SCIENCE RESEARCH CENTER
Speaker and Author:
Anna Grassellino, National Quantum Information Science Superconducting Quantum Materials and Systems Center, Fermilab, US
Abstract
In this talk I will describe the mission, goals and the partnership strengths of the new US National Quantum Information Research Center SQMS. SQMS brings the power of DOE laboratories, together with industry, academia and other federal entities, to achieve transformational advances in the major cross-cutting challenge of understanding and eliminating the decoherence mechanisms in superconducting 2D and 3D devices, with the final goal of enabling construction and deployment of superior quantum systems for computing and sensing. SQMS combines the strengths of an array of experts and world-class facilities towards these common goals. Materials science experts will work in understanding and mitigating the key limiting mechanisms of coherence in the quantum regime. Coherence time is the limit on how long a qubit can retain its quantum state before that state is ruined by noise. It is critical to advancing quantum computing, sensing and communication. SQMS is leading the way in extending coherence time of superconducting quantum systems thanks to world-class materials science and through the world leading expertise in superconducting RF cavities which are integrated with industry-designed and -fabricated computer chips. Leveraging new understanding from the materials development, quantum device and quantum computing researchers will pursue device integration and quantum controls development for 2-D and 3-D superconducting architectures. One of the ambitious goals of SQMS is to build and deploy a beyond-state-of-the-art quantum computer based on superconducting technologies. Its unique high connectivity will provide unprecedented opportunity to explore novel quantum algorithms. SQMS researchers will ultimately build quantum computer prototypes based on 2-D and 3-D architectures, enabling new quantum simulation for science applications.
15:40 CEST K.2.2 LIVE Q&A
Authors:
Anna Grassellino1 and Mathias Soeken2
1National Quantum Information Science Superconducting Quantum Materials and Systems Center, Fermilab, US; 2Microsoft, CH
Abstract
Live question and answer session for interaction among speaker and audience

2.1 Emerging trends in the HPC industry landscape

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/y3tu535LC2J2NARwr

Session chair:
Nehir Sonmez, BSC, ES

Session co-chair:
Miquel Moreto, BSC, ES

Organizers:
Gilles Sassatelli, LIRMM, FR
Miquel Moreto, BSC, ES

The hardware HPC landscape is rapidly changing with novel players challenging decade-long installed architectures and technologies. This session will shed light on the emerging trends through 3 contributions that will discuss the momentum around the RISC-V ecosystem and the emergence of tools for enabling its use in HPC environment, the European Mont-Blanc strategy and academia-industry partnership that is about to give birth to a European processor architecture.

Time Label Presentation Title
Authors
16:00 CEST 2.1.1 TRENDS IN HPC DRIVEN BY THE RACE TO EXASCALE
Speaker and Author:
Craig Prunty, SiPEARL, FR
Abstract
HPC in Europe (and worldwide) is pushing to rapidly to exascale, where exascale is defined by double precision Linpack score. This push for performance, while remaining within cost and power consumption envelopes, and the need to support a wide variety of workloads is enforcing a diversion in processing elements, CPU and accelerators. CPU provide general processing capability across the bulk of workloads, with a focus in HPC toward balanced compute and memory bandwidth, demonstrated by HPCG performance. Accelerators are pushing the envelope on vector processing, with high Linpack scores, and also offer a platform for AI. This is leading to some interesting trends in HPC, including Modular system architectures, tight coupling between accelerators and general purpose processors, and the emergence of AI to address some of the Exascale challenges.
16:20 CEST 2.1.2 COYOTE: AN OPEN SOURCE SIMULATION TOOL TO ENABLE RISC-V IN HPC
Speaker:
Borja Perez, Barcelona Supercomputing Center, ES
Authors:
Borja Perez, Alexander Fell and John Davis, Barcelona Supercomputing Center, ES
Abstract
The confluence of technology trends and economics has reincarnated computer architecture and specifically, software-hardware co-design. We are entering a new era of a completely open ecosystem, from applications to chips and everything in between. The software-hardware co-design of supercomputers for tomorrow requires flexible tools today that will take us to the Exascale and beyond. The MareNostrum Experimental Exascale Platform (MEEP) addresses this by proposing a flexible FPGA-based emulation platform, designed to explore hardware-software co-designs for future RISC-V supercomputers. This platform is part of an open ecosystem, allowing its infrastructure to be reused in other projects. MEEP’s inaugural emulated system will be a RISC-V based self-hosted HPC vector and systolic array accelerator, with a special aim at efficient data movement. Early development stages for such an architecture require fast, scalable and easy to modify simulation tools, with the right granularity and fidelity, enabling rapid design space exploration. Being a part of MEEP, this paper introduces Coyote, a new open source, execution-driven simulator based on the open source RISC-V ISA and which can provide detailed results at various levels and granularities. Coyote focuses on data movement and the modelling of the memory hierarchy of the system, which is one of the main hurdles for high performance sparse workloads, while omitting lower level details. As a result, performance evaluation shows that Coyote achieves an aggregate simulation of up to 5 MIPS when modelling up to 128 cores. This enables the fast comparison of different designs for future RISC-V based HPC architectures.
16:40 CEST 2.1.3 MONT-BLANC 2020: TOWARDS SCALABLE AND POWER EFFICIENT EUROPEAN HPC PROCESSORS
Speaker:
Said Derradji, Atos, FR
Authors:
Adrià Armejach1, Bine Brank2, Jordi Cortina3, François Dolique4, Timothy Hayes5, Nam Ho2, Pierre-Axel Lagadec6, Romain Lemaire4, Guillem Lopez-Paradis7, Laurent Marliac6, Miquel Moreto7, Pedro Marcuello3, Dirk Pleiter2, Xubin Tan3 and Said Derradji6
1BSC & UPC, ES; 2Forschungszentrum Juelich, Juelich Supercomputing Centre, DE; 3Semidynamics Technology Services, ES; 4CEA-Leti, FR; 5Arm, GB; 6Atos, FR; 7BSC, ES
Abstract
The Mont-Blanc 2020 (MB2020) project has triggered the development of the next generation industrial processor for Big Data and High Performance Computing (HPC). MB2020 is paving the way to the future low-power European processor for exascale, defining the System-on-Chip (SoC) architecture and implementing new critical building blocks to be integrated in such an SoC. In this paper, we first present an overview of the MB2020 project, then we describe our experimental infrastructure, the requirements of relevant applications, and the IP blocks developed in the project. Finally, we present our emulation-based final demonstrator and explain how it integrates within our first generation of HPC processors.
17:00 CEST 2.1.4 LIVE JOINT Q&A
Authors:
Nehir Sonmez1, Miquel Moreto1, Craig Prunty2, Borja Perez1 and Said Derradji3
1BSC, ES; 2SiPEARL, FR; 3Atos, FR
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

2.2 3D integration: Today's practice and road ahead

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 17:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/BukgutER7n43hxbg8

Session chair:
Mahdi Nikdast, Colorado State University Fort Collins, US

Organizer:
Partha Pratim Pande, Washington State University, US

Three-dimensional (3D) integration has frequently been described as a means to overcome scaling bottlenecks, and advance both “More Moore” and “More Than Moore” through the use of vertical interconnects and die/wafer stacking. Moore’s law has entered a new phase characterized by vertical integration also called “3D Power Scaling”. Recent industry trends show the viability of 3D integration in real products (e.g., the AMD Radeon R9 Fury X graphics card, Xilinx Virtex-7 2000T/H580T and Ultra-scale FPGAs). Flash memory producers have also demonstrated multiple layers of memory on top of each other, e.g., as many as 32 and 48 layers of Flash memory (Toshiba BiCS) have been reported. With the emergence of Monolithic 3D Integration (M3D), it is also important to understand the necessary design and manufacturing challenges associated with this new 3D integration paradigm. Moreover, 3D integration enables modular and heterogeneous architectures that permit scalable design and verification. This special session will articulate the challenges associated with 3D integration (chiplet, heterogeneous integration, testing, yield, manycore chip design, etc.) as well as highlight opportunities for deriving the maximum possible benefit from vertical integration. The presenters will describe the most-compelling research advances, architectural breakthroughs, and design and test solutions that will contribute to closing the gap between hype and reality.

Time Label Presentation Title
Authors
16:00 CEST 2.2.1 UNDERSTANDING CHIPLETS TODAY TO ANTICIPATE FUTURE INTEGRATION OPPORTUNITIES AND LIMITS
Speaker:
Gabriel Loh, Advanced Micro Devices, Inc., US
Authors:
Gabriel Loh, Samuel Naffziger and Kevin M. Lepak, Advanced Micro Devices, Inc., US
Abstract
Chiplet-based architectures have recently started attracting a lot of attention, and we are seeing real-world architectures utilizing chiplet technologies in high-volume commercial production in multiple mainstream markets. In this special session paper, we provide a technical overview of the current state of chiplet technology including its benefits and limitations. This provides background and grounding in the current state-of-the-art and also lays out a range of technical areas to consider for the remaining forward-looking papers in this special session. We discuss the benefits and costs of different approaches to splitting and modularizing a monolithic chip into chiplets. In particular, we cover supporting high bandwidth and low latency communication between the die, mixed integration of multiple process technology nodes, and silicon and IP reuse. We then explore future challenges for chiplet architectures looking into the next decade of innovation.
16:15 CEST 2.2.2 HETEROGENEOUS 3D ICS: CURRENT STATUS AND FUTURE DIRECTIONS FOR PHYSICAL DESIGN TECHNOLOGIES
Speaker:
Gauthaman Murali, Georgia Institute of Technology, US
Authors:
Gauthaman Murali and Sung Kyu Lim, Georgia Institute of Technology, US
Abstract
One of the advantages of 3D IC technology is its ability to integrate different devices such as CMOS, SRAM, and RRAM, or multiple technology nodes of single or different devices onto a single chip due to the presence of multiple tiers. This ability to create heterogeneous 3D ICs finds a wide range of applications, from improving processor performance by integrating better memory technologies to building compute-in-memory ICs to support advanced machine learning algorithms. This paper discusses the current trends and future directions for the physical design of heterogeneous 3D ICs. We summarize various physical design and optimization flows, integration techniques, and existing academic works on heterogeneous 3D ICs.
16:30 CEST 2.2.3 ADVANCES IN TESTING AND DESIGN-FOR-TEST SOLUTIONS FOR M3D INTEGRATED CIRCUITS
Speaker:
Krishnendu Chakrabarty, Duke University, US
Authors:
Sanmitra Banerjee1, Arjun Chaudhuri2, Krishnendu Chakrabarty1 and Shao-Chun Hung1
1Duke University, US; 2GLOBALFOUNDRIES US Inc., US
Abstract
Monolithic 3D (M3D) integration has the potential to achieve significantly higher device density compared to TSV-based 3D stacking. Sequential integration of transistor layers enables high-density vertical interconnects, known as inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make M3D integrated circuits especially prone to process variations and manufacturing defects. We explore the impact of these fabrication imperfections on chip-performance and present the associated test challenges. We introduce two M3D-specific design-for-test solutions – a low-cost built-in self-test architecture for the defect-prone ILVs and a tier-level fault localization method for yield learning. We describe the impact of defects on the efficiency of delay fault testing and highlight solutions for test generation under constraints imposed by the 3D power distribution network.
16:45 CEST 2.2.4 3D++: UNLOCKING THE NEXT GENERATION OF HIGH-PERFORMANCE AND ENERGY-EFFICIENT ARCHITECTURES USING M3D INTEGRATION
Speaker:
Partha Pande, Washington State University, US
Authors:
Biresh Kumar Joardar, Aqeeb Iqbal Arka, Jana Doppa and Partha Pratim Pande, Washington State University, US
Abstract
Three-dimensional (3D) integration has frequently been described as a means to overcome scaling bottlenecks, and advance both “More Moore” and “More Than Moore” through the use of vertical interconnects and die/wafer stacking. Recent industry trends show the viability of 3D integration in real products. Flash memory producers have also demonstrated multiple layers of memory on top of each other. However, conventional TSV-based 3D designs cannot achieve the full-potential of vertical integration and perform sub-optimally. Monolithic 3D (M3D) is an emerging vertical integration technology that promises significant power-performance-area benefits compared to TSVs. Hence, it is important to understand the necessary design trade-offs and challenges associated with this new paradigm. In this paper, we present both the advantages and the various design challenges in M3D-enabled system design considering Processing-in-Memory (PIM) and manycore systems as suitable case-studies.

2.3 A Deep Dive into Future of Lightweight Cryptography: New Standards, Optimizations, and Attacks

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 17:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/EaAFPx7Zv8vGywHqe

Session chair:
Shivam Bhasin, Temasek labs, Nanyang Technological University, SG

Session co-chair:
Kris Gaj, George Mason University, US

Organizer:
Shivam Bhasin, Temasek labs, Nanyang Technological University, SG

Internet of things (IoT) has grown rapidly to 50 billion devices in 2020. IoT devices are also deployed in sensitive applications like healthcare, supply-chain management, smart factories, etc., raising security concerns. Even when deployed in non-sensitive applications, the huge population of these devices enables attacks, such as Mirai, seriously hampering internet traffic. While standard cryptography, like AES, can run on resource-constrained devices, these algorithms were designed with focus on desktop/server environments. In recent years, the U.S. National Institute of Standards and Technology (NIST) started a public effort to standardize lightweight cryptography (LWC). In particular, the on-going standardization process, similar to the past cryptographic competitions, such as AES and SHA-3, is evaluating candidates for Authenticated Encryption with Associated Data (AEAD). The candidates are evaluated in terms of security, performance and flexibility. Of special importance to lightweight applications is energy efficiency and affinity to side-channel and fault attack countermeasures. This session will present recent research results covering the security and performance of AEAD, submitted to the NIST LWC Standardization Process, with focus on (a) efficient implementations, (b) security analysis.

Time Label Presentation Title
Authors
16:00 CEST 2.3.1 HARDWARE BENCHMARKING OF ROUND 2 CANDIDATES IN THE NIST LIGHTWEIGHT CRYPTOGRAPHY STANDARDIZATION PROCESS
Speaker:
Kris Gaj, George Mason University, US
Authors:
Kamyar Mohajerani,, Richard Haeussler, Rishub Nagpal, Farnoud Farahmand, Abubakr Abdulgadir, Jens-Peter Kaps and Kris Gaj, George Mason University, US
Abstract
Twenty five Round 2 candidates in the NIST Lightweight Cryptography (LWC) process have been implemented in hardware by groups from all over the world. All implementations compliant with the LWC Hardware API, proposed in 2019, have been submitted for hardware benchmarking to George Mason University’s LWC benchmarking team. The received submissions were first verified for correct functionality and compliance with the hardware API’s specification. Then, the execution times in clock cycles, as a function of input sizes, have been determined using behavioral simulation. The compatibility of all implementations with FPGA toolsets from three major vendors, Xilinx, Intel, and Lattice Semiconductor was verified. Optimized values of the maximum clock frequency and resource utilization metrics, such as the number of look-up tables (LUTs) and flip-flops (FFs), were obtained by running optimization tools, such as Minerva, ATHENa, and Xeda. The raw post-place and route results were then converted into values of the corresponding throughputs for long, medium-size, and short inputs. The results were presented in the form of easy to interpret graphs and tables, demonstrating the relative performance of all investigated algorithms. An effort was made to make the entire process as transparent as possible and results easily reproducible by other groups.
16:15 CEST 2.3.2 A DEEPER LOOK AT ENERGY CONSUMPTION OF LIGHTWEIGHT BLOCK CIPHERS
Speaker:
Francesco Regazzoni, University of Amsterdam and ALaRI - USI, CH
Authors:
Andrea Caforio1, Fatih Balli1, Subhadeep Banik1 and Francesco Regazzoni2
1EPFL, CH; 2University of Amsterdam and ALaRI - USI, CH
Abstract
In the last few years, the field of lightweight cryptography has seen an influx in the number of block ciphers and hash functions being proposed. In the past there have been numerous papers that have looked at circuit level implementation of block ciphers with respect to lightweight metrics like area power and energy. In the paper by Banik et al. (SAC'15), for example, by studying the energy consumption model of a CMOS gate, it was shown that the energy consumed per cycle during the encryption operation of an $r$-round unrolled architecture of any block cipher is a quadratic function in $r$. However, most of these explorative works were at a gate level, in which a circuit synthesizer would construct a circuit using gates from a standard cell library, and the area power and energy would be estimated by estimating the switching statistics of the nodes in the circuit. Since only a part of the EDA design flow was done, it did not account for issues that might arise when the circuit is finally mapped into silicon post route. Metrics like area, power and energy would need to be re-estimated due to the effect of the parasitics introduced in the circuit by the connecting wires, nodes and interconnects. In this paper, we look to plug this very gap in literature by re-examining the designs of lightweight block ciphers with respect to their performances after completing the placement and routing process. This is a timely exercise to do since three of the block ciphers we analyze in the paper are used in around 13 of the 32 candidates in the second round of the NIST lightweight competition being conducted currently.
16:30 CEST 2.3.3 MACHINE LEARNING ASSISTED DIFFERENTIAL DISTINGUISHERS FOR LIGHTWEIGHT CIPHERS
Speaker:
Anubhab Baksi, Nanyang Technological University, SG
Authors:
Anubhab Baksi1, Jakub Breier2, Yi Chen3 and Xiaoyang Dong3
1Nanyang Technological University, Singapore, SG; 2Silicon Austria Labs, AT; 3Tsinghua University, CN
Abstract
At CRYPTO 2019, Gohr first introduces the deep learning based cryptanalysis on round-reduced SPECK. Using a deep residual network, Gohr trains several neural network based distinguishers on 8-round SPECK-32/64. The analysis follows an 'all-in-one' differential cryptanalysis approach, which considers all the output differences effect under the same input difference. Usually, the all-in-one differential cryptanalysis is more effective compared to that using only one single differential trail. However, when the cipher is non-Markov or its block size is large, it is usually very hard to fully compute. Inspired by Gohr's work, we try to simulate the all-in-one differentials for non-Markov ciphers through machine learning. Our idea here is to reduce a distinguishing problem to a classification problem, so that it can be efficiently managed by machine learning. As a proof of concept, we show several distinguishers for four high profile ciphers, each of which works with trivial complexity. In particular, we show differential distinguishers for 8-round Gimli-Hash, Gimli-Cipher and Gimli-Permutation; 3-round Ascon-Permutation; 10-round Knot-256 permutation and 12-round Knot-512 permutation; and 4-round Chaskey-Permutation. Finally, we explore more on choosing an efficient machine learning model and observe that only a three layer neural network can be used. Our analysis shows the attacker is able to reduce the complexity of finding distinguishers by using machine learning techniques.
16:45 CEST 2.3.4 DNFA: DIFFERENTIAL NO-FAULT ANALYSIS OF BIT PERMUTATION BASED CIPHERS ASSISTED BY SIDE-CHANNEL
Speaker:
Shivam Bhasin, Temasek Laboratories @ NTU, SG
Authors:
Xiaolu Hou1, Jakub Breier2 and Shivam Bhasin3
1Nanyang Technological University, SG; 2Silicon Austria Labs, AT; 3Temasek Laboratories, Nanyang Technological University, SG
Abstract
Physical security of NIST lightweight cryptography competition candidates is gaining importance as the standardization process progresses. Side-channel attacks (SCA) are a well-researched topic within the physical security of cryptographic implementations. It was shown that collisions in the intermediate values can be captured by side-channel measurements to reduce the complexity of the key retrieval to trivial numbers. In this paper, we target a specific bit permutation vulnerability in the block cipher GIFT that allows the attacker to mount a key recovery attack. We present a novel SCA methodology called DCSCA - Differential Ciphertext SCA, which follows principles of differential fault analysis, but instead of the usage of faults, it utilizes SCA and statistical distribution of intermediate values. We simulate the attack on a publicly available bitslice implementation of GIFT, showing the practicality of the attack. We further show the application of the attack on GIFT-based AEAD schemes (GIFT-COFB, ESTATE, HYENA, and SUNDAE-GIFT) proposed for the NIST LWC competition. DCSCA can recover the master key with 2^{13.39} AEAD sessions, assuming 32 encryptions per session.

2.4 Quantum Computing

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/tP8zdkrzxBGsH7CE2

Session chair:
Matthew Amy, Dalhousie University, CA

Session co-chair:
Carmen Almudever, TU Delft, NL

The session looks at techniques for the design, simulation, and mapping of quantum circuits to hardware. The first paper investigates the use of approximation to speed up and reduce memory usage in decision diagram based simulation of quantum circuits. Continuing with the theme of decision diagrams, the next paper investigates complimentary techniques for simulating quantum circuits on noisy hardware. Finally the session concludes with a presentation of a novel hardware mapping algorithm for reversible circuits.

Time Label Presentation Title
Authors
16:00 CEST 2.4.1 (Best Paper Award Candidate)
AS ACCURATE AS NEEDED, AS EFFICIENT AS POSSIBLE: APPROXIMATIONS IN DD-BASED QUANTUM CIRCUIT SIMULATION
Speaker:
Stefan Hillmich, Johannes Kepler University Linz, AT
Authors:
Stefan Hillmich1, Richard Kueng1, Igor L. Markov2 and Robert Wille1
1Johannes Kepler University Linz, AT; 2University of Michigan, US
Abstract
Quantum computers promise to solve important problems faster than conventional computers. However, unleashing this power has been challenging. In particular, design automation runs into (1) the probabilistic nature of quantum computation and (2) exponential requirements for computational resources on non-quantum hardware. In quantum circuit simulation, Decision Diagrams (DDs) have previously shown to reduce the required memory in many important cases by exploiting redundancies in the quantum state. In this paper, we show that this reduction can be amplified by exploiting the probabilistic nature of quantum computers to achieve even more compact representations. Specifically, we propose two new DD-based simulation strategies that approximate the quantum states to attain more compact representations, while, at the same time, allowing the user to control the resulting degradation in accuracy. We also analytically prove the effect of multiple approximations on the attained accuracy and empirically show that the resulting simulation scheme enables speed-ups up to several orders of magnitudes.
16:15 CEST 2.4.2 STOCHASTIC QUANTUM CIRCUIT SIMULATION USING DECISION DIAGRAMS
Speaker:
Thomas Grurl, University of Applied Sciences Upper Austria, AT
Authors:
Thomas Grurl1, Richard Kueng2, Jürgen Fuß1 and Robert Wille2
1University of Applied Sciences Upper Austria, AT; 2Johannes Kepler University Linz, AT
Abstract
Recent years have seen unprecedented advance in the design and control of quantum computers. Nonetheless, their applicability is still restricted and access remains expensive. Therefore, a substantial amount of quantum algorithms research still relies on simulating quantum circuits on classical hardware. However, due to the sheer complexity of simulating real quantum computers, many simulators unrealistically simplify the problem and instead simulate perfect quantum hardware, i.e., they do not consider errors caused by the fragile nature of quantum systems. Stochastic quantum simulation provides a conceptually suitable solution to this problem: physically motivated errors are applied in a probabilistic fashion throughout the simulation. In this work, we propose to use decision diagrams, as well as concurrent executions, to substantially reduce resource-requirements–which are still daunting–for stochastic quantum circuit simulation. Backed up by rigorous theory, empirical studies show that this approach allows for a substantially faster and much more scalable simulation for certain quantum circuits.
16:30 CEST 2.4.3 COMBINING SWAPS AND REMOTE TOFFOLI GATES IN THE MAPPING TO IBM QX ARCHITECTURES
Speaker:
Philipp Niemann, University of Bremen / DFKI GmbH, DE
Authors:
Philipp Niemann1, Chandan Bandyopadhyay2 and Rolf Drechsler3
1Cyber-Physical Systems, DFKI GmbH, DE; 2University of Bremen, DE; 3University of Bremen/DFKI, DE
Abstract
Quantum computation received a steadily growing attention in recent years, especially supported by the emergence of publicly available quantum computers like the popular IBM QX series. In order to execute a reversible or quantum circuit on those devices, a mapping is required that replaces each reversible or quantum gate by an equivalent cascade of elementary, i.e. directly executable, gates---a task which tends to induce a significant mapping overhead. Several approaches have been proposed for this task which either rely on the swapping of physically adjacent qubits or the use of precomputed templates, so-called remote CNOT gates. In this paper, we show that combining both, swapping and remote gates, at the reversible circuit level has the prospect of significantly reducing the mapping overhead. We propose a methodology to compute the optimal combination of swaps and templates for Multiple-Controlled Toffoli gates. By using a formulation as a single-source shortest-path problem, a complete database of optimal combinations can be computed efficiently. Experimental results indicate that the mapping overhead can be significantly reduced.

2.5 Platform validation with simulation

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/2whM9Sdi8KmkbdrWt

Session chair:
Katell Morin-Allory, University Grenoble Alpes, FR

Session co-chair:
Sat Chaterjee, Google, US

The session deals with utilization of simulation platforms for verification and validation in various platforms. The first paper describes a system for coverage directed generation based on numerical optimization. The second paper deals with method for validating memory management units in post-silicon. The third paper offers a fast and accurate method for simulating compute in memory extensions to processors. In addition, the session includes two interactive presentations, one on efficient use of assertions on the edge in the cloud and the second about integrating IPs in concolic testing.

Time Label Presentation Title
Authors
16:00 CEST 2.5.1 AUTOMATIC SCALABLE SYSTEM FOR THE COVERAGE DIRECTED GENERATION (CDG) PROBLEM
Speaker:
Avi Ziv, IBM Research - Haifa, IL
Authors:
Raviv Gal1, Eldad Haber2, Wesam Ibraheem3, Brian Irwin2, Ziv Nevo3 and Avi Ziv3
1IBM Research, Haifa, IL; 2University of British Columbia, CA; 3IBM Research - Haifa, IL
Abstract
We present AS-CDG, a novel automatic scalable system for data-driven coverage directed generation. The goal of AS-CDG is to find the test-templates that maximize the probability of hitting uncovered events. The system contains two components, one for a coarse-grained search that finds relevant parameters and the other for a fine-grained search for the settings of these parameters. To overcome the lack of evidence in the search, we replace the real target with an approximated target induced by neighboring events, for which we have evidence. Usage results on real-life units of high-end processors illustrate the ability of the proposed system to automatically find the desired test-templates and hit the previously uncovered target events.
16:15 CEST 2.5.2 POST SILICON VALIDATION OF THE MMU
Speaker:
Hillel Mendelson, IBM Research, IL
Authors:
Tom Kolan1, Hillel Mendelson2, Vitali Sokhin1, Shai Doron1, Hernan Theiler1, Shay Aviv1, Hagai Hadad2, Natalia Freidman2, Elena Tsanko3, John Ludden3 and Bryant Cockcroft3
1IBM Research - Haifa, IL; 2IBM, IL; 3IBM, US
Abstract
Post silicon validation is a unique challenge in the design verification process. On one hand, it utilizes real silicon and is therefore able to cover a larger state-space. On the other, it suffers from debugging challenges due to a lack of observability into the design. These challenges dictate distinctive design choices, such as the simplicity of validation tools and a built-for-debugging software design methodology. The Memory Management Unit (MMU) is central to any design that uses virtual-memory, and creates complex verification challenges, especially in many-core designs. We propose a novel method for post silicon validation of the MMU that brings together previously undescribed techniques, based on several papers and patents. This method was implemented in Threadmill, a bare metal exerciser and was used in the verification of high-end industry-level POWER and ARM SoCs. It succeeded in increasing RTL coverage, hitting several hidden bugs, and saving hundreds of work-hours in the validation process.
16:30 CEST IP2_1.1 AN EFFECTIVE METHODOLOGY FOR INTEGRATING CONCOLIC TESTING WITH SYSTEMC-BASED VIRTUAL PROTOTYPES
Speaker:
Sören Tempel, University of Bremen, DE
Authors:
Sören Tempel1, Vladimir Herdt2 and Rolf Drechsler3
1University of Bremen, DE; 2DFKI, DE; 3University of Bremen/DFKI, DE
Abstract
In this paper we propose an effective methodology for integrating Concolic Testing (CT) with SystemC-based Virtual Prototypes (VPs) for verification of embedded SW binaries. Our methodology involves three steps: 1) integrating CT support with the Instruction Set Simulator (ISS) of the VP, 2) utilizing the standard TLM-2.0 extension mechanism for transporting concolic values alongside generic TLM transactions, and 3) providing lightweight concolic overlays for SystemC-based peripherals that enable non-intrusive CT support for peripherals and thus significantly reduce the CT integration effort. Our RISC-V experiments using the RIOT operating system demonstrate the effectiveness of our approach.
16:31 CEST IP2_1.2 A CONTAINERIZED ROS-COMPLIANT VERIFICATION ENVIRONMENT FOR ROBOTIC SYSTEMS
Speaker:
Samuele Germiniani, University of Verona, IT
Authors:
Stefano Aldegheri, Nicola Bombieri, Samuele Germiniani, Federico Moschin and Graziano Pravadelli, University of Verona, IT
Abstract
This paper proposes an architecture and a related automatic flow to generate, orchestrate and deploy a ROS-compliant verification environment for robotic systems. The architecture enables assertion-based verification by exploiting monitors automatically synthesized from LTL assertions. The monitors are encapsulated in plug-and-play ROS nodes that do not require any modification to the system under verification (SUV). To guarantee both verification accuracy and real-time constraints of the system in a resource-constrained environment even after the monitor integration, we define a novel approach to move the monitor evaluation across the different layers of an edge-to-cloud computing platform. The verification environment is containerized for both cloud and edge computing using Docker to enable system portability and to handle, at run-time, the resources allocated for verification. The effectiveness and efficiency of the proposed architecture have been evaluated on a complex distributed system implementing a mobile robot path planner based on 3D simultaneous localization and mapping.
16:32 CEST 2.5.3 SIM²PIM: A FAST METHOD FOR SIMULATING HOST INDEPENDENT & PIM AGNOSTIC DESIGNS
Speaker:
Luigi Carro, UFRGS, BR
Authors:
Paulo Cesar Santos1, Bruno Endres Forlin2 and Luigi Carro2
1UFRGS - Universidade Federal do Rio Grande do Sul, BR; 2UFRGS, BR
Abstract
Processing-in-Memory (PIM), with the help of modern memory integration technologies, has emerged as a practical approach to mitigate the memory wall and improve performance and energy efficiency in contemporary applications. However, there is a need for tools capable of quickly simulating different PIMs designs and their suitable integration with different hosts. This work presentsSim2PIM, a Simple Simulator for PIM devices that seamlessly integrates any PIM architecture with the host processor and memory hierarchy. Sim2PIM’s simulation environment allows the user to describe a PIM architecture in different user-defined abstraction levels. The application code runs natively on the Host, with minimal overhead from the simulator integration, allowing Sim2PIM to collect precise metrics from the Hardware Performance Counters (HPCs). Our simulator is available to download at http://pim.computer/.

2.6 Hardware architectures for neural network applications with emerging technologies

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/H5ygzuagE255kK24q

Session chair:
Gage Hills, MIT, US

Session co-chair:
Bruno Schmitt, EPFL, CH

Neural networks have shown record-breaking performance on various artificial intelligence tasks and has attracted attention across multiple areas of the computing stack. This session highlights exciting advances in design methodologies leveraging emerging technologies, from optical neuron based on multi-operand ring resonators, to ReRAM-based crossbar computing-in-memory chips, to new architectures for minimizing crossbar size, which is addressed by a new mapping framework leveraging binary decision diagrams and nanoscale memristor crossbar arrays.

Time Label Presentation Title
Authors
16:00 CEST 2.6.1 (Best Paper Award Candidate)
COMPACT: FLOW-BASED COMPUTING ON NANOSCALE CROSSBARS WITH MINIMAL SEMIPERIMETER
Speaker:
Sven Thijssen, University of Central Florida, US
Authors:
Sven Thijssen1, Sumit Kumar Jha2 and Rickard Ewetz1
1University of Central Florida, US; 2University of Texas at San Antonio, US
Abstract
In-memory computing is a promising solution strategy for data-intensive applications to circumvent the von Neumann bottleneck. Flow-based computing is the concept of performing in-memory computing using sneak paths in nanoscale crossbar arrays. The limitation of previous work is that the resulting crossbar representations have large dimensions. In this paper, we present a framework called COMPACT for mapping Boolean functions to crossbar representations with minimal semiperimeter (the number of wordlines plus bitlines). The COMPACT framework is based on an analogy between binary decision diagrams (BDDs) and nanoscale memristor crossbar arrays. More specifically, nodes and edges in a BDD correspond to wordlines/bitlines and memristors in a crossbar array, respectively. The relation enables a function represented by a BDD with n nodes and an odd cycle transversal of size k to be mapped to a crossbar with a semiperimeter of n+k. The k extra wordlines/bitlines are introduced due to crossbar connection constraints, i.e. wordlines (bitlines) cannot directly be connected to wordlines (bitlines). For multi-input multi-output functions, COMPACT can also be applied to shared binary decision diagrams (SBDDs), which further reduces the size of the crossbar representations. Compared with the state-of-the-art mapping technique, the semiperimeter is reduced from 2.13n to 1.09n on the average, which translates into crossbar representations with 78% smaller area. The power consumption and the computation delay are on the average reduced by 7% and 52%, respectively.
16:15 CEST 2.6.2 SQUEEZELIGHT: TOWARDS SCALABLE OPTICAL NEURAL NETWORKS WITH MULTI-OPERAND RING RESONATORS
Speaker:
Jiaqi Gu, University of Texas at Austin, US
Authors:
Jiaqi Gu1, Chenghao Feng1, Zheng Zhao1, Zhoufeng Ying1, Mingjie Liu2, Ray T. Chen1 and David Z. Pan1
1University of Texas at Austin, US; 2University of Texas Austin, US
Abstract
Optical neural networks (ONNs) have demonstrated promising potentials for next-generation artificial intelligence acceleration with ultra-low latency, high bandwidth, and low energy consumption. However, due to high area cost and lack of efficient sparsity exploitation, previous ONN designs fail to provide scalable and efficient neuromorphic computing, which hinders the practical implementation of photonic neural accelerators. In this work, we propose a novel design methodology to enable a more scalable ONN architecture. We propose a nonlinear optical neuron based on multi-operand ring resonators to achieve neuromorphic computing with a compact footprint, low wavelength usage, learnable neuron balancing, and built-in nonlinearity. The structured sparsity is exploited to support more efficient ONN engines via a fine-grained structured pruning technique. A robustness-aware learning method is adopted to guarantee the variation-tolerance of our ONN. Simulation and experimental results show that the proposed ONN achieves one-order-of-magnitude improvement in compactness and efficiency over previous designs with high fidelity and robustness.
16:30 CEST IP2_2.1 RECEPTIVE-FIELD AND SWITCH-MATRICES BASED RERAM ACCELERATOR WITH LOW DIGITAL-ANALOG CONVERSION FOR CNNS
Speaker:
Xun Liu, North China University of Technology, CN
Authors:
Yingxun Fu1, Xun Liu2, Jiwu Shu3, Zhirong Shen4, Shiye Zhang1, Jun Wu1 and Li Ma1
1North China University of Technology, CN; 2North China Univercity of Technology, CN; 3Tsinghua University, CN; 4Xiamen University, CN
Abstract
Process-in-Memory (PIM) based accelerator becomes one of the best solutions for the execution of convolution neural networks (CNN). Resistive random access memory (ReRAM) is a classic type of non-volatile random-access memory, which is very suitable for implementing PIM architectures. However, existing ReRAM-based accelerators mainly consider to improve the calculation efficiency, but ignore the fact that the digital-analog signal conversion process spends a lot of energy and executing time. In this paper, we propose a novel ReRAM-based accelerator named Receptive-Field and Switch-Matrices based CNN Accelerator (RFSM). In RFSM, we first propose a receptive-field based convolution strategy to analyze the data relationships, and then gives a dynamic and configurable crossbar combination method to reduce the digital-analog conversion operations. The evaluation result shows that, compared to existing works, RFSM gains up to 6.7x higher speedup and 7.1x lower energy consumption.
16:31 CEST IP2_2.2 (Best Paper Award Candidate)
AN ON-CHIP LAYER-WISE TRAINING METHOD FOR RRAM BASED COMPUTING-IN-MEMORY CHIPS
Speaker:
Yiwen Geng, Tsinghua University, CN
Authors:
Yiwen Geng, Bin Gao, Qingtian Zhang, Wenqiang Zhang, Peng Yao, Yue Xi, Yudeng Lin, Junren Chen, Jianshi Tang, Huaqiang Wu and He Qian, Institute of Microelectronics, Tsinghua University, CN
Abstract
RRAM based computing-in-memory (CIM) chips have shown great potentials to accelerate deep neural networks on edge devices by reducing data transfer between the memory and the computing unit. However, due to the non-ideal characteristics of RRAM, the accuracy of the neural network on the RRAM chip is usually lower than the software. Here we propose an on-chip layer-wise training (LWT) method to alleviate the adverse effect of RRAM imperfections and improve the accuracy of the chip. Using a locally validated dataset, LWT can reduce the communication between the edge and the cloud, which benefits for the personalized data privacy. The simulation results on the CIFAR-10 dataset show that the LWT method can improve the accuracy of VGG-16 and ResNet-18 more than 5% and 10%, respectively, with only 30% operations and 35% buffer compared with the back-propagation method. Moreover, the pipe-LWT method is presented to improve the throughput by three times further.
16:32 CEST 2.6.3 A 3-D LUT DESIGN FOR TRANSIENT ERROR DETECTIONVIA INTER-TIER IN-SILICON RADIATION SENSOR
Speaker:
Sarah Azimi, Politecnico di Torino, IT
Authors:
Sarah Azimi, Corrado De Sio and Luca Sterpone, Politecnico di Torino, IT
Abstract
Three-dimensional Integrated Circuits (3-D ICs) have gained much attention as a promising approach to increase IC performance due to their several advantages in terms of integration density, power dissipation, and achievable clock frequencies. However, achieving a 3-D ICs resilient to soft errors resulting from radiation effects is a challenging problem. Traditional Radiation-Hardened-by-Design (RHBD) techniques are costly in terms of area, power, and performance overheads. In this work, we propose a new 3-D LUT design integrating error detection capabilities. The LUT has been designed on a two tiers IC model improving radiation resiliency by selective upsizing of sensitive transistors. Besides, an in-silicon radiation sensor adopting inverters chain has been implemented within the free volume of the 3-D structure. The proposed design shows a 37% reduction in sensitivity to SETs and an effective error detection rate of 83% without introducing any area overhead

2.7 Scheduling and Execution Time Variation

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/QpHBEpw4bYqYPixYQ

Session chair:
Marko Bertogna, University of Modena, IT

Session co-chair:
Claire Maiza, Grenoble INP, FR

Both scheduling and worst-case execution time estimation problems are facing complex architectures and the need for new functionalities. In this session, the authors present new results on Lazy Round Robin scheduling, allowing them to mitigate timing attacks. Moreover, for parallel tasks, Virtual Gang scheduling is proposed in the context of multicore systems. The scheduling results of this session are complementary with new ways of defining worst-case execution time bounds for mixed-criticality systems.

Time Label Presentation Title
Authors
16:00 CEST 2.7.1 RESPONSE TIME ANALYSIS OF LAZY ROUND ROBIN
Speaker:
Yue Tang, The Northeastern University, CN
Authors:
Yue Tang1, Nan Guan2, Zhiwei Feng1, Xu Jiang1 and Wang Yi3
1Northeastern University, CN; 2The Hong Kong Polytechnic University, HK; 3Northeastern University and Uppsala University, CN
Abstract
The Round Robin scheduling policy is used in many real-time embedded systems because of its simplicity and low overhead. In this paper, we study a variation of Round Robin used in practical systems, named Lazy Round Robin, which is simpler to implement and has lower runtime overhead than ordinary Round Robin. The key difference between Round Robin and Lazy Round Robin lies in when the scheduler reacts to newly released task instances. The Round Robin scheduler checks whether a newly released task instance is eligible for execution in the remaining part of the current round, while the Lazy Round Robin scheduler does not react to any task release until the end of the current round. This paper develops techniques to calculate upper bounds of response time of tasks scheduled by Lazy Round Robin. Experiments are conducted to evaluate our analysis techniques and compare the real-time performance of Round Robin and Lazy Round Robin.
16:15 CEST 2.7.2 IMPROVING THE TIMING BEHAVIOUR OF MIXED-CRITICALITY SYSTEMS USING CHEBYSHEV'S THEOREM
Speaker:
Behnaz Ranjbar, TU Dresden, DE
Authors:
Behnaz Ranjbar1, Ali Hoseinghorban2, Siva Satyendra Sahoo1, Alireza Ejlali2 and Akash Kumar1
1TU Dresden, DE; 2Sharif University of Technology, IR
Abstract
In Mixed-Criticality (MC) systems, there are often multiple Worst-Case Execution Times (WCETs) for the same task, corresponding to system operation mode. Determining the appropriate WCETs for lower criticality modes is non-trivial; while on the one hand, a low WCET for a mode can improve the processor utilization in that mode, on the other hand, using a larger WCET ensures that the mode switches are minimized, thereby maximizing the quality-of-service for all tasks, albeit at the cost of processor utilization. Although there are many studies to determine WCET in the highest criticality mode, no analytical solutions are proposed to determine WCETs in other lower criticality modes. In this regard, we propose a scheme to determine WCETs by Chebyshev theorem to make a trade-off between the number of scheduled tasks at design-time and the number of dropped low-criticality tasks at runtime as a result of frequent mode switches. Our experimental results show that our scheme improves the utilization of state-of-the-art MC systems by up to 85.29%, while maintaining 9.11% mode switching probability in the worst-case scenario.
16:30 CEST 2.7.3 VIRTUAL GANG SCHEDULING OF PARALLEL REAL-TIME TASKS
Speaker:
Waqar Ali, University of Kansas, US
Authors:
Waqar Ali1, Rodolfo Pellizzoni2 and Heechul Yun3
1University of Kansas at Lawrence, US; 2University of Waterloo, CA; 3University of Kansas, US
Abstract
We consider the problem of executing parallel real-time tasks according to gang scheduling on a multicore system in the presence of shared resource interference. Specifically, we consider sets of gang-tasks with precedence constraints in the form of a DAG. We introduce the novel concept of a virtual gang: a group of parallel tasks that are scheduled together as a single entity. Employing virtual gangs allows us to tightly bound the effect of shared resource interference. It also transforms the original, complex scheduling problem into a form that can be easily implemented and is amenable to exact schedulability analysis, further reducing pessimism. We present and evaluate both optimal and heuristic methods for forming virtual gangs based on a known interference model and while respecting all precedence constraints among tasks. When precedence constraints are not considered, we also compare our approach against existing response-time analysis for globally scheduled gang-tasks, as well as general parallel tasks. The results show that our approach significantly outperforms state-of-the-art multicore schedulability analyses when shared-resource interference is considered. Even in the absence of interference, it performs better than the state-of-the-art for highly parallel tasksets.

2.8 Exhibition Keynote on Digital Twins and Invitation to Become a Book Author

Date: Tuesday, 02 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/MB6PyFhHcrzT9KqDK

Organizer:
Jürgen Haase, edacentrum GmbH, DE

This Exhibition Workshop features an Exhibition Keynote and a tutorial on the publication of research results in a book.

Time Label Presentation Title
Authors
16:00 CEST 2.8.1 EXHIBITION KEYNOTE - DIGITAL TWIN: THE FUTURE IS NOW
Speaker:
Thomas Heurung, Siemens EDA, DE
Abstract

Virtually every discussion on business trends talks about digitalization—whether it's the digital thread, digital twin, the digital enterprise, or the digitalization of everything. The goal is to harness the power of the exponential to integrate data in unprecedented ways to deliver new value and performance. As a result, the next generation of SoCs will be driven by business workloads and energy efficiencies, where software performance will define semiconductor success.

That is why a digital twin is becoming a necessity to virtually verify and validate the system performance of SoCs both pre-silicon and then throughout the lifecycle of the SoC. Thomas Heurung, technical director Europe for Siemens EDA, will explain how the digital twin is helping to drive Tera-scale IC and application systems. First by accelerating design creation of custom accelerators.Then, by supporting the shift-left in SoC verification, leading to true system validation from IP to software to systems. And ultimately the digital twin enables a digitalization of the SoC environment both pre-silicon and throughout the lifecycle of the IC.

16:30 CEST 2.8.2 BOOK PUBLISHING 101: THE WHY, HOW AND WHAT
Speaker:
Charles Glaser, Springer Nature, US
Abstract

Are you interested in learning more about the why, how (and what) of publishing a book? Publishing a book is a powerful tool, allowing you to communicate your ideas to a global audience, building your reputation in the field and accelerating your career. Join this upcoming webinar to hear from Springer Nature Editorial Director, Charles Glaser, about the stages in the publishing process and how Springer have helped many authors just like you, publish a book.  Q&A will follow to answer all your book publishing questions.


IP2_1 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/JGXWJ2m4MyTpq2spH

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP2_1.1 AN EFFECTIVE METHODOLOGY FOR INTEGRATING CONCOLIC TESTING WITH SYSTEMC-BASED VIRTUAL PROTOTYPES
Speaker:
Sören Tempel, University of Bremen, DE
Authors:
Sören Tempel1, Vladimir Herdt2 and Rolf Drechsler3
1University of Bremen, DE; 2DFKI, DE; 3University of Bremen/DFKI, DE
Abstract
In this paper we propose an effective methodology for integrating Concolic Testing (CT) with SystemC-based Virtual Prototypes (VPs) for verification of embedded SW binaries. Our methodology involves three steps: 1) integrating CT support with the Instruction Set Simulator (ISS) of the VP, 2) utilizing the standard TLM-2.0 extension mechanism for transporting concolic values alongside generic TLM transactions, and 3) providing lightweight concolic overlays for SystemC-based peripherals that enable non-intrusive CT support for peripherals and thus significantly reduce the CT integration effort. Our RISC-V experiments using the RIOT operating system demonstrate the effectiveness of our approach.
IP2_1.2 A CONTAINERIZED ROS-COMPLIANT VERIFICATION ENVIRONMENT FOR ROBOTIC SYSTEMS
Speaker:
Samuele Germiniani, University of Verona, IT
Authors:
Stefano Aldegheri, Nicola Bombieri, Samuele Germiniani, Federico Moschin and Graziano Pravadelli, University of Verona, IT
Abstract
This paper proposes an architecture and a related automatic flow to generate, orchestrate and deploy a ROS-compliant verification environment for robotic systems. The architecture enables assertion-based verification by exploiting monitors automatically synthesized from LTL assertions. The monitors are encapsulated in plug-and-play ROS nodes that do not require any modification to the system under verification (SUV). To guarantee both verification accuracy and real-time constraints of the system in a resource-constrained environment even after the monitor integration, we define a novel approach to move the monitor evaluation across the different layers of an edge-to-cloud computing platform. The verification environment is containerized for both cloud and edge computing using Docker to enable system portability and to handle, at run-time, the resources allocated for verification. The effectiveness and efficiency of the proposed architecture have been evaluated on a complex distributed system implementing a mobile robot path planner based on 3D simultaneous localization and mapping.

IP2_2 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/4taqRWQpMc3eRmd6X

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP2_2.1 RECEPTIVE-FIELD AND SWITCH-MATRICES BASED RERAM ACCELERATOR WITH LOW DIGITAL-ANALOG CONVERSION FOR CNNS
Speaker:
Xun Liu, North China University of Technology, CN
Authors:
Yingxun Fu1, Xun Liu2, Jiwu Shu3, Zhirong Shen4, Shiye Zhang1, Jun Wu1 and Li Ma1
1North China University of Technology, CN; 2North China Univercity of Technology, CN; 3Tsinghua University, CN; 4Xiamen University, CN
Abstract
Process-in-Memory (PIM) based accelerator becomes one of the best solutions for the execution of convolution neural networks (CNN). Resistive random access memory (ReRAM) is a classic type of non-volatile random-access memory, which is very suitable for implementing PIM architectures. However, existing ReRAM-based accelerators mainly consider to improve the calculation efficiency, but ignore the fact that the digital-analog signal conversion process spends a lot of energy and executing time. In this paper, we propose a novel ReRAM-based accelerator named Receptive-Field and Switch-Matrices based CNN Accelerator (RFSM). In RFSM, we first propose a receptive-field based convolution strategy to analyze the data relationships, and then gives a dynamic and configurable crossbar combination method to reduce the digital-analog conversion operations. The evaluation result shows that, compared to existing works, RFSM gains up to 6.7x higher speedup and 7.1x lower energy consumption.
IP2_2.2 AN ON-CHIP LAYER-WISE TRAINING METHOD FOR RRAM BASED COMPUTING-IN-MEMORY CHIPS
Speaker:
Yiwen Geng, Tsinghua University, CN
Authors:
Yiwen Geng, Bin Gao, Qingtian Zhang, Wenqiang Zhang, Peng Yao, Yue Xi, Yudeng Lin, Junren Chen, Jianshi Tang, Huaqiang Wu and He Qian, Institute of Microelectronics, Tsinghua University, CN
Abstract
RRAM based computing-in-memory (CIM) chips have shown great potentials to accelerate deep neural networks on edge devices by reducing data transfer between the memory and the computing unit. However, due to the non-ideal characteristics of RRAM, the accuracy of the neural network on the RRAM chip is usually lower than the software. Here we propose an on-chip layer-wise training (LWT) method to alleviate the adverse effect of RRAM imperfections and improve the accuracy of the chip. Using a locally validated dataset, LWT can reduce the communication between the edge and the cloud, which benefits for the personalized data privacy. The simulation results on the CIFAR-10 dataset show that the LWT method can improve the accuracy of VGG-16 and ResNet-18 more than 5% and 10%, respectively, with only 30% operations and 35% buffer compared with the back-propagation method. Moreover, the pipe-LWT method is presented to improve the throughput by three times further.

UB.03 University Booth

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/6qWF5PX2LAwP8bpLH

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.03 3D-MEM-THERM: A FAST, ACCURATE 3D MEMORY THERMAL SIMULATOR
Speakers:
Lokesh Siddhu and Rajesh Kedia, IIT Delhi, IN
Authors:
Lokesh Siddhu and Rajesh Kedia, IIT Delhi, IN
Abstract
Thermal issues have limited the widespread adoption of 3D memories. Fast and accurate thermal simulation can help in designing appropriate thermal management policies. Temperature-dependent leakage power, which causes significant heating in 3D memories, is not modeled accurately in existing thermal simulators like HotSpot. These simulators also do not account for the effect of process variations on leakage power. We augment HotSpot to address these challenges and propose a fast trace-based thermal simulator, namely 3D-Mem-Therm. 3D-Mem-Therm integrates several other novel features, such as support for evaluating thermal management policies, choosing memory layout from predefined 3D memory floorplans, etc. 3D-Mem-Therm is significantly faster than industry-standard simulators and estimates temperature accurately within one-degree centigrade. We plan to demonstrate these features of 3D-Mem-Therm for the rapid design of thermal management policies for 3D memories.
/system/files/webform/tpc_registration/UB.3-3D-Mem-Therm.pdf

UB.04 University Booth

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/RENWPfcgdEqvFtipR

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.04 MODULAR HARDWARE AND SOFTWARE PLATFORM FOR THE RAPID IMPLEMENTATION OF ASIC-BASED BIOANALYTICAL TEST SYSTEMS
Speaker:
Alexander Hofmann, IMMS Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH, DE
Authors:
Alexander Hofmann, Peggy Reich, Marco Götze, Alexander Rolapp, Sebastian Uziel, Thomas Elste, Bianca Leistritz, Wolfram Kattanek and Björn Bieske, IMMS Institut für Mikroelektronik- und Mechatronik-Systeme gemeinnützige GmbH (IMMS GmbH), DE
Abstract
To support the complex and lengthy development of sensor ASICs for point-of-care diagnostics, there is a need for modular, flexible and powerful test systems, which are mainly needed for the test and characterization of sensor ASICs and for the prototypical realization of bioanalytical measurements. At IMMS, a modular hardware and software platform has been developed, which includes functional modules for power supply, electrical signal processing, digital signal processing, and modules for the control of fluidics, light sources, and heating elements. The platform software covers the control of the hardware modules as well as the acquisition and processing of measurement data. In addition, it enables the automated execution of measurement sequences. This contribution demonstrates the use of the platform with the example of an optoelectronic sensor ASIC for highly sensitive measurements of light absorption in liquids where the sensor detects small signal changes in a wide dynamic range.
/system/files/webform/tpc_registration/UB.4-.pdf

UB.05 University Booth

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/kQJosNue2ApJAxLt6

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.05 ELSA: FORMAL ABSTRACTION AND VERIFICATION OF ANALOG CIRCUITS
Speaker:
Ahmad Tarraf, University of Frankfurt, DE
Authors:
Ahmad Tarraf and Lars Hedrich, University of Frankfurt, DE
Abstract
The demonstration presents a recently published methodology that automatically generates accurate
abstract models suited for verification and simulation routines
with significant speed up factors. The abstraction methodology is based on sampling
a Spice netlist at transistor level with full Spice BSIM accuracy.
The approach generates a hybrid automaton (HA)
that exhibits a linear behavior described by a state space representation in each of its locations,
thereby modeling the nonlinear behavior of the netlist via multiple locations. Hence, due to the
linearity of the obtained model, the approach is easily scalable.
As the eigenvalues of the linearized system play a significant role in the abstraction process,
the tool was named Elsa: eigenvalue-based hybrid linear system abstraction.
The HAs can be deployed
in various output languages: Matlab, Verilog-A, and SystemC-AMS.
/system/files/webform/tpc_registration/UB.5-ELSA.pdf

UB.06 University Booth

Date: Tuesday, 02 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ecf9BKy54iZTgBL5s

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.06 LEARNV: A RISC-V BASED EMBEDDED SYSTEM DESIGN FRAMEWORK FOR EDUCATION AND RESEARCH DEVELOPMENT
Speaker:
Noureddine Ait Said, TIMA Laboratory, FR
Authors:
Noureddine Ait Said and Mounir Benabdenbi, TIMA Laboratory, FR
Abstract
Designing a modern System on a Chip is based on the joint design of hardware and software (co-design). However, understanding the tight relationship between hardware and software is not straightforward. Moreover to validate new concepts in SoC design from the idea to the hardware implementation is time-consuming and often slowed by legacy issues (intellectual property of hardware blocks and expensive commercial tools). To overcome these issues we propose to use the open-source Rocket Chip environment for educational purposes, combined with the open-source LowRisc architecture to implement a custom SoC design on an FPGA board. The demonstration will present how students and engineers can take benefit from the environment to deepen their knowledge in HW and SW co-design. Using the LowRisC architecture, an image classification application based on the use of CNNs will serve as a demonstrator of the whole open-source hardware and software flow and will be mapped on a Nexys A7 FPGA board.
/system/files/webform/tpc_registration/UB.6-LeaRnV.pdf

3.1 Sustainable solutions at large: bettering energy efficiency in HPC

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/5LFowceockZhBcWM8

Session chair:
David Bol, Université catholique de Louvain, BE

Session co-chair:
Gilles Sassatelli, LIRMM, FR

Organizers:
Gilles Sassatelli, LIRMM, FR
Miquel Moreto, BSC, ES

Supercomputing and cloud infrastructures power consumption is rising at an unprecedented rate, and poses a serious challenge namely because of the growing carbon footprint. This session is built around 4 contributions that attack the problem at different levels, from increasing heterogeneity and datacenter management techniques for reaching better energy efficiency to disruptive approaches to decarbonization using renewables energies and waste-heat reuse in massively distributed grid computing systems.

Time Label Presentation Title
Authors
17:30 CEST 3.1.1 FUTURE OF HPC: DIVERSIFYING HETEROGENEITY
Speaker:
Dejan Milojicic, Hewlett Packard Enerprise, US
Authors:
Dejan Milojicic1, Paolo Faraboschi1, Nicolas Dube2 and Duncan Roweth3
1Hewlett Packard Labs, US; 2Hewlett Packard Enterprise, CA; 3Hewlett Packard Enterprise, GB
Abstract
Abstract—After the end of Dennard scaling and with the imminent end of Moore’s Law, it has become challenging to continue scaling HPC systems within a given power envelope. This is exacerbated most in large systems, such as high end supercomputers. To alleviate this problem, general purpose is no longer sufficient, and HPC systems and components are being augmented with special-purpose hardware. By definition, because of the narrow applicability of specialization, broad supercomputing adoption requires using different heterogeneous components, each optimized for a specific application domain. In this paper, we discuss the impact of the introduced heterogeneity of specialization across the HPC stack: interconnects including memory models, accelerators including power and cooling, use cases and applications including AI, and delivery models, such as traditional, as-a-Service, and federated. We believe that a stack that supports diversification across hardware and software is required to continue scaling performance and maintaining energy efficiency.
17:45 CEST 3.1.2 A DATA CENTER DEMAND RESPONSE POLICY FOR REAL-WORLD WORKLOAD SCENARIOS IN HPC
Speaker:
Daniel Wilson, boston university, US
Authors:
yijia zhang, Daniel C. Wilson, Ioannis Ch. Paschalidis and Ayse K. Coskun, Boston University, US
Abstract
Demand response programs offer an opportunity for large power consumers to save on electricity costs by modulating their power consumption in response to demand changes in the electricity grid. Multiple types of such programs exist; for example, regulation service programs enable a consumer to bid for a sustainable amount of power draw over a time period, along with a reserve amount they are able to provide at request of the electricity service provider. Data centers offer unique capabilities to participate in these programs since they have significant capacity to modify their power consumption through workload scheduling and CPU power limiting. This paper proposes a novel power management policy and a bidding policy that enable data centers to participate in regulation service programs under real-world constraints. The power management policy schedules computing jobs and applies server power-capping under both the constraints of power programs and the constraints of job Quality-of-Service (QoS). Simulations with workload traces from a real data center show that the proposed policies enable data centers to meet both the requirement of regulation service programs and the QoS requirement of jobs. We demonstrate that, by applying our policies, data centers can save their electricity costs by 10% while abiding by all the QoS constraints in a real-world scenario.
18:00 CEST 3.1.3 ACCELERATING DATA CENTER DECARBONIZATION AND MAXIMIZING RENEWABLE USAGE WITH GRID EDGE SOLUTIONS
Speaker:
John Glassmire, Hitachi ABB Power Grids, US
Authors:
John Glassmire1, Hamideh Bitaraf1, Stylianos Papadakis2 and Alexandre Oudalov2
1Hitachi ABB Power Grids, US; 2Hitachi ABB Power Grids, CH
Abstract
Data centers and other computing clusters have unique electrical power requirements. They demand high reliability with high power quality, while at the same time are being driven by society and industry to use renewables as their only electricity source. To date, many large data center users have focused on offsite renewable portfolio contracts and power purchases agreements to offset data center energy demands. However, this strategy misses several greenhouse gas contributors: the diesel and gas generators that provide back-up, and the reliance on existing fossil fuel generation that often balances renewable output power in utility networks. With grid edge solutions including microgrids and battery energy storage systems, data centers have an opportunity to maximize their usage of renewable generation while minimizing the usage of fossil-driven energy generation. This paper will explore the key considerations for using grid edge technologies to decarbonize the back-up supplies for data centers, as well as explore how they can stabilize the utility networks that supply data centers--even as the penetration of renewable generation in the network reach 100%. We will introduce the strategies for implementation, including an overview of the design, management, control, and optimization of their renewable energy supply. We will explore the economic considerations for these investments, while providing useful benchmarks for achievable goals in each of these areas.
18:15 CEST 3.1.4 DISTRIBUTED GRID COMPUTING MANAGER COVERING WASTE HEAT REUSE CONSTRAINTS
Speaker:
Rémi Bouzel, Qarnot Computing, FR
Authors:
Rémi Bouzel, Yanik Ngoko, Paul Benoit and Nicolas Sainthérant, Qarnot Computing, FR
Abstract
In this paper, we discuss a green and distributed type of datacenters, implemented by Qarnot computing. This approach promotes a new computing paradigm in which computers are considered as machines that produce both computation and heat, and are therefore able to reuse the waste heat generated. It is based on two main technologies: a new model of servers and a new distributed grid computing manager which encloses a heat aware job scheduler. This paper focuses on the infrastructures and cloud computing services that were developed to answer the constraints of this new HPC paradigm. The description covers the job scheduler that ensures security and resilience of Qarnot distributed computing resources in a non-regulated environment. We summarize the key computational challenges met and the strategies developed to solve them. A specific use case is detailed to show that, in spite of its thermal-aware specificity, spawning a job on the Qarnot platform remains as simple as on any other state-of-the-art job scheduler.
18:30 CEST 3.1.5 LIVE JOINT Q&A
Authors:
David Bol1, Gilles Sassatelli2, Dejan Milojicic3, Ayse Coskun4, John Glassmire5 and Rémi Bouzel6
1Université catholique de Louvain, BE; 2LIRMM CNRS / University of Montpellier 2, FR; 3Hewlett Packard Labs, US; 4Boston University, US; 5Hitachi ABB Power Grids, US; 6Qarnot Computing, FR
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

3.2 Journey with Emerging Technologies and Architectures from Devices to System-Level Management

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/jDauCZ2zCKrT3FpAc

Session chair:
Jian-Jia Chen, University of Dortmund, DE

Session co-chair:
Georgios Zervakis, Karlsruhe Institute of Technology, DE

Organizers:
Hussam Amrouch, University of Stuttgart, DE
Jörg Henkel, Karlsruhe Institute of Technology, DE

The goal of this special session is to introduce and discuss various emerging technologies for logic circuitry and memory as well as novel architectures with the key focus on how such innovations can reshape the future of computing - especially when it comes to memory-intensive applications like those in the Artificial Intelligent (AI) domain. The special session will cover various key research areas starting from innovations in the underlying devices all the way up to innovations in the computer architecture and system-level managements. Three different promising emerging technologies will be presented: (i) Negative Capacitance Field-Effect Transistor (NCFET) as a new CMOS technology with advantages mainly for low power, (ii) Ferroelectric FET (Fe-FET) as a non-volatile, area-efficient and low-power non-volatile memory as well as (iii) a phase-change memory (PCM) and resistive RAM (ReRAM) offering a large potential for tackling the memory wall challenges in current technologies. In addition, the special session will also focus on discussing recent breakthroughs in computer architectures and demonstrate how innovations from both sides of spectrum (i.e., low-level devices and high-level architectures) can be brought together towards significantly boosting the efficiency of computing in the upcoming generations.

Time Label Presentation Title
Authors
17:30 CEST 3.2.1 FEFET AND NCFET FOR FUTURE NEURAL NETWORKS: VISIONS AND OPPORTUNITIES
Speaker:
Hussam Amrouch, University of Stuttgart, DE
Authors:
Mikail Yayla1, Kuan-Hsun Chen1, Georgios Zervakis2, Joerg Henkel3, Jian-Jia Chen1 and Hussam Amrouch4
1TU Dortmund, DE; 2Karlsruhe Institute of Technology, DE; 3KIT, DE; 4University of Stuttgart, DE
Abstract
The goal of this special session paper is to introduce and discuss different emerging technologies for logic circuitry and memory as well as new lightweight architectures for neural networks. We demonstrate how the ever-increasing complexity in Artificial Intelligent (AI) applications, resulting in an immense increase in the computational power, necessitates inevitably employing innovations starting from the underlying devices all the way up to the architectures. Two different promising emerging technologies will be presented: (i) Negative Capacitance Field-Effect Transistor (NCFET) as a new beyond-CMOS technology with advantages for offering low power and/or higher accuracy for neural network inference. (ii) Ferroelectric FET (FeFET) as a novel non-volatile, area-efficient and ultra-low power memory device. In addition, we demonstrate how Binary Neural Networks (BNNs) offer a promising alternative for traditional Deep Neural Networks (DNNs) due to its lightweight hardware implementation. Finally, we present the challenges from combining FeFET-based NVM with NNs and summarize our perspectives for future NNs and the vital role that emerging technologies may play.
17:45 CEST 3.2.2 EXPLOITING FEFETS VIA CROSS-LAYER DESIGN FROM IN-MEMORY COMPUTING CIRCUITS TO META LEARNING APPLICATIONS
Speaker:
Dayane Reis, University of Notre Dame, US
Authors:
Dayane Reis, Ann Franchesca Laguna, Michael Niemier and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
A ferroelectric FET (FeFET), made by integrating a ferroelectric material layer in the gate stack of a MOSFET, is a device that can behave as both a transistor and a non-volatile storage element. This unique property of FeFETs enables area efficient and low-power merged logic and memory functionality, desirable for many data analytic and machine learning applications. To best exploit this unique feature of FeFETs, cross-layer design practices spanning from circuits and architectures to algorithms and applications is needed. The paper presents FeFET-based circuits and architectures that offer, either independently or in a configurable fashion, content addressable memory (TCAM) and general-purpose compute-in-memory (GP-CiM) functionalities. These in-memory computing modules bring new opportunities to accelerating data-intensive applications. We discuss the use of these FeFET based in-memory computing fabrics in meta-learning applications, specifically as attentional memory. System-level task mapping and end-to-end evaluation will be discussed.
18:00 CEST 3.2.3 FUTURE COMPUTING PLATFORM DESIGN: A CROSS-LAYER DESIGN APPROACH
Speaker:
Hsiang-Yun Cheng, Academia Sinica, TW
Authors:
Hsiang-Yun Cheng1, Chun-Feng Wu2, Christian Hakert3, Kuan-Hsun Chen3, Yuan-Hao Chang1, Jian-Jia Chen3, Chia-Lin Yang4 and Tei-Wei Kuo5
1Academia Sinica, TW; 2Academia Sinica and National Taiwan University, TW; 3TU Dortmund University, DE; 4National Taiwan University, TW; 5National Taiwan University and City University of Hong Kong, TW
Abstract
Future computing platforms are facing a paradigm shift with the emerging resistive memory technologies. First, they offer fast memory accesses and data persistence in a single large-capacity device deployed on the memory bus, blurring the boundary between memory and storage. Second, they enable computing-in-memory for neuromorphic computing to mitigate costly data movements. Due to the non-ideality of these resistive memory devices at the moment, we envision that cross-layer design is essential to bring such a system into practice. In this paper, we showcase a few examples to demonstrate how cross-layer design can be developed to fully exploit the potential of resistive memories and accelerate its adoption for future computing platforms.
18:15 CEST 3.2.4 INTELLIGENT ARCHITECTURES FOR INTELLIGENT COMPUTING SYSTEMS
Speaker and Author:
Onur Mutlu, ETH Zurich and Carnegie Mellon University, CH
Abstract
Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this invited special session talk, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novel directions: 1) processing using memory, which exploits analog operational properties of memory chips to perform massively-parallel operations in memory, with low-cost changes, 2) processing near memory, which integrates sophisticated additional processing capability in memory controllers, the logic layer of 3D-stacked memory technologies, or memory chips to enable high memory bandwidth and low memory latency to near-memory logic. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs. This accompanying short paper provides a summary of the invited talk and points the reader to further work that may be beneficial to examine.

3.3 Vertical IP Protection of the Next-Generation Devices: Quo Vadis?

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/nRP9chcosiRhfYH4B

Session chair:
Sebastian Huhn, University of Bremen, DE

Session co-chair:
Shubham Rai, TU Dresden, DE

Organizer:
Shubham Rai, TU Dresden, DE

With the advent of 5G and IoT applications, there is a greater thrust in terms of hardware security due to imminent risks caused by high amount of intercommunication between various subsystems. Security gaps in integrated circuits thus represent high risks for both---the manufacturers and the users of electronic systems. Particularly in the domain of Intellectual Property (IP) protection, there is an urgent need to devise security measures at all levels of abstraction so that we can be one step ahead of any kind of adversarial attacks. In this special session, we will discuss IP protection measures from multiple perspectives---from system-level to device-level security measures, from discussing various attack methods such as reverse engineering and hardware Trojan insertions to proposing new-age protection measures such as DNA-based logic locking and secure information flow tracking. This special session will give a holistic overview at the current state-of-the-art measures and how well we are prepared for the next generation circuits and systems. The main idea we want to put forward is that security should be one of the deciding factors during circuits and systems design, and it should not be an afterthought.

Time Label Presentation Title
Authors
17:30 CEST 3.3.1 IP PROTECTION, PRESENT AND FUTURE SCHEMES
Speaker and Author:
Ramesh Karri, NYU, US
Abstract
With the advent of 5G and IoT applications, there is a greater thrust in terms of hardware security due to imminent risks caused by high amount of intercommunication between various subsystems. Security gaps in integrated circuits, thus represent high risks for both---the manufacturers and the users of electronic systems. Particularly in the domain of Intellectual Property (IP) protection, there is an urgent need to devise security measures at all levels of abstraction so that we can be one step ahead of any kind of adversarial attacks. This work presents IP protection measures from multiple perspectives---from system-level down to device-level security measures, from discussing various attack methods such as reverse engineering and hardware Trojan insertions to proposing new-age protection measures such as %DNA-based multi-valued logic locking and secure information flow tracking. This special session will give a holistic overview at the current state-of-the-art measures and how well we are prepared for the next generation circuits and systems.
17:45 CEST 3.3.2 SECURITY VALIDATION AT VP-LEVEL USING INFORMATION FLOW TRACKING
Speaker and Author:
Rolf Drechsler, University of Bremen/DFKI, DE
Abstract
Security is a crucial aspect in modern embedded systems that complements functional correctness to build safe and reliable systems. A very effective technique to validate security policies and thus protect a system against a broad range of security related exploits is Information Flow Tracking (IFT). In this talk we present efficient IFT-based techniques at the system-level using Virtual Prototypes (VPs). This enables validation of security policies very early in the design flow and hence enables to prevent costly iterations later on. In particular, we present static and dynamic IFT-based techniques for security validation of the VP as well as the embedded SW running on the VP. Our experiments demonstrate the effectiveness of our approach.
18:00 CEST 3.3.3 MVLOCK: A MULTI-VALUED LOGIC LOCKING SCHEME FOR FUTURE-GENERATION COMPUTING SYSTEMS
Speaker and Author:
Farhad Merchant, Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE
Abstract
The future-generation computing systems will require sophisticated security mechanisms to prevent a variety of attacks. Especially with the emergence of neuromorphic computing, the underlying computations are not purely digital anymore. The complexity of the future-generation computing systems also increases the attack surface for the bad actors. The vulnerability of the designs while in the production at a third-party foundry is going to be a major concern. Logic locking is an emerging technique able to provide various measures to protect against foundry attacks such as hardware Trojan insertion, IP piracy, and counterfeiting. In this paper, we discuss the impact of post-CMOS technologies on security and how various logic locking paradigms can help us overcome hardware security challenges. Here, in particular, we propose integrating soft (biological) intelligent systems as high-density building-blocks to store information and create a subset of multi-valued logic locking (MVLock), and discuss increasing difficulty level in breaking the logic locked circuits. Classical Boolean satisfiability test-based attacks and novel machine learning based attacks are analysed for key retrieval and prediction.
18:15 CEST 3.3.4 HARNESSING SECURITY THROUGH RUNTIME RECONFIGURABLE TRANSISTORS
Speaker and Author:
Akash Kumar, TU Dresden, DE
Abstract
Polymorphism is an indispensable property for ensuring security. Emerging reconfigurable nanotechnologies exhibit functional polymorphism at the very transistor level. The transistors belonging to this class, showcase ambipolar behavior where a single transistor can be configured to have either p-type and n-type functionality. This ambipolar behavior offers applications opportunities both at the logic gate as well as the circuit level. Transistors being the most fundamental piece in electronic circuits, polymorphism at this level can help in providing strong bottom-up security. In this talk, we discuss how runtime-reconfigurability at the transistor-level manifests itself into interesting circuit paradigms by offering more functionality per computation unit. We will discuss how transistor-level reconfigurability can be used for designing security primitives such as true random number generators. We will introduce a novel application scenario of kill-switch based on these nanotechnologies which is more disruptive as it is innocuous and completely hidden in normal circuit operation.

3.4 Advances with Emerging Technologies: Biochips, Memory-Centric Computing, and Ion Trap Quantum Architectures

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/qw4tW5LkJm3zrYzdP

Session chair:
Bing Li, TU Munich, DE

Session co-chair:
Said Hamdioui, TU Delft, NL

This session considers advances with various emerging technology platforms including microfluidic biochips, memory-centric computer architectures, and quantum computing. The first presentation considers new approaches to monitor the health and state of microelectrodes that can lead to erroneous bioassay outcomes. The next paper speaks to applications of memory-centric computing to address problems in graph processing. Finally, the last paper addresses design challenges with ion-trap-based quantum architectures.

Time Label Presentation Title
Authors
17:30 CEST 3.4.1 FORMAL SYNTHESIS OF ADAPTIVE DROPLET ROUTING FOR MEDA BIOCHIPS
Speaker:
Mahmoud Elfar, Duke University, US
Authors:
Mahmoud Elfar, Tung-Che Liang, Krishnendu Chakrabarty and Miroslav Pajic, Duke University, US
Abstract
A digital microfluidic biochip (DMFB) enables the miniaturization of immunoassays, point-of-care clinical diagnostics, and DNA sequencing. A recent generation of DMFBs uses a micro-electrode-dot-array (MEDA) architecture, which provides fine-grained control of droplets and real-time droplet sensing using CMOS technology. However, microelectrodes in a MEDA biochip can degrade due to charge trapping when they are repeatedly charged and discharged during bioassay execution; such degradation leads to the failure of microelectrodes and erroneous bioassay outcomes. To address this problem, we first introduce a new microelectrode-cell design such that we can obtain the health status of all the microelectrodes in a MEDA biochip by employing the inherent sensing mechanism. Next, we present a stochastic game-based model for droplet manipulation, and a formal synthesis method for droplet routing that can dynamically change droplet transportation routes. This adaptation is based on the real-time health information obtained from microelectrodes. Comprehensive simulation results for four real-life bioassays show that our method increases the likelihood of successful bioassay completion with negligible impact on time-to-results.
17:45 CEST 3.4.2 HYGRAPH: ACCELERATING GRAPH PROCESSING WITH HYBRID MEMORY-CENTRIC COMPUTING
Speaker:
Minxuan Zhou, University of California, San Diego, US
Authors:
Minxuan Zhou1, Muzhou Li1, Mohsen Imani2 and Tajana Rosing1
1UCSD, US; 2University of California Irvine, US
Abstract
Graph applications are challenging to run efficiently on conventional systems because of their large and irregular data. Several works have exploited near-data processing (NDP) based on emerging 3D-stacked memory to accelerate graph processing applications by offloading computations to massively parallel cores in the memory chip. Even though NDP can efficiently support parallel operations in a memory scalable way, it still requires data movement between memory and near-memory cores. Such data movement introduces large overhead because of the random data pattern in graph workloads. Furthermore, the parallelism provided by NDP systems is still insufficient for graph applications because of the limited number of processing cores. In this work, we tackle these challenges by integrating processing in-memory (PIM) technology in the NDP-based accelerator. We propose HyGraph, a software-hardware co-design for graph acceleration that exploits hybrid memory-centric computing technologies, including NDP and PIM. The design of HyGraph includes an optimization algorithm for hybrid memory layout, a run-time system combining both NDP and PIM processing flows, and customized hardware for efficiently enabling PIM functionality in NDP systems. Our experimental results show that HyGraph is up to 1.9× faster and 2.4× more energy-efficient than state-of-the-art memory-centric graph accelerators on several widely used graph algorithms with various real-world graphs.
18:00 CEST IP3_2.1 GENERIC SAMPLE PREPARATION FOR DIFFERENT MICROFLUIDIC PLATFORMS
Speaker:
Sudip Poddar, Johannes Kepler University Linz, Austria, AT
Authors:
Sudip Poddar1, Gerold Fink2, Werner Haselmayr1 and Robert Wille2
1Johannes Kepler University, AT; 2Johannes Kepler University Linz, AT
Abstract
Sample preparation plays a crucial role in several medical applications. Microfluidic devices or Labs-on-Chips (LoCs) got established as a suitable solution to realize this task in a miniaturized, integrated, and automatic fashion. Over the years, a variety of different microfluidic platforms emerged, which all have their respective pros and cons. Accordingly, numerous approaches for sample preparation have been proposed—each specialized on a single platform only. In this work, we propose an idea towards a generic sample preparation approach which will generalize the constraints of the different microfluidic platforms and, by this, will provide a platform-independent sample preparation method. This will allow designers to quickly check what existing platform is most suitable for the considered task and to easily support upcoming and future microfluidic platforms as well. We illustrate the applicability of the proposed method with examples for various platforms.
18:01 CEST IP3_2.2 RAISE: A RESISTIVE ACCELERATOR FOR SUBJECT-INDEPENDENT EEG SIGNAL CLASSIFICATION
Speaker:
Fan Chen, Duke University, US
Authors:
Fan Chen1, Linghao Song1, Hai (Helen) Li2 and Yiran Chen1
1Duke University, US; 2Duke University/TUM-IAS, US
Abstract
State-of-the-art deep neural networks (DNNs) for electroencephalography (EEG) signals classification focus on subject-related tasks, in which the test data and the training data needs to be collected from the same subject. In addition, due to limited computing resources and strict power budgets at edges, it is very challenging to deploy the inference of such DNN models on biological devices. In this work, we present an algorithm/hardware co-designed low-power accelerator for subject-independent EEG signal classification. We propose a compact neural network that is capable to identify the common and stable structure among subjects. Based on it, we realize a robust subject-independent EEG signal classification model that can be extended to multiple BCI tasks with minimal overhead. Based on this model, we present RAISE, a low-power processing-in-memory inference accelerator by leveraging the emerging resistive memory. We compare the proposed model and hardware accelerator to prior arts across various BCI paradigms. We show that our model achieves the best subject-independent classification accuracy, while RAISE achieves 2.8x power reduction and 2.5x improvement in performance per watt compared to the state-of-the-art resistive inference accelerator.
18:02 CEST 3.4.3 EXACT PHYSICAL DESIGN OF QUANTUM CIRCUITS FOR ION-TRAP-BASED QUANTUM ARCHITECTURES
Speaker:
Robert Wille, Institute for Integrated Circuits, Johannes Kepler University Linz, Austria, AT
Authors:
Oliver Keszocze1, Naser Mohammadzadeh2 and Robert Wille3
1Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE; 2Department of Computer Engineering, Shahed University, IR; 3Johannes Kepler University Linz, AT
Abstract
Quantum computers exploit quantum effects in a controlled manner in order to efficiently solve problems that are very hard to address on classical computers. Ion-trapped-based technologies are a particularly advanced concept of realizing quantum computers with advantages with respect to physical realization and fault-tolerance. Accordingly, several physical design methods aiming at realizing quantum circuits to corresponding architectures have been proposed. However, all these methods are heuristic and cannot guarantee minimality. In this work, we proposed a solution which can generate exact physical designs, i.e., solutions which require a minimal number of time steps. To this end, satisfiability solvers are utilized. Experimental evaluations confirm that, despite the underlying computational complexity of the problem, this allows to generate minimal physical designs for several quantum circuits for the first time.
18:17 CEST IP3_3.1 DOUBLE DQN FOR CHIP-LEVEL SYNTHESIS OF PAPER-BASED DIGITAL MICROFLUIDIC BIOCHIPS
Speaker:
Fang-Chi Wu, Department of Computer Science and Engineering, National Sun Yat-Sen University, TW
Authors:
Fang-Chi Wu1, Jian-De Li2, Katherine Shu-Min Li1, Sying-Jyan Wang2 and Tsung-Yi Ho3
1National Sun Yat-sen University, TW; 2National Chung Hsing University, TW; 3National Tsing Hua University, TW
Abstract
Paper-based digital microfluidic biochip (PB-DMFB) technology is one of the most promising solutions in biochemical applications due to the paper substrate. The paper substrate makes PB-DMFBs more portable, cost-effective, and less dependent on manufacturing equipment. However, the single-layer paper substrate, which entangles electrodes, conductive wires, and droplet routing in the same layer, raises challenges to chip-level synthesis of PB-DMFBs. Furthermore, current design automation tools have to address various design issues including manufacturing cost, reliability, and security. Therefore, a more flexible chip-level synthesis method is necessary. In this paper, we propose the first reinforcement learning based chip-level synthesis for PB-DMFBs. Double deep Q-learning networks are adapted for the agent to select and estimate actions, and then we obtain the optimized synthesis results. Experimental results show that the proposed method is not only effective and efficient for chip-level synthesis but also scalable to reliability and security–oriented schemes.

3.5 Regularity and Optimization for Logic Synthesis

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ZjjT66x269LfLtPa9

Session chair:
Luca Amaru, Synopsys, US

Session co-chair:
Eleonora Testa, Synopsys, US

The first two papers in this session present algorithms for maximizing the autosymmetry degree of incompletely specified Boolean functions as well as rewriting and resubstitution for XOR-Majority graphs. The third paper presents an ILP-based optimization method for multiplier design.

Time Label Presentation Title
Authors
17:30 CEST 3.5.1 PRESERVING SELF-DUALITY DURING LOGIC SYNTHESIS FOR EMERGING RECONFIGURABLE NANOTECHNOLOGIES
Speaker:
Shubham Rai, TU Dresden, DE
Authors:
Shubham Rai1, Heinz Riener2, Giovanni De Micheli3 and Akash Kumar1
1TU Dresden, DE; 2EPFL, CH; 3École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
Emerging reconfigurable nanotechnologies allow the implementation of self-dual functions with a fewer number of transistors as compared to traditional CMOS technologies. To achieve better area results for Reconfigurable Field-Effect Transistors (RFET)-based circuits, a large portion of a logic representation must be mapped to self-dual logic gates. This, in turn, depends upon how self-duality is preserved in the logic representation during logic optimization and technology mapping. In the present work, we develop Boolean size-optimization methods– a rewriting and a resubstitution algorithms using Xor-Majority Graphs(XMGs) as a logic representation aiming at better preserving self-duality during logic optimization. XMGs are more compact for both unate and binate logic functions as compared to conventional logic representations such as And-Inverter Graphs(AIGs) or Majority-Inverter Graphs (MIGs). We evaluate the proposed algorithm over crafted benchmarks (with various levels of self-duality), and cryptographic benchmarks. For cryptographic benchmarks with a high self-duality ratio, the XMG-based logic optimisation flow can achieve an area reduction of up to17% when compared to AIG-based optimization flows implemented in the academic logic synthesis tool ABC.
17:45 CEST 3.5.2 AUTOSYMMETRY OF INCOMPLETELY SPECIFIED FUNCTIONS
Speaker:
Valentina Ciriani, Università degli Studi di Milano, IT
Authors:
Anna Bernasconi1 and Valentina Ciriani2
1Universita' di Pisa, IT; 2Universita' degli Studi di Milano, IT
Abstract
Autosymmetric Boolean functions are “regular functions” that are rather frequent in the set of Boolean functions describing standard circuits. Autosymmetry is typically exploited for improving the synthesis time and the quality of the optimized circuits. This paper studies in not a naive way, for the first time, the autosymmetry of incompletely specified functions, i.e., Boolean functions with don’t care conditions. The theory of autosymmetry for completely specified functions is extended to the incompletely specified case and a new heuristic algorithm is provided for the detection of autosymmetry. The experimental results validate the theoretical study and show that the 77% of the considered benchmarks has an improved autosymmetry degree.
18:00 CEST IP3_1.1 SYNTHESIS OF SI CIRCUITS FROM BURST-MODE SPECIFICATIONS
Speaker:
Alex Chan, Newcastle University, GB
Authors:
Alex Chan1, Danil Sokolov1, Victor Khomenko1, David Lloyd2 and Alex Yakovlev1
1Newcastle University, GB; 2Dialog Semiconductor, GB
Abstract
In this paper, we present a new workflow that is based on the conversion of Extended Burst-Mode (XBM) specifications to Signal Transition Graphs (STGs). While XBMs offer a simple design entry to specify asynchronous circuits, they cannot be synthesised into speed-independent (SI) circuits, due to the 'burst mode' timing assumption inherent in the model. Furthermore, XBM synthesis tools are no longer supported, and there are no dedicated tools for formal verification of XBMs. Our approach addresses these issues, by granting the XBMs access to sophisticated synthesis and verification tools available for STGs, as well as the possibility to synthesise SI circuits. Experimental results show that our translation only linearly increases the model size and that our workflow achieves a much improved synthesis success rate, with a 33% average reduction in the literal count.
18:01 CEST IP3_1.2 LOW-LATENCY ASYNCHRONOUS LOGIC DESIGN FOR INFERENCE AT THE EDGE
Speaker:
Adrian Wheeldon, Newcastle University, GB
Authors:
Adrian Wheeldon1, Alex Yakovlev1, Rishad Shafik1 and Jordan Morris2
1Newcastle University, GB; 2ARM Ltd, Newcastle University, GB
Abstract
Modern internet of things (IoT) devices leverage machine learning inference using sensed data on-device rather than offloading them to the cloud. Commonly known as inference at-the-edge, this gives many benefits to the users, including personalization and security. However, such applications demand high energy efficiency and robustness. In this paper we propose a method for reduced area and power overhead of self-timed early-propagative asynchronous inference circuits, designed using the principles of learning automata. Due to natural resilience to timing as well as logic underpinning, the circuits are tolerant to variations in environment and supply voltage whilst enabling the lowest possible latency. Our method is exemplified through an inference datapath for a low power machine learning application. The circuit builds on the Tsetlin machine algorithm further enhancing its energy efficiency. Average latency of the proposed circuit is reduced by 10x compared with the synchronous implementation whilst maintaining similar area. Robustness of the proposed circuit is proven through post-synthesis simulation with 0.25 V to 1.2 V supply. Functional correctness is maintained and latency scales with gate delay as voltage is decreased.
18:02 CEST 3.5.3 GOMIL: GLOBAL OPTIMIZATION OF MULTIPLIER BY INTEGER LINEAR PROGRAMMING
Speaker:
Weihua Xiao, Shanghai Jiao Tong University, CN
Authors:
Weihua Xiao1, Weikang Qian1 and Weiqiang Liu2
1Shanghai Jiao Tong University, CN; 2Nanjing University of Aeronautics and Astronautics, CN
Abstract
Multiplier is an important arithmetic circuit. State-of-the-art designs consist of a partial product generator (PPG), a compressor tree (CT), and a carry propagation adder (CPA), with the last two components dominating the area and delay. Existing representative works optimize the CT and the CPA separately, adding a rigid boundary between these two components. In this paper, we break the boundary by proposing GOMIL, a global optimization for multiplier by integer linear programming. Two ILP sub-problems are first formulated to optimize the CT and the prefix structure in the CPA, respectively. Then, they are unified to provide a global optimization to the multiplier. The proposed method is applicable to not only multipliers with the AND gate-based PPG, but also those with Booth encoding-based PPG. The experimental results showed that the multipliers optimized by GOMIL can reduce the power-delay product by up to 71%, compared to the state-of-the-art multipliers developed in industry. The code of GOMIL is made open-source.

3.6 RF and High-Speed design challenges

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/YAc6b9SZ7aFfw6EPk

Session chair:
Manuel Barragan, TIMA, FR

Session co-chair:
Marc Margalef-Rovira, IEMN, FR

This session focuses on simulation, design and test challenges for state-of-the-art RF and high-speed circuits and systems. The first paper deals with the efficient simulation of noise in RF systems. The second paper proposes an open source serial link. the third paper deals with the industrial testing of a complete transceiver using simple digital tester channels.

Time Label Presentation Title
Authors
17:30 CEST 3.6.1 AN EVENT-DRIVEN SYSTEM-LEVEL NOISE ANALYSIS METHODOLOGY FOR RF SYSTEMS
Speaker:
Christoph Beyerstedt, RWTH Aachen University, DE
Authors:
Christoph Beyerstedt, Jonas Meier, Fabian Speicher, Ralf Wunderlich and Stefan Heinen, RWTH Aachen University, DE
Abstract
This paper presents an approach for a system-level noise simulation in the frequency domain for RF systems. The noise analysis is able to depict frequency conversion due to nonlinear or time-varying circuits and can also consider the signal processing in the digital part. Therefore, it is possible to analyze the noise from all parts of the system at a point of choice, e.g. directly at the demodulator input. The approach is based on analog models of the genuine circuit level implementation used for system level exploration or verification purpose. It can be integrated in a conventional system level simulation with nearly no overhead. The noise analysis is implemented on top of an already existing event-driven RF simulation approach which uses a combination of SystemVerilog and C++ for modeling. MATLAB is used for post-processing and visualization of the results.
17:45 CEST 3.6.2 OPENSERDES: AN OPEN SOURCE PROCESS-PORTABLE ALL-DIGITAL SERIAL LINK
Speaker:
Gaurav Kumar K, Purdue University, US
Authors:
Gaurav Kumar K, Baibhab Chatterjee and Shreyas Sen, Purdue University, US
Abstract
Over the last decade, the growing influence of open source software has necessitated the need to reduce the abstraction levels in hardware design. Open source hardware significantly reduces the development time, increasing the probability of first-pass success and enable developers to optimize software solutions based on hardware features, thereby reducing the design costs. The recent introduction of open source Process Development Kit(OpenPDK) by Skywater technologies in June 2020 has eliminated the barriers to Application-Specific Integrated Circuit (ASIC)design, which is otherwise considered expensive and not easily accessible. The OpenPDK is the first concrete step towards achieving the goal of open source circuit blocks that can be imported to reuse and modify in ASIC design. With process technologies scaling down for better performance, the need for entirely digital designs, which can be synthesized in any standard Automatic Place-and-Route (APR) tool, has increased considerably, for mapping physical design to the new process technology. This work presents a first open source all-digital Serializer/Deserializer (SerDes) for multi-GHz serial links designed using Skywater OpenPDK 130nmprocess node. To ensure that the design is fully synthesizable, the SerDes uses CMOS inverter based drivers at the transmitter, while the receiver front end comprises a resistive feedback inverter as a sensing element, followed by sampling elements. A fully digital oversampling CDR at the receiver end recovers the transmitter clock for proper decoding of data bits. The physical design flow utilizes OpenLANE, which is an open source end-to-end tool for generating GDS from RTL. Cadence Virtuoso has been used for extracting parasitics for post-layout simulations, which exhibit the SerDes functionality at 2 Gbps for 34 dB channel loss while consuming 438 mW power. The generated GDS and netlist files of the SerDes, along with the required documentation, are uploaded in a GitHub repository for public access.
18:00 CEST IP3_3.2 CONSTRUCTIVE USE OF PROCESS VARIATIONS: RECONFIGURABLE AND HIGH-RESOLUTION DELAY-LINE
Speaker:
Xiaolin Xu, Northeastern University, US
Authors:
Wenhao Wang1, Yukui Luo2 and Xiaolin Xu2
1ECE Department of Northeastern University, US; 2Northeastern University, US
Abstract
Delay-line is a critical circuit component for highspeed electronic design and testing, such as high-performance FPGA and ASICs, to provide timing signals of specific duration or duty cycle. However, the performance of existing CMOS-based delay-lines is limited by various practical issues. For example, the minimum propagation delay (resolution) of CMOS gates is limited by the process variations from circuit fabrication. This paper presents a novel delay-line scheme, which instead of mitigating the process variations from circuit fabrication, constructively leverages them to generate time signals of specific duration. Moreover, the resolution of the proposed delay-line method is reconfigurable, for which we propose a Machine Learning modeling method to assist such reconfiguration, i.e., to generate time duration of different scales. The performance of the proposed delay-line is validated with HSpice simulation and prototype on a Xilinx Virtex-6 FPGA evaluation kit. The experimental results demonstrate that the proposed delay
18:01 CEST 3.6.3 (Best Paper Award Candidate)
DIGITAL TEST OF ZIGBEE TRANSMITTERS: VALIDATION IN INDUSTRIAL TEST ENVIRONMENT
Speaker:
Thibault Vayssade, University Montpellier, CNRS, LIRMM, FR
Authors:
Thibault VAYSSADE1, Florence AZAIS1, Laurent LATORRE1 and François LEFEVRE2
1University Montpellier, CNRS, LIRMM, FR; 2NXP Semiconductors, FR
Abstract
This paper presents the validation of a low-cost solution for production test of ZigBee transmitters in industrial environment. The solution relies on 1-bit acquisition of a 2.4GHz signal with a standard digital ATE channel using harmonic sampling. Dedicated post-processing algorithms are then applied on the low-frequency binary vector captured by the ATE to retrieve the RF signal characteristics and implement the tests specified by IEEE Std 802.15.4. Experimental results collected on more than 1.5 thousand pieces of a ZigBee transceiver from NXP Semiconductors are presented.

3.7 Lightweight Machine Learning at the Edge

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/xWiyMQQuo7Bq8WctR

Session chair:
Marina Zapater Sancho, HEIG-VD / HES-SO, CH

Session co-chair:
Diana Goehringer, TU Dresden, DE

This session is dedicated to edge innovative methods and applications. The first paper introduces a quantization framework, AIQ, to support adaption at the edge with inference-level bit widths, leveraging gated weight buffering and dynamic error scaling. The second paper proposes a light-weight dedicated Hyperdimensional-Computing platform that targets low power, high energy efficiency, and low latency, while being configurable to support various applications. Both the third regular paper and one of the IP papers optimize communication costs in multi-camera systems, using Long Short-Term Memory to overcome the limitations of Convolutional Networks for fall detection applications, and using a learning-based super-resolution enhancer to compensate for the performance degradation. Finally, one IP paper proposes a Neural Architecture Search framework featuring dynamic channel scaling, to maximize the accuracy under a given latency, and progressive space shrinking to refine the search space.

Time Label Presentation Title
Authors
17:30 CEST 3.7.1 A QUANTIZATION FRAMEWORK FOR NEURAL NETWORK ADAPTION AT THE EDGE
Speaker:
Mengyuan Li, University of Notre Dame, US
Authors:
Mengyuan Li and Xiaobo Sharon Hu, University of Notre Dame, US
Abstract
Edge devices employing a neural network (NN) inference engine running a pre-trained model often perform poorly or simply fail at unseen situations. Meta learning, consisting of meta training, NN adaptation and inference, has been shown to be quite effective in quickly learning and responding to a new environment. The adaption phase, including both forward and backward computation, should be performed on edge devices to maximize the benefit in the few-shot learning application. However, deploying high-precision, full-blown training accelerators at the edge can be rather costly for most Internet of Things applications. This paper reveals some unique observations in the adaptation phase and introduces a quantization framework, AIQ, based on these observations to support adaption at the edge with inference-level bit widths. AIQ includes two key ideas, i.e., gated weight buffering and dynamic error scaling, to reduce memory and computational needs with minimal sacrifice in accuracy. Major modules of AIQ, are synthesized and evaluated. Experimental results show that AIQ saves 41% and 70% weight memory for two widely used datasets while incurring minimum hardware overhead and negligible accuracy loss.
17:45 CEST 3.7.2 TINY-HD: ULTRA-EFFICIENT HYPERDIMENSIONAL COMPUTING ENGINE FOR IOT APPLICATIONS
Speaker:
Behnam Khaleghi, University of California San Diego, US
Authors:
Behnam Khaleghi1, Hanyang Xu1, Justin Morris1 and Tajana Rosing2
1University of California, San Diego, US; 2UCSD, US
Abstract
Hyperdimensional computing (HD) is a new brain-inspired algorithm that mimics the human brain for cognitive tasks. Despite its inherent potential, the practical efficiency of HD is tied to the underlying hardware, which throttles the efficiency of HD in conventional microprocessors. In this paper, we propose tiny-HD, a light-weight dedicated HD platform that targets low power, high energy efficiency, and low latency, while being configurable to support various applications. We leverage an enhanced HD encoding that alleviates the memory requirements and also simplifies the dataflow to make tiny-HD flexible with an efficient architecture. We further augment tiny-HD by pipelining the stages and resource sharing, as well as a data layout that enables opportunistic power reduction. We compared tiny-HD in terms of area, performance, power, and energy consumption with the state-of-the-art HD platforms. tiny-HD occupies 0.5mm^2, consumes 1.6mW standby and 9.6mW runtime power (at 400MHz), with a 0.016ms latency on a set of IoT benchmarks. tiny-HD consumes average per-query energy of 160nJ, which outperforms the state-of-the-art FPGA and ASIC implementations by 95.5x and 11.2x, respectively.
18:00 CEST IP3_4.1 RESOLUTION-AWARE DEEP MULTI-VIEW CAMERA SYSTEMS
Speaker:
Zeinab Hakimi, Pennsylvania State University, US
Authors:
Zeinab Hakimi1 and Vijaykrishnan Narayanan2
1Pennsylvania State University, US; 2Penn State University, US
Abstract
Recognizing 3D objects with multiple views is an important problem in computer vision. However, multi view object recognition can be challenging for networked embedded intelligent systems (IoT devices) as they have data transmission limitation as well as computational resource constraint. In this work, we design an enhanced multi-view distributed recognition system which deploys a view importance estimator to transmit data with different resolutions. Moreover, a multi-view learning-based superresolution enhancer is used at the back-end to compensate for the performance degradation caused by information loss from resolution reduction. The extensive experiments on the benchmark dataset demonstrate that the designed resolution-aware multi-view system can decrease the endpoint’s communication energy by a factor of 5X while sustaining accuracy. Further experiments on the enhanced multi-view recognition system show that accuracy increment can be achieved with minimum effect on the computational cost of back-end system.
18:01 CEST IP3_4.2 HSCONAS: HARDWARE-SOFTWARE CO-DESIGN OF EFFICIENT DNNS VIA NEURAL ARCHITECTURE SEARCH
Speaker:
Xiangzhong Luo, Nanyang Technological University, SG
Authors:
Xiangzhong Luo, Di Liu, Shuo Huai and Weichen Liu, Nanyang Technological University, SG
Abstract
In this paper, we present a novel multi-objective hardware-aware neural architecture search (NAS) framework, namely HSCoNAS, to automate the design of deep neural networks (DNNs) with high accuracy but low latency upon target hardware. To accomplish this goal, we first propose an effective hardware performance modeling method to approximate the runtime latency of DNNs on target hardware, which will be integrated into HSCoNAS to avoid the tedious on-device measurements. Besides, we propose two novel techniques, i.e., dynamic channel scaling to maximize the accuracy under the specified latency and progressive space shrinking to refine the search space towards target hardware as well as alleviate the search overheads. These two techniques jointly work to allow HSCoNAS to perform fine-grained and efficient explorations. Finally, an evolutionary algorithm (EA) is incorporated to conduct the architecture search. Extensive experiments on ImageNet are conducted upon diverse target hardware, i.e., GPU, CPU, and edge device to demonstrate the superiority of HSCoNAS over recent state-of-the-art approaches.
18:02 CEST 3.7.3 A VIDEO-BASED FALL DETECTION NETWORK BY SPATIO-TEMPORAL JOINT-POINT MODEL ON EDGE DEVICES
Speaker:
Ziyi Guan, Southern University of Science and Technology, CN
Authors:
Ziyi Guan1, Shuwei Li1, Yuan Cheng2, Changhai Man1, Wei Mao1, Ngai Wong3 and Hao Yu1
1Southern University of Science and Technology, CN; 2Shanghai Jiao Tong University, CN; 3University of Hong Kong, CN
Abstract
Tripping or falling is among the top threats in elderly healthcare, and the development of automatic fall detection systems are of considerable importance. With the fast development of the Internet of Things (IoT), camera vision-based solutions have drawn much attention in recent years. The traditional fall video analysis on the cloud has significant communication overhead. This work introduces a fast and lightweight video fall detection network based on a spatio-temporal joint-point model to overcome these hurdles. Instead of detecting falling motion by the traditional Convolutional Neural Networks (CNNs), we propose a Long Short-Term Memory (LSTM) model based on time-series joint-point features, extracted from a pose extractor and then filtered from a geometric joint-point filter. Experiments are conducted to verify the proposed framework, which shows a high sensitivity of 98.46% on Multiple Cameras Fall Dataset and 100% on UR Fall Dataset. Furthermore, our model can achieve pose estimation tasks simultaneously, attaining 73:3 mAP in the COCO keypoint challenge dataset, which outperforms the OpenPose work by 8%.

3.8 Industrial Design Methods and Tools: RISC-V

Date: Tuesday, 02 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/hdCNhvWzswFnA7Wwp

Organizer:
Jürgen Haase, edacentrum GmbH, DE

This Exhibition Workshop features industrial design methods and tools. It is open to conference delegates as well as to exhibition visitors.

Time Label Presentation Title
Authors
17:30 CEST 3.8.1 ANDES RISC-V PROCESSOR IP SOLUTIONS
Speaker:
Florian Wohlrab, Andes Technology, TW
Abstract

The SoC industry has seen the fast-growing and diversified demands for a wide range of RISC-V based products: from tiny low-power MCUs for consumer devices, to chips powering enterprise-grade products and datacenter servers; from one power-efficient core to a thousand GHz+ cores working cohesively. To serve the market, Andes has developed a rich portfolio of AndesCore processor IPs already used in the above scenarios. They include compact single-issue cores to feature-rich Linux-capable superscalar cores, cacheless single cores to cache-coherence multicores, and cores capable of processing floating-point and DSP data to those crunching a large volume of vector data. Based on the solid foundation, Andes continues to enrich our product offerings for higher performance efficiency as well as more flexible configurations.

In this talk, we will first give an overview of Andes existing V5 RISC-V processor lineup and present examples of how V5 processors are used in SoC. Then, we will introduce V5 IPs newly added to Andes processor portfolio, the associated software support and their performance data. We will provide an update of Andes Custom Extension™ (ACE) and show how it can further accelerate control and data paths in applications.


IP3_1 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/jpCmxWZZBBXmFoAEm

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP3_1.1 SYNTHESIS OF SI CIRCUITS FROM BURST-MODE SPECIFICATIONS
Speaker:
Alex Chan, Newcastle University, GB
Authors:
Alex Chan1, Danil Sokolov1, Victor Khomenko1, David Lloyd2 and Alex Yakovlev1
1Newcastle University, GB; 2Dialog Semiconductor, GB
Abstract
In this paper, we present a new workflow that is based on the conversion of Extended Burst-Mode (XBM) specifications to Signal Transition Graphs (STGs). While XBMs offer a simple design entry to specify asynchronous circuits, they cannot be synthesised into speed-independent (SI) circuits, due to the 'burst mode' timing assumption inherent in the model. Furthermore, XBM synthesis tools are no longer supported, and there are no dedicated tools for formal verification of XBMs. Our approach addresses these issues, by granting the XBMs access to sophisticated synthesis and verification tools available for STGs, as well as the possibility to synthesise SI circuits. Experimental results show that our translation only linearly increases the model size and that our workflow achieves a much improved synthesis success rate, with a 33% average reduction in the literal count.
IP3_1.2 LOW-LATENCY ASYNCHRONOUS LOGIC DESIGN FOR INFERENCE AT THE EDGE
Speaker:
Adrian Wheeldon, Newcastle University, GB
Authors:
Adrian Wheeldon1, Alex Yakovlev1, Rishad Shafik1 and Jordan Morris2
1Newcastle University, GB; 2ARM Ltd, Newcastle University, GB
Abstract
Modern internet of things (IoT) devices leverage machine learning inference using sensed data on-device rather than offloading them to the cloud. Commonly known as inference at-the-edge, this gives many benefits to the users, including personalization and security. However, such applications demand high energy efficiency and robustness. In this paper we propose a method for reduced area and power overhead of self-timed early-propagative asynchronous inference circuits, designed using the principles of learning automata. Due to natural resilience to timing as well as logic underpinning, the circuits are tolerant to variations in environment and supply voltage whilst enabling the lowest possible latency. Our method is exemplified through an inference datapath for a low power machine learning application. The circuit builds on the Tsetlin machine algorithm further enhancing its energy efficiency. Average latency of the proposed circuit is reduced by 10x compared with the synchronous implementation whilst maintaining similar area. Robustness of the proposed circuit is proven through post-synthesis simulation with 0.25 V to 1.2 V supply. Functional correctness is maintained and latency scales with gate delay as voltage is decreased.

IP3_2 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/G2ovS3MSZWxuNM6cP

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP3_2.1 GENERIC SAMPLE PREPARATION FOR DIFFERENT MICROFLUIDIC PLATFORMS
Speaker:
Sudip Poddar, Johannes Kepler University Linz, Austria, AT
Authors:
Sudip Poddar1, Gerold Fink2, Werner Haselmayr1 and Robert Wille2
1Johannes Kepler University, AT; 2Johannes Kepler University Linz, AT
Abstract
Sample preparation plays a crucial role in several medical applications. Microfluidic devices or Labs-on-Chips (LoCs) got established as a suitable solution to realize this task in a miniaturized, integrated, and automatic fashion. Over the years, a variety of different microfluidic platforms emerged, which all have their respective pros and cons. Accordingly, numerous approaches for sample preparation have been proposed—each specialized on a single platform only. In this work, we propose an idea towards a generic sample preparation approach which will generalize the constraints of the different microfluidic platforms and, by this, will provide a platform-independent sample preparation method. This will allow designers to quickly check what existing platform is most suitable for the considered task and to easily support upcoming and future microfluidic platforms as well. We illustrate the applicability of the proposed method with examples for various platforms.
IP3_2.2 RAISE: A RESISTIVE ACCELERATOR FOR SUBJECT-INDEPENDENT EEG SIGNAL CLASSIFICATION
Speaker:
Fan Chen, Duke University, US
Authors:
Fan Chen1, Linghao Song1, Hai (Helen) Li2 and Yiran Chen1
1Duke University, US; 2Duke University/TUM-IAS, US
Abstract
State-of-the-art deep neural networks (DNNs) for electroencephalography (EEG) signals classification focus on subject-related tasks, in which the test data and the training data needs to be collected from the same subject. In addition, due to limited computing resources and strict power budgets at edges, it is very challenging to deploy the inference of such DNN models on biological devices. In this work, we present an algorithm/hardware co-designed low-power accelerator for subject-independent EEG signal classification. We propose a compact neural network that is capable to identify the common and stable structure among subjects. Based on it, we realize a robust subject-independent EEG signal classification model that can be extended to multiple BCI tasks with minimal overhead. Based on this model, we present RAISE, a low-power processing-in-memory inference accelerator by leveraging the emerging resistive memory. We compare the proposed model and hardware accelerator to prior arts across various BCI paradigms. We show that our model achieves the best subject-independent classification accuracy, while RAISE achieves 2.8x power reduction and 2.5x improvement in performance per watt compared to the state-of-the-art resistive inference accelerator.

IP3_3 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/fd6MteFcgFbqQecQy

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP3_3.1 DOUBLE DQN FOR CHIP-LEVEL SYNTHESIS OF PAPER-BASED DIGITAL MICROFLUIDIC BIOCHIPS
Speaker:
Fang-Chi Wu, Department of Computer Science and Engineering, National Sun Yat-Sen University, TW
Authors:
Fang-Chi Wu1, Jian-De Li2, Katherine Shu-Min Li1, Sying-Jyan Wang2 and Tsung-Yi Ho3
1National Sun Yat-sen University, TW; 2National Chung Hsing University, TW; 3National Tsing Hua University, TW
Abstract
Paper-based digital microfluidic biochip (PB-DMFB) technology is one of the most promising solutions in biochemical applications due to the paper substrate. The paper substrate makes PB-DMFBs more portable, cost-effective, and less dependent on manufacturing equipment. However, the single-layer paper substrate, which entangles electrodes, conductive wires, and droplet routing in the same layer, raises challenges to chip-level synthesis of PB-DMFBs. Furthermore, current design automation tools have to address various design issues including manufacturing cost, reliability, and security. Therefore, a more flexible chip-level synthesis method is necessary. In this paper, we propose the first reinforcement learning based chip-level synthesis for PB-DMFBs. Double deep Q-learning networks are adapted for the agent to select and estimate actions, and then we obtain the optimized synthesis results. Experimental results show that the proposed method is not only effective and efficient for chip-level synthesis but also scalable to reliability and security–oriented schemes.
IP3_3.2 CONSTRUCTIVE USE OF PROCESS VARIATIONS: RECONFIGURABLE AND HIGH-RESOLUTION DELAY-LINE
Speaker:
Xiaolin Xu, Northeastern University, US
Authors:
Wenhao Wang1, Yukui Luo2 and Xiaolin Xu2
1ECE Department of Northeastern University, US; 2Northeastern University, US
Abstract
Delay-line is a critical circuit component for highspeed electronic design and testing, such as high-performance FPGA and ASICs, to provide timing signals of specific duration or duty cycle. However, the performance of existing CMOS-based delay-lines is limited by various practical issues. For example, the minimum propagation delay (resolution) of CMOS gates is limited by the process variations from circuit fabrication. This paper presents a novel delay-line scheme, which instead of mitigating the process variations from circuit fabrication, constructively leverages them to generate time signals of specific duration. Moreover, the resolution of the proposed delay-line method is reconfigurable, for which we propose a Machine Learning modeling method to assist such reconfiguration, i.e., to generate time duration of different scales. The performance of the proposed delay-line is validated with HSpice simulation and prototype on a Xilinx Virtex-6 FPGA evaluation kit. The experimental results demonstrate that the proposed delay

IP3_4 Interactive Presentations

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/c3kZHSMFp9WHTDNNG

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP3_4.1 RESOLUTION-AWARE DEEP MULTI-VIEW CAMERA SYSTEMS
Speaker:
Zeinab Hakimi, Pennsylvania State University, US
Authors:
Zeinab Hakimi1 and Vijaykrishnan Narayanan2
1Pennsylvania State University, US; 2Penn State University, US
Abstract
Recognizing 3D objects with multiple views is an important problem in computer vision. However, multi view object recognition can be challenging for networked embedded intelligent systems (IoT devices) as they have data transmission limitation as well as computational resource constraint. In this work, we design an enhanced multi-view distributed recognition system which deploys a view importance estimator to transmit data with different resolutions. Moreover, a multi-view learning-based superresolution enhancer is used at the back-end to compensate for the performance degradation caused by information loss from resolution reduction. The extensive experiments on the benchmark dataset demonstrate that the designed resolution-aware multi-view system can decrease the endpoint’s communication energy by a factor of 5X while sustaining accuracy. Further experiments on the enhanced multi-view recognition system show that accuracy increment can be achieved with minimum effect on the computational cost of back-end system.
IP3_4.2 HSCONAS: HARDWARE-SOFTWARE CO-DESIGN OF EFFICIENT DNNS VIA NEURAL ARCHITECTURE SEARCH
Speaker:
Xiangzhong Luo, Nanyang Technological University, SG
Authors:
Xiangzhong Luo, Di Liu, Shuo Huai and Weichen Liu, Nanyang Technological University, SG
Abstract
In this paper, we present a novel multi-objective hardware-aware neural architecture search (NAS) framework, namely HSCoNAS, to automate the design of deep neural networks (DNNs) with high accuracy but low latency upon target hardware. To accomplish this goal, we first propose an effective hardware performance modeling method to approximate the runtime latency of DNNs on target hardware, which will be integrated into HSCoNAS to avoid the tedious on-device measurements. Besides, we propose two novel techniques, i.e., dynamic channel scaling to maximize the accuracy under the specified latency and progressive space shrinking to refine the search space towards target hardware as well as alleviate the search overheads. These two techniques jointly work to allow HSCoNAS to perform fine-grained and efficient explorations. Finally, an evolutionary algorithm (EA) is incorporated to conduct the architecture search. Extensive experiments on ImageNet are conducted upon diverse target hardware, i.e., GPU, CPU, and edge device to demonstrate the superiority of HSCoNAS over recent state-of-the-art approaches.

UB.08 University Booth

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/yyKegZc6NTxSCQx6t

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.08 DEFACTO: DESIGN AUTOMATION FOR SMART FACTORIES
Speaker:
Michele Lora, University of Verona, IT
Authors:
Michele Lora1, Pierluigi Nuzzo2 and Franco Fummi1
1University of Verona, IT; 2University of Southern California, US
Abstract
The DeFacto project develops modeling paradigms, algorithms, and tools for the design of advanced manufacturing systems. Central to the project is the CHASE framework, combining a pattern-based specification language with a rigorous synthesis and verification back-end based on assume-guarantee contracts. The front-end supports automatic translation of requirements to low-level mathematical languages. The synthesis and verification back-end uses the mathematical formalism of contracts to reason about the design from specification to implementation.

The demonstration shows the application of CHASE to the design of the control software governing a set of manufacturing tasks. Components and operations are specified in CHASE and formalized using contracts. CHASE coordinates its back-end tools to validate systems requirements, generate and validate implementations, highlighting the effectiveness of the decomposition mechanisms provided by contracts in the design of complex systems.

/system/files/webform/tpc_registration/UB.8-DeFacto.pdf

UB.09 University Booth

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/xGkFbnq8AAbkFkkCu

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.09 GREYHOUND: DEEP FUZZING IOT WIRELESS PROTOCOLS
Speaker:
Matheus E. Garbelini, Singapore University of Technology and Design, SG
Authors:
Sudipta Chattopadhyay and Matheus E. Garbelini, Singapore University of Technology and Design, SG
Abstract
In this booth, we present our recent works on automatically discovering (IoT) wireless protocol vulnerabilities.
We discuss how well understood IoT wireless protocol features can go wrong at the design or implementation phase and contribute to the latest and relevant Wi-Fi and Bluetooth vulnerabilities that challenge our current trust in IoT technologies. We also dive deep on state-of-the-art wireless testing, which currently lacks proper tools when compared to common software testing, and present a unique insight of how to apply over the air testing to discover wireless vulnerabilities using off-the-shelf hardware. Lastly, we display our core ideas of Fuzzing (nicknamed "Greyhound") that made the discovery of SweynTooth possible (set of Bluetooth Low Energy vulnerabilities challenging millions of IoT products) and discuss why related vulnerabilities can be present under the nose of wireless system-on-chip vendors for many years without notice.

/system/files/webform/tpc_registration/UB.9-Greyhound.pdf

UB.10 University Booth

Date: Tuesday, 02 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/gvcJjbzpJ4ETDuGyh

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.10 SRAM-PUF: PLATFORM FOR ACQUISITION OF SRAM-BASED PUFS FROM MICRO-CONTROLLERS
Speaker:
Sergio Vinagrero, University of Grenoble Alpes, FR
Authors:
Sergio Vinagrero1, Honorio Martin2, Ioana Vatajelu3 and Giorgio Di Natale3
1University of Grenoble Alpes, FR; 2University Carlos III of Madrid, ES; 3TIMA-CNRS, FR
Abstract
This demonstration shows a versatile platform for the acquisition of the content of SRAM memories embedded in microcontrollers at power-up. The platform is able to power-off and -on hundreds of microcontrollers and to retrieve the content of their SRAMs thanks to a scan chain connecting all boards. The data collected is then stored in a database to enable reliability analysis.
/system/files/webform/tpc_registration/UB.10-SRAM-PUF.pdf

4.1 Digital Twins

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 08:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/rTGM9SfGvaQh2pYXM

Session chair:
Roland Jancke, Fraunhofer IIS, Division Engineering of Adaptive Systems EAS, DE

Session co-chair:
Jean-Marie Brunet, Mentor, US

Organizers:
Enrico Macii, Politecnico di Torino, IT
Frank Schirrmeister, Cadence, US

Digital Twins (DTs )are becoming essential instruments in the process of industry digitalization. This session presents different applications of digital twins. First, DTs can be used for enabling and supporting car-as-a-service. Second, they can be exploited to monitor and optimize extra-functional aspects in production lines, such as energy consumption and communications. Next, cognitive DTs are introduced as the next stage of advancement of a digital twin that will help realizing the vision of Industry 4.0. Finally, technologies for dynamically introducing fault structures into DTs without the need to change the virtual prototype model are presented.

Time Label Presentation Title
Authors
07:00 CEST 4.1.1 ENABLING AND SUPPORTING CAR-AS-A-SERVICE BY DIGITAL TWIN MODELING AND DEPLOYMENT
Speaker:
Charles Steinmetz, University of Applied Sciences Hamm-Lippstadt, DE
Authors:
Charles Steinmetz1, Greyce N. Schroeder2, Achim Rettberg3, Ricardo Nagel Rodrigues4 and Carlos Eduardo Pereira2
1Hochschule Hamm-Lippstadt - Campus Lippstadt, DE; 2Federal University of Rio Grande do Sul, BR; 3University of Applied Science Hamm-Lippstadt & University Oldenburg, DE; 4FURG, BR
Abstract
Smart City is one area of application for the Internet of Things (IoT) and it has been attracting attention from both academia and industry. Cities will be composed by autonomous parts that communicate and provide services to each other. For instance, cars (autonomous or not) may be seen as a service that transports people from one point to another. Interactions between users and these kinds of services will grow, making it necessary to have digitization of all these parts of the Smart City. The Digital Twin (DT) concept proposes that real-world assets have a virtual representation connecting the physical world with the cyber world. This allows to track the whole life-cycle of this object as well as perform simulations with current or previously stored data. In this context, this work proposes the use of Digital Twin for enabling and supporting car-as-a-service (CaaS). A case study has been developed to demonstrate the modeling and the deployment of the Digital Twin, highlighting how this concept can be one of the key enablers for CaaS.
07:15 CEST 4.1.2 DIGITAL TWIN EXTENSION WITH EXTRA-FUNCTIONAL PROPERTIES
Speaker:
Sara Vinco, Politecnico di Torino, IT
Authors:
Khaled Alamin1, Nicola Dall'Ora2, Enrico Fraccaroli3, Sara Vinco1, Davide Quaglia2 and Massimo Poncino1
1Politecnico di Torino, IT; 2University of Verona, IT; 3Università degli Studi di Verona, IT
Abstract
Digital twins of production lines do not focus solely on the management of the production process, they can also monitor and optimize other extra-functional aspects such as energy consumption and communications. This paper proposes the extension of digital twin concept in such directions. First, we extend the digital twin with models of energy consumption, that allow the monitoring of production line components throughout production lifetime. Then, we propose a flow to design the communication network starting from information obtained from the digital twin concerning the production, usage and flowing of information through the plant. All these methodologies start from the production line specification, then they enrich it with data collected during operation, and finally information is used to perform design and optimization. Results have been shown on a real Industry 4.0 research facility.
07:30 CEST 4.1.3 COGNITIVE DIGITAL TWIN FOR MANUFACTURING SYSTEMS
Speaker:
Mohammad Abdullah Al Faruque, University of California, Irvine, US
Authors:
Mohammad Al Faruque1, Deepan Muthirayan2, Shih-Yuan Yu2 and Pramod P. Khargonekar2
1University of California Irvine, US; 2University of California, Irvine, US
Abstract
A digital twin is the virtual replica of a physical system. Digital twins are useful because they provide models and data for design, production, operation, diagnostics, and autonomy of machines and products. Hence, the digital twin has been projected as the key enabler of the Visions of Industry 4.0. The digital twin concept has become increasingly sophisticated and capable over time, enabled by many technologies. In this paper, we propose the cognitive digital twin as the next stage of advancement of a digital twin that will help realize the vision of Industry 4.0. Cognition, which is inspired by advancements in cognitive science, machine learning, and artificial intelligence, will enable a digital twin to achieve some critical elements of cognition, e.g., attention (selective focusing), perception (forming useful representations of data), memory (encoding and retrieval of information and knowledge), etc. Our main thesis is that cognitive digital twins will allow enterprises to creatively, effectively, and efficiently exploit implicit knowledge drawn from the experience of existing manufacturing systems and enable the transfer of higher performance decisions and control and improve the performance across the enterprise (at scale). Finally, we present open questions and challenges to realize these capabilities in a digital twin.
07:45 CEST 4.1.4 DYNAMIC FAULT INJECTION INTO DIGITAL TWINS OF SAFETY-CRITICAL SYSTEMS
Speaker:
Thomas Markwirth, Fraunhofer EAS/ IIS, DE
Authors:
Thomas Markwirth1, Roland Jancke2 and Christoph Sohrmann2
1Fraunhofer EAS/IIS, DE; 2Fraunhofer IIS/EAS, DE
Abstract
In this work we present a technology for dynamically introducing fault structures into digital twins without the need to change the virtual prototype model. The injection is done at the beginning of a simulation by dynamically rewiring the involved netlists. During the simulation on a real-time platform, faults can be activated or deactivated triggered by sequences, statistical effects or by events from the real world. In some cases the fault structures can even be auto-generated directly from a formal specification, which further automates the development process for safety-relevant systems. The approach is demonstrated at a SystemC/SystemC AMS virtual prototype of a safety-critical sub-systems which runs on a dSPACE real-time hardware.

4.2 Computing for Autonomy

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 08:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/RbqNQAW2gYDXKksno

Session chair:
Saibal Mukhopadhyay, Georgia Tech, US

Session co-chair:
Dimitrios Serpanos, University of Patras, GR

Organizer:
Marilyn Wolf, University of Nebraska at Lincoln, US

Autonomous systems present computational loads that are profoundly different from both traditional algorithmic workloads and non-real-time machine learning. Autonomous systems must operate in real time with low latency for critical operations. They must operate on limited power and thermal budgets given their untethered operation. Machine learning systems apply a wide range of numerical precision to manage bandwidth and power. They must also meet strict safety and security requirements. As a result, computing for autonomy will require novel computing devices and systems at the node and network levels. This special session will explore computing for autonomy as a co-design problem---how do we understand the requirements that autonomy poses on computing systems and how do we build computing platforms that meet those requirements without overdesign?

Time Label Presentation Title
Authors
07:00 CEST 4.2.1 MISSION SPECIFICATION AND EXECUTION OF MULTIDRONE SYSTEMS
Speaker:
Markus Gutmann, University of Klagenfurt, AT
Authors:
Markus Gutmann1 and Bernhard Rinner2
1Alpen-Adria Universität Klagenfurt, AT; 2University of Klagenfurt, AT
Abstract
Small unmanned aerial vehicles, commonly called drones, enable novel applications in many domains. Multidrone systems are a current key trend where several drones operate collectively as an integrated networked autonomous system to complete various missions. The specification and execution of multidrone missions are particularly challenging, since substantial expertise of the mission domain, the drone’s capabilities, and the drones’ software environment is required to properly encode the mission. In this position paper, we introduce a specification language for multidrone missions and describe the transcoding of its components into the multidrone execution environment for both simulations and real drones. The key features of our approach include (i) domain-independence of the mission specification, (ii) readability and ease of use, and (iii) expandability. The specification language has a simple syntax and uses a parameterized description of execution blocks and mission capabilities, which are derived from native drone functions. Domain-independence and expandability are provided by a clear separation between the specification and the implementation of the mission tasks. We demonstrate the effectiveness of our approach with a selected multidrone mission example.
07:15 CEST 4.2.2 PERCEPTION COMPUTING-AWARE CONTROLLER SYNTHESIS FOR AUTONOMOUS SYSTEMS
Speaker:
Samarjit Chakraborty, UNC Chapel Hill, US
Authors:
Clara Hobbs1, Debayan Roy2, Sridhar Duggirala1, F. Donelson Smith1, Soheil Samii3, James Anderson1 and Samarjit Chakraborty1
1UNC Chapel Hill, US; 2TU Munich, DE; 3General Motors, US
Abstract
Feedback control loops are ubiquitous in any autonomous system. The design flow for any controller starts by determining a control strategy, while abstracting away all implementation details. However, when designing controllers for autonomous systems, there is significant computation associated with the perception modules. For example, this involves vision processing using deep neural networks on multicore CPU+accelerator platforms. Such computation can be organized in many different ways, with each choice resulting in very different sensor-to-actuator delays and tradeoffs between cost, delay, and accuracy. Further, each of these choices requires the control strategy to be designed accordingly. It is not possible for a control designer to enumerate and account for all of these choices manually, or abstract them away as "implementation details" as done in traditional controller design. In this paper we outline this problem and discuss how automated controller-synthesis techniques could help in addressing it.
07:30 CEST 4.2.3 CLOSED-LOOP APPROACH TO PERCEPTION IN AUTONOMOUS SYSTEM
Speaker:
Saibal Mukhopadhyay, Georgia Institute of Technology, US
Authors:
Saibal Mukhopadhyay1, Kruttidipta Samal1 and Marilyn Wolf2
1Georgia Institute of Technology, US; 2University of Nebraska, US
Abstract
Currently, functional tasks within Autonomous Systems are balkanized into several sub-systems such as object detection, tracking, motion planning, multi-sensor fusion etc. which are developed and tested in isolation. In recent times, deep learning is used in the perception systems for improved accuracy, but such algorithms are not adaptive to the transient real-world requirements of an Autonomous System such as latency and energy. These limitations are critical for resource constrained systems such as autonomous drones. Therefore, a holistic closed-loop system design is required for building reliable and efficient perception systems for autonomous drones. The closed-loop perception system creates a focus-of-attention based feedback from end-task such as motion planning to control computation within the deep neural networks (DNNs) used in early perception tasks such as object detection. We observe that this closed-loop perception system improves resource utilization of resource hungry DNNs within perception system with minimal impact on motion planning.
07:45 CEST 4.2.4 COMPUTING FOR CONTROL AND CONTROL FOR COMPUTING
Speaker:
Justin Bradley, University of Nebraska-Lincoln, US
Authors:
Xinkai Zhang and Justin Bradley, University of Nebraska-Lincoln, US
Abstract
Computing can be thought of as a service provided to a system to yield actionable tasks enacted by physical hardware. But rarely is control thought to be in the service of enhancing computation. Consideration of that perspective is what motivates co-regulation, our framework for holistic cyber-physical control of autonomous vehicles. In this paper we elaborate on how co-regulation will enable the next generation of autonomous vehicles precisely because it considers computation as an enabler and consumer of autonomous behavior. We report on the latest advances in this space showing how co-regulation exceeds results in event-triggered, self-triggered, and fixed-rate control strategies yielding more robustness and adaptivity to changing and uncertain conditions -- a requirement for next-gen autonomous vehicles. We then describe a co-regulated decision making algorithm based on Markov Decision Processes showing how full consideration of computational resource allocation can increase decision-making capabilities in uncertain environments.

4.3 Approximate computing in processors and GPUs

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/nzwuAJHHfZnQqBMZP

Session chair:
Benjamin Carrion Schaefer, University of Texas at Dallas, US

Session co-chair:
Alberto Bosio, École Centrale de Lyon, FR

Approximate computing has shown to lead to very good results for a variety of hardware platforms. This session clearly highlights this area of research. The first contribution presents different approximation techniques for GPUs, while the second targets general-purpose processors and the last neural network accelerators. Moreover, the two IPs further corroborate this with the first one introducing a precision tunable floating-point multiplier for GPUs and the second one an instruction set simulator framework to evaluate approximations in CPUs.

Time Label Presentation Title
Authors
07:00 CEST 4.3.1 QSLC: QUANTIZATION-BASED, LOW-ERROR SELECTIVE APPROXIMATION FOR GPUS
Speaker:
Sohan Lal, TU Berlin, DE
Authors:
Sohan Lal, Jan Lucas and Ben Juurlink, TU Berlin, DE
Abstract
GPUs use a large memory access granularity (MAG) that often results in a low effective compression ratio for memory compression techniques. The low effective compression ratio is caused by a significant fraction of compressed blocks that have a few bytes above a multiple of MAG. While MAG-aware selective approximation, based on a tree structure, has been used to increase the effective compression ratio and the performance gain, approximation results in a high error that is reduced by using complex optimizations. We propose a simple quantization-based approximation technique (QSLC) that can also selectively approximate a few bytes above MAG. While the quantization-based approximation technique has a similar performance to the state-of-the-art tree-based selective approximation, the average error for the quantization-based technique is 5× lower. We further trade-off the two techniques and show that the area and power overhead of the quantization-based technique is 12.1× and 7.6× lower than the state-of-the-art, respectively. Our sensitivity analysis to different block sizes further shows the opportunities and the significance of MAG-aware selective approximation.
07:15 CEST 4.3.2 VALUE SIMILARITY EXTENSIONS FOR APPROXIMATE COMPUTING IN GENERAL-PURPOSE PROCESSORS
Speaker:
Younghoon Kim, Purdue University, US
Authors:
Younghoon Kim1, Swagath Venkataramani2, Sanchari Sen3 and Anand Raghunathan1
1Purdue University, US; 2IBM T. J. Watson Research Center, US; 3IBM T.J. Watson Research Center, US
Abstract
Approximate Computing (AxC) is a popular design paradigm wherein selected computations are executed approximately to gain efficiency with minimal impact on application-level quality. Most efforts in AxC target specialized accelerators and domain-specific processors, with relatively limited focus on General-Purpose Processors (GPPs). However, GPPs are still broadly used to execute applications that are amenable to AxC, making AxC for GPPs a critical challenge. A key bottleneck in applying AxC to GPPs is that their execution units account for only a small fraction of total energy, requiring a holistic approach targeting compute, memory and control front-ends. This paper proposes such an approach that leverages the application property of value similarity, i.e., input operands to computations that occur close-in-time take similar values. Such similar computations are dynamically pre-detected and the fetch-decode-execute of entire instruction sequences are skipped to benefit performance. To this end, we propose a set of lightweight micro-architectural and ISA extensions called VSX that enable: (i) similarity detection amongst values in a cache-line, (ii) skipping of pre-defined instructions and/or loop iterations when similarity is detected, and (iii) substituting outputs of skipped instructions with saved results from previously executed computations. We also develop compiler techniques, guided by user annotations, to benefit from VSX in the context of common Machine Learning (ML) kernels. Our RTL implementation of VSX for a low-power RISC-V processor incurred 2.13% area overhead and yielded 1.19X-3.84X speedup with <0.5% accuracy loss on 6 ML benchmarks.
07:30 CEST IP5_5.2 TRULOOK: A FRAMEWORK FOR CONFIGURABLE GPU APPROXIMATION
Speaker:
Mohsen Imani, University of California Irvine, US
Authors:
Ricardo Garcia1, Fatemeh Asgarinejad1, Behnam Khaleghi1, Tajana Rosing1 and Mohsen Imani2
1University of California San Diego, US; 2University of California Irvine, US
Abstract
In this paper, we propose TruLook, a framework that employs approximate computing techniques for GPU acceleration through computation reuse as well as approximate arithmetic operations to eliminate redundant and unnecessary exact computations. To enable computational reuse, GPU is enhanced with small lookup tables which are placed close to the stream cores that return already computed values for exact and potential inexact matches. Inexact matching is subject to a threshold that is controlled by the number of mantissa bits involved in the search. Approximate arithmetic is provided by a configurable approximate multiplier that dynamically detects and approximates operations which are not significantly affected by approximation. TruLook guarantees the accuracy bound required for an application by configuring the hardware at runtime. We have evaluated TruLook efficiency on a wide range of multimedia and deep learning applications. Our evaluation shows that with 0% and less than 1% quality loss budget, TruLook yields on average 2.1× and 5.6× energy-delay product improvement over four popular networks on ImageNet dataset.
07:31 CEST IP4_1.2 (Best Paper Award Candidate)
AXPIKE: INSTRUCTION-LEVEL INJECTION AND EVALUATION OF APPROXIMATE COMPUTING
Speaker:
Isaias Felzmann, University of Campinas, BR
Authors:
Isaías Bittencourt Felzmann1, João Fabrício Filho2 and Lucas Wanner3
1University of Campinas, BR; 2Unicamp/UTFPR, BR; 3Unicamp, BR
Abstract
Representing the interaction between accurate and approximate hardware modules at the architecture level is essential to understand the impact of Approximate Computing in a general-purpose computing scenario. However, extensive effort is required to model approximations into a baseline instruction level simulator and collect its execution metrics. In this work, we present the AxPIKE ISA simulation environment, a tool that allows designers to inject models of hardware approximation at the instruction level and evaluate their impact on the quality of results. AxPIKE embeds a high-level representation of a RISC-V system and produces a dedicated control mechanism, that allows the simulated software to manage the approximate behavior of compatible execution scenarios. The environment also provides detailed execution statistics that are forwarded to dedicated tools for energy accounting. We apply the AxPIKE environment to inject integer multiplication and memory access approximations into different applications and demonstrate how the generated statistics are translated into energy-quality trade-offs.
07:32 CEST 4.3.3 A 1D-CRNN INSPIRED RECONFIGURABLE PROCESSOR FOR NOISE-ROBUST LOW-POWER KEYWORDS RECOGNITION
Speaker:
Bo Liu, Southeast University, CN
Authors:
Bo Liu, Zeyu Shen, Lepeng Huang, Yu Gong, Zilong Zhang and Hao Cai, Southeast University, CN
Abstract
A low-power high-accuracy reconfigurable processor is proposed for noise-robust keywords recognition and evaluated in 22nm technology, which is based on an optimized one-dimensional convolutional recurrent neural network (1D-CRNN). In traditional DNN-based keywords recognition system, the speech feature extraction based on traditional algorithms and the DNN based keywords classification are two independent modules. Compared to the traditional architecture, both the feature extraction and keywords classification are processed by the proposed 1D-CRNN with weight/data bit width quantized to 8/8 bits. Therefore unified training and optimization framework can be performed for various application scenarios and input loads. The proposed 1D-CRNN based keywords recognition system can achieve a higher recognition accuracy with reduced computation operations. Based on system-architecture co-design, an energy-efficient DNN accelerator which can be dynamically reconfigured to process the 1D-CRNN with different configurations is proposed. The processing circuits of the accelerator are optimized to further improve the energy efficiency using a fine-grained precision reconfigurable approximate multiplier. Compared to the state-of-the-art architectures, this work can support 1~5 real-time keywords recognition with lower power consumption, while maintaining higher system capability and adaptability.

4.4 Raising performance of Hybrid Memory Systems

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/69WD5QK6XmBfjrisY

Session chair:
Jeronimo Castrillon, TU Dresden, DE

Session co-chair:
Sharad Sinha, IIT Goa, IN

This session presents different solutions to take advantage of emerging memory systems or hardware accelerators. The papers first include various microarchitecture techniques to improve the performance of hybrid memory systems, from a novel prefetcher to a hardware mechanism to ensure crash consistency for systems with NVM. The behavior of Hybrid Bandwidth Memory (HBM) under reduced-voltage operating conditions is thoroughly analyzed. A short paper also discusses how to speed up a fully quantized BERT, a transformer-based model, on an FPGA-based accelerator.

Time Label Presentation Title
Authors
07:00 CEST 4.4.1 (Best Paper Award Candidate)
LSP: COLLECTIVE CROSS-PAGE PREFETCHING FOR NVM
Speaker:
Haiyang Pan, Institute of Computing Technology, CAS, Beijing, China, CN
Authors:
Haiyang Pan1, Yuhang Liu1, Tianyue Lu1 and Mingyu Chen2
1Institute of Computing Technology, Chinese Academy of Sciences, CN; 2Professor, CN
Abstract
As an emerging technique, non-volatile memory (NVM) provides valuable opportunities for boosting the memory system, which is vital for the computing system performance. However, one challenge preventing NVM from replacing DRAM as the main memory is that NVM row activation's latency is much longer (by approximately 10x) than that of DRAM. To address this issue, we present a collective cross-page prefetching scheme that can accurately open an NVM row in advance and then prefetch the data blocks from the opened row with low overhead. We identify a memory access pattern (referred to as a ladder stream) to facilitate prefetching that can cross page boundary, and propose the ladder stream prefetcher (LSP) for NVM. In LSP, two crucial components have been well designed. Collective Prefetch Table is proposed to reduce the interference with demand requests caused by prefetching through speculatively scheduling the prefetching according to the states of the memory queue. It is implemented with low overhead by using single entry to track multiple prefetches. Memory Mapping Table is proposed to accurately prefetch future pages by maintaining the mapping between physical and virtual addresses. Experimental evaluations show that LSP improves the memory system performance with no prefetching by 66\%, and the improvement over the state-of-the-art prefetchers, Access Map Pattern Matching Prefetcher (AMPM), Best-Offset Prefetcher (BOP) and Signature Path Prefetcher (SPP) is 26.6\%, 21.7\% and 27.4\%, respectively.
07:15 CEST 4.4.2 EFFICIENT HARDWARE-ASSISTED OUT-PLACE UPDATE FOR PERSISTENT MEMORY
Speaker:
Yifu Deng, Michigan Technological University, US
Authors:
Yifu Deng1, Jianhui Yue2, Zhiyuan Lu1 and Yifeng Zhu3
1Michigan Technological University, US; 2Michigan Tech. University, US; 3University of Maine, US
Abstract
Shadow paging can guarantee crash consistency forPersistent Memory (PM). However, shadow paging requires theuse of an address mapping table to track shadow pages, andfrequent accesses to this table introduce significant performanceoverhead. In addition, maintaining crash consistency at the gran-ularity level of a page causes a large amount of unnecessary writetraffic. This paper proposes a novel hardware-assisted fine-grainedout-place-update scheme at the granularity level of a cachelineto efficiently support crash consistency for PM. Our designfully leverages the Address Indirection Table (AIT) available incommodity PM to implement remapping. To ensure the atomicityand durability of AIT updates, we propose two policies: eagerpersisting and lazy persisting. We also employ overflow log tohandle the eviction of speculative AIT cache entries upon anoverflow in the AIT cache. Evaluation results based on multicoreworkloads demonstrate that our proposed scheme can improve thetransaction throughput over the state-of-the-art design by 24.0%on average.
07:30 CEST IP4_1.1 (Best Paper Award Candidate)
HARDWARE ACCELERATION OF FULLY QUANTIZED BERT FOR EFFICIENT NATURAL LANGUAGE PROCESSING
Speaker:
Zejian Liu, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, CN
Authors:
Zejian Liu1, Gang Li2 and Jian Cheng1
1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, CN; 2Institute of Automation, Chinese Academy of Sciences, CN
Abstract
BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge computational complexity and memory footprint, we propose to fully quantize the BERT (FQ-BERT), including weights, activations, softmax, layer normalization, and all the intermediate results. Experiments demonstrate that the FQ-BERT can achieve 7.94× compression for weights with negligible performance loss. We then propose an accelerator tailored for the FQ-BERT and evaluate on Xilinx ZCU102 and ZCU111 FPGA. It can achieve a performance-per-watt of 3.18 fps/W, which is 28.91× and 12.72× over Intel(R) Core(TM) i7-8700 CPU and NVIDIA K80 GPU, respectively.
07:31 CEST 4.4.3 UNDERSTANDING POWER CONSUMPTION AND RELIABILITY OF HIGH-BANDWIDTH MEMORY WITH VOLTAGE UNDERSCALING
Speaker:
SeyedSaber NabaviLarimi, Barcelona Supercomputing Center, ES
Authors:
Seyed Saber Nabavi Larimi1, Behzad Salami1, Osman Sabri Unsal1, Adrian Cristal Kestelman1, Hamid Sarbazi-Azad2 and Onur Mutlu3
1Barcelona Supercomputing Center, ES; 2Sharif University of Technology and IPM, IR; 3ETH, CH
Abstract
Modern computing devices employ High-Bandwidth Memory (HBM) to meet their memory bandwidth requirements. An HBM-enabled device consists of multiple DRAM layers stacked on top of one another next to a compute chip (e.g. CPU, GPU, and FPGA) in the same package. Although such HBM structures provide high bandwidth at a small form factor, the stacked memory layers consume a substantial portion of the package’s power budget. Therefore, power-saving techniques that preserve the performance of HBM are desirable. Undervolting is one such technique: it reduces the supply voltage to decrease power consumption without reducing the device’s operating frequency to avoid performance loss. Undervolting takes advantage of voltage guardbands put in place by manufacturers to ensure correct operation under all environmental conditions. However, reducing voltage without changing frequency can lead to reliability issues manifested as unwanted bit flips. In this paper, we provide the first experimental study of real HBM chips under reduced-voltage conditions. We show that the guardband regions for our HBM chips constitute 19% of the nominal voltage. Pushing the supply voltage down within the guardband region reduces power consumption by a factor of 1.5X for all bandwidth utilization rates. Pushing the voltage down further by 11% leads to a total of 2.3X power savings at the cost of unwanted bit flips. We explore and characterize the rate and types of these reduced-voltage-induced bit flips and present a fault map that enables the possibility of a three-factor trade-off among power, memory capacity, and fault rate.

4.5 Sensing, security and performance in smart automotive and energy systems

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/C4zGATEvQ25dfPD3L

Session chair:
Philipp Mundhenk, Robert Bosch GmbH, DE

Session co-chair:
Dip Goswami, Eindhoven University of Technology, NL

This session provides three papers dealing with various aspects of smart automotive and energy systems. The first two papers deal with topics from automated driving and connected vehicles, namely a GPU-based fast depth sensing solution and a secure publish/subscribe protocol for vehicle to cloud communication. The third paper addresses the smart energy domain, in particular, the optimization of the latency of intermittent, storage-less systems powered only through renewable sources.

Time Label Presentation Title
Authors
07:00 CEST 4.5.1 (Best Paper Award Candidate)
A GPU-ACCELERATED DEEP STEREO-LIDAR FUSION FOR REAL-TIME HIGH-PRECISION DENSE DEPTH SENSING
Speaker:
Haitao Meng, Sun Yat-sen University, CN
Authors:
Haitao Meng1, Chonghao Zhong1, Jianfeng Gu1 and Gang Chen2
1Sun Yat-Sen University, CN; 2Sun Yat-sen University, CN
Abstract
Active LiDAR and stereo vision are the most commonly used depth sensing techniques in autonomous vehicles. Each of them alone has weaknesses in terms of density and reliability and thus cannot perform well on all practical scenarios. Recent works use deep neural networks (DNNs) to exploit their complementary properties, achieving a superior depth-sensing. However, these state-of-the-art solutions are not satisfactory on real-time responsiveness due to the high computational complexities of DNNs. In this paper, we present FastFusion, a fast deep stereo-LiDAR fusion framework for real-time high-precision depth estimation. FastFusion provides an efficient two-stage fusion strategy that leverages binary neural network to integrate stereo-LiDAR information as input and use cross-based LiDAR trust aggregation to further fuse the sparse LiDAR measurements in the back-end of stereo matching. More importantly, we present a GPU-based acceleration framework for providing a low latency implementation of FastFusion, gaining both accuracy improvement and real-time responsiveness. In the experiments, we demonstrate the effectiveness and practicability of FastFusion, which obtains a significant speedup over state-of-the-art baselines while achieving comparable accuracy on depth sensing.
07:15 CEST 4.5.2 SPPS: SECURE POLICY-BASED PUBLISH/SUBSCRIBE SYSTEM FOR V2C COMMUNICATION
Speaker:
Mohammad Hamad, Associate Professorship of Embedded Systems and Internet of Things, TUM Department of Electrical and Computer Engineering, TU Munich, DE
Authors:
Mohammad Hamad1, Emanuel Regnath1, Jan Lauinger1, Vassilis Prevelakis2 and Sebastian Steinhorst1
1TU Munich, DE; 2TU Braunschweig, DE
Abstract
The Publish/Subscribe (Pub/Sub) pattern is an attractive paradigm for supporting Vehicle to Cloud (V2C) communication. However, the security threats on confidentiality, integrity, and access control of the published data challenge the adoption of the Pub/Sub model. To address that, our paper proposes a secure policy-based Pub/Sub model for V2C communication, which allows to encrypt and control the access to messages published by vehicles. A vehicle encrypts messages with a symmetric key while saving the key in distributed shares on semi-honest services, called KeyStores, using the concept of secret sharing. The security policy, generated by the same vehicle, authorizes certain cloud services to obtain the shares from the KeyStores. Here, granting access rights takes place without violating the decoupling requirement of the Pub/Sub model. Experimental results show that, besides the end-to-end security protection, our proposed system introduces significantly less overhead (almost 70% less) than the state-of-the-art approach SSL when reestablishing connections, which is a common scenario in the V2C context due to unreliable network connection.
07:30 CEST IP4_2.1 THERMAL COMFORT AWARE ONLINE ENERGY MANAGEMENT FRAMEWORK FOR A SMART RESIDENTIAL BUILDING
Speaker:
Daichi Watari, Osaka University, JP
Authors:
Daichi Watari1, Ittetsu Taniguchi1, Francky Catthoor2, Charalampos Marantos3, Kostas Siozios4, Elham Shirazi5, Dimitrios Soudris3 and Takao Onoye1
1Osaka University, JP; 2imec, KU Leuven, BE; 3National TU Athens, GR; 4Aristotle University of Thessaloniki, GR; 5imec, KU Leuven, EnergyVille, BE
Abstract
Energy management in buildings equipped with renewable energy is vital for reducing electricity costs and maximizing occupant comfort. Despite several studies on the scheduling of appliances, a battery, and heating, ventilating, and air-conditioning (HVAC), there is a lack of a comprehensive and time-scalable approach that integrates predictive information such as renewable generation and thermal comfort. In this paper, we propose an online energy management framework to incorporate the optimal energy scheduling and prediction model of PV generation and thermal comfort by the model predictive control (MPC) approach. The energy management problem is formulated as coordinated three optimization problems covering a fast and slow time-scale. This reduces the time complexity without a significant negative impact on the global nature and quality of the result. Experimental results show that the proposed framework achieves optimal energy management that takes into account the trade-off between the electricity bill and thermal comfort.
07:31 CEST IP4_2.2 ONLINE LATENCY MONITORING OF TIME-SENSITIVE EVENT CHAINS IN SAFETY-CRITICAL APPLICATIONS
Speaker:
Jonas Peeck, TU Braunschweig, Institute of Computer and Network Engineering, DE
Authors:
Jonas Peeck, Johannes Schlatow and Rolf Ernst, TU Braunschweig, DE
Abstract
Highly-automated driving involves chains of perception, decision, and control functions. These functions involve data-intensive algorithms that motivate the use of a data-centric middleware and a service-oriented architecture. As an example we use the open-source project Autoware.Auto. The function chains define a safety-critical automated control task with weakly-hard real-time constraints. However, providing the required assurance by formal analysis is challenged by the complex hardware/software structure of these systems and their dynamics. We propose an approach that combines measurement, suitable distribution of deadline segments, and application-level online monitoring that serves to supervise the execution of service-oriented software systems with multiple function chains and weakly-hard real-time constraints. We use DDS as middleware and apply it to an Autoware.Auto use case.
07:32 CEST 4.5.3 INTERMITTENT COMPUTING WITH EFFICIENT STATE BACKUP BY ASYNCHRONOUS DMA
Speaker:
Mingsong Lv, The Hong Kong Polytechnic University, HK
Authors:
Wei Zhang1, Songran Liu2, Mingsong Lv2, QiuLin Chen3 and Nan Guan1
1The Hong Kong Polytechnic University, HK; 2Northeastern University, CN; 3Huawei Technologies Co., Ltd., CN
Abstract
Energy harvesting promises to power billions of Internet-of-Things devices without being restricted by battery life. The energy output of harvesters is typically tiny and highly unstable, so computing systems must frequently back up program states into non-volatile memory to ensure a program will progress in the presence of frequent power failures. However, state backup is a time-consuming process. In existing solutions for this problem, state backup is conducted sequentially with program execution, which considerably impact system performance. This paper proposes techniques to parallelize state backup and program execution with asynchronous DMA. The challenge is that program states can be incorrectly backed up, which may further cause the program to deliver incorrect computation. Our main idea is to allow errors to occur in parallel state backup and program execution, and detect the errors at the end of the state backup. Moreover, we propose a technique that allows the system to tolerate backup errors during execution without harming logical correctness. We designed a run-time system to implement the proposed approach. Experimental results on an STM32F7-based platform show that execution performance can be considerably improved by parallelizing state backup and program execution.

4.6 Physical Attacks and Countermeasures

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ahzhriQxSZXcKQE5m

Session chair:
Ileana Buhan, Radboud University, NL

Session co-chair:
Francesco Regazzoni, University of Amsterdam and ALaRI - USI, NL

The papers in this session cover powerful recovery strategies for cryptographic implementations for both side channel and fault injections. There are new FI countermeasures proposed and protections for a wide range of applications such as cache attacks, post-quantum crypto and FSMs.

Time Label Presentation Title
Authors
07:00 CEST 4.6.1 GRINCH: A CACHE ATTACK AGAINST GIFT LIGHTWEIGHT CIPHER
Speaker:
Cezar Reinbrecht, TU Delft, NL
Authors:
Cezar Rodolfo Wedig Reinbrecht1, Abdullah Aljuffri2, Said Hamdioui1, Mottaqiallah Taouil1 and Johanna Sepúlveda3
1Delft University of Technology, NL; 2Delft University of Tecchnology, NL; 3Airbus Defence and Space, DE
Abstract
The National Institute of Standard and Technology (NIST) has recently started a competition with the objective to standardize lightweight cryptography (LWC). The winning schemes will be deployed in Internet-of-Things (IoT) devices, a key step for the current and future information and communication technology market. GIFT is an efficient lightweight cipher and it is used by one-fourth of the LWC candidates in the NIST LWC competition. Thus, its security evaluation is critical. One vital threat to the security are so-called logical side-channel attacks based on cache observations. In this work, we propose a novel cache attack on GIFT referred to as GRINCH. We analyzed the vulnerabilities of GIFT and exploited them in our attack. The results show that the attack is effective and that the full key could be recovered with less than 400 encryptions.
07:15 CEST 4.6.2 BLIND SIDE-CHANNEL SIFA
Speaker:
Kostas Papagiannopoulos, NXP Semiconductors, DE
Authors:
Melissa Azouaoui1, Kostas Papagiannopoulos2 and Dominik Zürner2
1Universite Catholique de Louvain, NXP Semiconductors Hamburg, DE; 2NXP Semiconductors Hamburg, DE
Abstract
Statistical Ineffective Fault Attacks (SIFA) have been recently proposed as very powerful key recovery strategies on symmetric cryptographic primitives' implementations. Specifically, they have been shown to bypass many common countermeasures against faults such as redundancy or infection, and to remain applicable even when side-channel countermeasures are deployed. In this work, we investigate combined side-channel and fault attacks and show that a SIFA-like attack can be applied despite not having any direct ciphertext knowledge. The proposed attack exploits the ciphertext's side-channel and fault characteristics to mount successful key recoveries, even in the presence of masking and duplication countermeasures. We analyze the attack using simulations, discuss its requirements, strengths and limitations, and compare different approaches to distinguish the correct key. Finally, we demonstrate its applicability on an ARM Cortex-M4 device, utilizing a combination of laser-based fault injection and microprobe-based EM side-channel analysis.
07:30 CEST IP4_3.1 FEEDING THREE BIRDS WITH ONE SCONE: A GENERIC DUPLICATION BASED COUNTERMEASURE TO FAULT ATTACKS
Speaker:
Jakub Breier, Silicon Austria Labs, AT
Authors:
Anubhab Baksi1, Shivam Bhasin2, Jakub Breier3, Anupam Chattopadhyay4 and Vinay B. Y. Kumar1
1Nanyang Technological University, Singapore, SG; 2Temasek Laboratories, Nanyang Technological University, SG; 3Silicon Austria Labs, AT; 4Nanyang Technological University, SG
Abstract
In the current world of the Internet-of-things and edge computing, computations are increasingly performed locally on small connected systems. As such, those devices are often vulnerable to adversarial physical access, enabling a plethora of physical attacks which is a challenge even if such devices are built for security. As cryptography is one of the cornerstones of secure communication among devices, the pertinence of fault attacks is becoming increasingly apparent in a setting where a device can be easily accessed in a physical manner. In particular, two recently proposed fault attacks, Statistical Ineffective Fault Attack (SIFA) and the Fault Template Attack (FTA) are shown to be formidable due to their capability to bypass the common duplication based countermeasures. Duplication based countermeasures, deployed to counter the Differential Fault Attack (DFA), work by duplicating the execution of the cipher followed by a comparison to sense the presence of any effective fault, followed by an appropriate recovery procedure. While a handful of countermeasures are proposed against SIFA, no such countermeasure is known to thwart FTA to date. In this work, we propose a novel countermeasure based on duplication, which can protect against both SIFA and FTA. The proposal is also lightweight with only a marginally additional cost over simple duplication based countermeasures. Our countermeasure further protects against all known variants of DFA, including Selmke, Heyszl, Sigl’s attack from FDTC 2016. It does not inherently leak side-channel information and is easily adaptable for any symmetric key primitive. The validation of our countermeasure has been done through gate-level fault simulation.
07:31 CEST IP4_3.2 SIDE-CHANNEL ATTACK ON RAINBOW POST-QUANTUM SIGNATURE
Speaker:
Petr Socha, Czech TU in Prague, CZ
Authors:
David Pokorný, Petr Socha and Martin Novotný, Czech TU in Prague, CZ
Abstract
Rainbow, a layered multivariate quadratic digital signature, is a candidate for standardization in a competition-like process organized by NIST. In this paper, we present a CPA side-channel attack on the submitted 32-bit reference implementation. We evaluate the attack on an STM32F3 ARM microcontroller, successfully revealing the full private key. Furthermore, we propose a simple masking scheme with minimum overhead.
07:32 CEST 4.6.3 PATRON: A PRAGMATIC APPROACH FOR ENCODING LASER FAULT INJECTION RESISTANT FSMS
Speaker:
Muhtadi Choudhury, University of Florida, US
Authors:
Muhtadi Choudhury1, Shahin Tajik2 and Domenic Forte1
1University of Florida, US; 2Worcester Polytechnic Institute, US
Abstract
Since Finite State Machines (FSMs) regulate the overall operations in majority of the digital systems, the security of an entire system can be jeopardized if the FSM is vulnerable to physical attacks. By injecting faults into an FSM, an attacker can attain unauthorized access to sensitive states, resulting in information leakage and privilege escalation. One of the powerful fault injection techniques is laser-based fault injection (LFI), which enables an adversary to alter states of individual flip-flops. While standard error correction/detection techniques have been used to protect the FSMs from such fault attacks, their significant overhead makes them unattractive to designers. To keep the over-head minimal, we propose a novel FSM encoding scheme based on decision diagrams that utilizes don’t-care states of the FSM. We demonstrate that PATRON outperforms conventional encoding schemes in terms of both security and scalability for popular benchmarks. Finally, we introduce a vulnerability metric to aid the security analysis, which precisely manifests the susceptibility of FSM designs.

4.7 Improving Efficiency in Training and Inference of Spiking and Non-Spiking Neural Networks

Date: Wednesday, 03 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/fW67kfettgW8yQjfP

Session chair:
Anup Das, Drexel University, US

Session co-chair:
Akash Kumar, TU Dresden, DE

This session presents latest research on Spiking Neural Networks (SNNs) implementations, efficient training and inference with innovative methods. The first paper proposes four enhancing schemes for direct-training algorithms in SNNs to improve accuracy and spike count per each inference. The second one proposes a hybrid analog-spiking Long Short-Term Memory that combines the energy efficiency of SNNs with the performance efficiency of non-spiking NNs. The third paper proposes a framework in which two networks, one for pre-processing and one for the core application, are trained together so as to optimize the performance. Finally, an IP paper proposes to use cellular automata and ensemble Bloom filter to replace costly floating-point or integer calculations with binary operations.

Time Label Presentation Title
Authors
07:00 CEST 4.7.1 AN IMPROVED STBP FOR TRAINING HIGH-ACCURACY AND LOW-SPIKE-COUNT SPIKING NEURAL NETWORKS
Speaker:
Pai-Yu Tan, National Tsing Hua University, TW
Authors:
Pai-Yu Tan1, Cheng-Wen Wu2 and Juin-Ming Lu3
1National Tsing Hua University, TW; 2National Cheng Kung University, TW; 3Industrial Technology Research Institute, TW
Abstract
Spiking Neural Networks (SNNs) that facilitate energy-efficient neuromorphic hardware are getting increasing attention. Directly training SNN with backpropagation has already shown competitive accuracy compared with Deep Neural Networks. Besides the accuracy, the number of spikes per inference has a direct impact on the processing time and energy once employed in the neuromorphic processors. However, previous direct-training algorithms do not put great emphasis on this metric. Therefore, this paper proposes four enhancing schemes for the existing direct-training algorithm, Spatio-Temporal Back-Propagation (STBP), to improve not only the accuracy but also the spike count per inference. We first modify the reset mechanism of the spiking neuron model to address the information loss issue, which enables the firing threshold to be a trainable variable. Then we propose two novel output spike decoding schemes to effectively utilize the spatio-temporal information. Finally, we reformulate the derivative approximation of the non-differentiable firing function to simplify the computation of STBP without accuracy loss. In this way, we can achieve higher accuracy and lower spike count per inference on image classification tasks. Moreover, the enhanced STBP is feasible for the on-line learning hardware implementation in the future.
07:15 CEST 4.7.2 HYBRID ANALOG-SPIKING LONG SHORT-TERM MEMORY FOR ENERGY EFFICIENT COMPUTING ON EDGE DEVICES
Speaker:
Wachirawit Ponghiran, Purdue University, US
Authors:
Wachirawit Ponghiran and Kaushik Roy, Purdue University, US
Abstract
Recurrent neural networks such as Long Short-Term Memory (LSTM) have been used in many sequential learning tasks such as speech recognition and language translation. Running large-scale LSTMs for real-world applications is known to be compute-intensive and often relies on cloud execution. To enable LSTM operations on edge devices that receive inputs in real-time, there is a need to improve LSTM execution efficiency following the limited energy constraint of the mobile platforms. We propose a hybrid analog-spiking LSTM that combines the energy efficiency of spiking neural network (SNN) with the performance efficiency of analog (non-spiking) neural network (ANN). SNN, which processes and represents information as a sequence of sparse binary spikes or events, uses integrate and fire activation, hence consuming low power and energy for real-time inference (batch size of 1). The proposed Analog-Spiking LSTM is derived from a trained LSTM using a novel conversion method that transforms the fully-connected layers and the non-linearity function compatible for SNNs. We show that the default LSTM non-linearities are sources of output mismatch between the ANN and the SNN. We propose a set of replacement functions that lead to a minimal impact on the output quality of sequential learning problems. Our analyses on sequential image classification on MNIST dataset and sequence-to-sequence translation on the IWSLT14 dataset indicate <1% drop in average accuracy for row-wise and pixel-wise sequential image recognition and <1.5 drop in average BLEU score for the translation task. Implementation of the recognition system with the hybrid analog-spiking LSTM on Intel's spiking processor, Loihi, shows 55.9x improvement in active energy per inference over the baseline system on Intel i7-6700. Based on our analysis, we estimate this benefit to be 3.38x reduction in active energy per inference for the translation task.
07:30 CEST IP4_4.1 BLOOMCA: A MEMORY EFFICIENT RESERVOIR COMPUTING HARDWARE IMPLEMENTATION USING CELLULAR AUTOMATA AND ENSEMBLE BLOOM FILTER
Speaker:
Dehua Liang, Graduate School of Information Science and Technology, Osaka University, JP
Authors:
Dehua Liang, Masanori Hashimoto and Hiromitsu Awano, Osaka University, JP
Abstract
In this work, we propose a BloomCA which utilizes cellular automata (CA) and ensemble Bloom filter to organize an RC system by using only binary operations, which is suitable for hardware implementation. The rich pattern dynamics created by CA can map the input into high-dimensional space and provide more features for the classifier. Utilizing the ensemble Bloom filter as the classifier, the features can be memorized effectively. Our experiment reveals that applying the ensemble mechanism to Bloom filter endues a significant reduction in inference memory cost. Comparing with the state-of-the-art reference, the BloomCA achieves a 43x reduction for memory cost without hurting the accuracy. Our hardware implementation also demonstrates that BloomCA achieves over 21x and 43.64% reduction in area and power, respectively.
07:31 CEST 4.7.3 PAIRED TRAINING FRAMEWORK FOR TIME-CONSTRAINED LEARNING
Speaker:
Jung-Eun Kim, Yale University, US
Authors:
Jung-Eun Kim1, Richard Bradford2, Max Del Giudice3 and Zhong Shao3
1Department of Computer Science, Yale University, US; 2Collins Aerospace, US; 3Yale University, US
Abstract
This paper presents a design framework for machine learning applications that operate in systems such as cyber-physical systems where time is a scarce resource. We manage the tradeoff between processing time and solution quality by performing as much preprocessing of data as time will allow. This approach leads us to a design framework in which there are two separate learning networks:one for preprocessing and one for the core application functionality. We show how these networks can be trained together and how they can operate in an anytime fashion to optimize performance.

5.1 AI, ML and Data Analytics

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ZB25LydZB6i4Lp59x

Session chair:
Massimo Poncino, Politecnico di Torino, IT

Session co-chair:
Elias Fallon, Cadence, US

Organizers:
Enrico Macii, Politecnico di Torino, IT
Frank Schirrmeister, Cadence, US

Artificial Intelligence, Machine Learning and Data Analytics are tools that are finding applications in many different contexts, including that of intelligent manufacturing. This session illustrates how the use of such tools can help to solve problems and issues which are specific of the I4.0 paradigm. First, a machine learning approach to the real-time bin picking of randomly filed 3D industrial parts is presented. The method is based on deep learning with/without hybridizing conventional engineering approaches. Second, a system for in-situ defects monitoring and detection for metal additive manufacturing processes is introduced, that leverages an off-axis camera mounted on top of the machine. Next, machine learning is proposed as a key element for full life cycle management of complex equipment solutions based on digital twins. Finally, recent advances are presented regarding the use of artificial neural networks to address the challenges of automation and performance-efficient realizations of MS and NMR.

Time Label Presentation Title
Authors
08:00 CEST 5.1.1 MACHINE LEARNING BASED REAL-TIME INDUSTRIAL BIN-PICKING: HYBRID AND DEEP LEARNING APPROACHES
Speaker:
Sukhan Lee, Intelligent Systems Research Institute, SKKU, KR
Authors:
Sukhan Lee and Soojin Lee, Sungkyunkwan University, KR
Abstract
The real-time pick and place of 3D industrial parts randomly filed in a part-bin plays an important role for manufacturing automation. Approaches based solely on the conventional engineering discipline have been shown limitations in terms of handling multiple parts of arbitrary 3D geometries in real-time. In this paper, we present a machine learning approach to the real-time bin picking of randomly filed 3D industrial parts based on deep learning with/without hybridizing conventional engineering approaches. The proposed hybrid approach, first, makes use of deep learning based object detectors configured in a cascaded form to detect parts in a bin and extract features of the parts detected. Then, the part features and their positions are fed to the engineering approach to the estimation of their 3D poses in a bin. On the other hand, the proposed sole deep learning approach is based on, first, extracting the partial 3D point cloud of the object from its 2D image with the background removed and then transforming the extracted partial 3D point cloud to its full 3D point cloud representation. Or, it may be based on directly transforming the object 2D image with its background removed to the 3D point cloud representation. The experimental results demonstrate that the proposed approaches are able to perform a real-time multiple part bin picking operation for multiple 3D parts of arbitrary geometries with a high precision.
08:15 CEST 5.1.2 IMAGE ANALYTICS AND MACHINE LEARNING FOR IN-SITU DEFECTS DETECTION IN ADDITIVE MANUFACTURING
Speaker:
Davide Cannizzaro, Politecnico di Torino, IT
Authors:
Davide Cannizzaro1, Antonio Giuseppe Varrella1, Stefano Paradiso2, Roberta Sampieri2, Enrico Macii1, Edoardo Patti1 and Santa Di Cataldo1
1Politecnico di Torino, IT; 2FCA Product Development AM Centre, IT
Abstract
In the context of Industry 4.0, metal Additive Manufacturing (AM) is considered a promising technology for medical, aerospace and automotive fields. However, the lack of assurance of the quality of the printed parts can be an obstacle for a larger diffusion in industry. To this date, AM is most of the times a trial-and-error process, where the faulty artefacts are detected only after the end of part production. This impacts on the processing time and overall costs of the process. A possible solution to this problem is the in-situ monitoring and detection of defects, taking advantage of the layer-by-layer nature of the build. In this paper, we describe a system for in-situ defects monitoring and detection for metal Powder Bed Fusion (PBF), that leverages an off-axis camera mounted on top of the machine. A set of fully automated algorithms based on Computer Vision and Machine Learning allow the timely detection of a number of powder bed defects and the monitoring of the object's profile for the entire duration of the build.
08:30 CEST 5.1.3 STRENGTHENING DIGITAL TWIN APPLICATIONS BASED ON MACHINE LEARNING FOR COMPLEX EQUIPMENT
Speaker:
Zijie Ren, South China University of Technology, CN
Authors:
Zijie Ren and Jiafu Wan, South China University of Technology, CN
Abstract
Digital twin technology and machine learning are emerging technologies in recent years. Through digital twin technology, it is virtually possible to virtualize a product, process or service and the information interaction and co-evolution between physical and information world. Machine Learning (ML) can improve the cognitive, reasoning and decision-making abilities of the digital twin through knowledge extraction. The full life cycle management of complex equipment is considered the key to the intelligent transformation and upgrading of the modern manufacturing industry. The application of the above two technologies in the full life cycle management of complex equipment is going to make each stage of the life cycle more responsive, predictable and adaptable. In this study, we have proposed a full life cycle digital twin architecture for complex equipment. We have described four specific scenarios in which two typical machine learning algorithms based on deep reinforcement learning are applied which are further used to enhance digital twin in various stages of complex equipment. At the end of this study, we have summarized the application advantages of the combination of digital twin and machine learning while addressing future research direction in this domain.
08:45 CEST 5.1.4 ARTIFICIAL INTELLIGENCE FOR MASS SPECTROMETRY AND NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY
Speaker:
Michael Hübner, Brandenburg University of Technology Cottbus - Senftenberg, DE
Authors:
Florian Fricke1, Safdar Mahmood2, Javier Hoffmann2, Marcelo Brandalero3, Sascha Liehr4, Simon Kern4, Klas Meyer4, Stefan Kowarik5, Stephan Westerdick1, Michael Maiwald4 and Michael Huebner6
1Ruhr-Universitat Bochum, DE; 2Brandenburgische Technische Universitat, DE; 3Brandenburg University of Technology, DE; 4Bundesanstalt fur Materialforschung und -prufung (BAM), DE; 5University of Graz, AT; 6Brandenburg TU Cottbus, DE
Abstract
Mass Spectrometry (MS) and Nuclear Magnetic Resonance Spectroscopy (NMR) are critical components of every industrial chemical process as they provide information on the concentrations of individual compounds and by-products. These processes are carried out manually and by a specialist, which takes a substantial amount of time and prevents their utilization for real-time closed-loop process control. This paper presents recent advances from two projects that use Artificial Neural Networks (ANNs) to address the challenges of automation and performance-efficient realizations of MS and NMR. In the first part, a complete toolchain has been developed to develop simulated spectra and train ANNs to identify compounds in MS. In the second part, a limited number of experimental NMR spectra have been augmented by simulated spectra to train an ANN with better prediction performance and speed than state-of-theart analysis. These results suggest that, in the context of the digital transformation of the process industry, we are now on the threshold of a possible strongly simplified use of MS and MRS and the accompanying data evaluation by machine-supported procedures, and can utilize both methods much wider for reaction and process monitoring or quality control.
09:00 CEST 5.1.5 LIVE JOINT Q&A
Authors:
Massimo Poncino1, Elias Fallon2, Sukhan Lee3, Davide Cannizzaro1, Michael Huebner4 and Zijie Ren5
1Politecnico di Torino, IT; 2Cadence, US; 3Intelligent Systems Research Institute, Sungkyunkwan University, KR; 4Brandenburg TU Cottbus, DE; 5South China University of Technology, CN
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

5.2 Micro-Architectural Attacks and Defenses

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/3PmrqsT6Re3hcMqTE

Session chair:
Nele Mentens, Leiden University, BE

Session co-chair:
Fatemeh Ganji, WPI, US

This session covers micro-architectural attacks and defenses, which has been a hot topic in the recent years due to the vulnerabilities discovered in well-known processors. The papers in the session present novel cache architectures for mitigating attacks against cache and directories. In addition, exploration of microarchitectural side-channels and prevention of such leakages are also discussed in this session.

Time Label Presentation Title
Authors
08:00 CEST 5.2.1 SCRAMBLE CACHE: AN EFFICIENT CACHE ARCHITECTURE FOR RANDOMIZED SET PERMUTATION
Speaker:
Amine Jaamoum, University Grenoble Alpes, CEA, LETI MINATEC Campus, FR
Authors:
Amine Jaamoum1, Thomas Hiscock1 and Giorgio Di Natale2
1University Grenoble Alpes, CEA, LETI MINATEC Campus, FR; 2TIMA, FR
Abstract
Driven by the need of performance-efficient computations, a large number of systems resort to cache memories. In this context, cache side-channel attacks have been proven to be a serious threat for many applications. Many solutions and countermeasures exist in literature. Nevertheless, the majority of them do not cope with the constraints and limitations imposed by embedded systems. In this paper, we introduce a novel cache architecture that leverages randomized set placement to defeat cache side-channel analysis. A key property of this architecture is its low impact on performance and its small area overhead. We demonstrate that this countermeasure allows protecting the system against known cache side-channel attacks, while guaranteeing small overheads, making this solution suitable also for embedded systems.
08:15 CEST 5.2.2 (Best Paper Award Candidate)
MICROARCHITECTURAL TIMING CHANNELS AND THEIR PREVENTION ON AN OPEN-SOURCE 64-BIT RISC-V CORE
Speaker:
Nils Wistoff, ETH Zurich, CH
Authors:
Nils Wistoff1, Moritz Schneider1, Frank Gurkaynak1, Luca Benini2 and Gernot Heiser3
1ETH Zurich, CH; 2Università di Bologna and ETH Zurich, IT; 3UNSW and Data61, CSIRO, AU
Abstract
Microarchitectural timing channels use variations in the timing of events, resulting from competition for limited hardware resources, to leak information in violation of the operating system’s security policy. Such channels also exist on a simple in-order RISC-V core, as we demonstrate on the open-source RV64GC Ariane core. Time protection, recently proposed and implemented in the seL4 microkernel, aims to prevent timing channels, but depends on a controlled reset of microarchitectural state. Using Ariane, we show that software techniques for performing such a reset are insufficient and highly inefficient. We demonstrate that adding a single flush instruction is sufficient to close all five evaluated channels at negligible hardware costs, while requiring only minor modifications to the software stack.
08:30 CEST IP5_1.1 EXPLORING MICRO-ARCHITECTURAL SIDE-CHANNEL LEAKAGES THROUGH STATISTICAL TESTING
Speaker:
Sarani Bhattacharya, KU Leuven, BE
Authors:
Sarani Bhattacharya1 and Ingrid Verbauwhede2
1Phd, BE; 2KU Leuven - COSIC, BE
Abstract
Micro-architectural side-channel leakage received a lot of attention due to their high impact on software security on complex out-of-order processors. These are extremely specialised threat models and can be only realised in practise with high precision measurement code, triggering micro-architectural behavior that leaks information. In this paper, we present a tool to support the inexperienced user to verify his code for side-channel leakage. We combine two very useful tools- statistical testing and hardware performance monitors to bridge this gap between the understanding of the general purpose users and the most precise speculative execution attacks. We first show that these event counters are more powerful than observing timing variabilities on an executable. We extend Dudect, where the raw hardware events are collected over the target executable, and leakage detection tests are incorporated on the statistics of observed events following the principles of non-specific t-test's. Finally, we show the applicability of our tool on the most popular speculative micro-architectural and data-sampling attack models.
08:31 CEST IP5_1.2 SECLUSIVE CACHE HIERARCHY FOR MITIGATING CROSS-CORE CACHE AND COHERENCE DIRECTORY ATTACKS
Speaker:
Vishal Gupta, Indian Institute of Technology, Kanpur, IN
Authors:
Vishal Gupta1, Vinod Ganesan2 and Biswabandan Panda3
1Indian Institute of Technology, Kanpur, IN; 2Indian Institute of Technology Madras, IN; 3IIT Kanpur, IN
Abstract
Cross-core cache attacks glean sensitive data by exploiting the fundamental interference at the shared resources like the last-level cache (LLC) and coherence directories. Complete non-interference will make cross-core cache attacks unsuccessful. To this end, we propose a seclusive cache hierarchy with zero storage overhead and a marginal increase in on-chip traffic, that provides non-interference by employing cache-privatization on demand. Upon a cross-core eviction by an attacker core at the LLC, the block is back-filled into the private cache of the victim core. Our back-fill strategy mitigates cross-core conflict based LLC and coherence directory-based attacks. We show the efficacy of the seclusive cache hierarchy by comparing it with existing cache hierarchies.
08:32 CEST 5.2.3 TINY-CFA: A MINIMALISTIC APPROACH FOR CONTROL FLOW ATTESTATION USING VERIFIED PROOFS OF EXECUTION
Speaker:
Sashidhar Jakkamsetti, UC Irvine, US
Authors:
Ivan De Oliveira Nunes1, Sashidhar Jakkamsetti2 and Gene Tsudik3
1University of California Irvine, US; 2UC Irvine, US; 3UCI, US
Abstract
The design of tiny trust anchors attracted much attention over the past decade, to secure low-end MCU-s that cannot afford more expensive security mechanisms. In particular, hardware/software (hybrid) co-designs offer low hardware cost, while retaining similar security guarantees as (more expensive) hardware-based techniques. Hybrid trust anchors support security services (such as remote attestation, proofs of software update/erasure/reset, and proofs of remote software execution) in resource-constrained MCU-s, e.g., MSP430 and AVR AtMega32. Despite these advances, detection of control-flow attacks in low-end MCU-s remains a challenge, since hardware requirements for the cheapest mitigation techniques are often more expensive than the MCU-s themselves. In this work, we tackle this challenge by designing Tiny-CFA – a Control-Flow Attestation (CFA) technique with a single hardware requirement – the ability to generate proofs of remote software execution (PoX). In turn, PoX can be implemented very efficiently and securely in low-end MCU-s. Consequently, our design achieves the lowest hardware overhead of any CFA technique, while relying on a formally verified PoX as its sole hardware requirement. With respect to runtime overhead, Tiny-CFA also achieves better performance than prior CFA techniques based on code instrumentation. We implement and evaluate Tiny-CFA, analyze its security, and demonstrate its practicality using real-world publicly available applications.
08:47 CEST IP5_2.1 TOWARDS A FIRMWARE TPM ON RISC-V
Speaker:
Marouene Boubakri, University of Carthage, TN
Authors:
Marouene Boubakri1, Fausto Chiatante2 and Belhassen Zouari1
1Mediatron Lab, Higher School of Communications of Tunis, University of Carthage, Tunisia, TN; 2NXP, FR
Abstract
To develop the next generation of Internet of Things, Edge devices and systems which leverage progress in enabling technologies such as 5G, distributed computing and artificial intelligence (AI), several requirements need to be developed and put in place to make the devices smarter. A major requirement for all the above applications is the long-term security and trust computing infrastructure. Trusted Computing requires the introduction inside of the platform of a Trusted Platform Module (TPM). Traditionally, a TPM was a discrete and dedicated module plugged into the platform to give TPM capabilities. Recently, processors manufacturers started integrating trusted computing features into their processors. A significant drawback of this approach is the need for a permanent modification of the processor microarchitecture. In this context, we suggest an analysis and a design of a software-only TPM for RISC-V processors based on seL4 microkernel and OP-TEE

5.3 Systolic Array Architectures for Machine Learning Acceleration

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/x94HY6u5nNjBptmuj

Session chair:
Giovanni Ansaloni, EPFL, CH

Session co-chair:
Henk Corporaal, TU/e, NL

The systolic array is an efficient design pattern for accelerating matrix multiplication and convolution using a regular grid of multiply accumulate (MAC) units with local communication.  Nonetheless, implementing modern DNN models with depthwise separable convolutions and sparsity remains a challenge.  The first paper in the session presents a new layer, namely FuSeConv which makes depthwise separable convolution systolic-friendly. The second paper presents a Heterogeneous Systolic Array design to solve the problem by increasing the data reuse ratio.  In our third paper, the authors propose a hardware-software co-design approach to support execution of pruned DNNs on a systolic array augmented with additional multiplexers for the operands.  The last paper also addresses the efficient acceleration of sparse DNNs, this time by proposing a scheme that maintains constant probability of index matching and therefore achieves high system performance.

Time Label Presentation Title
Authors
08:00 CEST 5.3.1 FUSECONV: FULLY SEPARABLE CONVOLUTIONS FOR FAST INFERENCE ON SYSTOLIC ARRAYS
Speaker:
Surya Selvam, IIT Madras / Purdue University, IN
Authors:
Surya Selvam1, Vinod Ganesan1 and Pratyush Kumar2
1Indian Institute of Technology Madras, IN; 2IIT Madras, IN
Abstract
Both efficient neural networks and hardware accelerators are being explored to speed up DNN inference on edge devices. For example, MobileNet uses depthwise separable convolution to achieve much lower latency, while systolic arrays provide much higher performance per watt. Interestingly however, the combination of these two ideas is inefficient: The computational patterns of depth-wise separable convolution are not systolic and lack data reuse to saturate the systolic array's constrained dataflow. In this paper, we propose FuSeConv (Fully-Separable Convolution) as a drop-in replacement for depth-wise separable convolution. FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions. The resultant computation is systolic and efficiently utilizes the systolic array with a slightly modified dataflow. With FuSeConv, we achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset. The high speed-up motivates exploration of hardware-aware Neural Operator Search (NOS) in complement to ongoing efforts on Neural Architecture Search (NAS).
08:15 CEST 5.3.2 HESA: HETEROGENEOUS SYSTOLIC ARRAY ARCHITECTURE FOR COMPACT CNNS HARDWARE ACCELERATORS
Speaker:
Rui Xu, National University of Defense Technology, CN
Authors:
Rui Xu, Sheng Ma, Yaohua Wang and Yang Guo, National University of Defense Technology, CN
Abstract
Compact convolutional neural networks have become a hot research topic. However, we find that the hardware accelerator with systolic arrays processing compact models is extremely performance-inefficient, especially when processing depthwise convolutional layers in the networks. To make systolic arrays efficient for compact convolutional neural networks, we propose the heterogeneous systolic array (HeSA) architecture. It introduces heterogeneous processing elements that support multiple modes of dataflow, which can further exploit the reuse data chance of depthwise convolutional layers and without changing the architecture of the naïve systolic array. By increasing the utilization rate of processing elements in the array, HeSA improves the performance, throughput, and energy efficiency compared to the standard baseline. Based on our evaluation with typical workloads, HeSA improves the utilization rate of the computing resource in depthwise convolutional layers by 4.5×-5.5× and acquires 1.5-2.2× total performance speedup compared to the standard systolic array architecture. HeSA also improves the on-chip data reuse chance and saves over 20% of energy consumption. Meanwhile, the area of HeSA is basically unchanged compared to the baseline due to its simple design.
08:30 CEST IP4_5.2 SPRITE: SPARSITY-AWARE NEURAL PROCESSING UNIT WITH CONSTANT PROBABILITY OF INDEX-MATCHING
Speaker:
Sungju Ryu, POSTECH, KR
Authors:
Sungju Ryu1, Youngtaek Oh2, Taesu Kim1, Daehyun Ahn1 and jae-joon kim3
1Pohang University of Science and Technology, KR; 2Pohang university of science and technology, KR; 3postech, KR
Abstract
Sparse neural networks are widely used for memory savings. However, irregular indices of non-zero input activations and weights tend to degrade the overall system performance. This paper presents a scheme to maintain constant probability of index-matching for weight and input over a wide range of sparsity overcoming a critical limitation in previous works. A sparsity-aware neural processing unit based on the proposed scheme improves the system performance up to 6.1X compared to previous sparse convolutional neural network hardware accelerators.
08:31 CEST 5.3.3 HARDWARE-SOFTWARE CODESIGN OF WEIGHT RESHAPING AND SYSTOLIC ARRAY MULTIPLEXING FOR EFFICIENT CNNS
Speaker:
Jingyao Zhang, Xidian University, CN
Authors:
Jingyao Zhang1, Huaxi Gu1, Grace Li Zhang2, Bing Li2 and Ulf Schlichtmann2
1Xidian University, CN; 2TU Munich, DE
Abstract
The last decade has witnessed the breakthrough of deep neural networks (DNNs) in various fields, e.g., image/speech recognition. With the increasing depth of DNNs, the number of multiply-accumulate operations (MAC) with weights explodes significantly, preventing their applications in resource-constrained platforms. The existing weight pruning method is considered to be an effective method to compress neural networks for acceleration. However, the weights after pruning usually exhibit irregular patterns. Implementing MAC operations with such irregular weight patterns on hardware platforms with regular designs, e.g., GPUs and systolic arrays, might result in an underutilization of hardware resources. To utilize the hardware resource efficiently, in this paper, we propose a hardware-software codesign framework for acceleration on systolic arrays. First, weights after unstructured pruning are reorganized into a dense cluster. Second, various blocks are selected to cover the cluster seamlessly. To support the concurrent computations of such blocks on systolic arrays, a multiplexing technique and the corresponding systolic architecture is developed for various CNNs. The experimental results demonstrate that the performance of CNN inferences can be improved significantly without accuracy loss.

5.4 Approximation in neural networks

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/hrvyGCXkG2iJmSzvX

Session chair:
Nikos Bellas, University of Thessaly, GR

Session co-chair:
Lukas Sekanina, Brno University of Technology, CZ

Approximate computing is nowadays widely used for optimizing neural network computing in terms of performance, energy and area. This session presents three papers in the confluence of these two emerging paradigms. First, a methodology is presented that formally studies the effects of approximate (e.g. undervolted) memory in terms of the bit flips it may incur on the functionality of Binary Neural Networks (BNNs). The second paper presents a modified CNN re-training methodology to better accommodate CNN-specific approximation errors. The third paper proposes a stochastic computing accelerator for an efficient implementation of neural networks. An IP paper focuses on the hardware implementation of a Bayesian Confidence Propagation Neural Network with a reduced memory consumption.

Time Label Presentation Title
Authors
08:00 CEST 5.4.1 (Best Paper Award Candidate)
MARGIN-MAXIMIZATION IN BINARIZED NEURAL NETWORKS FOR OPTIMIZING BIT ERROR TOLERANCE
Speaker:
Mikail Yayla, TU Dortmund, DE
Authors:
Sebastian Buschjäger, Jian-Jia Chen, Kuan-Hsun Chen, Mario Günzel, Christian Hakert, Katharina Morik, Rodion Novkin, Lukas Pfahler and Mikail Yayla, TU Dortmund, DE
Abstract
To overcome the memory wall in neural network (NN) inference systems, recent studies have proposed to use approximate memory, in which the supply voltage and access latency parameters are tuned, for lower energy consumption and faster access at the cost of reliability. To tolerate the occurring bit errors, the state-of-the-art approaches apply bit flip injections to the NNs during training, which require high overheads and do not scale well for large NNs and high bit error rates. In this work, we focus on binarized NNs (BNNs), whose simpler structure allows better exploration of bit error tolerance metrics based on margins. We provide formal proofs to quantify the maximum number of bitflips that can be tolerated. With the proposed margin-based metrics and the well-known hinge loss for maximum margin classification in support vector machines (SVMs), we are able to construct a modified hinge loss (MHL) to train BNNs for bit error tolerance without any bit flip injection. Our experimental results indicate that the MHL enables the possibility for BNNs to tolerate higher bit error rates than with bit flip training and, therefore, allows to further lower the requirements on approximate memories used for BNNs
08:15 CEST 5.4.2 KNOWLEDGE DISTILLATION AND GRADIENT ESTIMATION FOR ACTIVE ERROR COMPENSATION IN APPROXIMATE NEURAL NETWORKS
Speaker:
Cecilia De la Parra, Robert Bosch GmbH, DE
Authors:
Cecilia De la Parra1, Xuyi Wu2, Akash Kumar3 and Andre Guntoro1
1Robert Bosch GmbH, DE; 2TU Muenchen, DE; 3TU Dresden, DE
Abstract
Approximate computing is a promising approach for optimizing computational resources of error-resilient applications such as Convolutional Neural Networks (CNNs). However, such approximations introduce an error that needs to be compensated by optimization methods, which typically include a retraining or fine-tuning stage. To efficiently recover from the introduced error, this fine-tuning process needs to be adapted to take CNN approximations into consideration. In this work, we present a novel methodology for fine-tuning approximate CNNs with ultra-low bit-width quantization and large approximation error, which combines knowledge distillation and gradient estimation to recover the lost accuracy due to approximations. With our proposed methodology, we demonstrate energy savings of up to 38% in complex approximate CNNs with weights quantized to 4 bits and 8-bit activations, with less than 3% accuracy loss w.r.t. the full precision model.
08:30 CEST IP4_4.2 APPROXIMATE COMPUTATION OF POST-SYNAPTIC SPIKES REDUCES BANDWIDTH TO SYNAPTIC STORAGE IN A MODEL OF CORTEX
Speaker:
Yu Yang, KTH Royal Institute of Technology, SE
Authors:
Dimitrios Stathis1, Yu Yang2, Ahmed Hemani3 and Anders Lansner4
1KTH Royal Institute of Technology, SE; 2Royal Institute of Technology - KTH, SE; 3KTH - Royal Institute of Technology, SE; 4Stockholm University and KTH Royal Institute of Technology, SE
Abstract
The Bayesian Confidence Propagation Neural Network (BCPNN) is a spiking model of the cortex. The synaptic weights of BCPNN are organized as matrices. They require substantial synaptic storage and a large bandwidth to it. The algorithm requires a dual access pattern to these matrices, both row-wise and column-wise, to access its synaptic weights. In this work, we exploit an algorithmic optimization that eliminates the column-wise accesses. The new computation model approximates the post-synaptic spikes computation with the use of a predictor. We have adopted this approximate computational model to improve upon the previously reported ASIC implementation, called eBrainII. We also present the error analysis of the approximation to show that it is negligible. The reduction in storage and bandwidth to the synaptic storage results in a 48% reduction in energy compared to eBrainII. The reported approximation method also applies to other neural network models based on a Hebbian learning rule.
08:31 CEST 5.4.3 GEO: GENERATION AND EXECUTION OPTIMIZED STOCHASTIC COMPUTING ACCELERATOR FOR NEURAL NETWORKS
Speaker:
Tianmu Li, University of California, Los Angeles, CN
Authors:
Tianmu Li1, Wojciech Romaszkan2, Sudhakar Pamarti3 and Puneet Gupta2
1University of California, Los Angeles, US; 2UCLA, US; 3University of California Los Angeles, US
Abstract
Stochastic computing (SC) has seen a renaissance in recent years as a means for machine learning acceleration due to its compact arithmetic and approximation properties. Still, SC accuracy remains an issue, with prior works either not fully utilizing the computational density or suffering from significant accuracy losses. In this work, we propose GEO – Generation and Execution Optimized Stochastic Computing Accelerator for Neural Networks, which optimizes stream generation and execution components of SC, and bridges the accuracy gap between stochastic computing and fixed-point neural networks. We improve accuracy by coupling controlled stream sharing with training and balancing OR and binary accumulations. GEO further optimizes the SC execution through progressive shadow buffering and architectural optimizations. GEO can improve accuracy compared to state-of-the-art SC by 2.2-4.0% points while being up to 4.4X faster and 5.3X more energy efficient. GEO eliminates the accuracy gap between SC and fixed-point architectures while delivering up tp 5.6X higher throughput, and 2.6X lower energy.

5.5 Optimizing the Memory System for Latency and Throughput

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/Yi7Z64cbuY2EKexK8

Session chair:
Georgios Keramidas, Aristotle University of Thessaloniki, GR

Session co-chair:
Caroline Collange, INRIA, FR

This session presents an application-aware replacement policy and a new scalable memory design for multi-core and many-cores architectures. An efficient solution for enabling on-the-fly error detection and correction in latency sensitive caches is also described. This session concludes with a two-level approach to fine-tune the memory system by using both simulators and real hardware.

Time Label Presentation Title
Authors
08:00 CEST 5.5.1 A FAIRNESS CONSCIOUS CACHE REPLACEMENT POLICY FOR LAST LEVEL CACHE
Speaker:
Shirshendu Das, Indian Institute of Technology Ropar, Punjab, India, IN
Authors:
Kousik Kumar Dutta, Prathamesh Nitin Tanksale and Shirshendu Das, Indian Institute of Technology Ropar, IN
Abstract
Multicore systems with shared Last Level Cache (LLC) posses a bigger challenge in allocating the LLC space among multiple applications running in the system. Since all application uses the shared LLC, interference caused by them may evict important blocks of other applications that result in premature eviction and may also lead to thrashing. Replacement policies applied locally to a set distributes the sets in a dynamic way among the applications. However, previous work on replacement techniques focused on the re-reference aspect of a block or application behavior to improve the overall system performance. The paper proposes a novel cache replacement technique extit{Application Aware Re-reference Interval Prediction (AARIP)} that considers application behavior, re-reference interval, and premature block eviction for replacing a cache block. Experimental evaluation on a four-core system shows that AARIP achieves an overall performance improvement of 7.26%, throughput by 4.9%, and achieves overall system fairness by 7.85%, as compared to the traditional SRRIP replacement policy.
08:15 CEST 5.5.2 MEMPOOL: A SHARED-L1 MEMORY MANY-CORE CLUSTER WITH A LOW-LATENCY INTERCONNECT
Speaker:
Matheus Cavalcante, ETH Zürich, CH
Authors:
Matheus Cavalcante1, Samuel Riedel1, Antonio Pullini2 and Luca Benini3
1ETH Zurich, CH; 2GreenWaves Technologies, FR; 3Università di Bologna and ETH Zurich, IT
Abstract
A key challenge in scaling shared-L1 multi-core clusters towards many-core (more than 16 cores) configurations is to ensure low-latency and efficient access to the L1 memory. In this work we demonstrate that it is possible to scale up the shared-L1 architecture: We present MemPool, a 32 bit many-core system with 256 fast RV32IMA "Snitch" cores featuring application-tunable execution units, running at 700 MHz in typical conditions (TT/0.80 V/25 °C). MemPool is easy to program, with all the cores sharing a global view of a large L1 scratchpad memory pool, accessible within at most five cycles zero-load latency. In MemPool's physical-aware design, we emphasized the exploration, design, and optimization of the low-latency processor-to-L1-memory interconnect. We compare three candidate topologies, analyzing them in terms of latency, throughput, and back-end feasibility. The chosen topology keeps the average latency at fewer than six cycles, even for a heavy injected load of 0.33 request/core/cycle. We also propose a lightweight addressing scheme that maps each core private data to a memory bank accessible within one cycle, which leads to performance gains of up to 20% in real-world signal processing benchmarks. The addressing scheme is also highly efficient in terms of energy consumption since requests to local banks consume only half of the energy required to access remote banks. Our design achieves competitive performance with respect to an ideal, non-implementable full-crossbar baseline.
08:30 CEST IP4_5.1 MEMORY HIERARCHY CALIBRATION BASED ON REAL HARDWARE IN-ORDER CORES FOR ACCURATE SIMULATION
Speaker:
Quentin Huppert, LIRMM, FR
Authors:
Quentin Huppert1, Timon Evenblij2, Manu Komalan2, Francky Catthoor3, Lionel Torres4 and David Novo5
1Lirmm, FR; 2imec, BE; 3IMEC, BE; 4University of Montpellier, FR; 5CNRS, LIRMM, University of Montpellier, FR
Abstract
Computer system simulators are major tools used by architecture researchers. Two key elements play a role in the credibility of simulator results: (1) the simulator’s accuracy, and (2) the quality of the baseline architecture. Some simulators, such as gem5, already provide highly accurate parameterized models. However, finding the right values for all these parameters to faithfully model a real architecture is still a problem. In this paper, we calibrate the memory hierarchy of an in-order core gem5 simulation to accurately model a real mobile Arm SoC. We execute small programs, which are designed to stress specific parts of the memory system, to deduce key parameter values for the model. We compare the execution of SPEC CPU2006 benchmarks on the real hardware with the gem5 simulation. Our results show that our calibration reduces the average and worst-case IPC error by 36% and 50%, respectively, when compared with a gem5 simulation configured with the default parameters.
08:31 CEST 5.5.3 SRAM ARRAYS WITH BUILT-IN PARITY COMPUTATION FOR REAL-TIME ERROR DETECTION IN CACHE TAG ARRAYS
Speaker:
Ramon Canal, Universitat Politècnica de Catalunya, ES
Authors:
Ramon Canal1, Yiannakis Sazeides2 and Arkady Bramnik3
1Universitat Politècnica de Catalunya, ES; 2University of Cyprus, CY; 3Intel, IL
Abstract
This work proposes an SRAM array with built-in real-time error detection (RTD) capabilities. Each cell in the new RTD-SRAM array computes its part of the real-time parity of an SRAM array column on-the-fly. RTD based arrays detect a fault right away after it occurs, rather than when it is read. RTD, therefore, breaks the serialization between data access and error detection and, thus, it can speed-up the access-time of arrays that use on-the-fly error detection and correction. The paper presents an analysis and optimization of an RTD-SRAM and its application to a tag array. Compared to a state-of-the-art tag array protection, the evaluated scheme has comparable error detection and correction strength and, depending on the array dimensions, the access time is reduced by 5% to 18%, energy by 20% to 40% and area up to 30%.

5.6 From applications to circuit layout - Industrial perspectives

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ucSCpqWvJpbHKvgAw

Session chair:
Emil Matus, TU Dresden, DE

Session co-chair:
Nicolas Ventroux, Thales Research and Technology, FR

This session covers a wide range of topics from an avionics use case, over Linux driver validation to automatic layout generation for DRAM processes. The first paper proposes to assess the feasibility of using two graphics-based academic methodologies for GPUs, OpenGL CS 2.0 and Brook Auto/BRASIL, in an industrial use-case critical system (avionics application). The second one shows a method for checking the conformance of a VirtIO driver implementation to its specification using a Clock Constraint Specification Language (CCSL) and a tool to check for violations. The last paper in this session presents a generator script based process independent layout generation for area optimized logic cells in advanced DRAM processes.

Time Label Presentation Title
Authors
08:00 CEST 5.6.1 COMPARISON OF GPU COMPUTING METHODOLOGIES FOR SAFETY-CRITICAL SYSTEMS: AN AVIONICS CASE STUDY
Speaker:
Leonidas Kosmidis, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
Authors:
Marc Benito1, Matina Maria Trompouki2, Leonidas Kosmidis3, Juan David Garcia4, Sergio Carretero4 and Ken Wenger5
1Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 2Universitat Politècnica de Catalunya, ES; 3Barcelona Supercomputing Center (BSC), ES; 4Airbus Defence and Space, ES; 5CoreAVI, CA
Abstract
Introducing advanced functionalities in safety-critical systems requires using more powerful architectures such as GPUs. However software in safety-critical industries is subject to functional certification, which cannot be achieved using standard GPU programming languages such as CUDA and OpenCL. Fortunately, GPUs are already used in certified critical systems for display tasks, using safety-certified solutions such as OpenGL SC 2.0. In this paper, we compare two state-of-the art graphics-based methodologies, OpenGL SC 2.0 and Brook Auto/BRASIL for the implementation of a prototype avionics case study. We evaluate both methods on a realistic industrial setup, composed by an avionics-grade GPU and a safety-certified GPU driver in terms of development metrics and performance, showing their feasibility.
08:15 CEST 5.6.2 VERIFYING THE CONFORMANCE OF A DRIVER IMPLEMENTATION TO THE VIRTIO SPECIFICATION
Speaker:
Matias Ezquiel Vara Larsen, Huawei Research Center, FR
Author:
Matias Vara Larsen, Huawei, FR
Abstract
VirtIO is a specification that enables developers to base on a common interface to implement devices and drivers for virtual environments. This paper proposes the verification and analysis of the VirtIO specification by using the Clock Constraint Specification Language (CCSL). In our proof-of-concept approach, a verification engineer translates requirements from the specification into a CCSL specification. Then, the tool TimeSquare is used to detect inconsistencies with a implementation but also to understand what the specification enables. This paper aims to present the approach and to have face-to-face discussions and debate about the benefits, drawbacks and trade-offs.
08:30 CEST 5.6.3 PROCESS-PORTABLE AND PROGRAMMABLE LAYOUT GENERATION OF DIGITAL CIRCUITS IN ADVANCED DRAM TECHNOLOGIES
Speaker:
Youngbog Yoon, SK Hynix, KR
Authors:
Youngbog Yoon1, Daeyong Han1, Shinho Chu1, Sangho Lee1, Jaeduk Han2 and Junhyun Chun1
1SK Hynix, KR; 2Hanyang University, KR
Abstract
This paper introduces a physical layout design methodology that produces DRC-clean, area-efficient, and programmable layouts of digital circuits in advanced DRAM processes. The proposed methodology automates the layout generation process to enhance design productivity, while still providing rich customization for efficient area and routing resource utilizations. Process-specific parameterized cells (PCells) are combined with process-independent place-and-route functions to automatically generate area-efficient and programmable layouts. Routing grids are optimized to enhance the area and routing efficiency. The proposed method reduced the design time of digital layouts by 80% compared to a manual design with high layout qualities, significantly enhancing the design productivity.

5.7 Machine learning dependability and test

Date: Wednesday, 03 February 2021
Time: 08:00 CEST - 08:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/TZGcZcMg3h6G5fbuj

Session chair:
Rishad Shafik, Newcastle University, GB

Session co-chair:
Georgios Karakonstantis, Queen's University Belfast, GB

This session focuses on reliability of machine learning accelerators and applications. Papers in this session present architectural solutions to improve reliability of different neural network accelerators including DNN and spiking neural networks. Presented solutions aim at improving dependability while minimizing its cost in terms of area and performance.

Time Label Presentation Title
Authors
08:00 CEST 5.7.1 HYDREA: TOWARDS MORE ROBUST AND EFFICIENT MACHINE LEARNING SYSTEMS WITH HYPERDIMENSIONAL COMPUTING
Speaker:
Justin Morris, University of California, San Diego, US
Authors:
Justin Morris1, Kazim Ergun2, Behnam Khaleghi1, Mohsen Imani3, Baris Aksanli4 and Tajana Rosing5
1University of California, San Diego, US; 2University Of California San Diego, US; 3University of California Irvine, US; 4San Diego State University, US; 5UCSD, US
Abstract
Today’s systems, especially in the age of federated learning, rely on sending all the data to the cloud, and then use complex algorithms, such as Deep Neural Networks, which require billions of parameters and many hours to train a model. In contrast, the human brain can do much of this learning effortlessly. Hyperdimensional (HD) Computing aims to mimic the behavior of the human brain by utilizing high dimensional representations. This leads to various desirable properties that other Machine Learning (ML) algorithms lack such as: robustness to noise in the system and simple, highly parallel operations. In this paper, we propose HyDREA, a HD computing system that is Robust, Efficient, and Accurate. To evaluate the feasibility of HyDREA in a federated learning environment with wireless communication noise, we utilize NS-3, a popular network simulator that models a real world environment with wireless communication noise. We found that HyDREA is 48x more robust to noise than other comparable ML algorithms. We additionally propose a Processing-in-Memory (PIM) architecture that adaptively changes the bitwidth of the model based on the signal to noise ratio (SNR) of the incoming sample to maintain the robustness of the HD model while achieving high accuracy and energy efficiency. Our results indicate that our proposed system loses less than 1% classification accuracy, even in scenarios with an SNR of 6.64. Our PIM architecture is also able to achieve 255x better energy efficiency and speed up execution time by 28x compared to the baseline PIM architecture.
08:15 CEST 5.7.2 (Best Paper Award Candidate)
DNN-LIFE: AN ENERGY-EFFICIENT AGING MITIGATION FRAMEWORK FOR IMPROVING THE LIFETIME OF ON-CHIP WEIGHT MEMORIES IN DEEP NEURAL NETWORK HARDWARE ARCHITECTURES
Speaker:
Muhammad Abdullah Hanif, Vienna University of Technology (TU Wien), AT
Authors:
Muhammad Abdullah Hanif1 and Muhammad Shafique2
1Institute of Computer Engineering, Vienna University of Technology, AT; 2New York University Abu Dhabi (NYUAD), AE
Abstract
Negative Biased Temperature Instability (NBTI)-induced aging is one of the critical reliability threats in nano-scale devices. This paper makes the first attempt to study the NBTI aging in the on-chip weight memories (composed of 6T-SRAM cells) of deep neural network (DNN) hardware accelerators, subjected to complex DNN workloads. We propose DNN-Life, a specialized aging-mitigation framework for DNNs, which jointly exploits hardware- and software-level knowledge to improve the lifetime of the DNN weight memory with reduced energy overhead. At the software-level, we analyze the effects of different DNN quantization methods on the distribution of the bits of the weight values. Based on the insights gained from this analysis, we propose a micro-architecture that employs low-cost memory write (and read) transducers to achieve an optimal duty-cycle at run time in the memory cells, thereby balancing the aging of complimentary parts in 6T-SRAM cells of the weight memory. As a result, our DNN-Life framework enables efficient aging mitigation of weight memory of the given DNN hardware at minimal energy overhead during the inference process.
08:30 CEST IP4_6.1 WISER: DEEP NEURAL NETWORK WEIGHT-BIT INVERSION FOR STATE ERROR REDUCTION IN MLC NAND FLASH
Speaker:
Jaehun Jang, SungKyunkwan University, KR
Authors:
Jaehun Jang1 and Jong Hwan Ko2
1Department of Semiconductor and Display Engineering, SungKyunkwan University, Memory Division, Samsung Electronics, KR; 2Sungkyunkwan University (SKKU), KR
Abstract
When Flash memory is used to store the deep neural network (DNN) weights, inference accuracy can degrade due to the Flash memory state errors. To protect the weights from the state errors, the existing methods relied on ECC(Error Correction Code) or parity, which can incur power/storage overhead. In this study, we propose a weight bit inversion method that minimizes the accuracy loss due to the Flash memory state errors without using the ECC or parity. The method first applies WISE(Weight-bit Inversion for State Elimination) that removes the most error-prone state from MLC NAND, thereby improving both the error robustness and the MSB page read speed. If the initial accuracy loss due to weight inversion of WISE is unacceptable, we apply WISER(Weight-bit Inversion for State Error Reduction) that reduces weight mapping to the error-prone state with minimum weight value changes. The simulation results show that after 16K program-erase cycles in NAND Flash, WISER reduces CIFAR-100 accuracy loss by 2.92X for VGG-16 compared to the existing methods.
08:31 CEST IP4_6.2 OR-ML: ENHANCING RELIABILITY FOR MACHINE LEARNING ACCELERATOR WITH OPPORTUNISTIC REDUNDANCY
Speaker:
Zheng Wang, Shenzhen Institutes of Advanced Technology, CN
Authors:
Bo Dong1, Zheng Wang2, Wenxuan Chen3, Chao Chen2, Yongkui Yang2 and Zhibin Yu2
1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; School of Microelectronics, Xidian University, CN; 2Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, CN; 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; School of Microelectronics, Xidian University Country/Region: China (CN), CN
Abstract
Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines' computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Opportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objection classification and detection.
08:32 CEST 5.7.3 NEURON FAULT TOLERANCE IN SPIKING NEURAL NETWORKS
Speaker:
Theofilos Spyrou, Sorbonne Université, CNRS, LIP6, FR
Authors:
Theofilos Spyrou1, Sarah A. El-Sayed1, Engin Afacan1, Luis A. Camuñas-Mesa2, Bernabé Linares-Barranco2 and Haralampos-G. Stratigopoulos1
1Sorbonne Université, CNRS, LIP6, FR; 2Instituto de Microelectrónica de Sevilla (IMSE-CNM), CSIC y Universidad de Sevilla, ES
Abstract
The error-resiliency of Artificial Intelligence (AI) hardware accelerators is a major concern, especially when they are deployed in mission-critical and safety-critical applications. In this paper, we propose a neuron fault tolerance strategy for Spiking Neural Networks (SNNs). It is optimized for low area and power overhead by leveraging observations made from a large-scale fault injection experiment that pinpoints the critical fault types and locations. We describe the fault modeling approach, the fault injection framework, the results of the fault injection experiment, the fault-tolerance strategy, and the fault-tolerant SNN architecture. The idea is demonstrated on two SNNs that we designed for two SNN-oriented datasets, namely the N-MNIST and IBM's DVS128 gesture datasets.

6.8 How STM32 Enables Digital Transformation in Industries

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/D4CJPnhsL2NDCCGbP

Session Chairs:
Marcello Coppola, STMicroelectronics, FR
Antonio Lionetto, STMicroelectronics, IT

Organizer:
Jürgen Haase, edacentrum GmbH, DE

One of the most demanding challenges for European industry is the digital transformation of Startups, SMEs and Midcaps. FED4SAE, a program funded by the EU as part of its Horizon 2020 initiative, facilitates access to leading technologies, competencies, and industrial platforms. The session speakers will demonstrate in a pragmatic way how the STM32 microcontroller and its ecosystem have enabled Bettair, Energica Motor Company, Safecility and Zannini to address technical challenges in Smart City, Building Safety, FIM MotoE™ World Cup and Industry4.0. You will learn more about these success stories and what STM32 really means for:

  • Bettair - who will present full cycle development of a complete solution to monitor in real-time the air quality of urban areas
  • Energica Motor Company - who will share The Energica LPWAN Low Power EV Battery Monitor System used in FIM MotoE™ World Cup
  • Safecility - who will present how to Enhance Building Safety across Europe through IoT Automation
  • Zannini - who will show an integrated system for monitoring and controlling industrial machines.
Time Label Presentation Title
Authors
09:00 CEST 6.8.1 INTRODUCTION: HOW STM32 ENABLES DIGITAL TRANSFORMATION IN INDUSTRIES
Speaker:
Marcello Coppola, STMicroelectronics, FR
09:08 CEST 6.8.2 ENERGICA LPWAN LOW POWER EV BATTERY MONITOR SYSTEM
Speaker:
Giovanni Gherardi, Energica Motor Company, IT
09:26 CEST 6.8.3 AIR-QUALITY MONITORING SYSTEM VIA LORAWAN NETWORK
Speakers:
Leonardo Santiago and Jaume Ribot, Bettair Cities, ES
09:44 CEST 6.8.4 ENHANCING BUILDING SAFETY ACROSS EUROPE THROUGH IOT AUTOMATION BUILT ON STM32: HOW SAFECILITY DELIVER IOT EMERGENCY LIGHTING
Speaker:
Cian O Flaherty, Safecility, IE
10:02 CEST 6.8.5 AN INTEGRATED SYSTEM FOR MONITORING AND CONTROLLING INDUSTRIAL MACHINES
Speaker:
Lorenzo Zampetti, Z4tec, IT

IP4_1 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/fWPCcb8gEwtPBrbnD

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_1.1 HARDWARE ACCELERATION OF FULLY QUANTIZED BERT FOR EFFICIENT NATURAL LANGUAGE PROCESSING
Speaker:
Zejian Liu, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, CN
Authors:
Zejian Liu1, Gang Li2 and Jian Cheng1
1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, CN; 2Institute of Automation, Chinese Academy of Sciences, CN
Abstract
BERT is the most recent Transformer-based model that achieves state-of-the-art performance in various NLP tasks. In this paper, we investigate the hardware acceleration of BERT on FPGA for edge computing. To tackle the issue of huge computational complexity and memory footprint, we propose to fully quantize the BERT (FQ-BERT), including weights, activations, softmax, layer normalization, and all the intermediate results. Experiments demonstrate that the FQ-BERT can achieve 7.94× compression for weights with negligible performance loss. We then propose an accelerator tailored for the FQ-BERT and evaluate on Xilinx ZCU102 and ZCU111 FPGA. It can achieve a performance-per-watt of 3.18 fps/W, which is 28.91× and 12.72× over Intel(R) Core(TM) i7-8700 CPU and NVIDIA K80 GPU, respectively.
IP4_1.2 AXPIKE: INSTRUCTION-LEVEL INJECTION AND EVALUATION OF APPROXIMATE COMPUTING
Speaker:
Isaias Felzmann, University of Campinas, BR
Authors:
Isaías Bittencourt Felzmann1, João Fabrício Filho2 and Lucas Wanner3
1University of Campinas, BR; 2Unicamp/UTFPR, BR; 3Unicamp, BR
Abstract
Representing the interaction between accurate and approximate hardware modules at the architecture level is essential to understand the impact of Approximate Computing in a general-purpose computing scenario. However, extensive effort is required to model approximations into a baseline instruction level simulator and collect its execution metrics. In this work, we present the AxPIKE ISA simulation environment, a tool that allows designers to inject models of hardware approximation at the instruction level and evaluate their impact on the quality of results. AxPIKE embeds a high-level representation of a RISC-V system and produces a dedicated control mechanism, that allows the simulated software to manage the approximate behavior of compatible execution scenarios. The environment also provides detailed execution statistics that are forwarded to dedicated tools for energy accounting. We apply the AxPIKE environment to inject integer multiplication and memory access approximations into different applications and demonstrate how the generated statistics are translated into energy-quality trade-offs.

IP4_2 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/YyAdz2Y4a4M3EQPKY

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_2.1 THERMAL COMFORT AWARE ONLINE ENERGY MANAGEMENT FRAMEWORK FOR A SMART RESIDENTIAL BUILDING
Speaker:
Daichi Watari, Osaka University, JP
Authors:
Daichi Watari1, Ittetsu Taniguchi1, Francky Catthoor2, Charalampos Marantos3, Kostas Siozios4, Elham Shirazi5, Dimitrios Soudris3 and Takao Onoye1
1Osaka University, JP; 2imec, KU Leuven, BE; 3National TU Athens, GR; 4Aristotle University of Thessaloniki, GR; 5imec, KU Leuven, EnergyVille, BE
Abstract
Energy management in buildings equipped with renewable energy is vital for reducing electricity costs and maximizing occupant comfort. Despite several studies on the scheduling of appliances, a battery, and heating, ventilating, and air-conditioning (HVAC), there is a lack of a comprehensive and time-scalable approach that integrates predictive information such as renewable generation and thermal comfort. In this paper, we propose an online energy management framework to incorporate the optimal energy scheduling and prediction model of PV generation and thermal comfort by the model predictive control (MPC) approach. The energy management problem is formulated as coordinated three optimization problems covering a fast and slow time-scale. This reduces the time complexity without a significant negative impact on the global nature and quality of the result. Experimental results show that the proposed framework achieves optimal energy management that takes into account the trade-off between the electricity bill and thermal comfort.
IP4_2.2 ONLINE LATENCY MONITORING OF TIME-SENSITIVE EVENT CHAINS IN SAFETY-CRITICAL APPLICATIONS
Speaker:
Jonas Peeck, TU Braunschweig, Institute of Computer and Network Engineering, DE
Authors:
Jonas Peeck, Johannes Schlatow and Rolf Ernst, TU Braunschweig, DE
Abstract
Highly-automated driving involves chains of perception, decision, and control functions. These functions involve data-intensive algorithms that motivate the use of a data-centric middleware and a service-oriented architecture. As an example we use the open-source project Autoware.Auto. The function chains define a safety-critical automated control task with weakly-hard real-time constraints. However, providing the required assurance by formal analysis is challenged by the complex hardware/software structure of these systems and their dynamics. We propose an approach that combines measurement, suitable distribution of deadline segments, and application-level online monitoring that serves to supervise the execution of service-oriented software systems with multiple function chains and weakly-hard real-time constraints. We use DDS as middleware and apply it to an Autoware.Auto use case.

IP4_3 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/2nAZpfhAejudsEauZ

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_3.1 FEEDING THREE BIRDS WITH ONE SCONE: A GENERIC DUPLICATION BASED COUNTERMEASURE TO FAULT ATTACKS
Speaker:
Jakub Breier, Silicon Austria Labs, AT
Authors:
Anubhab Baksi1, Shivam Bhasin2, Jakub Breier3, Anupam Chattopadhyay4 and Vinay B. Y. Kumar1
1Nanyang Technological University, Singapore, SG; 2Temasek Laboratories, Nanyang Technological University, SG; 3Silicon Austria Labs, AT; 4Nanyang Technological University, SG
Abstract
In the current world of the Internet-of-things and edge computing, computations are increasingly performed locally on small connected systems. As such, those devices are often vulnerable to adversarial physical access, enabling a plethora of physical attacks which is a challenge even if such devices are built for security. As cryptography is one of the cornerstones of secure communication among devices, the pertinence of fault attacks is becoming increasingly apparent in a setting where a device can be easily accessed in a physical manner. In particular, two recently proposed fault attacks, Statistical Ineffective Fault Attack (SIFA) and the Fault Template Attack (FTA) are shown to be formidable due to their capability to bypass the common duplication based countermeasures. Duplication based countermeasures, deployed to counter the Differential Fault Attack (DFA), work by duplicating the execution of the cipher followed by a comparison to sense the presence of any effective fault, followed by an appropriate recovery procedure. While a handful of countermeasures are proposed against SIFA, no such countermeasure is known to thwart FTA to date. In this work, we propose a novel countermeasure based on duplication, which can protect against both SIFA and FTA. The proposal is also lightweight with only a marginally additional cost over simple duplication based countermeasures. Our countermeasure further protects against all known variants of DFA, including Selmke, Heyszl, Sigl’s attack from FDTC 2016. It does not inherently leak side-channel information and is easily adaptable for any symmetric key primitive. The validation of our countermeasure has been done through gate-level fault simulation.
IP4_3.2 SIDE-CHANNEL ATTACK ON RAINBOW POST-QUANTUM SIGNATURE
Speaker:
Petr Socha, Czech TU in Prague, CZ
Authors:
David Pokorný, Petr Socha and Martin Novotný, Czech TU in Prague, CZ
Abstract
Rainbow, a layered multivariate quadratic digital signature, is a candidate for standardization in a competition-like process organized by NIST. In this paper, we present a CPA side-channel attack on the submitted 32-bit reference implementation. We evaluate the attack on an STM32F3 ARM microcontroller, successfully revealing the full private key. Furthermore, we propose a simple masking scheme with minimum overhead.

IP4_4 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/YfkeNJuv3YFTbt9i8

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_4.1 BLOOMCA: A MEMORY EFFICIENT RESERVOIR COMPUTING HARDWARE IMPLEMENTATION USING CELLULAR AUTOMATA AND ENSEMBLE BLOOM FILTER
Speaker:
Dehua Liang, Graduate School of Information Science and Technology, Osaka University, JP
Authors:
Dehua Liang, Masanori Hashimoto and Hiromitsu Awano, Osaka University, JP
Abstract
In this work, we propose a BloomCA which utilizes cellular automata (CA) and ensemble Bloom filter to organize an RC system by using only binary operations, which is suitable for hardware implementation. The rich pattern dynamics created by CA can map the input into high-dimensional space and provide more features for the classifier. Utilizing the ensemble Bloom filter as the classifier, the features can be memorized effectively. Our experiment reveals that applying the ensemble mechanism to Bloom filter endues a significant reduction in inference memory cost. Comparing with the state-of-the-art reference, the BloomCA achieves a 43x reduction for memory cost without hurting the accuracy. Our hardware implementation also demonstrates that BloomCA achieves over 21x and 43.64% reduction in area and power, respectively.
IP4_4.2 APPROXIMATE COMPUTATION OF POST-SYNAPTIC SPIKES REDUCES BANDWIDTH TO SYNAPTIC STORAGE IN A MODEL OF CORTEX
Speaker:
Yu Yang, KTH Royal Institute of Technology, SE
Authors:
Dimitrios Stathis1, Yu Yang2, Ahmed Hemani3 and Anders Lansner4
1KTH Royal Institute of Technology, SE; 2Royal Institute of Technology - KTH, SE; 3KTH - Royal Institute of Technology, SE; 4Stockholm University and KTH Royal Institute of Technology, SE
Abstract
The Bayesian Confidence Propagation Neural Network (BCPNN) is a spiking model of the cortex. The synaptic weights of BCPNN are organized as matrices. They require substantial synaptic storage and a large bandwidth to it. The algorithm requires a dual access pattern to these matrices, both row-wise and column-wise, to access its synaptic weights. In this work, we exploit an algorithmic optimization that eliminates the column-wise accesses. The new computation model approximates the post-synaptic spikes computation with the use of a predictor. We have adopted this approximate computational model to improve upon the previously reported ASIC implementation, called eBrainII. We also present the error analysis of the approximation to show that it is negligible. The reduction in storage and bandwidth to the synaptic storage results in a 48% reduction in energy compared to eBrainII. The reported approximation method also applies to other neural network models based on a Hebbian learning rule.

IP4_5 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/aTD3XZgEET2TWjiKy

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_5.1 MEMORY HIERARCHY CALIBRATION BASED ON REAL HARDWARE IN-ORDER CORES FOR ACCURATE SIMULATION
Speaker:
Quentin Huppert, LIRMM, FR
Authors:
Quentin Huppert1, Timon Evenblij2, Manu Komalan2, Francky Catthoor3, Lionel Torres4 and David Novo5
1Lirmm, FR; 2imec, BE; 3IMEC, BE; 4University of Montpellier, FR; 5CNRS, LIRMM, University of Montpellier, FR
Abstract
Computer system simulators are major tools used by architecture researchers. Two key elements play a role in the credibility of simulator results: (1) the simulator’s accuracy, and (2) the quality of the baseline architecture. Some simulators, such as gem5, already provide highly accurate parameterized models. However, finding the right values for all these parameters to faithfully model a real architecture is still a problem. In this paper, we calibrate the memory hierarchy of an in-order core gem5 simulation to accurately model a real mobile Arm SoC. We execute small programs, which are designed to stress specific parts of the memory system, to deduce key parameter values for the model. We compare the execution of SPEC CPU2006 benchmarks on the real hardware with the gem5 simulation. Our results show that our calibration reduces the average and worst-case IPC error by 36% and 50%, respectively, when compared with a gem5 simulation configured with the default parameters.
IP4_5.2 SPRITE: SPARSITY-AWARE NEURAL PROCESSING UNIT WITH CONSTANT PROBABILITY OF INDEX-MATCHING
Speaker:
Sungju Ryu, POSTECH, KR
Authors:
Sungju Ryu1, Youngtaek Oh2, Taesu Kim1, Daehyun Ahn1 and jae-joon kim3
1Pohang University of Science and Technology, KR; 2Pohang university of science and technology, KR; 3postech, KR
Abstract
Sparse neural networks are widely used for memory savings. However, irregular indices of non-zero input activations and weights tend to degrade the overall system performance. This paper presents a scheme to maintain constant probability of index-matching for weight and input over a wide range of sparsity overcoming a critical limitation in previous works. A sparsity-aware neural processing unit based on the proposed scheme improves the system performance up to 6.1X compared to previous sparse convolutional neural network hardware accelerators.

IP4_6 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/sQdNyqetXmw6iE6R9

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP4_6.1 WISER: DEEP NEURAL NETWORK WEIGHT-BIT INVERSION FOR STATE ERROR REDUCTION IN MLC NAND FLASH
Speaker:
Jaehun Jang, SungKyunkwan University, KR
Authors:
Jaehun Jang1 and Jong Hwan Ko2
1Department of Semiconductor and Display Engineering, SungKyunkwan University, Memory Division, Samsung Electronics, KR; 2Sungkyunkwan University (SKKU), KR
Abstract
When Flash memory is used to store the deep neural network (DNN) weights, inference accuracy can degrade due to the Flash memory state errors. To protect the weights from the state errors, the existing methods relied on ECC(Error Correction Code) or parity, which can incur power/storage overhead. In this study, we propose a weight bit inversion method that minimizes the accuracy loss due to the Flash memory state errors without using the ECC or parity. The method first applies WISE(Weight-bit Inversion for State Elimination) that removes the most error-prone state from MLC NAND, thereby improving both the error robustness and the MSB page read speed. If the initial accuracy loss due to weight inversion of WISE is unacceptable, we apply WISER(Weight-bit Inversion for State Error Reduction) that reduces weight mapping to the error-prone state with minimum weight value changes. The simulation results show that after 16K program-erase cycles in NAND Flash, WISER reduces CIFAR-100 accuracy loss by 2.92X for VGG-16 compared to the existing methods.
IP4_6.2 OR-ML: ENHANCING RELIABILITY FOR MACHINE LEARNING ACCELERATOR WITH OPPORTUNISTIC REDUNDANCY
Speaker:
Zheng Wang, Shenzhen Institutes of Advanced Technology, CN
Authors:
Bo Dong1, Zheng Wang2, Wenxuan Chen3, Chao Chen2, Yongkui Yang2 and Zhibin Yu2
1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; School of Microelectronics, Xidian University, CN; 2Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, CN; 3Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; School of Microelectronics, Xidian University Country/Region: China (CN), CN
Abstract
Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines' computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Opportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objection classification and detection.

UB.11 University Booth

Date: Wednesday, 03 February 2021
Time: 09:00 CEST - 09:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/SHkuEzh2qyHwnwadk

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.11 CATANIS: CAD TOOL FOR AUTOMATIC NETWORK SYNTHESIS
Speaker:
Davide Quaglia, University of Verona, IT
Authors:
Davide Quaglia and Enrico Fraccaroli, University of Verona, IT
Abstract
The proliferation of communication technologies for embedded systems opened the way for new applications, e.g., Smart Cities and Industry 4.0. In such applications hundreds or thousands of smart devices interact together through different types of channels and protocols. This increasing communication complexity forces computer-aided design methodologies to scale up from embedded systems in isolation to the global inter-connected system.
This booth will demonstrate the functionality of a graphic tool for automatic network synthesis developed in Python and QT to be lightweight and cross-platform. It allows to graphically specify the communication requirements of the application as a set of interacting tasks, the constraints of the environment (e.g., its map can be considered) together with a library of node types and communication protocols to be used.
/system/files/webform/tpc_registration/UB.11-CATANIS.pdf

6.1 CPS-related sensors, platforms and software

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/m88hQeddDnmm25PXz

Session chair:
Michael Huebner, Brandenburg University of Technology, DE

Session co-chair:
Rainer Kress, Infineon, DE

Organizers:
Enrico Macii, Politecnico di Torino, IT
Frank Schirrmeister, Cadence, US

Sensors, platforms and software are key enabling components for the implementation of the I4.0 paradigm. This session starts by offering an embedded tutorial in which the use of micro sensors and sensor nodes is practically demonstrated in the context of the "Innovation Campus Electronics and Micro Sensor Technologies Cottbus” (iCampμs). Next, the focus is on the software systems that are in use in the industrial and automotive domains, by discussing issues and solutions for the current move from monolithic and tightly integrated run-time environments to virtual platforms implemented on fewer domain computers with heterogeneous physical architectures. Finally, the last contribution addresses the need of mastering the variability of functional control software, such as that used in the context of Industry 4.0, and it illustrates the challenges in documenting the dependencies of different software parts, including their variability using family models.

Time Label Presentation Title
Authors
09:30 CEST 6.1.1 ICAMPμS: DEVELOPMENT AND TRANSFER PLATFORM FOR INTEGRATED MICROSENSOR TECHNOLOGIES IN A CONNECTED WORLD
Speaker and Author:
Harald Schenk, Fraunhofer IPMS, DE
Abstract
The “Innovation Campus Electronics and Micro Sensor Technologies Cottbus” (iCampμs) in Germany focuses on the development of micro sensors and sensor nodes to provide solutions for a wide range of specific applications. iCampμs places special emphasis on providing multi-purpose technology platforms to lower entry barriers in particular for the needs of small and medium sized companies. The tutorial provides inside into strategy, set-up and objectives of iCampμs. Thereby, development activities including its expected impact will be presented by means of various example applications. This comprises e.g. mobile ultra-low-power radar and micro machined ultrasound technologies for medical and industrial applications as well as merging of sensors with AI for advanced predictive maintenance.
10:00 CEST 6.1.2 EFFICIENT RUN-TIME ENVIRONMENTS FOR SYSTEM-LEVEL LET PROGRAMMING
Speaker:
Rolf Ernst, TU Braunschweig, DE
Authors:
Kai-Björn Gemlau, Leonie Köhler and Rolf Ernst, TU Braunschweig, DE
Abstract
Growing requirements of large industrial and automotive software systems have initiated an ongoing move from monolithic and tightly integrated run-time environments (RTE) to virtual platforms implemented on fewer domain computers with heterogeneous physical architectures. This trend has given rise to new programming paradigms to enable specification, implementation and supervision of software systems that are predictable and robust under interference and change. One of those paradigms, the Logical Execution Time (LET), is now part of the automotive software standard, AUTOSAR. While originally applied to single shared-memory multicore processors, System-level LET (SL-LET) extends this approach to virtual and distributed platforms providing a powerful paradigm for CPS in future industrial systems. This contribution explains and demonstrates the resulting challenges to the RTE and the opportunities to improve its efficiency, in particular the communication stack.
10:15 CEST 6.1.3 MANAGING VARIABILITY AND REUSE OF EXTRA-FUNCTIONAL CONTROL SOFTWARE IN CPPS
Speaker:
Birgit Vogel-Heuser, TU Munich, DE
Authors:
Birgit Vogel-Heuser1, Juliane Fischer1, Dieter Hess2,  Eva-Maria Neumann1 and Marcus Würr3
1TU Munich, DE; 2CODESYS GmbH, DE; 3Schneider Electric Automation GmbH, DE
Abstract
Cyber-Physical Production Systems (CPPS) are long-living and variant-rich systems. Challenges and trends in the context of Industry 4.0 such as a high degree of customization, evolution, and different disciplines involved, e.g., mechanics, electrics/electronics and software, cause a high amount of variability. Mastering the variability of functional control software, e.g., different control variants of an actuator type, is itself a challenge in the development and reuse of CPPS software. This task becomes even more complex when considering the variability of human-machine interface (HMI) software and extra-functional software such as operating modes, diagnosis or fault handling. Moreover, the interdependencies between functional, extra-functional and HMI software pose an additional challenge for variability management and the planned reuse of these software parts. This paper illustrates the challenges in documenting the dependencies of these software parts including their variability using family models. Additionally, the current state-of-practice in industry, which was derived in questionnaires and interview studies, is shown and compared to the potential of increasing the software’s reusability and thus its flexibility in the context of Industry 4.0 by concepts using the object-oriented extension of IEC 61131-3.
10:30 CEST 6.1.4 LIVE JOINT Q&A
Authors:
Michael Huebner1, SCHENK HARALD2, Rolf Ernst3 and Birgit Vogel-Heuser4
1Brandenburg TU Cottbus, DE; 2Brandenburg University of Technology Cottbus-Senftenberg, DE; 3TU Braunschweig, DE; 4TU Munich, DE
Abstract
30 minutes of live joint question and answer time for interaction among speakers and audience.

6.3 Security in Machine Learning

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/JA4mvRfYewc6bgasL

Session chair:
Ioana Vatajelu, TIMA/CNRS, FR

Session co-chair:
Elif Bilge Kavun, University of Passau, DE

This session deals with security problems of neural networks in machine learning as well as using neural networks for security applications. Novel methods against adversarial attacks targeting neural networks are presented, ranging from detection of weight attacks and fault injection to new mitigation techniques using inherent structural parameters. Furthermore, using neural networks to ensure trust in manufacturing is also covered in the session.

Time Label Presentation Title
Authors
09:30 CEST 6.3.1 SECURING DEEP SPIKING NEURAL NETWORKS AGAINST ADVERSARIAL ATTACKS THROUGH INHERENT STRUCTURAL PARAMETERS
Speaker:
Ihsen Alouani, Université Polytechnique Hauts-De-France, FR
Authors:
Rida El-Allami1, Alberto Marchisio2, Muhammad Shafique3 and Ihsen Alouani4
1INSA Hauts-de-France, Université Polytechnique Hauts-de-France, FR; 2TU Wien (TU Wien), AT; 3New York University Abu Dhabi (NYUAD), AE; 4IEMN lab, INSA Hauts-de-France, Université Polytechnique Hauts-De-France, FR
Abstract
Deep Learning (DL) algorithms have gained popularity owing to their practical problem-solving capacity. However, they suffer from a serious integrity threat, i.e., their vulnerability to adversarial attacks. In the quest for DL trustworthiness, recent works claimed the inherent robustness of Spiking Neural Networks (SNNs) to these attacks, without considering the variability in their structural spiking parameters. This paper explores the security enhancement of SNNs through internal structural parameters. Specifically, we investigate the SNNs robustness to adversarial attacks with different values of the neuron's firing voltage thresholds and time window boundaries. We thoroughly study SNNs security under different adversarial attacks in the strong white-box setting, with different noise budgets and under variable spiking parameters. Our results show a significant impact of the structural parameters on the SNNs' security, and promising sweet spots can be reached to design trustworthy SNNs with 85% higher robustness than a traditional non-spiking DL system. To the best of our knowledge, this is the first work that investigates the impact of structural parameters on SNNs robustness to adversarial attacks. The proposed contributions and the experimental framework is available online to the community for reproducible research.
09:45 CEST 6.3.2 GNNUNLOCK: GRAPH NEURAL NETWORKS-BASED ORACLE-LESS UNLOCKING SCHEME FOR PROVABLY SECURE LOGIC LOCKING
Speaker:
Lilas Alrahis, Khalifa University, AE
Authors:
Lilas Alrahis1, Satwik Patnaik2, Faiq Khalid3, Muhammad Abdullah Hanif4, Hani Saleh1, Muhammad Shafique5 and Ozgur Sinanoglu6
1Khalifa University, AE; 2NEW YORK UNIVERSITY, US; 3TU Wien, AT; 4Institute of Computer Engineering, Vienna University of Technology, AT; 5New York University Abu Dhabi (NYUAD), AE; 6New York University Abu Dhabi, AE
Abstract
Logic locking is a holistic design-for-trust technique that aims to protect the design intellectual property (IP) from untrustworthy entities throughout the supply chain. Functional and structural analysis-based attacks successfully circumvent state-of-the-art, provably secure logic locking (PSLL) techniques. However, such attacks are not holistic and target specific implementations of PSLL. Automating the detection and subsequent removal of protection logic added by PSLL while accounting for all possible variations is an open research problem. In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on PSLL that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, without requiring an oracle. This approach fits perfectly with the targeted problem since a circuit is a graph with an inherent structure and the protection logic is a sub-graph of nodes (gates) with specific and common characteristics. GNNs are powerful in capturing the nodes' neighborhood properties, facilitating the detection of the protection logic. To rectify any misclassifications induced by the GNN, we additionally propose a connectivity analysis-based post-processing algorithm to successfully remove the predicted protection logic, thereby retrieving the original design. Our extensive experimental evaluation demonstrates that GNNUnlock is 99.24%-100% successful in breaking various benchmarks locked using stripped-functionality logic locking [1], tenacious and traceless logic locking [2], and Anti-SAT [3]. Our proposed post-processing enhances the detection accuracy, reaching 100% for all of our tested locked benchmarks. Analysis of the results corroborates that GNNUnlock is powerful enough to break the considered schemes under different parameters, synthesis settings, and technology nodes. The evaluation further shows that GNNUnlock successfully breaks corner cases where even the most advanced state-of-the-art attacks [4], [5] fail. We also open source our attack framework [6].
10:00 CEST IP5_5.1 RUNTIME FAULT INJECTION DETECTION FOR FPGA-BASED DNN EXECUTION USING SIAMESE PATH VERIFICATION
Speaker:
Xianglong Feng, Rutgers University, US
Authors:
Xianglong Feng, Mengmei Ye, Ke Xia and Sheng Wei, Rutgers University, US
Abstract
Deep neural networks (DNNs) have been deployed on FPGAs to achieve improved performance, power efficiency, and design flexibility. However, the FPGA-based DNNs are vulnerable to fault injection attacks that aim to compromise the original functionality. The existing defense methods either duplicate the models and check the consistency of the results at runtime, or strengthen the robustness of the models by adding additional neurons. However, these existing methods could introduce huge overhead or require retraining the models. In this paper, we develop a runtime verification method, namely Siamese path verification (SPV), to detect fault injection attacks for FPGA-based DNN execution. By leveraging the computing features of the DNN and designing the weight parameters, SPV adds neurons to check the integrity of the model without impacting the original functionality and, therefore, model retraining is not required. We evaluate the proposed SPV approach on Xilinx Virtex-7 FPGA using the MNIST dataset. The evaluation results show that SPV achieves the security goal with low overhead.
10:01 CEST 6.3.3 RADAR: RUN-TIME ADVERSARIAL WEIGHT ATTACK DETECTION AND ACCURACY RECOVERY
Speaker:
Jingtao Li, Arizona State University, US
Authors:
Jingtao Li1, Adnan Siraj Rakin2, Zhezhi He3, Deliang Fan2 and Chaitali Chakrabarti2
1School of Electrical, Computer and Energy Engineering, Arizona State University, US; 2Arizona State University, US; 3Department of ECE, Arizona State University, US
Abstract
Adversarial attacks on Neural Network weights, such as the progressive bit-flip attack (PBFA), can cause a catastrophic degradation in accuracy by flipping a very small number of bits. Furthermore, PBFA can be conducted at run time on the weights stored in DRAM main memory. In this work, we propose RADAR, a Run-time adversarial weight Attack Detection and Accuracy Recovery scheme to protect DNN weights against PBFA. We organize weights that are interspersed in a layer into groups and employ a checksum-based algorithm on weights to derive a 2-bit signature for each group. At run time, the 2-bit signature is computed and compared with the securely stored golden signature to detect the bit-flip attacks in a group. After successful detection, we zero out all the weights in a group to mitigate the accuracy drop caused by malicious bit-flips. The proposed scheme is embedded in the inference computation stage. For the ResNet-18 ImageNet model, our method can detect 9.6 bit-flips out of 10 on average. For this model, the proposed accuracy recovery scheme can restore the accuracy from below 1% caused by 10 bit flips to above 69%. The proposed method has extremely low time and storage overhead. System-level simulation on gem5 shows that RADAR only adds <1% to the inference time, making this scheme highly suitable for run-time attack detection and mitigation.

6.4 SSD Storage and Predictable Cache Coherence

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/A5W4iARtkkgzZxyep

Session chair:
David Novo, LIRMM University of Montpellier, FR

Session co-chair:
Jordi Cortadella, Universitat Politecnica de Catalunya, ES

This session explores solutions for efficient storage in Solid State Drives (SSD) and predictable coherence protocols in multicore cache memory systems. The first paper presents a data structure for key-value pairs that is optimized for SSDs and that introduces three different merge strategies for log-structured merge trees. The second paper proposes a method to eliminate duplicate writes induced by write-ahead logging on SSDs with minimal overhead. Finally, the third paper introduces an approach to automatically synthesize predictable high-performance cache coherence protocols from a high-level domain-specific language specification.

Time Label Presentation Title
Authors
09:30 CEST 6.4.1 PTIERDB: BUILDING BETTER READ-WRITE COST BALANCED KEY-VALUE STORES FOR SMALL DATA ON SSD
Speaker:
Li Liu, Huazhong University of Science and Technology, CN
Authors:
Li Liu and Ke Zhou, Huazhong University of Science and Technology, CN
Abstract
The popular Log-Structured Merge (LSM) tree based Key-Value (KV) stores make trade-offs between write cost and read cost via different merge policies, i.e., leveling and tiering. It has been widely documented that leveling severely hampers write throughput, while tiering hampers read throughput. The characteristics of modern workloads are seriously challenging LSM-tree based KV stores for high performance and high scalability on SSDs. In this work, we present PTierDB, an LSM-tree based KV store that strikes the better balance between read cost and write cost for small data on SSD via an adaptive tiering principle and three merge policies in the LSM-tree, leveraging both the sequential and random performance characteristics of SSDs. Adaptive tiering introduces two merge principles: prefix-based data split which bounds the lookup cost and coexisted merge and move which reduces data merging. Based on adaptive tiering, three merge policies make decisions to merge-sort or move data during the merging processes for different levels. We demonstrate the advantages of PTierDB with both microbenchmarks and YCSB workloads. Experimental results show that, compared with state-of-the-art KV stores and the KV implementations with popular merge policies, PTierDB achieves a better balance between read cost and write cost, and yields up to 2.5x improvement in the performance and 50% reduction of write amplification.
09:45 CEST 6.4.2 SW-WAL: LEVERAGING ADDRESS REMAPPING OF SSDS TO ACHIEVE SINGLE-WRITE WRITE-AHEAD LOGGING
Speaker:
Qiulin Wu, Huazhong University of Science and Technology, CN
Authors:
Qiulin Wu, You Zhou, Fei Wu, Ke Wang, Hao Lv, Jiguang Wan and Changsheng Xie, Huazhong University of Science and Technology, CN
Abstract
Write-ahead logging (WAL) has been widely used to provide transactional atomicity in databases, such as SQLite and MySQL/InnoDB. However, the WAL introduces duplicate writes, where changes are recorded in the WAL file and then written to the database file, called checkpointing writes. On the other hand, NAND flash-based SSDs, which have an inherent indirection software layer, called flash translation layer (FTL), become commonplace in modern storage systems. Innovative SSD designs have been proposed to eliminate the WAL overheads by exploiting the FTL, such as providing an atomic write interface or utilizing its address remapping. However, these designs introduce significant performance overheads of maintaining and persisting extra transactional information to guarantee the transactional atomicity or mapping consistency. In this paper, we propose single-write WAL (SW-WAL), a novel cross-layer design, to eliminate WAL-induced duplicate writes on SSDs with minimal overheads. The SSD exposes an address remapping interface to the host, through which the checkpointing writes can be completed without conducting real data writes. To ensure the transactional atomicity and mapping consistency, we make the SSD aware of the transactional writes to the WAL file. Specifically, when transactional data are written to the WAL file, both transactional and mapping semantics are delivered from the host to the SSD and persisted in relevant flash pages as housekeeping metadata without any extra overheads. We implement a prototype of SW-WAL, which runs a popular database SQLite on an emulated NVMe SSD. Experimental results show that SW-WAL improves the database performance by up to 62% compared with original SQLite that bears the WAL overheads and up to 32% compared with the state-of-the-art design that eliminates the WAL overheads.
10:00 CEST IP5_3.1 M2H: OPTIMIZING F2FS VIA MULTI-LOG DELAYED WRITING AND MODIFIED SEGMENT CLEANING BASED ON DYNAMICALLY IDENTIFIED HOTNESS
Speaker:
Lihua Yang, Huazhong University of Science and Technology, CN
Authors:
Lihua Yang, Zhipeng Tan, Fang Wang, Shiyun Tu and Jicheng Shao, Huazhong University of Science and Technology, CN
Abstract
With the widespread use of flash memory from mobile devices to large data centers, flash friendly file system (F2FS) designed for flash memory features has become popular. However, F2FS suffers from severe cleaning overhead due to its logging scheme writes. Mixed storage of data with different hotness in the file system aggravates segment cleaning. We propose {m}ulti-log delayed writing and {m}odified segment cleaning based on dynamically identified {h}otness (M2H). M2H defines the hotness by the file block update distance and uses the K-means clustering to identify hotness accurately for dynamic access patterns. Based on fine-grained hotness, we design multi-log delayed writing and modify the selection and release of the victim segment. The hotness metadata cache is used to reduce overheads induced by hotness metadata management and clustering calculations. Compared with the existing strategy of F2FS, migrated blocks of segment cleaning in M2H reduce by 36.05% to 36.51% and the file system bandwidth increases by 69.52% to 70.43% cumulatively.
10:01 CEST IP5_3.2 CHARACTERIZING AND OPTIMIZING EDA FLOWS FOR THE CLOUD
Speaker:
Abdelrahman Hosny, Brown University, US
Authors:
Abdelrahman Hosny and Sherief Reda, Brown University, US
Abstract
Cloud computing accelerates design space exploration in logic synthesis, and parameter tuning in physical design. However, deploying EDA jobs on the cloud requires EDA teams to deeply understand the characteristics of their jobs in cloud environments. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we formulate the problem of migrating EDA jobs to the cloud. First, we characterize the performance of four main EDA applications, namely: synthesis, placement, routing and static timing analysis. We show that different EDA jobs require different machine configurations. Second, using observations from our characterization, we propose a novel model based on Graph Convolutional Networks to predict the total runtime of a given application on different machine configurations. Our model achieves a prediction accuracy of 87%. Third, we develop a new formulation for optimizing cloud deployments in order to reduce deployment costs while meeting deadline constraints. We present a pseudo-polynomial optimal solution using a multi-choice knapsack mapping that reduces costs by 35.29%.
10:02 CEST 6.4.3 AUTOMATED SYNTHESIS OF PREDICTABLE AND HIGH-PERFORMANCE CACHE COHERENCE PROTOCOLS
Speaker:
Anirudh Kaushik, University of Waterloo, CA
Authors:
Anirudh Kaushik and Hiren Patel, University of Waterloo, CA
Abstract
We present SYNTHIA, an open and automated tool for synthesizing predictable and high-performance cache co- herence protocols for multi-core processors in multi-processor system-on-chips (MPSoCs) deployed in real-time systems. SYNTHIA automates the complex analysis associated with designing predictable and high-performance cache coherence protocols, and constructs new states (transient states) and corresponding transitions that achieve predictability and performance. We use SYNTHIA to construct complete protocol implementations from simple specifications of common protocols (MSI, MESI, and MOESI protocols). We validated the correctness, predictability, and performance guarantees of the generated protocol implementations from SYNTHIA using manually implemented versions, and a micro-architectural simulator.

6.5 Applications on Reconfigurable Systems

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/4c8rrPJKfcwYLahCt

Session chair:
Marco Santambrogio, Politecnico Di Milano, IT

Session co-chair:
Thomas Preußer, Accemic Technologies, US

This session explores a variety of applications and their mapping onto reconfigurable systems, including protein back translation and dense SLAM, along with a mapping approach for novel via-switch based FPGAs. Two IPs discuss key value store and big data serialization on FPGAs.

Time Label Presentation Title
Authors
09:30 CEST 6.5.1 FPGA ACCELERATION OF PROTEIN BACK-TRANSLATION AND ALIGNMENT
Speaker:
Sahand Salamat, University of California San Diego, US
Authors:
Sahand Salamat1, Jaeyoung Kang2, Yeseong Kim3, Mohsen Imani2, Niema Moshiri1 and Tajana Rosing4
1University of California, San Diego, US; 2University of California San Diego, US; 3DGIST, KR; 4UCSD, US
Abstract
Identifying genome functionality changes our understanding of humans and helps us in disease diagnosis; as well as drug, bio-material, and genetic engineering of plants and animals. Comparing the structure of the protein sequences, when only sequence information is available, against a database with known functionality helps us to identify and recognize the functionality of the unknown sequence. The process of predicting the possible RNA sequence that a specific protein has originated from is called back-translation. Aligning the back-translated RNA sequence against the database locates the most similar sequences, which is used to predict the functionality of the unknown protein sequence. Providing massive parallelism, FPGAs can accelerate bioinformatics applications substantially. In this paper, we propose, FabP, an optimized FPGA-based accelerator for aligning a back-translated protein sequence against a database of DNA/RNA sequences. FabP is deeply optimized to fully utilize the FPGA resources and the DRAM memory bandwidth to maximize the performance.
09:45 CEST 6.5.2 (Best Paper Award Candidate)
FPGA ARCHITECTURES FOR APPROXIMATE DENSE SLAM COMPUTING
Speaker:
Maria Rafaela Gkeka, University of Thessaly, GR
Authors:
Maria-Rafaela Gkeka, Alexandros Patras, Christos D. Antonopoulos, Spyros Lalis and Nikolaos Bellas, University of Thessaly, GR
Abstract
Simultaneous Localization and Mapping (SLAM) is the problem of constructing and continuously updating a map of an unknown environment while keeping track of an agent’s trajectory within this environment. SLAM is widely used in robotics, navigation, and odometry for augmented and virtual reality. In particular, dense SLAM algorithms construct and update the map at pixel granularity at a very high computational and energy cost especially when operating under real-time constraints. Dense SLAM algorithms can be approximated, however care must be taken to ensure that these approximations do not prevent the agent from navigating correctly in the environment. Our work introduces and evaluates a plethora of embedded MPSoC FPGA designs for KinectFusion (a well-known dense SLAM algorithm), featuring a variety of optimizations and approximations, to highlight the interplay between SLAM performance and accuracy. Based on an extensive exploration of the design space, we show that properly designed approximations, which exploit SLAM domain knowledge and efficient management of FPGA resources, enable high-performance dense SLAM in embedded systems, at almost 28 fps, with high energy efficiency and without compromising agent tracking and map construction.
10:00 CEST IP5_4.1 HETEROKV: A SCALABLE LINE-RATE KEY-VALUE STORE ON HETEROGENEOUS CPU-FPGA PLATFORMS
Speaker:
Haichang Yang, Institute of Microelectronics, Tsinghua University, CN
Authors:
Haichang Yang1, Zhaoshi Li2, Jiawei Wang2, Shouyi Yin2, Shaojun Wei2 and Leibo Liu2
1Tsinghua Unversity, CN; 2Tsinghua University, CN
Abstract
In-memory key-value store (KVS) has become crucial for many large-scale Internet services providers to build high-performance data centers. While most of the state-of-the-art KVS systems are optimized for read-intensive applications, a wide range of applications have been proven to be insert-intensive or scan-intensive, which scale poorly with the current implementations. With the availability of FPGA-based smart NICs in data centers, hardware-aided and hardware-based KVS systems are gaining their popularity. In this paper, we present HeteroKV, a scalable line-rate KVS on heterogeneous CPU-FPGA platforms, aiming to provide high throughput in read-, insert- and scan-intensive scenarios. To achieve this, HeteroKV leverages a heterogeneous data structure consisting of a b+ tree, whose leaf nodes are cache-aware partitioned hash tables. Experiments demonstrate HeteroKV's high performance in all scenarios. Specifically, a single node HeteroKV is able to achieve 430M, 315M and 15M key-value operations per second in read-, insert- and scan-intensive scenarios respectively, which are more than 1.5x, 1.4x and 5x higher than state-of-the-art implementations.
10:02 CEST 6.5.3 MUX GRANULARITY-ORIENTED ITERATIVE TECHNOLOGY MAPPING FOR IMPLEMENTING COMPUTE-INTENSIVE APPLICATIONS ON VIA-SWITCH FPGA
Speaker:
Takashi Imagawa, Ritsumeikan University, JP
Authors:
Takashi Imagawa1, Jaehoon Yu2, Masanori Hashimoto3 and Hiroyuki Ochi1
1Ritsumeikan University, JP; 2Tokyo Institute of Technology, JP; 3Osaka University, JP
Abstract
This paper proposes a technology mapping algorithm for implementing application circuits on via-switch FPGA (VS-FPGA). The via-switch is a novel non-volatile and rewritable memory element. Its small footprint and low parasitic RC are expected to improve the area- and energy-efficiency of an FPGA system. Some unique features of the VS-FPGA require a dedicated technology mapping strategy for implementing application circuits with maximum energy-efficiency. One of the features is the small ratio of logic blocks to arithmetic blocks (ABs). Given an application circuit, the proposed algorithm first detects word-wise circuit elements, such as MUXs. These elements are evaluated with an index of how resource utilization and fan-out change when the corresponding element is implemented with AB. All these elements are sorted in descending order based on this index. According to this order, each element is mapped to AB one by one, and synthesis and evaluation are repeated iteratively until satisfying given design constraints. The experimental results show that resource utilization and maximum fan-out can be reduced by about 30% to 50% and 12% to 87%, respectively. The proposed algorithm is not limited to the VS-FPGA and is expected to improve computation density and energy-efficiency of various FPGAs dedicated to compute-intensive signal processing applications.

6.6 Energy-Efficient Platforms for Novel Computing Paradigms

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/3MCcHfP8FYep96XRF

Session chair:
Qinru Qiu, Syracuse University, US

Session co-chair:
Alok Prakash, NTU, SG

This session presents efficient hardware platforms for novel computing paradigms via hardware and algorithm innovations to achieve aggressive energy efficiency. The first paper discusses a lightweight automata processor for which the model can be compressed into SRAMs. The second paper implements a manifold trainable encoder for energy-efficient hyper-dimensional computing. The third paper is about energy efficient compute-in-memory hardware with low-bit ADCs for mapping binary ResNets.

Time Label Presentation Title
Authors
09:30 CEST 6.6.1 LAP: A LIGHTWEIGHT AUTOMATA PROCESSOR FOR PATTERN MATCHING TASKS
Speaker:
Haojun Xia, University of Science and Technology of China, CN
Authors:
Haojun Xia, Lei Gong, Chao Wang, Xianglan Chen and Xuehai Zhou, University of Science and Technology of China, CN
Abstract
Growing applications are employing finite automata as their basic computational model. These applications match tens to thousands of patterns on a large amount of data, which brings great challenges to conventional processors. Hardware-based solutions have achieved high throughputs automata processing. However, they are too heavy to be integrated into small chips. Besides, they have to rely on DRAMs or other high capacity memories to store their underlying automata models. We focus on building a more lightweight automata processor, which can store the whole automata model into SRAMs with limited size and run independently. We propose LAP, a lightweight automata processor. Extremely high storage efficiency is achieved in LAP, leveraging a novel automata model (ADFA) and efficient packing algorithms. Besides, we exploit software-hardware co-design to achieve faster processing speed. We observe that ADFA's traversal algorithm is parallelizable. Thus, we propose novel hardware instructions to parallel the additional memory accesses in ADFA model and hide their access overhead. LAP is organized into a four-stage pipeline and prototyped into Xilinx Artix-7 FPGA at 263 MHz frequency. Evaluations show that LAP achieves extremely high storage efficiency, exceeding IBM's RegX and Micron's AP by 8 times. Besides, LAP achieves significant improvements in processing speed ranging from 32% to 91% compared with previous lightweight implementations. As a result, a low-power CPU equipped with five LAP cores can achieve 9.5 Gbps processing throughput matching 400 patterns simultaneously.
09:45 CEST 6.6.2 MANIHD: EFFICIENT HYPER-DIMENSIONAL LEARNING USING MANIFOLD TRAINABLE ENCODER
Speaker:
Zhuowen Zou, University of California Irvine, US
Authors:
Zhuowen Zou1, Yeseong Kim2, M. Hassan Najafi3 and Mohsen Imani4
1University of California San Diego, US; 2Daegu Institue of Science and Technology, KR; 3University of Louisiana, US; 4University of California Irvine, US
Abstract
Hyper-Dimensional (HD) computing emulates the functionality of the human short memory by computing with hypervectors as an alternative to computing with numbers. The main goal of HD computing is to map data points into sparse high-dimensional space where the learning task can perform in a linear and hardware-friendly way. The existing HD computing algorithms are using static and non-trainable encoder; thus, they require very high-dimensionality to provide acceptable accuracy. However, this high dimensionality results in high computational cost, especially over the realistic learning problems. In this paper, we proposed ManiHD that supports adaptive and trainable encoder for efficient learning in high-dimensional space. ManiHD explicitly considers non-linear interactions between the features during the encoding. This enables ManiHD to provide maximum learning accuracy using much lower dimensionality. ManiHD not only enhances the learning accuracy but also significantly improves the learning efficiency during both training and inference phases. ManiHD also enables online learning by sampling data points and capturing the essential features in an unsupervised manner. We also propose a quantization method that trades accuracy and efficiency for optimal configuration. Our evaluation of a wide range of classification tasks shows that ManiHD provides 4.8% higher accuracy than the stateof-the-art HD algorithms. In addition, ManiHD provides, on average, 12.3× (3.2×) faster and 19.3× (6.3×) more energyefficient training (inference) as compared to the state-of-the-art learning algorithms
10:00 CEST 6.6.3 MAPPING BINARY RESNETS ON COMPUTING-IN-MEMORY HARDWARE WITH LOW-BIT ADCS
Speaker:
Yulhwa Kim, Pohang University of Science and Technology, KR
Authors:
Yulhwa Kim1, Hyungjun Kim2, Jihoon Park1, Hyunmyung Oh1 and jae-joon kim3
1Pohang University of Science and Technology, KR; 2POSTECH, KR; 3postech, KR
Abstract
Implementing binary neural networks (BNNs) on computing-in-memory (CIM) hardware has several attractive features such as small memory requirement and minimal overhead in peripheral circuits such as analog-to-digital converters (ADCs). On the other hand, one of the downsides of using BNNs is that it degrades the classification accuracy. Recently, ResNet-style BNNs are gaining popularity with higher accuracy than conventional BNNs. The accuracy improvement comes from the high-resolution skip connection which binary ResNets use to compensate the information loss caused by binarization. However, the high-resolution skip connection forces the CIM hardware to use high-bit ADCs again so that area and energy overhead becomes larger. In this paper, we demonstrate that binary ResNets can be also mapped on CIM with low-bit ADCs via aggressive partial sum quantization and input-splitting combined with retraining. As a result, the key advantages of BNN CIM such as small area and energy consumption can be preserved with higher accuracy.

6.7 Challenges in implementing edge nodes of IoT systems

Date: Wednesday, 03 February 2021
Time: 09:30 CEST - 10:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/8Si8G6D2CfdW8tMHx

Session chair:
Rodolfo Pellizzoni, University of Waterloo, CA

Session co-chair:
Daniele Jahier Pagliari, Politecnico di Torino, IT

Distributed IoT and embedded systems are becoming more and more pervasive in everyday life. The papers in this session address several challenges related to predictability, energy management, and low power for distributed IoT systems, and explore openings provided by open-source hardware to question the hardware/software tradeoff.

Time Label Presentation Title
Authors
09:30 CEST 6.7.1 A MODEL-BASED DESIGN FLOW FOR ASYNCHRONOUS IMPLEMENTATIONS FROM SYNCHRONOUS SPECIFICATIONS
Speaker:
Yu Bai, Hebei University of Science and Technology, CN
Authors:
Yu Bai1, Omair Rafique2 and Klaus Schneider2
1Hebei University of Science and Technology, CN; 2University of Kaiserslautern, DE
Abstract
The synthesis of distributed embedded systems from dataflow models like Kahn Process Networks (KPN) has to deal with particular problems like absence of deadlocks and buffer overflows. However, the verification of the absence of these problems for a KPN model is in general not decidable. Starting with synchronous models, desynchronization avoids such design difficulties by generating sound dataflow networks by correctness of construction. In this paper, we present a design flow following such an approach. Our design flow differs from previous work in the following aspects: The synchronous models are specified by an imperative synchronous language and are therefore better suited for control-intensive applications. Verification of desynchronization criteria is carried out efficiently with the help of model checking and SAT-solving, ensuring the compliance of the functional behavior. Qualified code is translated automatically into the KPN model. Finally, the KPN model is automatically synthesized to the open computing language (OpenCL) based implementation which is platform independent and can be executed on various commercial off-the-shelf target platforms.
09:45 CEST 6.7.2 SURVIVING TRANSIENT POWER FAILURES WITH SRAM DATA RETENTION
Speaker:
Mingsong Lv, The Hong Kong Polytechnic University, HK
Authors:
Songran Liu1, Wei Zhang2, Mingsong Lv1, Qiulin Chen3 and Nan Guan2
1Northeastern University, CN; 2The Hong Kong Polytechnic University, HK; 3Huawei Technologies Co., Ltd., CN
Abstract
Many computing systems, such as those powered by energy harvesting or deployed in harsh working environment, may experience unpredictable and frequent transient power failures in their life time. The systems may fail to deliver correct computation results or never progress, as computation is frequently interrupted by the power failures. A possible solution could be frequently saving program states to non-volatile memory (NVM), such as using checkpoints, so that the system can incrementally progress. However, this approach is too costly, since frequent NVM writes is time and energy consuming, and may wear out the NVM device. In this work, we propose an approach to enable a system to use volatile SRAM to correctly progress in the presence of transient power failures, since SRAM is capable of retaining its data for seconds or minutes with the charge remained in the battery/capacitor after the CPU core stops at its brown-out voltage. The main problem is to validate whether the data in SRAM are actually retained during power failures. In our approach, we validate only a subset of the program states with Cyclic Redundancy Check for efficiency. The validation technique requires maintaining a backup version of the program states, which additionally provides the system with the ability to progress incrementally. We implement a run-time system with the proposed approach. Experimental results on an MSP430 platform show that the system can correctly progress on SRAM in the presence of transient power failures with low overhead.
10:00 CEST IP5_2.2 RISC-V FOR REAL-TIME MCUS - SOFTWARE OPTIMIZATION AND MICROARCHITECTURAL GAP ANALYSIS
Speaker:
Robert Balas, ETH Zurich, CH
Authors:
Robert Balas1 and Luca Benini2
1ETH Zürich, CH; 2Università di Bologna and ETH Zurich, IT
Abstract
Processors using the RISC-V ISA are finding increasing real use in IoT and embedded systems in the MCU segment. However, many real-life use cases in this segment have real-time constraints. In this paper we analyze the current state of real-time support for RISC-V with respect to the ISA, available hardware and software stack focusing on the RV32IMC subset of the ISA. As a reference point, we use the CV32E40P, an open-source industrially supported RV32IMFC core and FreeRTOS, a popular open-source real-time operating system, to do a baseline characterization. We perform a series of software optimizations on the vanilla RISC-V FreeRTOS port where we also explore and make use of ISA and micro-architectural features, improving the context switch time by 25% and the interrupt latency by 33% in the average and 20% in the worst-case run on a CV32E40P when evaluated on a power control unit firmware and synthetic benchmarks. This improved version serves then in a comparison against the ARM Cortex-M series, which in turn allows us to highlight gaps and challenges to be tackled in the RISC-V ISA as well as in the hardware/software ecosystem to achieve competitive maturity.
10:01 CEST 6.7.3 SOURCE CODE CLASSIFICATION FOR ENERGY EFFICIENCY IN PARALLEL ULTRA LOW-POWER MICROCONTROLLERS
Speaker:
Emanuele Parisi, Università di Bologna, IT
Authors:
Emanuele Parisi, Francesco Barchi, Andrea Bartolini, Giuseppe Tagliavini and Andrea Acquaviva, Università di Bologna, IT
Abstract
The analysis of source code through machine learning techniques is an increasingly explored research topic aiming at increasing smartness in the software toolchain to exploit modern architectures in the best possible way. In the case of low-power, parallel embedded architectures, this means finding the configuration, for instance in terms of the number of cores, leading to minimum energy consumption. Depending on the kernel to be executed, the energy optimal scaling configuration is not trivial. While recent work has focused on general-purpose systems to learn and predict the best execution target in terms of the execution time of a snippet of code or kernel (e.g. offload OpenCL kernel on multicore CPU or GPU), in this work we focus on static compile-time features to assess if they can be successfully used to predict the minimum energy configuration on PULP, an ultra-low-power architecture featuring an on-chip cluster of RISC-V processors. Experiments show that using machine learning models on the source code to select the best energy scaling configuration automatically is viable and has the potential to be used in the context of automatic system configuration for energy minimisation.

6.2 EDA Meets Quantum Computing: Bringing Two Communities Together

Date: Wednesday, 03 February 2021
Time: 09:45 CEST - 10:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/523LnhqKZNnaXmmDk

Session chair:
Aida Todri-Sanial, CNRS, LIRMM/University of Montpellier, FR

Session co-chair:
Elena Gnani, Universià di Bologna, IT

Organizer:
Aida Todri-Sanial, CNRS, LIRMM/University of Montpellier, FR

The session will highlight the crossover research needs, challenges and prospects of how EDA can help to advance the roadmap of quantum computing. Companies, government and research institutions are driving innovation to bring their quantum computing technology online, even though the number of qubits is still quite low. But plans for quantum computers with hundreds and evens thousands of qubits have recently been announced. Doubling performance every year is now the benchmark for quantum computers as designers look to the EDA community for new automation tools. The industry is asking practical questions; for example, how can the current EDA tools and IC industry support quantum research? Can EDA tools help to correct the errors of qubits or enable quantum error correction? IBM has made impressive accomplishment on building quantum chips based on superconducting circuits as qubits and accessible via the cloud. But such circuits constitute only one possible platform for quantum computing. Significant efforts are also under way to enable CMOS-based quantum computing. Consequently, many EDA methods can be of direct interest to drive innovation on CMOS silicon qubits. The session will showcase progress on both Quantum Technologies and EDA communities with speakers from both industry and academia.

Time Label Presentation Title
Authors
09:45 CEST 6.2.2 QUANTUM COMPUTING WITH CMOS TECHNOLOGY
Speaker:
Miguel Fernando Gonzalez Zalba, Quantum MotionTechnologies, ES
Author:
Fernando Gonzalez-Zalba, Quantum Motion Technologies, GB
Abstract
Quantum computing is poised to be the innovation driver of the next decade. Its information processing capabilities will radically accelerate drug discovery, improve online security, or even boost artificial intelligence [1]. Building a quantum computer promises to have a major positive impact in society, however building the hardware that will enable that paradigm change its one of the greatest technological challenges for humanity. The spins of isolated electrons in silicon are one of the most promising solid-state systems to achieve that goal. With the recent demonstrations of long coherence times [2], high-fidelity spin readout [3], and one- and two-qubit gates [4-7], the basic requirements to build a fault-tolerant quantum computer have now been fulfilled. These are promising initial results for this relatively recent approach to quantum computing, indicating that attempting to build a quantum computer based on silicon technology is a realistic proposition. However, many technological challenges lie ahead. So far, most of the aforementioned milestones were achieved with small-scale devices (one- or two-qubit systems) fabricated in academic cleanrooms offering a relatively modest level of process control and reproducibility. Now, a transition from lab-based demonstrations to spin qubits manufactured at scale is necessary. Recently, important developments in the field of nanodevice engineering have shown this may be possible by using modified field-effect transistors (FET) [8,9], thus creating an opportunity to leverage the scaling capabilities of the complementary metal-oxide-semiconductor (CMOS) industry to address the challenge. From a technological perspective, CMOS-based quantum computing brings compatibility with well-established, highly reproducible Very Large-Scale Integration (VLSI) techniques of the CMOS industry that routinely manufacture billions of quasi-identical transistors on the size of a fingertip. Furthermore, using CMOS technology for quantum computing could enable hybrid integration of quantum and classical technologies facilitating data management and fast information feedback between processing blocks. In this paper, I will present a series of results on silicon FETs manufactured in an industrial environment that show this technology could provide a platform on to which implement electron spin qubits at scale. I will present our efforts to develop a qubit specific measurement technique that is accurate and scalable while being compatible with the industrial fabrication processes [10-12]. Using this methodology, I will show the first report of electron spin readout in a silicon industry-fabricated device [13]. On the architecture side, I will present results that combine, on chip, digital and quantum devices to perform time-multiplexed readout [14]. And finally, I will show our strategy to use small CMOS quantum processing units in a multi-core approach to solve hybrid quantum-classical algorithms that benefit from massive parallelisation [15,16].
10:00 CEST 6.2.3 STRUCTURED OPTIMIZED ARCHITECTING OF FULL-STACK QUANTUM SYSTEMS IN THE NISQ ERA
Speaker:
Carmen G. Almudever, Delft University of Technology, NL
Authors:
Carmen G. Almudever1 and Eduard Alarcon2
1TU Delft, NL; 2TU Catalonia, UPC BarcelonaTech, ES
Abstract
In the midst of the NISQ era of quantum computers, the challenges are gravitating to encompass both architecting and full-stack engineering aspects, which are inherently algorithm-driven, so that there starts to be a convergence of bottom-up and top down design approaches, what we coin as the Quantum Architecting (QuArch) era. In face of many-fold diverse design proposals, in this paper it is postulated and proposed to apply the so-called Design Space Exploration (DSE) to the full vertical stack of quantum systems as an instrumental methodology to address such design diversity challenge. This structured design means, based upon composing a multidimensional input design space together with compressing the set of output performance metrics into an optimization-oriented overall figure of merit, provides a framework and method for optimization, for performance comparison. It yields as well a way to discriminate among alternative techniques at all layers and across layers, eventually as a structured and comprehensive design-oriented formal framework to address the quantum system design and evaluation complexity. The paper concludes by illustrating instances of this methodology in optimizing and comparing mapping techniques to address the resource-constrained current NISQ quantum chips, and to carry out a quantitative gap analysis of scalability trends aiming many-core distributed quantum architectures.
10:15 CEST 6.2.4 VISUALIZING DECISION DIAGRAMS FOR QUANTUM COMPUTING
Speaker and Author:
Robert Wille, Johannes Kepler University Linz, AT
Abstract
With the emergence of more and more applications for quantum computing, also the development of corresponding methods for design automation is receiving increasing interest. In this respect, decision diagrams provide a promising basis for many design tasks such as simulation, synthesis, verification, and more. However, users of the corresponding tools often do not have a proper background or an intuition about how these methods based on decision diagrams work and what their strengths and limits are. In an effort to make decision diagrams for quantum computing more accessible, we present a visualization tool which visualizes quantum decision diagrams and allows to explore their behavior when used in the design tasks mentioned above. The installation-free web-tool allows users to interactively learn how decision diagrams can be used in quantum computing, e.g., to (1) compactly represent quantum states and the functionality of quantum circuits, (2) to efficiently simulate quantum circuits, and (3) to verify the equivalence of two circuits. The tool is available at https://iic.jku.at/eda/research/quantum_dd/tool.

IP5_1 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/7A587R9TDKzgQCXNF

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP5_1.1 EXPLORING MICRO-ARCHITECTURAL SIDE-CHANNEL LEAKAGES THROUGH STATISTICAL TESTING
Speaker:
Sarani Bhattacharya, KU Leuven, BE
Authors:
Sarani Bhattacharya1 and Ingrid Verbauwhede2
1Phd, BE; 2KU Leuven - COSIC, BE
Abstract
Micro-architectural side-channel leakage received a lot of attention due to their high impact on software security on complex out-of-order processors. These are extremely specialised threat models and can be only realised in practise with high precision measurement code, triggering micro-architectural behavior that leaks information. In this paper, we present a tool to support the inexperienced user to verify his code for side-channel leakage. We combine two very useful tools- statistical testing and hardware performance monitors to bridge this gap between the understanding of the general purpose users and the most precise speculative execution attacks. We first show that these event counters are more powerful than observing timing variabilities on an executable. We extend Dudect, where the raw hardware events are collected over the target executable, and leakage detection tests are incorporated on the statistics of observed events following the principles of non-specific t-test's. Finally, we show the applicability of our tool on the most popular speculative micro-architectural and data-sampling attack models.
IP5_1.2 SECLUSIVE CACHE HIERARCHY FOR MITIGATING CROSS-CORE CACHE AND COHERENCE DIRECTORY ATTACKS
Speaker:
Vishal Gupta, Indian Institute of Technology, Kanpur, IN
Authors:
Vishal Gupta1, Vinod Ganesan2 and Biswabandan Panda3
1Indian Institute of Technology, Kanpur, IN; 2Indian Institute of Technology Madras, IN; 3IIT Kanpur, IN
Abstract
Cross-core cache attacks glean sensitive data by exploiting the fundamental interference at the shared resources like the last-level cache (LLC) and coherence directories. Complete non-interference will make cross-core cache attacks unsuccessful. To this end, we propose a seclusive cache hierarchy with zero storage overhead and a marginal increase in on-chip traffic, that provides non-interference by employing cache-privatization on demand. Upon a cross-core eviction by an attacker core at the LLC, the block is back-filled into the private cache of the victim core. Our back-fill strategy mitigates cross-core conflict based LLC and coherence directory-based attacks. We show the efficacy of the seclusive cache hierarchy by comparing it with existing cache hierarchies.

IP5_2 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/p433buGvoP3pkCaog

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP5_2.1 TOWARDS A FIRMWARE TPM ON RISC-V
Speaker:
Marouene Boubakri, University of Carthage, TN
Authors:
Marouene Boubakri1, Fausto Chiatante2 and Belhassen Zouari1
1Mediatron Lab, Higher School of Communications of Tunis, University of Carthage, Tunisia, TN; 2NXP, FR
Abstract
To develop the next generation of Internet of Things, Edge devices and systems which leverage progress in enabling technologies such as 5G, distributed computing and artificial intelligence (AI), several requirements need to be developed and put in place to make the devices smarter. A major requirement for all the above applications is the long-term security and trust computing infrastructure. Trusted Computing requires the introduction inside of the platform of a Trusted Platform Module (TPM). Traditionally, a TPM was a discrete and dedicated module plugged into the platform to give TPM capabilities. Recently, processors manufacturers started integrating trusted computing features into their processors. A significant drawback of this approach is the need for a permanent modification of the processor microarchitecture. In this context, we suggest an analysis and a design of a software-only TPM for RISC-V processors based on seL4 microkernel and OP-TEE
IP5_2.2 RISC-V FOR REAL-TIME MCUS - SOFTWARE OPTIMIZATION AND MICROARCHITECTURAL GAP ANALYSIS
Speaker:
Robert Balas, ETH Zurich, CH
Authors:
Robert Balas1 and Luca Benini2
1ETH Zürich, CH; 2Università di Bologna and ETH Zurich, IT
Abstract
Processors using the RISC-V ISA are finding increasing real use in IoT and embedded systems in the MCU segment. However, many real-life use cases in this segment have real-time constraints. In this paper we analyze the current state of real-time support for RISC-V with respect to the ISA, available hardware and software stack focusing on the RV32IMC subset of the ISA. As a reference point, we use the CV32E40P, an open-source industrially supported RV32IMFC core and FreeRTOS, a popular open-source real-time operating system, to do a baseline characterization. We perform a series of software optimizations on the vanilla RISC-V FreeRTOS port where we also explore and make use of ISA and micro-architectural features, improving the context switch time by 25% and the interrupt latency by 33% in the average and 20% in the worst-case run on a CV32E40P when evaluated on a power control unit firmware and synthetic benchmarks. This improved version serves then in a comparison against the ARM Cortex-M series, which in turn allows us to highlight gaps and challenges to be tackled in the RISC-V ISA as well as in the hardware/software ecosystem to achieve competitive maturity.

IP5_3 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/k6wgwYQCKRZDDRDpx

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP5_3.1 M2H: OPTIMIZING F2FS VIA MULTI-LOG DELAYED WRITING AND MODIFIED SEGMENT CLEANING BASED ON DYNAMICALLY IDENTIFIED HOTNESS
Speaker:
Lihua Yang, Huazhong University of Science and Technology, CN
Authors:
Lihua Yang, Zhipeng Tan, Fang Wang, Shiyun Tu and Jicheng Shao, Huazhong University of Science and Technology, CN
Abstract
With the widespread use of flash memory from mobile devices to large data centers, flash friendly file system (F2FS) designed for flash memory features has become popular. However, F2FS suffers from severe cleaning overhead due to its logging scheme writes. Mixed storage of data with different hotness in the file system aggravates segment cleaning. We propose {m}ulti-log delayed writing and {m}odified segment cleaning based on dynamically identified {h}otness (M2H). M2H defines the hotness by the file block update distance and uses the K-means clustering to identify hotness accurately for dynamic access patterns. Based on fine-grained hotness, we design multi-log delayed writing and modify the selection and release of the victim segment. The hotness metadata cache is used to reduce overheads induced by hotness metadata management and clustering calculations. Compared with the existing strategy of F2FS, migrated blocks of segment cleaning in M2H reduce by 36.05% to 36.51% and the file system bandwidth increases by 69.52% to 70.43% cumulatively.
IP5_3.2 CHARACTERIZING AND OPTIMIZING EDA FLOWS FOR THE CLOUD
Speaker:
Abdelrahman Hosny, Brown University, US
Authors:
Abdelrahman Hosny and Sherief Reda, Brown University, US
Abstract
Cloud computing accelerates design space exploration in logic synthesis, and parameter tuning in physical design. However, deploying EDA jobs on the cloud requires EDA teams to deeply understand the characteristics of their jobs in cloud environments. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we formulate the problem of migrating EDA jobs to the cloud. First, we characterize the performance of four main EDA applications, namely: synthesis, placement, routing and static timing analysis. We show that different EDA jobs require different machine configurations. Second, using observations from our characterization, we propose a novel model based on Graph Convolutional Networks to predict the total runtime of a given application on different machine configurations. Our model achieves a prediction accuracy of 87%. Third, we develop a new formulation for optimizing cloud deployments in order to reduce deployment costs while meeting deadline constraints. We present a pseudo-polynomial optimal solution using a multi-choice knapsack mapping that reduces costs by 35.29%.

IP5_4 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/GfdMuDtRsmQm9Jfss

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP5_4.1 HETEROKV: A SCALABLE LINE-RATE KEY-VALUE STORE ON HETEROGENEOUS CPU-FPGA PLATFORMS
Speaker:
Haichang Yang, Institute of Microelectronics, Tsinghua University, CN
Authors:
Haichang Yang1, Zhaoshi Li2, Jiawei Wang2, Shouyi Yin2, Shaojun Wei2 and Leibo Liu2
1Tsinghua Unversity, CN; 2Tsinghua University, CN
Abstract
In-memory key-value store (KVS) has become crucial for many large-scale Internet services providers to build high-performance data centers. While most of the state-of-the-art KVS systems are optimized for read-intensive applications, a wide range of applications have been proven to be insert-intensive or scan-intensive, which scale poorly with the current implementations. With the availability of FPGA-based smart NICs in data centers, hardware-aided and hardware-based KVS systems are gaining their popularity. In this paper, we present HeteroKV, a scalable line-rate KVS on heterogeneous CPU-FPGA platforms, aiming to provide high throughput in read-, insert- and scan-intensive scenarios. To achieve this, HeteroKV leverages a heterogeneous data structure consisting of a b+ tree, whose leaf nodes are cache-aware partitioned hash tables. Experiments demonstrate HeteroKV's high performance in all scenarios. Specifically, a single node HeteroKV is able to achieve 430M, 315M and 15M key-value operations per second in read-, insert- and scan-intensive scenarios respectively, which are more than 1.5x, 1.4x and 5x higher than state-of-the-art implementations.

IP5_5 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/47vFGRJEaTagrvs2Y

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP5_5.1 RUNTIME FAULT INJECTION DETECTION FOR FPGA-BASED DNN EXECUTION USING SIAMESE PATH VERIFICATION
Speaker:
Xianglong Feng, Rutgers University, US
Authors:
Xianglong Feng, Mengmei Ye, Ke Xia and Sheng Wei, Rutgers University, US
Abstract
Deep neural networks (DNNs) have been deployed on FPGAs to achieve improved performance, power efficiency, and design flexibility. However, the FPGA-based DNNs are vulnerable to fault injection attacks that aim to compromise the original functionality. The existing defense methods either duplicate the models and check the consistency of the results at runtime, or strengthen the robustness of the models by adding additional neurons. However, these existing methods could introduce huge overhead or require retraining the models. In this paper, we develop a runtime verification method, namely Siamese path verification (SPV), to detect fault injection attacks for FPGA-based DNN execution. By leveraging the computing features of the DNN and designing the weight parameters, SPV adds neurons to check the integrity of the model without impacting the original functionality and, therefore, model retraining is not required. We evaluate the proposed SPV approach on Xilinx Virtex-7 FPGA using the MNIST dataset. The evaluation results show that SPV achieves the security goal with low overhead.
IP5_5.2 TRULOOK: A FRAMEWORK FOR CONFIGURABLE GPU APPROXIMATION
Speaker:
Mohsen Imani, University of California Irvine, US
Authors:
Ricardo Garcia1, Fatemeh Asgarinejad1, Behnam Khaleghi1, Tajana Rosing1 and Mohsen Imani2
1University of California San Diego, US; 2University of California Irvine, US
Abstract
In this paper, we propose TruLook, a framework that employs approximate computing techniques for GPU acceleration through computation reuse as well as approximate arithmetic operations to eliminate redundant and unnecessary exact computations. To enable computational reuse, GPU is enhanced with small lookup tables which are placed close to the stream cores that return already computed values for exact and potential inexact matches. Inexact matching is subject to a threshold that is controlled by the number of mantissa bits involved in the search. Approximate arithmetic is provided by a configurable approximate multiplier that dynamically detects and approximates operations which are not significantly affected by approximation. TruLook guarantees the accuracy bound required for an application by configuring the hardware at runtime. We have evaluated TruLook efficiency on a wide range of multimedia and deep learning applications. Our evaluation shows that with 0% and less than 1% quality loss budget, TruLook yields on average 2.1× and 5.6× energy-delay product improvement over four popular networks on ImageNet dataset.

UB.12 University Booth

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/QbaNDvemHm8TyhgvQ

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.12 HARDBLOCK: DEMONSTRATOR OF PHYSICALLY BINDING AN IOT DEVICE TO A NON-FUNGIBLE TOKEN IN ETHEREUM BLOCKCHAIN
Speakers:
Javier Arcenegui, Rosario Arjona and Iluminada Baturone, Universidad de Sevilla - CSIC, ES
Authors:
Javier Arcenegui, Rosario Arjona and Iluminada Baturone, Universidad de Sevilla - CSIC, ES
Abstract
Nowadays, blockchain is a growing technology in the Internet of Thing (IoT) ecosystem. In this work, we show a demonstrator of an IoT device bound to a Non-Fungible Token (NFT) based on the ERC-721 standard of Ethereum blockchain. The advantages of our solution is that IoT devices can be controlled securely by events from the blockchain and authenticated users, besides being able to carry out blockchain transactions. The IoT device generates its own Blockchain Account (BCA) using a secret seed firstly generated by a True Random Number Generator (TRNG) and then reconstructed by a Physical Unclonable Function (PUF). A Pycom Wipy 3.0 board with the ESP32 microcontroller is employed as IoT device. The internal SRAM of the microcontroller acts as PUF and TRNG. The SRAM is controlled by a firmware developed in ESP-IDF. A smart contract developed in Solidity using Remix IDE creates the token. Kovan testnet and a Graphical User Interface programmed in Python are employed to show the results.
/system/files/webform/tpc_registration/UB.12-HardBlock.pdf

UB.13 University Booth

Date: Wednesday, 03 February 2021
Time: 10:30 CEST - 11:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ukhjmT9xMo3Zd5L5e

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.13 EUCLID-NIR GPU: AN ON-BOARD PROCESSING GPU-ACCELERATED SPACE CASE STUDY DEMONSTRATOR
Speaker:
Ivan Rodriguez Ferrandez, BSC/UPC, ES
Authors:
Ivan Rodriguez and Leonidas Kosmidis, BSC / UPC, ES
Abstract
Embedded Graphics Processing Units (GPUs) are very attractive candidates for on-board payload processing of future space systems, thanks to their high performance and low-power consumption. Although there is significant interest from both academia and industry, there is no open and publicly available case study showing their capabilities, yet. In this master thesis project, which was performed within the GPU4S (GPU for Space) ESA-funded project, we have parallelised and ported the Euclid NIR (Near Infrared) image processing algorithm used in the European Space Agency's (ESA) mission to be launched in 2022, to an automotive GPU platform, the NVIDIA Xavier. In the demo we will present in real-time its significantly higher performance achieved compared to the original sequential implementation. In addition, visitors will have the opportunity to examine the images on which the algorithm operates, as well as to inspect the algorithm parallelisation through profiling and code inspection.
/system/files/webform/tpc_registration/UB.13-Euclid-NIR-GPU.pdf

K.4 Keynote - Special day on CPS and I4.0

Date: Wednesday, 03 February 2021
Time: 15:00 CEST - 15:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/omCcNzNqB4zQCZxk7

Session chair:
Enrico Macii, Politecnico di Torino, IT

Session co-chair:
Frank Schirrmeister, Cadence, US

Cyber-Physical Systems are the underlying force enabling Industry 4.0. They pave the way to new and powerful capabilities, like factory digital twin, and a continuum of AI-based decision making at the Edge and in the Cloud. Enabling all of these opportunities requires a coherent portfolio of MCUs and MPUs covering the full spectrum from high-performance to ultra-low power CPUs, connectivity solutions, end-to-end security, and cost-effective sensing, while embedding artificial intelligence and machine learning for end nodes. STMicroelectronics is in a privileged position at the front lines of both designing and manufacturing products for these cyber-physical systems on which Philippe Magarshack has some insights to share. Bio: Since 2016, Philippe Magarshack is MDG Group Vice President at ST Microelectronics, in charge of Microcontrollers and Digital ICs Group (MDG) Strategy, Technology & System Architecture. Magarshack was President of the Minalogic Collaborative R&D Cluster in Grenoble France, from 2014 to 2020. In 2012, he was VP for Design Enablement & Services, with a focus on the 28nm FD-SOI design ecosystem, and then during 2015, CTO of the Embedded Processing Solutions. In 2005, Magarshack was appointed Group VP and GM of ST’s Central CAD and Design Solutions for technologies ranging from CMOS to BICMOS and embedded NVM. In 1994, Magarshack joined the Central R&D Group of SGS-THOMSON Microelectronics (now STMicroelectronics), where he held several roles in CAD and Libraries management for advanced integrated-circuit manufacturing processes. From 1985 to 1989, Magarshack worked as a microprocessor designer at AT&T Bell Labs in the USA. Magarshack graduated with an engineering degree in Physics from Ecole Polytechnique, Paris, France, and with an Electronics Engineering degree from Ecole Nationale Supérieure des Télécommunications in Paris, France.

Time Label Presentation Title
Authors
15:00 CEST K.4.1 CYBER-PHYSICAL SYSTEMS FOR INDUSTRY 4.0: AN INDUSTRIAL PERSPECTIVE
Speaker and Author:
Philippe Magarshack, STMicroelectronics, FR
Abstract
Cyber-Physical Systems are the underlying force enabling Industry 4.0. They pave the way to new and powerful capabilities, like factory digital twin, and a continuum of AI-based decision making at the Edge and in the Cloud. Enabling all of these opportunities requires a coherent portfolio of MCUs and MPUs covering the full spectrum from high-performance to ultra-low power CPUs, connectivity solutions, end-to-end security, and cost-effective sensing, while embedding artificial intelligence and machine learning for end nodes. STMicroelectronics is in a privileged position at the front lines of both designing and manufacturing products for these cyber-physical systems on which Philippe Magarshack has some insights to share.

7.1 Panel - Special day on CPS and I4.0

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/TgZw6anGFuaxMRv9P

Session chair:
Enrico Macii, Politecnico di Torino, IT

Session co-chair:
Frank Schirrmeister, Cadence, US

Organizer:
Frank Schirrmeister, Cadence, US

Sensor-based, communication-enabled autonomous cyber-physical systems (CPS) require design and development techniques that holistically address hardware, software, electronic and mechanical components. This panel will discuss the resulting requirements that CPS has on development tools as they have to span multiple disciplines, either holistically or vie proper interfaces. The panelists will discuss gaps in current electronic design automation flows, potential enhancements, interfaces to simulation and analysis in the computational software domain, the role of standardization, and priorities for the industry to address. Panelists: Ed Sperling, SemiEngineering.com, USA, Moderator Laurent Maillet-Contoz, STMicroelectronics, , France Jean-Marie Brunet, Siemens EDA, USA Maurizio Griva, Reply, Italy Frank Schirrmeister, Cadence, USA

Time Label Presentation Title
Authors
16:00 CEST 7.1.1 PANEL: IS EDA READY FOR CYBER-PHYSICAL SYSTEMS?
Panelists:
Ed Sperling1, Laurent Maillet-Contoz2, Jean-Marie Brunet3, Maurizio Griva4 and Frank Schirrmeister5
1SemiEngineering.com, US; 2STMicroelectronics, FR; 3Siemens EDA, US; 4Reply, IT; 5Cadence Design Systems, US
Abstract
Sensor-based, communication-enabled autonomous cyber-physical systems (CPS) require design and development techniques that holistically address hardware, software, electronic and mechanical components. This panel will discuss the resulting requirements that CPS has on development tools as they have to span multiple disciplines, either holistically or vie proper interfaces. The panelists will discuss gaps in current electronic design automation flows, potential enhancements, interfaces to simulation and analysis in the computational software domain, the role of standardization, and priorities for the industry to address. Panelists: Ed Sperling, SemiEngineering.com, USA, Moderator Laurent Maillet-Contoz, STMicroelectronics, France Jean-Marie Brunet, Siemens EDA, USA Maurizio Griva, Reply, Italy Frank Schirrmeister, Cadence, USA
17:00 CEST 7.1.2 PANEL Q&A
Panelists:
Ed Sperling1, Laurent Maillet-Contoz2, Jean-Marie Brunet3, Maurizio Griva4 and Frank Schirrmeister5
1SemiEngineering.com, US; 2STMicroelectronics, FR; 3Siemens EDA, US; 4Reply, IT; 5Cadence Design Systems, US
Abstract
30 minutes of live question and answer time for interaction among panelists and audience.

7.2 Algorithm hardware co-design approaches for low power real time and robust artificial intelligence

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 17:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/2ouvWhzReFazXZupg

Session chair:
Amit Ranjan Trivedi, Electrical and Computer Engineering, University of Illinois at Chicago, US

Session co-chair:
Priyadarshini Panda, Electrical Engineering, Yale University, US

Organizers:
Priyadarshini Panda, Electrical Engineering, Yale University, US
Amit Ranjan Trivedi, Electrical and Computer Engineering, University of Illinois at Chicago, US

Artificial Intelligence (AI) algorithms have shown that the growing volume and variety of data, faster computing power, and efficient storage can be leveraged for highly accurate predictions and decision-making in complex computing problems. Earlier AI platforms were generally confined to static models where the prediction accuracy mattered the most, but computational efficiency of algorithms was not critical. However, at present, AI has been gaining momentum towards real-time applications. Real-time applications require dynamic decision-making based on the evolving inputs; thus, require AI models to predict with the least latency. To mitigate concerns due to unpredictable latency and privacy in cloud-based processing, it is preferred to implement real-time AI algorithms at the edge node itself. Meanwhile, since edge nodes are often battery-operated, complex AI algorithms must operate within stringent energy budget. Robustness of AI algorithms also becomes more critical for real-time applications, such as self-driving cars and surgical robotics where mispredictions can have fatal consequences. Interacting with increasingly sophisticated decision-making systems is becoming more and more a part of our daily life. This creates an immense responsibility for designers of these systems to build them in a way to guarantee safe interaction with their users and good performance, in the presence of noise and changes in the environment, and/or of model misspecification and uncertainty. Any progress in this area will be a huge step forward in using decision-making algorithms in emerging high stakes applications, such as autonomous driving, robotics, power systems, health care, recommendation systems, and finance. To address the emerging constraints on AI, this special session brings together experts focusing various critical approaches for low power and real-time AI – ranging from spiking neural networks to compute-in-memory to digital accelerators. A unique contribution of this special session is to raise awareness on reliability and robustness constraints on low power AI and highlight the potential of co-design approaches to address these key challenges through our recent works as well as from the recent literature.

Time Label Presentation Title
Authors
16:00 CEST 7.2.1 EFFICIENCY-DRIVEN HARDWARE OPTIMIZATION FOR ADVERSARIALLY ROBUST NEURAL NETWORKS
Speaker:
Abhiroop Bhattacharjee, Yale University, US
Authors:
Priyadarshini Panda, Abhiroop Bhattacharjee and Abhishek Moitra, Yale University, US
Abstract
With a growing need to enable intelligence in embedded devices in the Internet of Things (IoT) era, secure hardware implementation of DNNs has become imperative. We will focus on how to address adversarial robustness for Deep Neural Networks (DNNs) through efficiency-driven hardware optimizations. Since memory (specifically, dot-product operations) is a key energy-spending component for DNNs, hardware approaches in the past have focused on optimizing the memory. One such approach is approximate digital CMOS memories with hybrid 6T-8T SRAM cells that enable supply voltage (Vdd) scaling yielding low-power operation, without significantly affecting the performance due to read/write failures incurred in the 6T cells. In this talk, we will show how the bit-errors in the 6T cells of hybrid 6T-8T memories minimize the adversarial perturbations in a DNN. Essentially, we find that for different configurations of 8T-6T ratios and scaled Vdd operation, noise incurred in the hybrid memory architectures is bound within specific limits. This hardware noise can potentially interfere in the creation of adversarial attacks in DNNs yielding robustness. Another memory optimization approach involves using analog memristive crossbars that perform Matrix-Vector-Multiplications (MVMs) efficiently with low energy and area requirements. However, crossbars generally suffer from intrinsic non-idealities that cause errors in performing MVMs, leading to degradation in the accuracy of the DNNs. We will show how the intrinsic hardware variations manifested through crossbar non-idealities yield adversarial robustness to the mapped DNNs without any additional optimization.
16:15 CEST 7.2.2 COMPUTE-IN-MEMORY UPSIDE DOWN: A LEARNING OPERATOR CO-DESIGN PERSPECTIVE FOR SCALABILITY
Speaker:
Amit Trivedi, University of Illinois at Chicago, US
Authors:
Amit Trivedi, Shamma Nasrin, Shruthi Jaisimha and Priyesh Shukla, University of Illinois at Chicago, US
Abstract
In this paper, we discuss the potential of model-hardware co-design to considerably simplify the implementation complexity of compute-in-SRAM deep learning. Although compute-in-SRAM has emerged as a promising approach to improve the energy efficiency of DNN processing, current implementations suffer due to complex and excessive mixed-signal peripherals, such as the need for parallel digital-to-analog converters (DACs) at each input port. Comparatively, our approach inherently obviates complex peripherals by co-designing learning operators to SRAM's operational constraints. For example, our co-designed implementation is DAC-free even for multi-bit precision DNN processing. Additionally, we also discuss the interaction of our compute-in-SRAM operator with Bayesian inference of DNNs. We show a synergistic interaction of Bayesian inference with our framework where Bayesian methods allow achieving similar accuracy with a much smaller network size. Although each iteration of sample-based Bayesian inference is computationally expensive, the cost is minimized by our compute-in-SRAM approach. Meanwhile, by reducing the network size, Bayesian methods reduce the footprint cost of compute-in-SRAM implementation which is a key concern for the method. We characterize this interaction for deep learning-based pose (position and orientation) estimation for a drone.
16:30 CEST 7.2.3 RELIABLE EDGE INTELLIGENCE IN UNRELIABLE ENVIRONMENT
Speaker:
Saibal Mukhopadhyay, Georgia Institute of Technology, US
Authors:
Minah Lee, Xueyuan She, Biswadeep Chakraborty, Saurabh Dash, Burhan Ahmad Mudassar and Saibal Mukhopadhyay, Georgia Institute of Technology, US
Abstract
A key challenge for deployment of artificial intelligence (AI) in real-time safety-critical systems at the edge is to ensure reliable performance even in unreliable environments. This paper will present a broad perspective on how to design AI platforms to achieve this unique goal. First, we will present examples of AI architecture and algorithm that can assist in improving robustness against input perturbations. Next, we will discuss examples of how to make AI platforms robust against hardware induced noise and variation. Finally, we will discuss the concept of using lightweight networks as reliability estimators to generate early warning of potential task failures.
16:45 CEST 7.2.4 EXPLORING SPIKE-BASED LEARNING FOR NEUROMORPHIC COMPUTING: PROSPECTS AND PERSPECTIVES
Speaker:
Kaushik Roy, Purdue University, US
Authors:
Nitin Rathi, Amogh Agrawal, Chankyu Lee, Adarsh Kosta and Kaushik Roy, Purdue University, US
Abstract
Spiking neural networks (SNNs) operating with sparse binary signals (spikes) implemented on event-driven hardware can potentially be more energy-efficient than traditional artificial neural networks (ANNs). However, SNNs perform computations over time, and the neuron activation function does not have a well-defined derivative leading to unique training challenges. In this paper, we discuss the various spike representations and training mechanisms for deep SNNs. Additionally, we review applications that go beyond classification, like gesture recognition, motion estimation, and sequential learning. The unique features of SNNs, such as high activation sparsity and spike-based computations, can be leveraged in hardware implementations for energy-efficient processing. To that effect, we discuss various SNN implementations, both using digital ASICs as well as analog in-memory computing primitives. Finally, we present an outlook on future applications and open research areas for both SNN algorithms and hardware implementations.

7.3 Stochastic and Approximate Computing

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/zkKjDQW6KMHipmgL4

Session chair:
Ilia Polian, Universität Stuttgart, DE

Session co-chair:
Frank Sill Torres, German Aerospace Center (DLR) / Institute for the Protection of Maritime Infrastructures, DE

This session focuses on the application of stochastic and approximate computing concepts for emerging technologies. The first work of the session discusses the application of an FSM-based circuit for generating Low-Discrepancy input bitstreams that can be employed for stochastic computing. The following presentation concentrates on the application of printed electronics for implementing neural networks based on stochastic computing. The final work of this session presents a workload-aware approach that aims at the identification of ideal configurations of approximate units in order to minimize energy consumption.

Time Label Presentation Title
Authors
16:00 CEST 7.3.1 A LOW-COST FSM-BASED BIT-STREAM GENERATOR FOR LOW-DISCREPANCY STOCHASTIC COMPUTING
Speaker:
Sina Asadi, University of Louisiana at Lafayette, US
Authors:
Sina Asadi1, M. Hassan Najafi2 and Mohsen Imani3
1University of Louisiana, US; 2University of Louisiana at Lafayette, US; 3University of California Irvine, US
Abstract
Low-discrepancy (LD) bit-streams have been proposed to improve the accuracy and computation speed of stochastic computing (SC) circuits. These bit-streams are conventionally generated by using a quasi-random number generator such as a Sobol sequence generator and a comparator. The high hardware cost of such number generators makes the current comparator-based generators expensive in terms of area and power cost. The hardware cost issue further aggravates when increasing the number of inputs and the precision of data. A finite state machine (FSM)-based LD bit-stream generator was proposed recently to mitigate this hardware cost. The proposed generator, however, can only generate a specific LD pattern and hence, cannot be used where multiple independent LD bit-streams are needed. In this work, we propose a low-cost FSM-based LD bit-stream generator that supports generation of any number of independent LD bit-streams. The proposed generator reduces the hardware area and the area-delay product up to 80% compared to those of the state-of-the-art comparator-based LD bit-stream generator while generating accurate bit-streams. We develop a parallel design of the proposed generator and show that the 8x parallel implementation reduces the hardware cost on average more than 82 percent compared to the cost of the state-of-the-art parallel LD generator. Taking advantage of the provided area saving we improve the fault tolerance of the bit-stream generation unit, a vulnerable component in SC systems, by orders of magnitude. We show the effectiveness of using the proposed generator in SC-based design of convolution function as the case study.
16:15 CEST 7.3.2 PRINTED STOCHASTIC COMPUTING NEURAL NETWORKS
Speaker:
Dennis Weller, Karlsruhe Institute of Technology, DE
Authors:
Dennis Weller1, Nathaniel Bleier2, Michael Hefenbrock1, Jasmin Aghassi-Hagmann3, Michael Beigl1, Rakesh Kumar4 and Mehdi Tahoori1
1Karlsruhe Institute of Technology, DE; 2Universitiy of Illinois Urbana-Champaign, US; 3Offenburg University of Applied Sciences, DE; 4University of Illinois Urbana-Champaign, US
Abstract
Printed electronics (PE) offers flexible, extremely low-cost, and on-demand hardware due to its additive manufacturing process, enabling emerging ultra-low-cost applications, including machine learning applications. However, large feature sizes in PE limit the complexity of a machine learning classifier (e.g., a neural network) in PE. Stochastic computing Neural Networks (SC-NNs) can reduce area in silicon technologies, but still require complex designs due to unique implementation tradeoffs in PE. In this paper, we propose a printed mixed-signal system, which substitutes complex and power-hungry conventional stochastic computing (SC) components by printed analog designs. The printed mixed-signal SC consumes only 35% of power consumption and requires only 25% of area compared to a conventional 4-bit NN implementation. We also show that the proposed mixed-signal SC-NN provides good accuracy for popular neural network classification problems. We consider this work as an important step towards the realization of printed SC-NN hardware for near-sensor-processing.
16:30 CEST 7.3.3 WORKLOAD-AWARE APPROXIMATE COMPUTING CONFIGURATION
Speaker:
Xun Jiao, Villanova University, US
Authors:
Dongning Ma1, Rahul Thapa1, Xingjian Wang1, Cong Hao2 and Xun Jiao1
1Villanova University, US; 2UIUC, US
Abstract
Approximate computing recently arises due to its success in many error-tolerant applications such as multimedia applications. Various approximation methods have demonstrated the effectiveness of relaxing precision requirements in a specific arithmetic unit. This provides a basis for exploring simultaneous use of multiple approximate units to improve efficiency. In this paper, we aim to identify a proper approximation configuration of approximate units in a program to minimize energy consumption while meeting quality constraints. To do this, we formulate a constrained optimization problem and develop a tool called WOAxC that uses genetic algorithm to solve this problem. WOAxC considers the impact of different input workload on the application quality. We evaluate the efficacy of WOAxC in minimizing the energy consumption of several image processing applications with varying size (i.e., number of operations), workload (i.e., input datasets), and quality constraints. Our evaluation shows that the configuration provided by WOAxC for a system with multiple approximate units improves the energy efficiency by, on average, 79.6%, 77.4%, and 70.94% for quality loss of 5%, 2.5% and 0% (no loss), respectively. To the best of our knowledge, WOAxC is the first workload-aware approach to identify proper approximation configuration for energy minimization under quality guarantee.

7.4 AI accelerator design with in-memory computing and emerging memories

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/vJBHmQpgkyWx7Ks5L

Session chair:
Jae-Sun Seo, Arizona State University, US

Session co-chair:
Huichu Liu, Facebook, US

Driven by the ever-growing needs for energy efficiency, emerging memories and compute-in-memory techniques have continuously attracted interests for AI accelerator designs. This session focuses on several innovative design frameworks and mapping methodologies for the state-of-art AI accelerators with three papers on in-memory computing: (1) a peripheral circuit-aware pruning framework for ReRAM based in-memory computing, (2) a runtime reconfigurable design methodology for FeFET based in-memory computing and (3) a modeling framework for SRAM based in-memory computing accelerator. In addition, the session also discusses data and computation mapping optimization using different emerging memories in machine learning accelerators.

Time Label Presentation Title
Authors
16:00 CEST 7.4.1 (Best Paper Award Candidate)
TINYADC: PERIPHERAL CIRCUIT-AWARE WEIGHT PRUNING FRAMEWORK FOR MIXED-SIGNAL DNN ACCELERATORS
Speaker:
Geng Yuan, Northeastern University, US
Authors:
Geng Yuan1, Payman Behnam2, Yuxuan Cai1, Ali Shafiee3, Jingyan Fu4, Zhiheng Liao4, Zhengang Li1, Xiaolong Ma1, Jieren Deng5, Jinhui Wang6, Mahdi Bojnordi7, Yanzhi Wang1 and Caiwen Ding5
1Northeastern University, US; 2Georgia Institute of Technology, US; 3Samsung, US; 4North Dakota State University, US; 5University of Connecticut, US; 6University of South Alabama, US; 7University of Utah, US
Abstract
As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform in-situ matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TinyADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TinyADC achieves 3.5X and 2.9X power and area reduction, respectively. TinyADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and per unit of watt, respectively.
16:15 CEST 7.4.2 A RUNTIME RECONFIGURABLE DESIGN OF COMPUTE-IN-MEMORY BASED HARDWARE ACCELERATOR
Speaker:
Anni Lu, School of Electrical and Computer Engineering, Georgia Institute of Technology, US
Authors:
Anni Lu, Xiaochen Peng, Yandong Luo, Shanshi Huang and Shimeng Yu, Georgia Institute of Technology, US
Abstract
Compute-in-memory (CIM) is an attractive solution to address the “memory wall” challenges for the extensive computation in machine learning hardware accelerators. Prior CIM-based architectures, though can adapt to different neural network models during the design time, they are implemented to different custom chips. Therefore, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. In this paper, a runtime reconfigurable design methodology of CIM-based accelerator is proposed to support a class of convolutional neural networks running on one pre-fabricated chip instance. First, several design aspects are investigated: 1) reconfigurable weight mapping method; 2) input side of data transmission, mainly about the weight reloading; 3) output side of data processing, mainly about the reconfigurable accumulation. Then, system-level performance benchmark is performed for the inference of different models like VGG-8 on CIFAR-10 dataset and AlexNet, GoogLeNet, ResNet-18 and DenseNet-121 on ImageNet dataset to measure the tradeoffs between runtime reconfigurability, chip area, memory utilization, throughput and energy efficiency.
16:30 CEST IP6_4.1 A CASE FOR EMERGING MEMORIES IN DNN ACCELERATORS
Speaker:
Avilash Mukherjee, University of British Columbia, CA
Authors:
Avilash Mukherjee1, Kumar Saurav2, Prashant Nair1, Sudip Shekhar1 and Mieszko Lis1
1University of British Columbia, CA; 2QUALCOMM INDIA, IN
Abstract
The popularity of Deep Neural Networks (DNNs) has led to many DNN accelerator architectures, which typically focus on the on-chip storage and computation costs. However, much of the energy is spent on accesses to off-chip DRAM memory. While emerging resistive memory technologies such as MRAM, PCM, and RRAM can potentially reduce this energy component, they suffer from drawbacks such as low endurance that prevent them from being a DRAM replacement in DNN applications. In this paper, we examine how DNN accelerators can be designed to overcome these limitations and how emerging memories can be used for off-chip storage. We demonstrate that through (a) careful mapping of DNN computation to the accelerator and (b) a hybrid setup (both DRAM and an emerging memory), we can reduce inference energy over a DRAM-only design by a factor ranging from 1.12x on EfficientNetB7 to 6.3x on ResNet-50, while also increasing the endurance from 2 weeks to over a decade. As the energy benefits vary dramatically across DNN models, we also develop a simple analytical heuristic solely based on DNN model parameters that predicts the suitability of a given DNN for emerging-memory-based accelerators.
16:31 CEST 7.4.3 MODELING AND OPTIMIZATION OF SRAM-BASED IN-MEMORY COMPUTING HARDWARE DESIGN
Speaker:
Jyotishman Saikia, Arizona State University, IN
Authors:
Jyotishman Saikia1, Shihui Yin1, Sai Kiran Cherupally1, Bo Zhang2, Jian Meng1, Mingoo Seok2 and Jae-sun Seo1
1Arizona State University, US; 2Columbia University, US
Abstract
In-memory computing (IMC) has been demonstrated as a promising technique to significantly improve energy-efficiency for deep neural network (DNN) hardware accelerators. However, designing one involves setting many design variables such as the number of parallel rows to assert, analog-to-digital converter (ADC) at the periphery of memory sub-array, activation/weight precisions of DNNs, etc., which affect energy-efficiency, DNN accuracy, and area. While individual IMC designs have been presented in the literature, they have not investigated this multi-dimensional design optimization. In this paper, to fill this knowledge gap, we present a SRAM-based IMC hardware modeling and optimization framework. A unified systematic study closely models IMC hardware, and investigates how a number of design variables and nonidealities (e.g. device mismatch and ADC quantization) affect the DNN accuracy of IMC design. To maintain high DNN accuracy for the IMC SRAM hardware, it is shown that the number of activated rows, ADC resolution, ADC quantization range, and different sources of variability/noise need to be carefully selected and co-optimized with an underlying DNN algorithm to implement.

7.5 Design of emerging technologies from spin wave logic to quantum systems

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ARPmNZe7t2xniSfDc

Session chair:
Thomas Ernst, CEA-Leti, FR

Session co-chair:
Luca Sterpone, Politecnico di Torino, IT

As conventional methods to improve computing are offering diminishing returns, the future of computing relies more and more on emerging technologies to continue performance improvements. In this session, we explore exciting technology options including: novel spin wave devices with new techniques to realize multi-output logic gates; Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology, which can offer significantly enhanced power efficiency over state-of-the-art CMOS, and finally, MEMS resonator-based logic gates, which have recently shown interesting aspects of resonators such as reprogrammability, reduced circuit complexity, durability, and low power consumption.

Time Label Presentation Title
Authors
16:00 CEST 7.5.1 FAN-OUT OF 2 TRIANGLE SHAPE SPIN WAVE LOGIC GATES
Speaker:
Abdulqader Mahmoud, Delft University of Technology, NL
Authors:
Abdulqader Mahmoud1, Frederic Vanderveken2, Florin Ciubotaru2, Christoph Adelmann2, Sorin Cotofana1 and Said Hamdioui1
1TU Delft, NL; 2IMEC, BE
Abstract
Having multi-output logic gates saves much energy because the same structure can be used to feed multiple inputs of next stage gates simultaneously. This paper proposes novel triangle shape fanout of 2 spin wave Majority and XOR gates; the Majority gate is achieved by phase detection, whereas the XOR gate is achieved by threshold detection. The proposed logic gates are validated by means of micromagnetic simulations. Furthermore, the energy and delay are estimated for the proposed structures and compared with the state-of-the-art spin wave, and 16nm and 7nm CMOS logic gates. The results demonstrate that the proposed structures provide energy reduction of 25%-50% in comparison to the other 2-output spin-wave devices while having the same delay, and energy reduction of 43x-0.8x when compared to the 16nm and 7nm CMOS counterparts while having delay overhead of 11x-40x.
16:15 CEST 7.5.2 TOWARDS AQFP-CAPABLE PHYSICAL DESIGN AUTOMATION
Speaker:
Hongjia Li, Northeastern University, US
Authors:
Hongjia Li1, Mengshu Sun1, Tianyun Zhang2, Olivia Chen3, Nobuyuki Yoshikawa3, Bei Yu4, Yanzhi Wang1 and Yibo Lin5
1Northeastern University, US; 2Syracuse University, US; 3Yokohama National University, JP; 4The Chinese University of Hong Kong, HK; 5Peking University, CN
Abstract
Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology exhibits a high energy efficiency among superconducting electronics, however lacks effective design automation tools. In this work, we develop the first, efficient placement and routing framework for AQFP circuits considering the unique features and constraints, using MIT-LL technology as an example. Our proposed placement framework iteratively executes a fixed-order, row-wise placement algorithm, where the row-wise algorithm derives optimal solution with polynomial-time complexity. To address the maximum wirelength constraint issue in AQFP circuits, a whole row of buffers (or even more rows) is inserted. A* routing algorithm is adopted as the backbone algorithm, incorporating dynamic step size and net negotiation process to reduce the computational complexity accounting for AQFP characteristics, improving overall routability. Extensive experimental results demonstrate the effectiveness of our proposed framework.
16:30 CEST IP6_1.1 IMPLEMENTATION OF A MEMS RESONATOR-BASED DIGITAL TO FREQUENCY CONVERTER USING ARTIFICIAL NEURAL NETWORKS
Speaker:
Xuecui Zou, King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia, SA
Authors:
Xuecui Zou1, Sally Ahmed2 and Hossein Fariborzi2
1Kaust,Saudi arabia, SA; 2King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia, SA
Abstract
This paper proposes a novel approach for micro-electromechanical resonator-based digital to frequency converter (DFC) design using artificial neural networks (ANN). The DFC is a key building block for multiple digital and interface units. We present the design of a 4-bit DFC device which consists of an in-plane clamped-clamped micro-beam resonator and 6 partial electrodes. The digital inputs, which are DC signals applied to the corner partial electrodes, modulate the beam resonance frequency using the electrostatic softening effect. The main challenge in the design is to find the air gap size between each input electrode and the beam to achieve the desired relationship between the digital input combinations and the corresponding resonance frequencies for a given application. We use a shallow, fully-connected feedforward neural network model to estimate the air gaps corresponding to the desired resonance frequency distribution, with less than 1% error. Two special cases are discussed for two applications: equal air gaps for implementing a full adder (FA), and weight-adjusted air gaps for implementing a 4-bit digital to analog converter (DAC). The training, validation, and testing datasets are extracted from finite-element-method (FEM) simulations, by obtaining resonance frequencies for the 16 input combinations for different airgap sets. The proposed method based on ANN model paves the way for a new design paradigm for MEMS resonator-based logic and opens new routes for designing more complex digital and interface circuits.
16:31 CEST IP6_1.2 COMPILATION FLOW FOR CLASSICALLY DEFINED QUANTUM OPERATIONS
Speaker:
Bruno Schmitt, EPFL, CH
Authors:
Bruno Schmitt1, Ali Javadi-Abhari2 and Giovanni De Micheli3
1EPFL, CH; 2IBM Research, US; 3EFPL, CH
Abstract
We present a flow for synthesizing quantum operations that are defined by classical combinational functions. The discussion will focus on out-of-place computation, i.e., $U_f : |x
angle|y
angle|0
angle^k mapsto |x
angle|y oplus f(x)
angle|0
angle^k$. Our flow allows users to express this function at a high level of abstraction. At its core, there is an improved version of the current state-of-the-art algorithm for synthesizing oracles~cite{oracle19}. As a result, our synthesized circuits use up to 25\% fewer qubits and up to 43\% fewer Clifford gates. Crucially, these improvements are possible without increasing the number of $T$ gates nor the execution time.
16:32 CEST 7.5.3 CIRCUIT MODELS FOR THE CO-SIMULATION OF SUPERCONDUCTING QUANTUM COMPUTING SYSTEMS
Speaker:
Rohith Acharya, Department of Electrical Engineering (ESAT), KU Leuven, Belgium, BE
Authors:
Rohith Acharya1, Fahd A. Mohiyaddin2, Anton Potočnik2, Kristiaan De Greve2, Bogdan Govoreanu2, Iuliana P. Radu2, Georges Gielen3 and Francky Catthoor2
1Department of Electrical Engineering (ESAT), Katholieke University Leuven, BE; 2IMEC, BE; 3KU Leuven, BE
Abstract
Quantum computers based on superconducting qubits have emerged as a leading candidate for a scalable quantum processor architecture. The core of a quantum processor consists of quantum devices that are manipulated using classical electronic circuits, which need to be co-designed for optimal performance and operation. As the principles governing the behavior of the classical circuits and the quantum devices are different, this presents a unique challenge in terms of the simulation, design and optimization of the joint system. A methodology is presented to transform the behavior of small-scale quantum processors to equivalent circuit models that are usable with classical circuits in a generic electrical simulator, enabling the detailed analysis of the impact of many important non-idealities. The methodology has specifically been employed to derive a circuit model of a superconducting qubit interacting with the quantized electromagnetic field of a superconducting resonator. Based on this technique, a comprehensive analysis of the qubit operation is performed, including the coherent control and readout of the qubit using electrical signals. Furthermore, the effect of several non-idealities in the system such as qubit relaxation, decoherence and leakage out of the computational subspace are captured, in contrast to previous works. As the presented method enables the co-simulation of the control electronics with the quantum system, it facilitates the design and optimization of near-term superconducting quantum processors.

7.6 System Modeling and Specification

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/28tZHWz7JsiTcQX88

Session chair:
Pierluigi Nuzzo, University of Southern California, US

Session co-chair:
Frederic Mallet, Université Cote d'Azur, FR

The work of this session focuses on high-level modeling and specification of digital systems. The first paper leverages SAT solving to take full advantage of the European Train Control System (ETCS) level 3. The final goal is to get efficient timetables (or schedules) for trains, check the validity of existing schedules and have a better use of train stations and tracks. This is a requirement to deal with high-density traffic in busy stations. The second paper is related to high-level modeling of energy-driven systems using hybrid automata and statistical model-checking so as to pick the right battery characteristics for executing nodes and also support the energy transfer on-the-fly from one node to another when required. The third paper analyzes on-chip networks using a hybrid model by coupling dynamic simulations with analytical formulae to compute network latency.

Time Label Presentation Title
Authors
16:00 CEST 7.6.1 TOWARDS AUTOMATIC DESIGN AND VERIFICATION FOR LEVEL 3 OF THE EUROPEAN TRAIN CONTROL SYSTEM
Speaker:
Robert Wille, Johannes Kepler University Linz / SCCH Hagenberg, AT
Authors:
Robert Wille1, Tom Peham1, Judith Przigoda2 and Nils Przigoda2
1Johannes Kepler University Linz, AT; 2Siemens AG, DE
Abstract
For centuries, block signaling has been the fundamental principle of today’s railway systems to prevent trains from running into each other. But the corresponding infrastructure of physical blocks each requiring train detection methods is costly. Therefore, initiatives such as the European Train Control System (ETCS) and, here, particularly Level 3 of ETCS aim for the utilization of virtual sections which allow for a much higher degree of freedom and provide significant potential for increasing the efficiency in today’s train schedules. However, exploiting this potential is a highly non-trivial task which, thus far, mainly relied on manual labor. In this work, we provide an initial automatic methodology which aids designers of corresponding railway networks and train schedules. For the first time, the methodology utilizes design automation expertise (here, in terms of satisfiability solvers) to unveil the potential of ETCS Level 3. Case studies (including a real-life example inspired by the Norwegian Railways) confirm the applicability and suitability of the proposed methodology.
16:15 CEST 7.6.2 MODELING AND ANALYSIS FOR ENERGY-DRIVEN COMPUTING USING STATISTICAL MODEL-CHECKING
Speaker:
Abdoulaye Gamatie, LIRMM - University Montpellier, CNRS, FR
Authors:
Abdoulaye Gamatie1, Gilles Sassatelli2 and Marius Mikučionis3
1CNRS LIRMM / University of Montpellier, FR; 2LIRMM CNRS / University of Montpellier 2, FR; 3Department of Computer Science, Aalborg University, Aalborg, Denmark, DK
Abstract
Energy-driven computing is a recent paradigm that promotes energy harvesting as an alternative solution to conventional power supply systems. A crucial challenge in that context lies in the dimensioning of system resources w.r.t. energy harvesting conditions while meeting some given timing QoS requirements. Existing simulation and debugging tools do not make it possible to clearly address this issue. This paper defines a generic modeling and analysis framework to support the design exploration for energy-driven computing. It uses stochastic hybrid automata and statistical model-checking. It advocates a distributed system design, where heterogeneous nodes integrate computing and harvesting components and support inter-node energy transfer. Through a simple case-study, the paper shows how this framework addresses the aforementioned design challenge in a flexible manner and helps in reducing energy storage requirements.
16:30 CEST IP6_2.1 BLENDER: A TRAFFIC-AWARE CONTAINER PLACEMENT FOR CONTAINERIZED DATA CENTERS
Speaker:
Zhaorui Wu, Jinan University, Guangzhou, China, CN
Authors:
Zhaorui Wu1, Yuhui Deng2, Hao Feng1, Yi Zhou3 and Geyong Min4
1Department of Computer Science, Jinan University, Guangzhou, Guangdong Province, China., CN; 2State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Department of Computer Science, Jinan University, Guangzhou, Guangdong Province, China., CN; 3Department of Computer Science Columbus State University Columbus, American, US; 4College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, United Kingdom., GB
Abstract
Instantiated containers of an application are distributed across multiple Physical Machines (PMs) to achieve high parallel performance. Container placement plays a vital role in network traffic and the performance of containerized data centers. Existing container placement techniques do not consider the container traffic pattern, which is inadequate. To resolve this conflict, we investigate network traffic between containers and observe that it exhibits a Zipf-like distribution. We propose a novel container placement approach - Blender - by leveraging the Zipf-like distribution. Based on network traffic correlation, Blender employs RefineAlg and SplitAlg to divide containers of applications into blocks, and place these blocks across virtual machines. Blender exhibits two salient features: (i) it minimizes inter-block traffic by arranging the containers that communicate frequently in the same block. (ii) it achieves good load balancing by combining blocks according to the resource types they require and distributing them across multiple PMs. We compare Blender against two state-of-the-art methods SBP and CA-WFD. The experimental results show that Blender significantly reduces communication traffic. In particular, for the same number of PMs, Blender reduces the traffic of SBP and CA-WFD by 22% and 32%, respectively. Furthermore, with Blender in place, the physical resources of hosting PMs are well balanced and utilized.
16:31 CEST IP6_2.2 SC4MEC: AUTOMATED IMPLEMENTATION OF A SECURE HIERARCHICAL CALCULUS FOR MOBILE EDGE COMPUTING
Speaker:
Jiaqi Yin, East China Normal University, CN
Authors:
Jiaqi Yin1, Huibiao Zhu1 and Yuan Fei2
1East China Normal University, CN; 2Shanghai Normal University, CN
Abstract
Mobile Edge Computing (MEC), as an emerging technology, is proposed to solve the time delay problem in 5G era, especially in the field of autonomous driving. The core idea of MEC is to offload the task to the nearest device/server for computation, i.e., sinking the computation, so as to reduce the delay and congestion. Actually, there are lots of researches on MEC offloading strategy, but there is little research on the computation of its offloading characteristics. Therefore, in this paper, we first propose a secure hierarchical calculus SC4MEC to describe the features of MEC. We also give the syntax and operational semantics of this calculus from the process and network levels, and simulate the calculus in Maude. Meanwhile, local ecology is applied to the communication channel to further reduce the authentication delay of the device with the same identity and ensure the security of the transmission data. We also propose to extend the communication radius of MEC server or cloud server by rule Enlarge, in order to ensure the mobile devices's connectivity while the consumption of resources is minimized. Finally, we employ SC4MEC calculus to a small example about device to device communication with automated implementation.
16:32 CEST 7.6.3 NOC PERFORMANCE MODEL FOR EFFICIENT NETWORK LATENCY ESTIMATION
Speaker:
Oumaima Matoussi, Universite Paris-Saclay, CEA, List, FR
Author:
oumaima matoussi, CEA LIST, FR
Abstract
We propose a flexible light-weight and parametric NoC model designed for fast performance estimation at early design stages. Our NoC model combines the benefits of both analytical and simulation-based NoC models. Our NoC features an abstract router model whose buffers are updated at runtime with information about the actual traffic. This traffic information is fed to a closed-form expression that computes packet latency and that accounts for network contention at a router basis. We evaluated our hybrid NoC model in terms of estimation accuracy and simulation speed. We compared the simulation results to the ones obtained with a cycle accurate NoC simulator called Garnet. Our NoC model achieves less than 17% error in average network latency estimation and attains up to 14× speedup for a 8×8 mesh.

7.7 Secure Implementations

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/h9oECLYRsHhwyp83H

Session chair:
Johanna Sepulveda, Airbus, DE

Session co-chair:
Cedric Marchand, Ecole Centrale de Lyon, FR

This session is on the protection of security implementations from side-channel attacks. All papers evaluate the proposed protection mechanims on practical implementations.​

Time Label Presentation Title
Authors
16:00 CEST 7.7.1 (Best Paper Award Candidate)
MAKING OBFUSCATED PUFS SECURE AGAINST POWER SIDE-CHANNEL BASED MODELING ATTACKS
Speaker:
Trevor Kroeger, University of Maryland Baltimore County, US
Authors:
Trevor Kroeger1, Wei Cheng2, Sylvain Guilley2, Jean-Luc Danger2 and Naghmeh Karimi1
1University of Maryland Baltimore County, US; 2Institut Polytechnique de Paris, FR
Abstract
To enhance the security of digital circuits, there is often a desire to dynamically generate, rather than statically store, random values used for identification and authentication purposes. Physically Unclonable Functions (PUFs) provide the means to realize this feature in an efficient and reliable way by utilizing commonly overlooked process variations that unintentionally occur during the manufacturing of integrated circuits (ICs) due to the imperfection of fabrication process. When given a challenge, PUFs produce a unique response. However, PUFs have been found to be vulnerable to modeling attacks where by using a set of collected challenge response pairs (CRPs) and training a machine learning model, the response can be predicted for unseen challenges. To combat this vulnerability, researchers have proposed techniques such as Challenge Obfuscation. However, as shown in this paper, this technique can be compromised via modeling the PUF’s power side-channel. We first show the vulnerability of a state-of-the-art Challenge Obfuscated PUF (CO-PUF) against power analysis attacks by presenting our attack results on the targeted CO-PUF. Then we propose two countermeasures, as well as their hybrid version, that when applied to the CO-PUFs make them resilient against power side-channel based modeling attacks. We also provide some insights on the proper design metrics required to be taken when implementing these mitigations. Our simulation results show the high success of our attack in compromising the original Challenge Obfuscated PUFs (success rate > 98%) as well as the significant improvement on resilience of the obfuscated PUFs against power side-channel based modeling when equipped with our countermeasures.
16:15 CEST 7.7.2 AUTOMATED MASKING OF SOFTWARE IMPLEMENTATIONS ON INDUSTRIAL MICROCONTROLLERS
Speaker:
Florian Bache, Ruhr-University Bochum, Germany, DE
Authors:
Arnold Abromeit1, Florian Bache2, Leon A. Becker3, Marc Gourjon4, Tim Güneysu5, Sabrina Jorn1, Amir Moradi6, Maximilian Orlt7 and Falk Schellenberg8
1TüV-IT, DE; 2Ruhr-University Bochu,, DE; 3Ruhr-University Bochum, DE; 4Hamburg University of Technology, Germany & NXP Semiconductors Germany GmbH, DE; 5Ruhr-Universität Bochum & DFKI, DE; 6Ruhr University Bochum, DE; 7TU Darmstadt, DE; 8Max Planck Institute for Security and Privacy, DE
Abstract
Physical side-channel attacks threaten the security of exposed embedded devices, such as microcontrollers. Dedicated counter-measures, like masking, are necessary to prevent these powerful attacks. However, a gap between well-studied leakage models and observed leakage on real devices makes the application of these countermeasures non-trivial. In this work, we provide a gadget-based concept to automated masking that covers practically relevant leakage models in order to achieve security on real-world devices. We realize this concept with a fully automated compiler that transforms unprotected microcontroller-implementations of cryptographic primitives into masked executables, capable of being executed on the target device. In a case-study, we apply our approach to a bitsliced LED implementation and perform a TVLA-based security evaluation of its core component: the PRESENT s-box.
16:30 CEST IP6_3.1 STEALTHY LOGIC MISUSE FOR POWER ANALYSIS ATTACKS IN MULTI-TENANT FPGAS
Speaker:
Dennis Gnad, Karlsruhe Institute of Technology (KIT), DE
Authors:
Dennis Gnad1, Vincent Meyers1, Nguyen Minh Dang1, Falk Schellenberg2, Amir Moradi3 and Mehdi Tahoori1
1Karlsruhe Institute of Technology, DE; 2Max Planck Institute for Security and Privacy, DE; 3Ruhr University Bochum, DE
Abstract
FPGAs have been used in the cloud since several years, for workloads such as machine learning, database processes and security tasks. As for other cloud services, a highly desired feature is virtualization in which multiple tenants share a single FPGA to increase utilization and by that efficiency. By solely using standard FPGA logic in the untrusted tenant, on-chip logic sensors have recently been proposed, allowing remote power analysis side-channel and covert channel attacks on the victim tenant. However, such sensors are implemented by unusual circuit constructions, such as ring oscillators or delay lines, which might be easily detected by bitstream and/or netlist checking. In this paper we show that such structural checking methods are not universal solutions as the attacks can make use of ``benign-looking'' circuits. We demonstrate this by showing a successful Correlation Power Analysis attack on the Advanced Encryption~Standard.
16:31 CEST IP6_3.2 ENHANCED DETECTION RANGE FOR EM SIDE-CHANNEL ATTACK PROBES UTILIZING CO-PLANAR CAPACITIVE ASYMMETRY SENSING
Speaker:
Dong-Hyun Seo, Purdue University, KR
Authors:
Dong-Hyun Seo1, Mayukh Nath1, Debayan Das1, Santosh Ghosh2 and Shreyas Sen1
1Purdue University, US; 2Intel Corporation, US
Abstract
Electromagnetic (EM) side-channel analysis (SCA) attack, which breaks cryptographic implementations, has become a major concern in the design of circuits and systems. This paper focuses on EM SCA and proposes the detection of an approaching EM probe even before an attack is performed. The proposed method of co-planar capacitive asymmetry sensing consists of a grid of four metal plates of the same size and dimension. As an EM probe approaches the sensing metal plates, the symmetry of the sensing metal plate system breaks, and the capacitance between each pair diverge from their baseline capacitances. Using Ansys Maxwell Finite Element Method (FEM) simulations, we demonstrate that the co-planar capacitive asymmetry sensing has an enhanced detection range compared to other sensing methods. At a distance of 1 mm between the sensing metal plates and the approaching EM probe, it shows >17% change in capacitance, leading to a >10x improvement in detection range over the existing inductive sensing methods. At a distance of 0.1 mm, a > 45% change in capacitance is observed, leading to a >3x and >11x sensitivity improvement over capacitive parallel sensing and inductive sensing respectively. Finally, we show that the co-planar capacitive asymmetry sensing is sensitive to both E-field and H-field probes, unlike inductive sensing which cannot detect an E-field probe.
16:32 CEST 7.7.3 A HARDWARE ACCELERATOR FOR POLYNOMIAL MULTIPLICATION OPERATION OF CRYSTALS-KYBER PQC SCHEME
Speaker:
Ferhat Yaman, Sabanci University, TR
Authors:
Ferhat Yaman, Ahmet Can Mert, Erdinc Ozturk and Erkay Savas, Sabanci University, TR
Abstract
Polynomial multiplication is one of the most time-consuming operations utilized in lattice-based post-quantum cryptography (PQC) schemes. CRYSTALS-KYBER is a lattice-based key encapsulation mechanism (KEM) and it was recently announced as one of the four finalists at round three in NIST's PQC Standardization. Therefore, efficient implementations of polynomial multiplication operation are crucial for high-performance CRYSTALS-KYBER applications. In this paper, we propose three different hardware architectures (lightweight, balanced, high-performance) that implement the NTT, Inverse NTT (INTT) and polynomial multiplication operations for the CRYSTALS-KYBER scheme. The proposed architectures include a unified butterfly structure for optimizing polynomial multiplication and can be utilized for accelerating the key generation, encryption and decryption operations of CRYSTALS-KYBER. Our high-performance hardware with 16 butterfly units shows up to 112x, 132x and 109x improved performance for NTT, INTT and polynomial multiplication, respectively, compared to the high-speed software implementations on Cortex-M4.

7.8 ICT Innovation Funding for Manufacturing SMEs

Date: Wednesday, 03 February 2021
Time: 16:00 CEST - 16:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/m9jFGZh36c2PvNPrD

Session Chairs:
Maria Roca, Fundingbox, PL
Irina Frigioiu, Fundingbox, PL

Organizer:
Jürgen Haase, edacentrum GmbH, DE

I4MS initiative is supporting EU funded projects that are developing solutions that may be of interest to any company for their digital transformation, in any region in Europe specially those lagging behind.
I4MS offers platforms and service for digital transformation to manufacturing SMEs. Different EU projects and DIHs offer solutions for companies to improve their production processes, products or business models with digital technologies. Although these marketplaces have a high potential to offer useful services to certain companies, DIHs should increase their role raising awareness on the available technologies and benefits of the digital transformation. DIHs can help SMEs select the appropriate tools or services for their digital transformation needs.
This session will present the technologies and funding opportunities offered to manufacturing SMEs. A total of 35Million€ will be distributed in the next 2,5 years. Open calls will start in January 2021 and manufacturing SMEs will have the opportunity to receive funding and technological support from the following projects:

Time Label Presentation Title
Authors
16:00 CEST 7.8.1 PRESENTATION OF I4MS PHASE 4 INITIATIVE
Speaker:
Maria Roca, Fundingbox, PL
16:03 CEST 7.8.2 BETTER FACTORY PROJECT PRESENTATION & OPEN CALL OPPORTUNITIES
Speaker:
Anca Marin, FundingBox, PL
16:10 CEST 7.8.3 CHANGE2TWIN PROJECT PRESENTATION & OPEN CALL OPPORTUNITIES
Speaker:
Tor Dokkenn, SINTEF, NO
16:17 CEST 7.8.4 DIGITBRAIN PROJECT PRESENTATION & OPEN CALL OPPORTUNITIES
Speakers:
Andreas Ocklenburg and Andrea Hanninger, cloudSME, DE
16:24 CEST 7.8.5 KITT4SMES PROJECT PRESENTATION & OPEN CALL OPPORTUNITIES
Speaker:
Davorka Moslac, Innovation Centre Nikola Tesla, HR
16:31 CEST 7.8.6 VOJEXT PROJECT PRESENTATION & OPEN CALL OPPORTUNITIES
Speaker:
Maria Roca, Fundingbox, PL
16:38 CEST 7.8.7 I4MS PHASE 3 SUCCESS STORY
Speakers:
Andreas Ocklenburg and Andrea Hanninger, cloudSME, DE
16:45 CEST 7.8.8 CLOSING
Speaker:
Maria Roca, Fundingbox, PL

IP6_1 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/mKJkCkqi97oh45b6w

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP6_1.1 IMPLEMENTATION OF A MEMS RESONATOR-BASED DIGITAL TO FREQUENCY CONVERTER USING ARTIFICIAL NEURAL NETWORKS
Speaker:
Xuecui Zou, King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia, SA
Authors:
Xuecui Zou1, Sally Ahmed2 and Hossein Fariborzi2
1Kaust,Saudi arabia, SA; 2King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia, SA
Abstract
This paper proposes a novel approach for micro-electromechanical resonator-based digital to frequency converter (DFC) design using artificial neural networks (ANN). The DFC is a key building block for multiple digital and interface units. We present the design of a 4-bit DFC device which consists of an in-plane clamped-clamped micro-beam resonator and 6 partial electrodes. The digital inputs, which are DC signals applied to the corner partial electrodes, modulate the beam resonance frequency using the electrostatic softening effect. The main challenge in the design is to find the air gap size between each input electrode and the beam to achieve the desired relationship between the digital input combinations and the corresponding resonance frequencies for a given application. We use a shallow, fully-connected feedforward neural network model to estimate the air gaps corresponding to the desired resonance frequency distribution, with less than 1% error. Two special cases are discussed for two applications: equal air gaps for implementing a full adder (FA), and weight-adjusted air gaps for implementing a 4-bit digital to analog converter (DAC). The training, validation, and testing datasets are extracted from finite-element-method (FEM) simulations, by obtaining resonance frequencies for the 16 input combinations for different airgap sets. The proposed method based on ANN model paves the way for a new design paradigm for MEMS resonator-based logic and opens new routes for designing more complex digital and interface circuits.
IP6_1.2 COMPILATION FLOW FOR CLASSICALLY DEFINED QUANTUM OPERATIONS
Speaker:
Bruno Schmitt, EPFL, CH
Authors:
Bruno Schmitt1, Ali Javadi-Abhari2 and Giovanni De Micheli3
1EPFL, CH; 2IBM Research, US; 3EFPL, CH
Abstract
We present a flow for synthesizing quantum operations that are defined by classical combinational functions. The discussion will focus on out-of-place computation, i.e., $U_f : |x
angle|y
angle|0
angle^k mapsto |x
angle|y oplus f(x)
angle|0
angle^k$. Our flow allows users to express this function at a high level of abstraction. At its core, there is an improved version of the current state-of-the-art algorithm for synthesizing oracles~cite{oracle19}. As a result, our synthesized circuits use up to 25\% fewer qubits and up to 43\% fewer Clifford gates. Crucially, these improvements are possible without increasing the number of $T$ gates nor the execution time.

IP6_2 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/nQBAcKKzky2E7YusC

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP6_2.1 BLENDER: A TRAFFIC-AWARE CONTAINER PLACEMENT FOR CONTAINERIZED DATA CENTERS
Speaker:
Zhaorui Wu, Jinan University, Guangzhou, China, CN
Authors:
Zhaorui Wu1, Yuhui Deng2, Hao Feng1, Yi Zhou3 and Geyong Min4
1Department of Computer Science, Jinan University, Guangzhou, Guangdong Province, China., CN; 2State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Department of Computer Science, Jinan University, Guangzhou, Guangdong Province, China., CN; 3Department of Computer Science Columbus State University Columbus, American, US; 4College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, United Kingdom., GB
Abstract
Instantiated containers of an application are distributed across multiple Physical Machines (PMs) to achieve high parallel performance. Container placement plays a vital role in network traffic and the performance of containerized data centers. Existing container placement techniques do not consider the container traffic pattern, which is inadequate. To resolve this conflict, we investigate network traffic between containers and observe that it exhibits a Zipf-like distribution. We propose a novel container placement approach - Blender - by leveraging the Zipf-like distribution. Based on network traffic correlation, Blender employs RefineAlg and SplitAlg to divide containers of applications into blocks, and place these blocks across virtual machines. Blender exhibits two salient features: (i) it minimizes inter-block traffic by arranging the containers that communicate frequently in the same block. (ii) it achieves good load balancing by combining blocks according to the resource types they require and distributing them across multiple PMs. We compare Blender against two state-of-the-art methods SBP and CA-WFD. The experimental results show that Blender significantly reduces communication traffic. In particular, for the same number of PMs, Blender reduces the traffic of SBP and CA-WFD by 22% and 32%, respectively. Furthermore, with Blender in place, the physical resources of hosting PMs are well balanced and utilized.
IP6_2.2 SC4MEC: AUTOMATED IMPLEMENTATION OF A SECURE HIERARCHICAL CALCULUS FOR MOBILE EDGE COMPUTING
Speaker:
Jiaqi Yin, East China Normal University, CN
Authors:
Jiaqi Yin1, Huibiao Zhu1 and Yuan Fei2
1East China Normal University, CN; 2Shanghai Normal University, CN
Abstract
Mobile Edge Computing (MEC), as an emerging technology, is proposed to solve the time delay problem in 5G era, especially in the field of autonomous driving. The core idea of MEC is to offload the task to the nearest device/server for computation, i.e., sinking the computation, so as to reduce the delay and congestion. Actually, there are lots of researches on MEC offloading strategy, but there is little research on the computation of its offloading characteristics. Therefore, in this paper, we first propose a secure hierarchical calculus SC4MEC to describe the features of MEC. We also give the syntax and operational semantics of this calculus from the process and network levels, and simulate the calculus in Maude. Meanwhile, local ecology is applied to the communication channel to further reduce the authentication delay of the device with the same identity and ensure the security of the transmission data. We also propose to extend the communication radius of MEC server or cloud server by rule Enlarge, in order to ensure the mobile devices's connectivity while the consumption of resources is minimized. Finally, we employ SC4MEC calculus to a small example about device to device communication with automated implementation.

IP6_3 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/EWp3hPpDCZ27XPEvv

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP6_3.1 STEALTHY LOGIC MISUSE FOR POWER ANALYSIS ATTACKS IN MULTI-TENANT FPGAS
Speaker:
Dennis Gnad, Karlsruhe Institute of Technology (KIT), DE
Authors:
Dennis Gnad1, Vincent Meyers1, Nguyen Minh Dang1, Falk Schellenberg2, Amir Moradi3 and Mehdi Tahoori1
1Karlsruhe Institute of Technology, DE; 2Max Planck Institute for Security and Privacy, DE; 3Ruhr University Bochum, DE
Abstract
FPGAs have been used in the cloud since several years, for workloads such as machine learning, database processes and security tasks. As for other cloud services, a highly desired feature is virtualization in which multiple tenants share a single FPGA to increase utilization and by that efficiency. By solely using standard FPGA logic in the untrusted tenant, on-chip logic sensors have recently been proposed, allowing remote power analysis side-channel and covert channel attacks on the victim tenant. However, such sensors are implemented by unusual circuit constructions, such as ring oscillators or delay lines, which might be easily detected by bitstream and/or netlist checking. In this paper we show that such structural checking methods are not universal solutions as the attacks can make use of ``benign-looking'' circuits. We demonstrate this by showing a successful Correlation Power Analysis attack on the Advanced Encryption~Standard.
IP6_3.2 ENHANCED DETECTION RANGE FOR EM SIDE-CHANNEL ATTACK PROBES UTILIZING CO-PLANAR CAPACITIVE ASYMMETRY SENSING
Speaker:
Dong-Hyun Seo, Purdue University, KR
Authors:
Dong-Hyun Seo1, Mayukh Nath1, Debayan Das1, Santosh Ghosh2 and Shreyas Sen1
1Purdue University, US; 2Intel Corporation, US
Abstract
Electromagnetic (EM) side-channel analysis (SCA) attack, which breaks cryptographic implementations, has become a major concern in the design of circuits and systems. This paper focuses on EM SCA and proposes the detection of an approaching EM probe even before an attack is performed. The proposed method of co-planar capacitive asymmetry sensing consists of a grid of four metal plates of the same size and dimension. As an EM probe approaches the sensing metal plates, the symmetry of the sensing metal plate system breaks, and the capacitance between each pair diverge from their baseline capacitances. Using Ansys Maxwell Finite Element Method (FEM) simulations, we demonstrate that the co-planar capacitive asymmetry sensing has an enhanced detection range compared to other sensing methods. At a distance of 1 mm between the sensing metal plates and the approaching EM probe, it shows >17% change in capacitance, leading to a >10x improvement in detection range over the existing inductive sensing methods. At a distance of 0.1 mm, a > 45% change in capacitance is observed, leading to a >3x and >11x sensitivity improvement over capacitive parallel sensing and inductive sensing respectively. Finally, we show that the co-planar capacitive asymmetry sensing is sensitive to both E-field and H-field probes, unlike inductive sensing which cannot detect an E-field probe.

IP6_4 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/GQ8HuYanxq2iZc7fm

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP6_4.1 A CASE FOR EMERGING MEMORIES IN DNN ACCELERATORS
Speaker:
Avilash Mukherjee, University of British Columbia, CA
Authors:
Avilash Mukherjee1, Kumar Saurav2, Prashant Nair1, Sudip Shekhar1 and Mieszko Lis1
1University of British Columbia, CA; 2QUALCOMM INDIA, IN
Abstract
The popularity of Deep Neural Networks (DNNs) has led to many DNN accelerator architectures, which typically focus on the on-chip storage and computation costs. However, much of the energy is spent on accesses to off-chip DRAM memory. While emerging resistive memory technologies such as MRAM, PCM, and RRAM can potentially reduce this energy component, they suffer from drawbacks such as low endurance that prevent them from being a DRAM replacement in DNN applications. In this paper, we examine how DNN accelerators can be designed to overcome these limitations and how emerging memories can be used for off-chip storage. We demonstrate that through (a) careful mapping of DNN computation to the accelerator and (b) a hybrid setup (both DRAM and an emerging memory), we can reduce inference energy over a DRAM-only design by a factor ranging from 1.12x on EfficientNetB7 to 6.3x on ResNet-50, while also increasing the endurance from 2 weeks to over a decade. As the energy benefits vary dramatically across DNN models, we also develop a simple analytical heuristic solely based on DNN model parameters that predicts the suitability of a given DNN for emerging-memory-based accelerators.

UB.14 University Booth

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/C5nNmBtLZGBpBzR96

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.14 BROOK SC: HIGH-LEVEL CERTIFICATION-FRIENDLY PROGRAMMING FOR GPU-POWERED SAFETY CRITICAL SYSTEMS
Speakers:
Leonidas Kosmidis and Marc Benito, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
Authors:
Leonidas Kosmidis and Marc Benito, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
Abstract
GPUs can provide the increased performance required in future critical systems. However, their programming models, e.g. CUDA or OpenCL, cannot be used in such systems as they violate safety critical programming guidelines. Brook SC (https://github.com/lkosmid/brook) was developed in UPC/BSC to allow safety-critical applications to be programmed in a CUDA-like GPU language, Brook, which enables the certification while increasing productivity.

In our demo, an avionics application running on a realistic safety critical GPU software stack and hardware is show cased. In this Bachelor's thesis project, which was awarded a 2019 HiPEAC Technology Transfer Award and a bronze medal at the ACM SRC at ICCAD 2020, an Airbus prototype application performing general-purpose computations with a safety-critical graphics API was ported to Brook SC in record time, achieving an order of magnitude reduction in the lines of code to implement the same functionality without performance penalty.

/system/files/webform/tpc_registration/UB.14-Brook-SC.pdf

UB.15 University Booth

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/rGFBhpR7qbDHkSqB4

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.15 MAEVE: 3D HUMAN MOTION ANALYSIS EVALUATION FROM VIDEO AT THE EDGE
Speakers:
Michele Boldo and Enrico Martini, University of Verona, IT
Authors:
Michele Boldo and Enrico Martini, University of Verona, IT
Abstract
In the last years, Human Pose Estimation has become a trend topic for human motion analysis. The fields of application are in continuous growth, such as diagnostic use for functional rehabilitation. Until now, IMUs and marker-based systems were used to estimate the pose, with the drawback of being invasive and expensive. This project presents the implementation of a ROS2-based approach that allows a single RGB-D camera applied to a low-power device with a GPU to estimate the 3D human pose through a CNN-based inference application without using external equipment in addition to the device. This allows for a flexible, modular architecture that is independent from the CNN model used to obtain the pose. Compared to the state-of-the-art solutions that rely on 2D pose estimation and that requires high performance computers or that provide only 2D support at the edge, our approach allows for 3D pose estimation on a low-power and low-cost embedded device while meeting real time constraints.
/system/files/webform/tpc_registration/UB.15-MAEVE.pdf

UB.16 University Booth

Date: Wednesday, 03 February 2021
Time: 17:00 CEST - 17:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/JzKWLuRQS4nmxxiqN

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.16 MICRORV32: A SPINALHDL BASED RISC-V IMPLEMENTATION FOR FPGAS
Speaker:
Sallar Ahmadi-Pour, University of Bremen, DE
Authors:
Sallar Ahmadi-Pour1, Vladimir Herdt2 and Rolf Drechsler2
1University of Bremen, DE; 2University of Bremen / DFKI, DE
Abstract
We propose a demonstration of a lightweight RISC-V implementation called MicroRV32 that is suitable for FPGAs. The entire design flow is based on open source tools. The core itself is implemented in the modern Scala-based SpinalHDL hardware description language. For the FPGA flow, the IceStorm suite is utilized. On the iCE40 HX8K FPGA the design requires about 50% of the resources and can be run at a maximum clock frequency of 34.02 MHz. Beside the core, the design also includes basic peripherals and software examples. MicroRV32 is particularly suitable as a lightweight implementation for research and education. The complete design flow can be executed on a Linux system by means of open source tools which makes the platform very accessible.
/system/files/webform/tpc_registration/UB.16-MicroRV32.pdf

8.2 Machine Learning Meets Logic Synthesis

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/gwzWF6c9ByQXKjD6H

Session chair:
Akash Kumar, TU Dresden, DE

Session co-chair:
Walter Lau Neto, University of Utah, US

Organizer:
Shubham Rai, TU Dresden, DE

This Special Session explores the intersection of machine learning (ML) and logic synthesis (LS). Both ML and LS perform data processing. In particular, LS transforms functionality into structure, which, at its simplest, means turning truth tables into logic circuits. ML, on the other hand, looks for the information implicit in large data sets, for example, detecting familiar objects in images or videos. The two types of data processing can be leveraged for a mutual benefit. Thus, ML can be used in LS to analyze the performance of LS algorithms, resulting in efficient optimization scripts. It is also possible to use LS in ML. In this case, logic circuits generated by LS are used as ML models in some application domain. It is this second, lesser-known connection, solving ML problems using LS methods, that is explored in depth in this Special Session.

Time Label Presentation Title
Authors
17:30 CEST 8.2.1 LOGIC SYNTHESIS MEETS MACHINE LEARNING: TRADING EXACTNESS FOR GENERALIZATION
Speaker:
Satrajit Chatterjee, Google AI, US
Authors:
Shubham Rai1, Walter Lau Neto2, Yukio Miyasaka3, Xinpei Zhang4, Mingfei Yu5, Qinyang Yi6, Masahiro Fujita5, Guilherme Barbosa Manske7, Matheus Pontes8, Leomar Rosa Jr9, Marilton S. de Aguiar10, Paulo Butzen11, Po-Chun Chien12, Yu-Shan Huang12, Hao-Ren Wang12, Jie-Hong Roland Jiang12, Jiaqi Gu13, Zheng Zhao14, Zixuan Jiang14, David Z. Pan14, Brunno Abreu15, Isac Campos16, AUGUSTO BERNDT17, Cristina Meinhardt18, Jonata Tyska Carvalho19, Mateus Grellert20, Sergio Bampi21, Aditya Lohana1, Akash Kumar1, Wei Zeng22, Azadeh Davoodi22, Rasit Onur Topaloglu23, Yuan Zhou24, Jordan Dotzel24, Yichi Zhang24, Hanyu Wang24, Zhiru Zhang24, Valerio Tenace2, Pierre-Emmanuel Gaillardon2, Alan Mishchenko25 and Satrajit Chatterjee26
1TU Dresden, DE; 2University of Utah, US; 3University of California, Berkeley, US; 4the University of Tokyo, JP; 5University of Tokyo, JP; 6University of Tokyo, Japan, JP; 7Universidade Federal de Pelotas, BR; 8UFPEL, BR; 9UFPel, BR; 10Universidade Federal de Pelotas, Brazil, BR; 11Universidade Federal do Rio Grande Sul, BR; 12National Taiwan University, TW; 13University of Texas Austin, US; 14University of Texas at Austin, US; 15UFRGS, BR; 16Universidade Federal de Santa Catarina, BR; 17Universidade Federal do Rio Grande do Sul, BR; 18UFSC, BR; 19Universidade Federal de Santa Catarina, Brazil, BR; 20Federal University of Santa Catarina, BR; 21UFRGS - Federal University of Rio Grande do Sul, BR; 22University of Wisconsin - Madison, US; 23IBM, US; 24Cornell University, US; 25UC Berkeley, US; 26Google AI, US
Abstract
Logic synthesis is a fundamental step in hardware design whose goal is to find structural representations of Boolean functions while minimizing delay and area. If the function is completely-specified, the implementation accurately represents the function. If the function is incompletely-specified, the implementation has to be true only on the care set. While most of the algorithms in logic synthesis rely on SAT and Boolean methods to exactly implement the care set, we investigate learning in logic synthesis, attempting to trade exactness for generalization. This work is directly related to machine learning where the care set is the training set and the implementation is expected to generalize on a validation set. We present learning incompletely-specified functions based on the results of a competition conducted at IWLS 2020. The goal of the competition was to implement 100 functions given by a set of care minterms for training, while testing the implementation using a set of validation minterms sampled from the same function. We make this benchmark suite available and offer a detailed comparative analysis of the different approaches to learning
17:45 CEST 8.2.2 LOGIC SYNTHESIS FOR GENERALIZATION AND LEARNING ADDITION
Speaker:
Yukio Miyasaka, University of California, Berkeley, US
Authors:
Yukio Miyasaka1, Xinpei Zhang2, Mingfei Yu3, Qingyang Yi3 and Masahiro Fujita3
1University of California, Berkeley, US; 2the University of Tokyo, JP; 3University of Tokyo, JP
Abstract
Logic synthesis generates a logic circuit of a given Boolean function, where the size and depth of the circuit are optimized for small area and low delay. On the other hand, machine learning has been extensively studied and used for many applications these days. Its general approach of training a model from a set of input-output samples is similar to logic synthesis with external don't-cares, except that in the case of machine learning the goal is to come up with a general understanding from the given samples. Seeing this resemblance from another perspective, we can think of logic synthesis targeting a generalization of the care-set. In this paper, we try such logic synthesis that generates a logic circuit where the given incomplete relation between input and output is generalized. We compared popular logic synthesis methods and machine learning models and analyzed their characteristics. We found that there were some arithmetic functions that these conventional models cannot effectively learn. Out of them, we further experimented on addition operations using tree models and found a heuristic minimization method of BDD achieves the highest accuracy.
18:00 CEST 8.2.3 ESPRESSO-GPU: BLAZINGLY FAST TWO-LEVEL LOGIC MINIMIZATION
Speaker:
Massoud Pedram, Univ of Southern California, US
Authors:
Hitarth Kanakia1, Mahdi Nazemi2, Arash Fayyazi3 and Massoud Pedram2
1University of Southern California, US; 2USC, US; 3University of southern california, US
Abstract
Two-level logic minimization has found applications in new problems such as efficient realization of deep neural network inference. Important characteristics of these new applications are that they tend to produce very large Boolean functions (in terms of the supporting variables and/or initial sum of product representation) and have don't-care-sets that are much larger in size than the on-set and off-set sizes. Applying conventional single-threaded logic minimization heuristics to these problems becomes unwieldy. This work introduces ESPRESSO-GPU, a parallel version of ESPRESSO-II, which takes advantage of the computing capabilities of general-purpose graphics processors to achieve a huge speedup compared to existing serial implementations. Simulation results show that ESPRESSO-GPU achieves an average speedup of 97x compared to ESPRESSO-II.
18:15 CEST 8.2.4 LOGICNETS: CO-DESIGNED NEURAL NETWORKS AND CIRCUITS FOR EXTREME-THROUGHPUT APPLICATIONS
Speaker:
Nicholas J. Fraser, Xilinx Research, IE
Authors:
Nicholas J. Fraser1, Yaman Umuroglu1, Yash Akhauri2 and Michaela Blott1
1Xilinx Research, IE; 2Intel Labs, IN
Abstract
Machine learning algorithms have been gradually displacing traditional programming techniques across multiple domains, including domains that would require extreme high-throughput data rates, such as telecommunications and network packet filtering. Although high accuracy has been demonstrated, very few works have shown how to run these algorithms with such high-throughput constraints. To address this, we propose LogicNets - a co-design method to construct a neural network and inference engine at the same time. We create a set of layer primitives called neuron equivalent circuits (NEQs) which map neural network layers directly to the hardware building blocks (HBBs) available on an FPGA. From this, we can design an execute networks with low activation bitwidth and high sparsity at extremely high data rates and low latency, while only using a small amount of FPGA resources.

8.3 The New Frontiers in Scalable Quantum Compilation

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:30 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/YugoDNL7rs9ekNnWd

Session chair:
Robert Wille, Johannes Kepler University Linz, AT

Session co-chair:
Kaitlin Smith, University of Chicago, US

Organizers:
Heinz Riener, EPFL, Lausanne, CH
Kaitlin Smith, University of Chicago, US

Quantum computing promises to solve some computationally intensive problems more efficiently than classical algorithms on conventional computers. Integer factorization, database search, and simulation of chemical processes on atomic level are only a few examples of proposed quantum applications. To accompany steady progress in quantum technology, scalable optimizing compilers will soon be necessary to map large-scale quantum programs developed in high-level languages onto quantum hardware.

Time Label Presentation Title
Authors
17:30 CEST 8.3.1 FROM BOOLEAN FUNCTIONS TO QUANTUM CIRCUITS: A SCALABLE QUANTUM COMPILATION FLOW IN C++
Speaker:
Bruno Schmitt, EPFL, CH
Authors:
Bruno Schmitt1, Fereshte Mozafari1, Giulia Meuli1, Heinz Riener1 and Giovanni De Micheli2
1EPFL, CH; 2École Polytechnique Fédérale de Lausanne (EPFL), CH
Abstract
We propose a flow for automated quantum compila- tion. Our flow takes a Boolean function implemented in Python as input and translates it into a format appropriate for reversible logic synthesis. We focus on two quantum compilation tasks: uniform state preparation and oracle synthesis. To illustrate the use of our flow, we solve IBM’s virtual hackathon challenge of 2019, called the Zed city problem, an instance of vertex coloring, by using quantum search algorithms. The expressiveness of Python in combination with automated compilation algorithms allows us to express quantum algorithms at a high level of abstraction, which reduces the effort to implement them, and leads to better and more flexible implementations. We show that our proposed flow generates a lower-cost circuit implementation of the oracle needed to solve IBM’s challenge when compared to the winning submission.
17:45 CEST 8.3.2 A RESOURCE ESTIMATION AND VERIFICATION WORKFLOW IN Q#
Speaker:
Mathias Soeken, Microsoft, CH
Authors:
Mathias Soeken, Mariia Mykhailova, Vadym Kliuchnikov, Christopher Granade and Alexander Vaschillo, Microsoft Quantum, US
Abstract
An important branch in quantum computing involves accurate resource estimation to assess the cost of running a quantum algorithm on future quantum hardware. A comprehensive and self-contained workflow with the quantum program in its center allows programmers to build comprehensible and reproducible resource estimation projects. We show how to systematically create such workflows using the quantum programming language Q#. Our approach uses simulators for verification, debugging, and resource estimation, as well as rewrite steps for optimization.
18:00 CEST 8.3.3 HIQ-PROJECTQ: TOWARDS USER-FRIENDLY AND HIGH-PERFORMANCE QUANTUM COMPUTING ON GPUS
Speaker:
Damien Nguyen, Huawei Research, CH
Authors:
Damien Nguyen1, Dmitry Mikushin1 and Yung Man-Hong2
1Zurich Research Center, Data Center Technology Laboratory, 2012 Laboratories, Huawei, CH; 2Central Research Institute, Data Center Technology Laboratory, 2012 Laboratories, Huawei, CN
Abstract
In this work, we present some of the latest efforts made at Huawei Research in order to improve the overall performance of ProjectQ, the quantum computing framework used as the foundation of our quantum research. Since a few years, performance assessment of the framework using profiling tools has shown that a significant portion of the compilation time is spent in doing memory management linked to the lifetime and access of Python objects. The main purpose of this work is therefore aimed at addressing some of these limitations by introducing a new C++ processing backend for ProjectQ, as well as starting a complete rewrite of the simulator code already written in C++. The core of this work is centered around on a new C++ API for ProjectQ that moves most of the compiler processing into natively compiled code. We achieve this by providing a new compiler engine to perform the conversion from Python to C++, which ensures that we retain maximum compatibility with existing user code while providing significant speed-ups. We also introduce some of our work aimed at porting the existing C++ code to offload the more demanding calculations onto GPUs for better performance. We then investigate the performance of this new API by comparing it with the original ProjectQ implementation, as well as some other existing quantum computing frameworks. The preliminary results show that the C++ API is able to considerably reduce the cost of compilation, to the point that the compilation process becomes mostly limited by the performance of the simulator. Unfortunately, due to the some last minutes developments and the time frame required for the submission of the present manuscript, we are unable to provide conclusive benchmark results for the GPU-capable implementation of the simulator. These will most likely be presented during the keynote presentation at the DATE conference in 2021.
18:15 CEST 8.3.4 COMPILERS FOR THE NISQ ERA
Speaker and Author:
Ross Duncan, Cambridge Quantum Computing Ltd, US
Abstract
We survey the distinctive features of NISQ devices, their differences from more conventional computing systems, and the challenges they pose for compilers.

8.4 Emerging Technologies for Neuromorphic Computing

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/C99maWhgFT9MmQJ5S

Session chair:
Michael Niemier, University of Notre Dame, US

Session co-chair:
Xunzhao Yin, Zhejiang University, CN

This session investigates platforms for neural networks and neuromorphic computing applications based on emerging technologies. The first paper discusses optical neural networks, and aims to improve computational efficiency when both operands are dynamically-encoded light signals. The second paper presents new methods to efficiently program memristor-based crossbars. The last presentation considers how RRAM-based crossbars may be made more resilient to cycle-to-cycle variations.

Time Label Presentation Title
Authors
17:30 CEST 8.4.1 O2NN: OPTICAL NEURAL NETWORKS WITH DIFFERENTIAL DETECTION-ENABLED OPTICAL OPERANDS
Speaker:
Jiaqi Gu, University of Texas at Austin, US
Authors:
Jiaqi Gu, Zheng Zhao, Chenghao Feng, Zhoufeng Ying, Ray T. Chen and David Z. Pan, University of Texas at Austin, US
Abstract
Optical neuromorphic computing has demonstrated promising performance with ultra-high computation speed, high bandwidth, and low energy consumption. The traditional optical neural network (ONN) architectures realize neuromorphic computing via electrical weight encoding. However, previous ONN design methodologies can only handle static linear projection with stationary synaptic weights, thus fail to support efficient and flexible computing when both operands are dynamically-encoded light signals. In this work, we propose a novel ONN engine O2NN based on wavelength-division multiplexing and differential detection to enable high-performance, robust, and versatile photonic neural computing with both light operands. Balanced optical weights and augmented quantization are introduced to enhance the representability and efficiency of our architecture. Static and dynamic variations are discussed in detail with a knowledge-distillation-based solution given for robustness improvement. Discussions on hardware cost and efficiency are provided for a comprehensive comparison with prior work. Simulation and experimental results show that the proposed ONN architecture provides flexible, efficient, and robust support for high-performance photonic neural computing with fully-optical operands under low-bit quantization and practical variations.
17:45 CEST 8.4.2 AN EFFICIENT PROGRAMMING FRAMEWORK FOR MEMRISTOR-BASED NEUROMORPHIC COMPUTING
Speaker:
Grace Li Zhang, TU Munich, DE
Authors:
Grace Li Zhang1, Bing Li1, Xing Huang1, Chen Shen1, Shuhang Zhang1, Florin Burcea1, Helmut Graeb1, Tsung-Yi Ho2, Hai (Helen) Li3 and Ulf Schlichtmann1
1TU Munich, DE; 2National Tsing Hua University, TW; 3Duke University/TUM-IAS, US
Abstract
Memristor-based crossbars are considered to be promising candidates to accelerate vector-matrix computation in deep neural networks. Before being applied for inference, memristors in the crossbars should be programmed to conductances corresponding to the weights after software training. Existing programming methods, however, adjust conductances of memristors individually with many programming-reading cycles. In this paper, we propose an efficient programming framework for memristor crossbars, where the programming process is partitioned into the predictive phase and the fine-tuning phase. In the predictive phase, multiple memristors are programmed simultaneously with a memristor programming model and IR-drop estimation. To deal with the programming inaccuracy resulting from process variations, noise and IR-drop, memristors are fine tuned afterwards to reach a specified programming accuracy. Simulation results demonstrate that the proposed method can reduce the number of programming-reading cycles by up to 94.77% and 90.61% compared to existing one-by-one and row-by-row programming methods.
18:00 CEST IP7_3.1 EFFICIENT IDENTIFICATION OF CRITICAL FAULTS IN MEMRISTOR CROSSBARS FOR DEEP NEURAL NETWORKS
Speaker:
CHING-YUAN CHEN, Duke University, US
Authors:
Ching-Yuan Chen1 and Krishnendu Chakrabarty2
1Graduate Institute of Electronics Engineering, TW; 2Duke University, US
Abstract
Deep neural networks (DNNs) are becoming ubiquitous, but hardware-level reliability is a concern when DNN models are mapped to emerging neuromorphic technologies such as memristor-based crossbars. As DNN architectures are inherently fault-tolerant and many faults do not affect inferencing accuracy, careful analysis must be carried out to identify faults that are critical for a given application. We present a misclassification-driven training (MDT) algorithm to efficiently identify critical faults (CFs) in the crossbar. Our results for two DNNs on the CIFAR-10 data set show that MDT can rapidly and accurately identify a large number of CFs—up to 20$ imes$ faster than a baseline method of forward inferencing with randomly injected faults. We use the set of CFs obtained using MDT and the set of benign faults obtained using forward inferencing to train a machine learning (ML) model to efficiently classify all the crossbar faults in terms of their criticality. We show that the ML model can classify millions of faults within minutes with a remarkably high classification accuracy of over 99\%. We present a fault-tolerance solution that exploits this high degree of criticality-classification accuracy, leading to a 93\% reduction in the redundancy needed for fault tolerance.
18:01 CEST 8.4.3 DIGITAL OFFSET FOR RRAM-BASED NEUROMORPHIC COMPUTING: A NOVEL SOLUTION TO CONQUER CYCLE-TO-CYCLE VARIATION
Speaker:
Ziqi Meng, Shanghai Jiao Tong University, CN
Authors:
Ziqi Meng1, Weikang Qian1, Yilong Zhao1, Yanan Sun2, Rui Yang1 and Li Jiang1
1Shanghai Jiao Tong University, CN; 2Department of Micro-Nano Electronics, Shanghai Jiao Tong University, CN
Abstract
Resistance variation in memristor device hinders the practical use of resistive random access memory (RRAM) crossbars as neural network (NN) accelerators. Previous fault-tolerant methods cannot effectively handle cycle-to-cycle variation (CCV). Many of them also use a pair of positive-weight and negative-weight crossbars to store a weight matrix, which implicitly enhances the fault tolerance but doubles the hardware cost. This paper proposes a novel solution that dramatically reduces the NN accuracy loss under CCV, while still using a single crossbar to store a weight matrix. The key idea is to introduce digital offsets into the crossbar, which further enables two techniques to conquer CCV. The first is a variation-aware weight optimization method that determines the optimal target weights to be written into the crossbar; the second is a post-writing tuning method that optimally sets the digital offsets to recover the accuracy loss due to variation. Simulation results show that the accuracy maintains the ideal value for LeNet with MNIST and only drops by 2.77% over the ideal value for ResNet-18 with CIFAR-10 under a large resistance variation. Moreover, compared to state-of-the-art fault-tolerant methods, our method achieves a better NN accuracy with at least 50% fewer crossbars.

8.5 Employing non-volatile devices across the memory hierarchy: from cache to storage

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/KtDTurZT5mLpH5mJq

Session chair:
Mengying Zhao, Shandong University, CN

Session co-chair:
Stefan Slesazeck, Namlab, DE

As non-volatile memories become increasingly popular, it is vital to revisit the memory system design to cope with the challenges brought by emerging technologies. This session is on how to tune the various components on the memory hierarchy, including cache, content-addressable memory, and solid state drives (SSDs), based on the unique characteristics of non-volatile devices such as multi-level cell, ternary feature, high write cost, and read disturbance, so as to improve performance, energy efficiency, and/or reliability of the component.

Time Label Presentation Title
Authors
17:30 CEST 8.5.1 (Best Paper Award Candidate)
IN-MEMORY NEAREST NEIGHBOR SEARCH WITH FEFET MULTI-BIT CONTENT-ADDRESSABLE MEMORIES
Speaker:
Arman Kazemi, University of Notre Dame, US
Authors:
Arman Kazemi1, Mohammad Mehdi Sharifi1, Ann Franchesca Laguna1, Franz Mueller2, Ramin Rajaei3, Ricardo Olivo2, Thomas Kaempfe2, Michael Niemier1 and X. Sharon Hu1
1University of Notre Dame, US; 2Fraunhofer IPMS-CNT, DE; 3Department of Computer Science and Engineering, University of Notre Dame, US
Abstract
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing L∞ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (FeFETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
17:45 CEST 8.5.2 ENERGY-AWARE DESIGNS OF FERROELECTRIC TERNARY CONTENT ADDRESSABLE MEMORY
Speaker:
Yu Qian, Zhejiang University, CN
Authors:
Yu Qian1, Zhenhao Fan2, Haoran Wang1, Chao Li1, Mohsen Imani3, Kai Ni4, Grace Li Zhang5, Bing Li5, Ulf Schlichtmann5, Cheng Zhuo1 and Xunzhao Yin1
1Zhejiang University, CN; 2Zhejiang Iniversity, CN; 3University of California Irvine, US; 4Rochester Institute of Technology, US; 5TU Munich, DE
Abstract
Ternary content addressable memories (TCAMs) are a special form of computing-in-memory (CiM) circuits that aim to address the so-called memory wall issues by merging the parallel search function with memory blocks. Due to the efficient content addressing nature, TCAMs have been increasingly utilized for search intensive tasks in low-power, data analytic applications, such as IP routers, associative memories, and learning models. While most state-of-the-art TCAM designs mainly focus on improving the TCAM density by harnessing compact nonvolatile memories (NVMs), little efforts have been spent on reducing and optimizing the energy consumption of the NVM based TCAM. In this paper, by exploiting the FerroelectricFET (FeFET) as a representative NVM, we propose two compact and energy-aware designs of ferroelectric TCAMs for low power applications. We first introduce a novel 2FeFET based XOR-like gate structure that can also be adopted to other NVMs, and then leverage the structure to propose two TCAM designs that achieve high energy efficiency by either reducing the associated precharge overhead (2FeFET-1T cell), or eliminating the precharge phase typically required by TCAMs (2FeFET-2T cell). We evaluate and compare the designs w.r.t area, search energy and delay at array level with other existing designs, and benchmark the proposed TCAM designs in an associative memory based GPU architecture. The results suggest that the proposed 2FeFET-1T/2FeFET-2T TCAM design consumes 3.03X/8.08X less search energy than the conventional 16T CMOS TCAM, while the proposed design cell area is only 32.1%/39.3% of the latter. Compared with the state-of-the-art 2FeFET only TCAM array, our proposed designs still achieve 1.79X and 4.79X search energy reduction, respectively. Moreover, our proposed designs can achieve, on average, 45.2%/51.5% energy saving compared with the conventional GPU based architecture at the application level.
18:00 CEST IP7_1.1 BLOCK ATTRIBUTE-AWARE DATA REALLOCATION TO ALLEVIATE READ DISTURB IN SSDS
Speaker:
Mingwang Zhao, Southwest University of China, CN
Authors:
Jianwei Liao1, Mingwang Zhao1, Zhigang Cai1, Jun Li2 and Yuanquan Shi3
1Southwest University of China, CN; 2Southwest University, CN; 3Huaihua University, CN
Abstract
This paper proposes a data reallocation method in RR processes, by taking account of block attributes including P/E cycles and read counts. Specifically, it can distribute the data in the RR block onto a number of available SSD blocks, to allow different blocks keeping their own near optimal (tolerable) read count. Then, it can reduce the number of read fresh operations and the number of raw bit errors. Through a series of simulation experiments based on several realistic disk traces, we demonstrate that the proposed method can decrease the read response time by between 15.11% and 24.96% and the raw bit error rate by 4.14% on average, in contrast to state-of-the-art approaches.
18:01 CEST IP7_1.2 DYNAMIC TERNARY CONTENT-ADDRESSABLE MEMORY IS INDEED PROMISING: DESIGN AND BENCHMARKING USING NANOELECTROMECHANICAL RELAYS
Speaker:
Hongtao Zhong, Tsinghua University, CN
Authors:
Hongtao Zhong, Shengjie Cao, Huazhong Yang and Xueqing Li, Tsinghua University, CN
Abstract
Ternary content addressable memory (TCAM) has been a critical component in caches, routers, etc., in which density, speed, power efficiency, and reliability are the major design targets. There have been the conventional low-write-power but bulky SRAM-based TCAM design, and also denser but less reliable or higher-write-power TCAM designs using nonvolatile memory (NVM) devices. Meanwhile, some TCAM designs using dynamic memories have been also proposed. Although dynamic design TCAM is denser than CMOS SRAM TCAM and more reliable than NVM TCAM, the conventional row-by-row refresh operations land up with a bottleneck of interference with normal TCAM activities. Therefore, this paper proposes a custom low-power dynamic TCAM using nanoelectromechanical (NEM) relay devices utilizing one-shot refresh to solve the memory refresh problem. By harnessing the unique NEM relay characteristics with a proposed novel cell structure, the proposed TCAM occupies a small footprint of only 3 transistors (with two NEM relays integrated on the top through the back-end-of-line process), which significantly outperforms the density of 16-transistor SRAM-based TCAM. In addition, evaluations show that the proposed TCAM improves the write energy efficiency by 2.31x, 131x, and 13.5x over SRAM, RRAM, and FeFET TCAMs, respectively; The search energy-delay-product is improved by 12.7x, 1.30x, and 2.83x over SRAM, RRAM, and FeFET TCAMs, respectively.
18:02 CEST 8.5.3 IMPROVING THE ENERGY EFFICIENCY OF STT-MRAM BASED APPROXIMATE CACHE
Speaker:
Wei Zhao, Huazhong University of Science and Technology, CN
Authors:
Wei Zhao1, Wei Tong1, Dan Feng1, Jingning Liu1, Zhangyu Chen1, Jie Xu2, Bing Wu1, Chengning Wang1 and Bo Liu3
1Huazhong University of Science and Technology, CN; 2Wuhan National Lab for Optoelectronics, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan, China, CN; 3Hikstor Technology Co., LTD, Hangzhou, China, CN
Abstract
Approximate computing applications lead to large energy consumption and performance demand for the memory system. However, traditional SRAM based cache cannot satisfy these demands due to high leakage power and limited density. Spin Transfer Torque Magnetic RAM (STT-MRAM) is a promising candidate of cache due to low leakage power and high density. However, STT-MRAM suffers from high write energy. To leverage the ability of tolerating acceptable quality loss via approximations to data, we propose a STT-MRAM based APProximate cache architecture (APPcache) to write/read approximate data thus largely reducing energy. We find many similar elements (e.g. pixels in images) existing in cache lines while running approximate computing applications. Therefore, APPcache uses several light-weight similarity-based encoding schemes to eliminate the similar elements to reduce the data size thus reducing the write energy of STT-MRAM based cache. Besides, we design a software interface to manually control the output quality. APPcache can significantly eliminate similar elements thus improving energy efficiency. Experimental results show that our scheme can reduce write energy and improve the image raw data compression ratio by 21.9% and 38.0% compared with the state-of-the-art scheme with 1% error rate, respectively. As for the output quality, the losses of all benchmarks are within 5% with 1% error rate.

8.6 Advances in Formal Verification

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/8L3XhH2BX7njqm4xE

Session chair:
Cimatti Alessandro, FBK, IT

Session co-chair:
Yakir Vizel, Technion, IL

The session presents several new techniques in verification of hardware and software. The technical papers propose an effective method for the verification of dividers, an automated approach to improve dead code detection using IC3, and a new methodology to apply refinement techniques to the verification of general hardware modules. Two interactive presentations describe how to optimize BDDs for classification, and how to use information flow tracking to detect exploitable buffer overflows.

Time Label Presentation Title
Authors
17:30 CEST 8.6.1 VERIFYING DIVIDERS USING SYMBOLIC COMPUTER ALGEBRA AND DON'T CARE OPTIMIZATION
Speaker:
Christoph Scholl, University of Freiburg, DE
Authors:
Christoph Scholl1, Alexander Konrad1, Alireza Mahzoon2, Daniel Grosse3 and Rolf Drechsler4
1University Freiburg, DE; 2University of Bremen, DE; 3Johannes Kepler University Linz, AT; 4University of Bremen/DFKI, DE
Abstract
In this paper we build on methods based on Symbolic Computer Algebra that have been applied successfully to multiplier verification and more recently to divider verification as well. We show that existing methods are not sufficient to verify optimized non-restoring dividers and enhance those methods by a novel optimization method for polynomials wrt. satisfiability don't cares. The optimization is reduced to Integer Linear Programming (ILP). Our experimental results show that this method is the key for enabling the verification of large and optimized non-restoring dividers (with bit widths up to 512).
17:45 CEST 8.6.2 ICP AND IC3
Speaker:
Felix Winterer, University of Freiburg, DE
Authors:
Karsten Scheibler1, Felix Winterer2, Tobias Seufert2, Tino Teige1, Christoph Scholl3 and Bernd Becker2
1BTC Embedded Systems AG, DE; 2University of Freiburg, DE; 3University Freiburg, DE
Abstract
If embedded systems are used in safety-critical environments, they need to meet several standards. For example, in the automotive domain the ISO 26262 standard requires that the software running on such systems does not contain unreachable code. Software model checking is one effective approach to automatically detect such dead code. Being used in a commercial product, iSAT3 already performs very well in this context. In this paper we integrate IC3 into SAT3 in order to improve its dead code detection capabilities even further.
18:00 CEST IP7_2.1 OPTIMIZING BINARY DECISION DIAGRAMS FOR INTERPRETABLE MACHINE LEARNING CLASSIFICATION
Speaker:
Gianpiero Cabodi, Politecnico di Torino, IT
Authors:
Gianpiero Cabodi1, Paolo Camurati1, Alexey Ignatiev2, Joao Marques-Silva3, Marco Palena1 and Paolo Pasini4
1Politecnico di Torino, IT; 2Monash University, AU; 3University of Toulouse, FR; 4Polytechnic University of Turin, IT
Abstract
Motivated by the need to understand the behaviour of complex machine learning (ML) models, there has been recent interest on learning optimal (or sub-optimal) decision trees (DTs). This interest is explained by the fact that DTs are widely regarded as interpretable by human decision makers. An alternative to DTs are Binary Decision Diagrams (BDDs), which can be deemed interpretable. Compared to DTs, and despite a fixed variable order, BDDs offer the advantage of more compact representations in practice, due to the sharing of nodes. Moreover, there is also extensive experience in the efficient manipulation of BDDs. Our work proposes preliminary inroads in two main directions: (a) proposing a SAT-based model for computing a decision tree as a smallest Reduced Ordered Binary Decision Diagram, consistent with given training data; and (b) exploring heuristic approaches for deriving sub-optimal (i.e., not minimal) ROBDDs, in order to improve the scalability of the proposed technique. The heuristic approach is related to recent work on using BDDs for classification. Whereas previous works addressed size reduction by general logic synthesis techniques, our work adds the contribution of generalized cofactors, that are a well-known compaction technique specific to BDDs, once a care (or equivalently a don't care) set is given. Preliminary experimental results are also provided, proposing a direct comparison between optimal and sub-optimal solutions, as well as an evaluation of the impact of the proposed size reduction steps.
18:01 CEST IP7_2.2 BOFT: EXPLOITABLE BUFFER OVERFLOW DETECTION BY INFORMATION FLOW TRACKING
Speaker:
Muhammad Monir Hossain, University of Florida, US
Authors:
Muhammad Monir Hossain, Farimah Farahmandi, Mark Tehranipoor and Fahim Rahman, University of Florida, US
Abstract
Buffer overflow is one of the most critical software vulnerabilities with numerous functional and security impacts on memory boundaries and program calls. An exploitable buffer overflow, which can be directly or indirectly triggered through external user domain inputs, is of a greater concern because it can be misused during run-time for adversarial intention. Although some existing tools offer buffer overflow detection to certain extents, there are major limitations, such as, poor detection coverage and ad-hoc/manual verification efforts due to inadequate predefined executions for static analysis and substantially large input subspace for dynamic verification. In this paper, to provide program verification in static time with high detection coverage, we propose an automated framework for Exploitable Buffer Overflow Detection by Information Flow Tracking (BOFT). We achieve this goal following three steps -- first, BOFT analyzes the usage of arrays, pointers, and vulnerable application programming interface (APIs) in the program code and automatically inserts assertions required for buffer overflow detection. Second, BOFT instruments the program with taints for direct and indirect information flow tracking using an extensive set of formal expressions. Finally, it symbolically analyzes the instrumented code for maximum coverage and provides the list of exploitable buffer overflow vulnerabilities. BOFT is evaluated on standard benchmarks from SAMATE Juliet Test Suite (NIST) with a successful detection of ~94.87% (minimum) of exploitable buffer overflows with zero false positives.
18:02 CEST 8.6.3 (Best Paper Award Candidate)
LEVERAGING PROCESSOR MODELING AND VERIFICATION FOR GENERAL HARDWARE MODULES
Speaker:
Yue Xing, Princeton University, US
Authors:
Yue Xing, Huaixi Lu, Aarti Gupta and Sharad Malik, Princeton University, US
Abstract
For processors, an instruction-set-architecture (ISA) provides a complete functional specification that can be used to formally verify an implementation. There has been recent work in specifying accelerators using formal instruction sets, referred to as Instruction-Level Abstractions (ILAs), and using them to formally verify their implementations by leveraging processor verification techniques. In this paper, we generalize ILAs for specification of general hardware modules and formal verification of their RTL implementations. This includes automated generation of a complete set of functional (not including timing) specification properties using the ILA instructions. We address the challenges posed by this generalization and provide several case studies to demonstrate the applicability of this technique, including all the modules in an open-source 8051 micro-controller. This verification identified three bugs and completed in reasonable time.

8.7 Multicore and Distributed Real-Time Systems

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/gs9gyqxqpyQn5xFFw

Session chair:
Liliana Cucu-Grosjean, Inria, FR

Session co-chair:
Marko Bertogna, Università di Modena e Reggio Emilia, IT

The arrival of multicore and distributed platforms is the new challenge faced by the real-time systems community. New advances in latency guarantees are presented in this context with the first contribution of the session. A second contribution deals with CAN-oriented solution, while transparent communication solutions are presented, in light of cache coherence requirements, as well as cache partitioning. Last, but not least, the multicore aspects push for more utilisation of hardware accelerators and new analyses for memory interferences are analyzed in the last paper.

Time Label Presentation Title
Authors
17:30 CEST 8.7.1 DUETTO: LATENCY GUARANTEES AT MINIMAL PERFORMANCE COST
Speaker:
Reza Mirosanlou, University of Waterloo, CA
Authors:
Reza Mirosanlou1, Mohamed Hassan2 and Rodolfo Pellizzoni1
1University of Waterloo, CA; 2McMaster University, CA
Abstract
The management of shared hardware resources in multi-core platforms has been characterized by a fundamental trade-off: high-performance arbiters typically employed in COTS systems offer no worst-case guarantees, while dedicated real-time controllers provide timing guarantees at the cost of significantly degrading system performance. In this paper, we overcome this trade-off by introducing Duetto, a novel hardware resource management paradigm. Duetto pairs a real-time arbiter with a high-performance arbiter and a latency estimator module. Based on the observation that the resource is rarely overloaded, Duetto executes the high-performance arbiter most of the time, switching to the real-time arbiter only in the rare cases when the latency estimator deems that timing guarantees risk being violated. We demonstrate our approach on the case study of a multi-bank memory. Our evaluation based on cycle-accurate simulations shows that Duetto can provide the same latency guarantees as the real-time arbiter with limited loss of performance compared to the high-performance arbiter.
17:45 CEST 8.7.2 VPROFILE: VOLTAGE-BASED ANOMALY DETECTION IN CONTROLLER AREA NETWORKS
Speaker:
Nathan Liu, University of Waterloo, CA
Authors:
Nathan Liu, Carlos Moreno, Murray Dunne and Sebastian Fischmeister, University of Waterloo, CA
Abstract
Modern cars are becoming more accessible targets for cyberattacks due to the proliferation of wireless communication channels. The intra-vehicle Controller Area Network (CAN) bus lacks authentication, which exposes critical components to interference from less secure, wirelessly compromised modules. To address this issue, we propose vProfile, a sender authentication system based on voltage fingerprints of Electronic Control Units (ECUs). vProfile exploits the physical properties of ECU output voltages on the CAN bus to determine the authenticity of bus messages, which enables the detection of both hijacked ECUs and external devices connected to the bus. We show the potential of vProfile using experiments on two production vehicles with precision and recall scores of over 99.99%. The improved identification rates and more straightforward design of vProfile make it an attractive improvement over existing methods.
18:00 CEST IP7_3.2 MODELING, IMPLEMENTATION, AND ANALYSIS OF XRCE-DDS APPLICATIONS IN DISTRIBUTED MULTI-PROCESSOR REAL-TIME EMBEDDED SYSTEMS
Speaker:
Saeid Dehnavi, Eindhoven University of Technology (TU/e), The Netherlands, NL
Authors:
Saeid Dehnavi1, Dip Goswami2, Martijn Koedam2, Andrew Nelson3 and Kees Goossens4
1Eindhoven University of Technology (TU/e), NL; 2Eindhoven University of Technology, NL; 3TU Eindhoven, NL; 4Eindhoven university of technology, NL
Abstract
The Publish-Subscribe paradigm is a design pattern for transparent communication in many recent distributed applications. Data Distribution Service (DDS) is a machine-to-machine communication standard that aims to provide reliable, high-performance, inter-operable, and real-time data exchange based on publish–subscribe paradigm. However, the high resource requirement of DDS limits its usage in low-cost embedded systems. XRCE-DDS is a Client-Agent based standard to enable resource-constrained small embedded systems to connect to the DDS global data space. Current XRCE-DDS implementations suffer from dependencies with host operating systems, target only single processing units, and lack performance analysis methods. In this paper, we present a bare-metal implementation of XRCE-DDS standard on the CompSOC platform as an instance of Multi-Processor System on Chip (MPSoC). The proposed framework includes a hard real-time side hosting the XRCE-DDS Client, and a soft real-time side hosting the XRCE-DDS Agent. A Scenario Aware Data Flow (SADF) model is proposed to capture the dynamism of the system behavior in terms of different execution scenarios. We analyze the long-term expected value for throughput by capturing the probabilistic scenario switching using a proposed Markov model which is experimentally validated.
18:01 CEST IP7_4.1 ANALYZING MEMORY INTERFERENCE OF FPGA ACCELERATORS ON MULTICORE HOSTS IN HETEROGENEOUS RECONFIGURABLE SOCS
Speaker:
Maxim Mattheeuws, ETH Zürich, CH
Authors:
Maxim Mattheeuws1, Björn Forsberg2, Andreas Kurth3 and Luca Benini4
1ETH, CH; 2ETH Zürich, CH; 3ETH Zurich, CH; 4Università di Bologna and ETH Zurich, IT
Abstract
Reconfigurable heterogeneous systems-on-chips (SoCs) integrating multiple accelerators are cost-effective and feature the processing power required for complex embedded applications. However, to enable their usage in real-time settings, it is crucial to control interference on the shared main memory for reliable performance. Interference causes performance degradation due to simultaneous memory requests by components such as CPUs, caches, accelerators, and DMAs. We propose a methodology to characterize the interference to multicore host processors caused by accelerators implemented in the FPGA fabric of reconfigurable heterogeneous SoCs. Based on it, we extend the roofline model to account for performance degradation of the computing platform. The extended model allows to determine in an efficient way at which point memory interference becomes critical for a given platform and workload. We apply our methodology to a modern Xilinx UltraScale+ SoC integrating a multicore ARM Cortex-A CPU and a Kintex-grade FPGA. To the best of our knowledge, our results experimentally show for the first time that programs with intensities below 5flop/byte -- workloads with low cache locality -- can suffer from slowdowns of up to an order of magnitude.
18:02 CEST 8.7.3 FLEXIBLE CACHE PARTITIONING FOR MULTI-MODE REAL-TIME SYSTEMS
Speaker:
Ohchul Kwon, TU Munich, KR
Authors:
Ohchul Kwon1, Gero Schwäricke2, Tomasz Kloda2, Denis Hoornaert1, Giovani Gracioli3 and Marco Caccamo1
1TU Munich, DE; 2TUM, DE; 3UFSC, Brazil, BR
Abstract
Cache partitioning is a well-studied technique that mitigates the inter-processor cache interference in multiprocessor systems. The resulting optimization problem involves allocating portions of the cache to individual processors. In multi-mode applications (e.g., flight control system that runs in take-off, cruise, or landing mode), the cache memory requirement can change over time, making runtime cache repartitioning necessary. This paper presents a cache partition allocation framework enabling flexible cache partitioning for multi-mode real-time systems. The main objective is to guarantee timing predictability in the steady states and during mode changes. We evaluate the effectiveness of our approach for multiple embedded benchmarks with different ranges of cache size sensitivity. The results show increased schedulability compared to static partitioning approaches.
18:17 CEST IP7_4.2 EMPIRICAL EVIDENCE FOR MPSOCS IN CRITICAL SYSTEMS: THE CASE OF NXP’S T2080 CACHE COHERENCE
Speaker:
Roger Pujol, Barcelona Supercomputing Center, ES
Authors:
Roger Pujol1, Hamid Tabani1, Jaume Abella2, Mohamed Hassan3 and Francisco J Cazorla1
1Barcelona Supercomputing Center, ES; 2Barcelona Supercomputing Center (BSC-CNS), ES; 3McMaster University, CA
Abstract
The adoption of complex MPSoCs in critical real-time embedded systems mandates a detailed analysis their architecture to facilitate certification. This analysis is hindered by the lack of a thorough understanding of the MPSoC system due to the unobvious and/or insufficiently documented behavior of some key hardware features. Confidence on those features can only be regained by building specific tests to both, assess whether their behavior matches specifications and unveil their behavior when it is not fully known a priori. In this work, we introduce a systematic approach that constructs this thorough understanding of the MPSoC architecture-- and assess against its specification in processor documentation -- with a focus on the cache coherence protocol in the avionics-relevant NXP T2080 architecture as our use-case. Our approach covers all transitions in the MESI cache coherence protocol, with emphasis on the coherence between DMA and processing cores. We build evidence of their behavior based on available debug support and performance monitors. Our analysis discloses unexpected behavior for coherence-related notifications as well as some hardware monitors.

8.8 Industrial Design Methods and Tools: Multidimensional Design Reuse and Extended Role of Test for Automotive

Date: Wednesday, 03 February 2021
Time: 17:30 CEST - 18:20 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/zfe4jcWSSMiyZgPoJ

Organizer:
Jürgen Haase, edacentrum GmbH, DE

This Exhibition Workshop features two talks on industrial design methods and tools. It is open to conference delegates as well as to exhibition visitors.

Time Label Presentation Title
Authors
17:30 CEST 8.8.1 EARLIER SOC INTEGRATION WITH A MULTIDIMENSIONAL DESIGN REUSE
Speaker:
Chouki Aktouf, Defacto Technologies, FR
Abstract

SoC design starts by design assembly connecting IP blocks which is just the beginning of the integration process. The difficult part is reaching the best possible PPA (Power, Performance, Area) combination within tight deadlines, while keeping engineering costs under control. In the conventional EDA (Electronic Design Automation) design flow, each task (power consumption, architecture, testing, etc.) is performed separately by an ultra-specialized team of engineers and significant design time is lost in iteration loops. The number of iterations has a great impact on the cost and time frame of the whole project.

This presentation will illustrate how to start SoC Build process much earlier compared to traditional design flows. Using a joint API handling a variety of design domains and design formats including RTL, constraints, power, physical, test, etc. Such API allows non design experts to take important design.

Also, a new dimension of design extraction is presented with a focus on “Power SoC Integration”. It is shown how the design reuse ratio is augmented by keeping engineering cost reasonably low.

17:55 CEST 8.8.2 EXTENDING THE ROLE OF TEST TO MEET AUTOMOTIVE SAFETY AND SECURITY REQUIREMENTS
Speaker:
Lee Harrison, Siemens EDA, GB
Abstract

The role of test is expanding from its traditional role into one that includes managing the entire silicon lifecycle. To ensure that ICs work safely and as expected throughout their operational life, the industry needs to expand from production test to a model that includes ongoing monitoring for defects, degradations, bugs, attacks, and use case surprises.


IP7_1 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/JQTHeB6SDwGigrYpW

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP7_1.1 BLOCK ATTRIBUTE-AWARE DATA REALLOCATION TO ALLEVIATE READ DISTURB IN SSDS
Speaker:
Mingwang Zhao, Southwest University of China, CN
Authors:
Jianwei Liao1, Mingwang Zhao1, Zhigang Cai1, Jun Li2 and Yuanquan Shi3
1Southwest University of China, CN; 2Southwest University, CN; 3Huaihua University, CN
Abstract
This paper proposes a data reallocation method in RR processes, by taking account of block attributes including P/E cycles and read counts. Specifically, it can distribute the data in the RR block onto a number of available SSD blocks, to allow different blocks keeping their own near optimal (tolerable) read count. Then, it can reduce the number of read fresh operations and the number of raw bit errors. Through a series of simulation experiments based on several realistic disk traces, we demonstrate that the proposed method can decrease the read response time by between 15.11% and 24.96% and the raw bit error rate by 4.14% on average, in contrast to state-of-the-art approaches.
IP7_1.2 DYNAMIC TERNARY CONTENT-ADDRESSABLE MEMORY IS INDEED PROMISING: DESIGN AND BENCHMARKING USING NANOELECTROMECHANICAL RELAYS
Speaker:
Hongtao Zhong, Tsinghua University, CN
Authors:
Hongtao Zhong, Shengjie Cao, Huazhong Yang and Xueqing Li, Tsinghua University, CN
Abstract
Ternary content addressable memory (TCAM) has been a critical component in caches, routers, etc., in which density, speed, power efficiency, and reliability are the major design targets. There have been the conventional low-write-power but bulky SRAM-based TCAM design, and also denser but less reliable or higher-write-power TCAM designs using nonvolatile memory (NVM) devices. Meanwhile, some TCAM designs using dynamic memories have been also proposed. Although dynamic design TCAM is denser than CMOS SRAM TCAM and more reliable than NVM TCAM, the conventional row-by-row refresh operations land up with a bottleneck of interference with normal TCAM activities. Therefore, this paper proposes a custom low-power dynamic TCAM using nanoelectromechanical (NEM) relay devices utilizing one-shot refresh to solve the memory refresh problem. By harnessing the unique NEM relay characteristics with a proposed novel cell structure, the proposed TCAM occupies a small footprint of only 3 transistors (with two NEM relays integrated on the top through the back-end-of-line process), which significantly outperforms the density of 16-transistor SRAM-based TCAM. In addition, evaluations show that the proposed TCAM improves the write energy efficiency by 2.31x, 131x, and 13.5x over SRAM, RRAM, and FeFET TCAMs, respectively; The search energy-delay-product is improved by 12.7x, 1.30x, and 2.83x over SRAM, RRAM, and FeFET TCAMs, respectively.

IP7_2 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/vBDNdKJbARXSDrrfs

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP7_2.1 OPTIMIZING BINARY DECISION DIAGRAMS FOR INTERPRETABLE MACHINE LEARNING CLASSIFICATION
Speaker:
Gianpiero Cabodi, Politecnico di Torino, IT
Authors:
Gianpiero Cabodi1, Paolo Camurati1, Alexey Ignatiev2, Joao Marques-Silva3, Marco Palena1 and Paolo Pasini4
1Politecnico di Torino, IT; 2Monash University, AU; 3University of Toulouse, FR; 4Polytechnic University of Turin, IT
Abstract
Motivated by the need to understand the behaviour of complex machine learning (ML) models, there has been recent interest on learning optimal (or sub-optimal) decision trees (DTs). This interest is explained by the fact that DTs are widely regarded as interpretable by human decision makers. An alternative to DTs are Binary Decision Diagrams (BDDs), which can be deemed interpretable. Compared to DTs, and despite a fixed variable order, BDDs offer the advantage of more compact representations in practice, due to the sharing of nodes. Moreover, there is also extensive experience in the efficient manipulation of BDDs. Our work proposes preliminary inroads in two main directions: (a) proposing a SAT-based model for computing a decision tree as a smallest Reduced Ordered Binary Decision Diagram, consistent with given training data; and (b) exploring heuristic approaches for deriving sub-optimal (i.e., not minimal) ROBDDs, in order to improve the scalability of the proposed technique. The heuristic approach is related to recent work on using BDDs for classification. Whereas previous works addressed size reduction by general logic synthesis techniques, our work adds the contribution of generalized cofactors, that are a well-known compaction technique specific to BDDs, once a care (or equivalently a don't care) set is given. Preliminary experimental results are also provided, proposing a direct comparison between optimal and sub-optimal solutions, as well as an evaluation of the impact of the proposed size reduction steps.
IP7_2.2 BOFT: EXPLOITABLE BUFFER OVERFLOW DETECTION BY INFORMATION FLOW TRACKING
Speaker:
Muhammad Monir Hossain, University of Florida, US
Authors:
Muhammad Monir Hossain, Farimah Farahmandi, Mark Tehranipoor and Fahim Rahman, University of Florida, US
Abstract
Buffer overflow is one of the most critical software vulnerabilities with numerous functional and security impacts on memory boundaries and program calls. An exploitable buffer overflow, which can be directly or indirectly triggered through external user domain inputs, is of a greater concern because it can be misused during run-time for adversarial intention. Although some existing tools offer buffer overflow detection to certain extents, there are major limitations, such as, poor detection coverage and ad-hoc/manual verification efforts due to inadequate predefined executions for static analysis and substantially large input subspace for dynamic verification. In this paper, to provide program verification in static time with high detection coverage, we propose an automated framework for Exploitable Buffer Overflow Detection by Information Flow Tracking (BOFT). We achieve this goal following three steps -- first, BOFT analyzes the usage of arrays, pointers, and vulnerable application programming interface (APIs) in the program code and automatically inserts assertions required for buffer overflow detection. Second, BOFT instruments the program with taints for direct and indirect information flow tracking using an extensive set of formal expressions. Finally, it symbolically analyzes the instrumented code for maximum coverage and provides the list of exploitable buffer overflow vulnerabilities. BOFT is evaluated on standard benchmarks from SAMATE Juliet Test Suite (NIST) with a successful detection of ~94.87% (minimum) of exploitable buffer overflows with zero false positives.

IP7_3 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/dm8AQeeAWJrnuJTc8

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP7_3.1 EFFICIENT IDENTIFICATION OF CRITICAL FAULTS IN MEMRISTOR CROSSBARS FOR DEEP NEURAL NETWORKS
Speaker:
CHING-YUAN CHEN, Duke University, US
Authors:
Ching-Yuan Chen1 and Krishnendu Chakrabarty2
1Graduate Institute of Electronics Engineering, TW; 2Duke University, US
Abstract
Deep neural networks (DNNs) are becoming ubiquitous, but hardware-level reliability is a concern when DNN models are mapped to emerging neuromorphic technologies such as memristor-based crossbars. As DNN architectures are inherently fault-tolerant and many faults do not affect inferencing accuracy, careful analysis must be carried out to identify faults that are critical for a given application. We present a misclassification-driven training (MDT) algorithm to efficiently identify critical faults (CFs) in the crossbar. Our results for two DNNs on the CIFAR-10 data set show that MDT can rapidly and accurately identify a large number of CFs—up to 20$ imes$ faster than a baseline method of forward inferencing with randomly injected faults. We use the set of CFs obtained using MDT and the set of benign faults obtained using forward inferencing to train a machine learning (ML) model to efficiently classify all the crossbar faults in terms of their criticality. We show that the ML model can classify millions of faults within minutes with a remarkably high classification accuracy of over 99\%. We present a fault-tolerance solution that exploits this high degree of criticality-classification accuracy, leading to a 93\% reduction in the redundancy needed for fault tolerance.
IP7_3.2 MODELING, IMPLEMENTATION, AND ANALYSIS OF XRCE-DDS APPLICATIONS IN DISTRIBUTED MULTI-PROCESSOR REAL-TIME EMBEDDED SYSTEMS
Speaker:
Saeid Dehnavi, Eindhoven University of Technology (TU/e), The Netherlands, NL
Authors:
Saeid Dehnavi1, Dip Goswami2, Martijn Koedam2, Andrew Nelson3 and Kees Goossens4
1Eindhoven University of Technology (TU/e), NL; 2Eindhoven University of Technology, NL; 3TU Eindhoven, NL; 4Eindhoven university of technology, NL
Abstract
The Publish-Subscribe paradigm is a design pattern for transparent communication in many recent distributed applications. Data Distribution Service (DDS) is a machine-to-machine communication standard that aims to provide reliable, high-performance, inter-operable, and real-time data exchange based on publish–subscribe paradigm. However, the high resource requirement of DDS limits its usage in low-cost embedded systems. XRCE-DDS is a Client-Agent based standard to enable resource-constrained small embedded systems to connect to the DDS global data space. Current XRCE-DDS implementations suffer from dependencies with host operating systems, target only single processing units, and lack performance analysis methods. In this paper, we present a bare-metal implementation of XRCE-DDS standard on the CompSOC platform as an instance of Multi-Processor System on Chip (MPSoC). The proposed framework includes a hard real-time side hosting the XRCE-DDS Client, and a soft real-time side hosting the XRCE-DDS Agent. A Scenario Aware Data Flow (SADF) model is proposed to capture the dynamism of the system behavior in terms of different execution scenarios. We analyze the long-term expected value for throughput by capturing the probabilistic scenario switching using a proposed Markov model which is experimentally validated.

IP7_4 Interactive Presentations

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/GxFxAcKvzuffC2Kzz

Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

Time Label Presentation Title
Authors
IP7_4.1 ANALYZING MEMORY INTERFERENCE OF FPGA ACCELERATORS ON MULTICORE HOSTS IN HETEROGENEOUS RECONFIGURABLE SOCS
Speaker:
Maxim Mattheeuws, ETH Zürich, CH
Authors:
Maxim Mattheeuws1, Björn Forsberg2, Andreas Kurth3 and Luca Benini4
1ETH, CH; 2ETH Zürich, CH; 3ETH Zurich, CH; 4Università di Bologna and ETH Zurich, IT
Abstract
Reconfigurable heterogeneous systems-on-chips (SoCs) integrating multiple accelerators are cost-effective and feature the processing power required for complex embedded applications. However, to enable their usage in real-time settings, it is crucial to control interference on the shared main memory for reliable performance. Interference causes performance degradation due to simultaneous memory requests by components such as CPUs, caches, accelerators, and DMAs. We propose a methodology to characterize the interference to multicore host processors caused by accelerators implemented in the FPGA fabric of reconfigurable heterogeneous SoCs. Based on it, we extend the roofline model to account for performance degradation of the computing platform. The extended model allows to determine in an efficient way at which point memory interference becomes critical for a given platform and workload. We apply our methodology to a modern Xilinx UltraScale+ SoC integrating a multicore ARM Cortex-A CPU and a Kintex-grade FPGA. To the best of our knowledge, our results experimentally show for the first time that programs with intensities below 5flop/byte -- workloads with low cache locality -- can suffer from slowdowns of up to an order of magnitude.
IP7_4.2 EMPIRICAL EVIDENCE FOR MPSOCS IN CRITICAL SYSTEMS: THE CASE OF NXP’S T2080 CACHE COHERENCE
Speaker:
Roger Pujol, Barcelona Supercomputing Center, ES
Authors:
Roger Pujol1, Hamid Tabani1, Jaume Abella2, Mohamed Hassan3 and Francisco J Cazorla1
1Barcelona Supercomputing Center, ES; 2Barcelona Supercomputing Center (BSC-CNS), ES; 3McMaster University, CA
Abstract
The adoption of complex MPSoCs in critical real-time embedded systems mandates a detailed analysis their architecture to facilitate certification. This analysis is hindered by the lack of a thorough understanding of the MPSoC system due to the unobvious and/or insufficiently documented behavior of some key hardware features. Confidence on those features can only be regained by building specific tests to both, assess whether their behavior matches specifications and unveil their behavior when it is not fully known a priori. In this work, we introduce a systematic approach that constructs this thorough understanding of the MPSoC architecture-- and assess against its specification in processor documentation -- with a focus on the cache coherence protocol in the avionics-relevant NXP T2080 architecture as our use-case. Our approach covers all transitions in the MESI cache coherence protocol, with emphasis on the coherence between DMA and processing cores. We build evidence of their behavior based on available debug support and performance monitors. Our analysis discloses unexpected behavior for coherence-related notifications as well as some hardware monitors.

UB.17 University Booth

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/26knQX25zLbDKnyAH

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.17 MELODI: A MASS E-LEARNING SYSTEM FOR DESIGN, TEST, AND PROTOTYPING OF DIGITAL HARDWARE
Speakers:
Philipp-Sebastian Vogt and Felix Georg Braun, TU Wien, AT
Authors:
Philipp-Sebastian Vogt and Felix Georg Braun, TU Wien, AT
Abstract
Teaching and learning design, test, and prototyping of digital hardware requires substantial resources from both students (tool-chains, licenses, and FPGA development kits) and universities (laboratories, licenses, and substantial human resources). Mass E-Learning Of design, test, and prototyping DIgital hardware (MELODI) provides an efficient and economical full-stack solution to reduce these requirements and enables an effective online learning platform.
MELODI is based on our previous experience with E-Learning (open-source system VELS) and communicates with students via email. It automatically generates randomized tasks (designed by teachers) for the students, evaluates their submissions and provides feedback to students and teachers. Lastly, it uses partial reconfiguration for efficient resource allocation on a FPGA with which students can interact remotely using a web interface and video stream.
Our demonstrator shows MELODI from a task request to the interactive web page.
/system/files/webform/tpc_registration/UB.17-MELODI.pdf

UB.18 University Booth

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/sQeFvMyaDTm9JbbTD

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.18 NEUROMUSCULAR SYNERGIES BASED CYBER-PHYSICAL PLATFORM FOR THE FAST LACK OF BALANCE RECOGNITION
Speaker:
Giovanni Mezzina, Politecnico di Bari, IT
Authors:
Giovanni Mezzina, Sardar Mehboob Hussain and Daniela De Venuto, Politecnico di Bari, IT
Abstract
This demonstration proposes the preliminary version of a novel pre-impact fall detection (PIFD) strategy. The multi-sensing architecture jointly analyzes the muscular and cortical activity, from 10 EMG electrodes on the lower limbs and 13 EEG sites all along the scalp. Recorded data are numerically treated by an algorithm composed of two main units: the EMG computation branch and the EEG one. The first one has two main roles: (i) it treats the EMGs, translating them into binary signals (ii) it uses these signals to enable the EEG branch. The EEG computation branch evaluates the rate of variation of the EEG power spectrum density, named m, to describe the cortical responsiveness in five bands of interest. The proposed architecture has been validated on five tasks: walking steps, curves, Timed Up&Go (TUG) test, obstacle avoidance and slip.
Experimental validation on 9 subjects showed that the system can identify a potential fall in 370.62 ms, with a sensitivity of the 93.33%.
/system/files/webform/tpc_registration/UB.18-.pdf

UB.19 University Booth

Date: Wednesday, 03 February 2021
Time: 18:30 CEST - 19:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/QqQrnZXRYodBzZdbH

Session Chair:
Frédéric Pétrot, IMAG, FR

Session Co-Chair:
Nicola Bombieri, Università di Verona, IT

Label Presentation Title
Authors
UB.19 NYUZI: AN OPEN SOURCE GPGPU FOR GRAPHICS, ENHANCED WITH OPENCL COMPILER FOR CALCULATIONS
Speaker:
Edwin Willegger, GMX, AT
Authors:
Nima TaheriNejad, Edwin Willegger, Mariusz Wojcik, Markus Kessler, Johannes Blatnik, Ioannis Daktylidis, Jonas Ferdig and Daniel Haslauer, TU Wien, AT
Abstract
Nyuzi is an open source processor designed for highly parallel, computationally intensive tasks and GPGPU applications. It was inspired by Intel's Larrabee, although the instruction set and the micro architecture are different. Among fully open source GPUs (with soft IPs), Nyuzi provides the most complete tool set. It is the only open source GPGPU with proven support for graphic applications. Moreover, we have recently added OpenCL compilation capabilities to it, enabling it to perform scientific calculations too. Hence, Nyuzi can be used to experiment with microarchitectural and instruction set design trade-offs for both graphic and scientific applications.
The project includes a synthesizable hardware design written in System Verilog, an instruction set emulator, an LLVM based C/C++/OpenCL compiler, software libraries, and tests. In this demo, you will see Nyuzi in action: rendering graphics on FPGA and running OpenCL codes.
/system/files/webform/tpc_registration/UB.19-Nyuzi-GPGPU.pdf

9.1 Autonomous Systems Design: Opening Panel

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 08:00 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ZjsTrqcyv7ocgh8y9

Session chair:
Rolf Ernst, TU Braunschweig, DE

Session co-chair:
Selma Saidi, TU Dortmund, DE

Organizers:
Rolf Ernst, TU Braunschweig, DE
Selma Saidi, TU Dortmund, DE

Fueled by the progress of artificial intelligence, autonomous systems become more and more integral parts of many Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) applications, such as automated driving, robotics, avionics and industrial automation. Autonomous systems are self-governed and self-adaptive systems that are designed to operate in an open and evolving environment that has not been completely defined at design time. This poses a unique challenge to the design and verification of dependable autonomous systems. In this opening session of the DATE Special Initiative on Autonomous Systems Design, industry leaders will talk about their visions of autonomous systems, the challenges they see in the development of autonomous systems as well as how autonomous systems will impact the business in their industries. These inputs will be discussed in an open floor panel. Panelists: - Thomas Kropf (Robert Bosch GmbH) - Pascal Traverse (Airbus) - Juergen Bortolazzi (Porsche AG) - Peter Liggesmeyer (Fraunhofer IESE) - Joseph Sifakis (University of Grenoble/VERIMAG) - Sandeep Neema (DARPA)

Time Label Presentation Title
Authors
07:00 CEST 9.1.1 AUTONOMOUS SYSTEMS DESIGN: OPENING PANEL
Panelists:
Rolf Ernst1, Selma Saidi2, Thomas Kropf3, Pascal Traverse4, Juergen Bortolazzi5, Peter Liggesmeyer6, Joseph Sifakis7 and Sandeep Neema8
1TU Braunschweig, DE; 2TU Dortmund, DE; 3Robert Bosch GmbH, DE; 4Airbus, FR; 5Porsche AG, DE; 6Fraunhofer IESE, DE; 7University of Grenoble/VERIMAG, FR; 8DARPA, US
Abstract
Fueled by the progress of artificial intelligence, autonomous systems become more and more integral parts of many Internet-of-Things (IoT) and Cyber-Physical Systems (CPS) applications, such as automated driving, robotics, avionics and industrial automation. Autonomous systems are self-governed and self-adaptive systems that are designed to operate in an open and evolving environment that has not been completely defined at design time. This poses a unique challenge to the design and verification of dependable autonomous systems. In this opening session of the DATE Special Initiative on Autonomous Systems Design, industry leaders will talk about their visions of autonomous systems, the challenges they see in the development of autonomous systems as well as how autonomous systems will impact the business in their industries. These inputs will be discussed in an open floor panel. Panelists: - Thomas Kropf (Robert Bosch GmbH) - Pascal Traverse (Airbus) - Juergen Bortolazzi (Porsche AG) - Peter Liggesmeyer (Fraunhofer IESE) - Joseph Sifakis (University of Grenoble/VERIMAG) - Sandeep Neema (DARPA)

9.2 IP Protection

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/2FaTQWjerFFsifGrE

Session chair:
Johann Knechtel, NYU Abu Dhabi, AE

Session co-chair:
Elif Bilge Kavun, University of Passau, DE

This session deals with methods for protection of IPs which is especially of importance in industrial applications. The papers cover novel defensive techniques for model checking and SAT-based attacks against logic encryption as well as cost-effective design methods for IC locking. Furthermore, a new method for hiding critical IC parts via soft eFPGA insertion is also presented.

Time Label Presentation Title
Authors
07:00 CEST 9.2.1 FA-SAT: FAULT-AIDED SAT-BASED ATTACK ON COMPOUND LOGIC LOCKING TECHNIQUES
Speaker:
Nimisha Limaye, New York University, US
Authors:
Nimisha Limaye1, Satwik Patnaik2 and Ozgur Sinanoglu3
1New York University, US; 2NEW YORK UNIVERSITY, US; 3New York University Abu Dhabi, AE
Abstract
Logic locking has received significant traction as a one-stop solution to thwart attacks at an untrusted foundry, test facility, and end-user. Compound locking schemes were proposed that integrate a low corruption and a high corruption locking technique to circumvent both tailored SAT-based and structural-analysis-based attacks. In this paper, we propose Fa-SAT, a generic attack framework that builds on the existing, open-source SAT tool to attack compound locking techniques. We consider the recently proposed bilateral logic encryption (BLE) and Anti-SAT coupled with random logic locking as case studies to showcase the efficacy of our proposed approach. Since the SAT-based attack alone cannot break these defenses, we integrate a fault-injection-based process into the SAT attack framework to successfully expose the logic added for locking and obfuscation. Our attack can circumvent these schemes' security guarantees with a 100% success across multiple trials of designs from diverse benchmark suites (ISCAS-85, MCNC, and ITC-99) synthesized with industry-standard tools for different key-sizes. Finally, we make our attack framework (as a web-interface) and associated benchmarks available to the research community.
07:15 CEST 9.2.2 A COGNITIVE SAT TO SAT-HARD CLAUSE TRANSLATION-BASED LOGIC OBFUSCATION
Speaker:
Rakibul Hassan, George Mason University, US
Authors:
Rakibul Hassan1, Gaurav Kolhe1, Setareh Rafatirad1, Houman Homayoun2 and Sai Manoj Pudukotai Dinakarrao1
1George Mason University, US; 2University of California Davis, US
Abstract
Logic obfuscation is introduced as a pivotal defense mechanism against emerging hardware threats on Integrated Circuits (ICs)such as reverse engineering (RE) and intellectual property (IP) theft. The effectiveness of logic obfuscation is challenged by the recently introduced Boolean satisfiability (SAT) attack and its variants. A plethora of countermeasures have been proposed to thwart that attacks. Irrespective of the implemented defenses, large power, performance, and area (PPA) overheads are seen to be indispensable. In contrast, we propose a neural network-based cognitive SATto SAT-hard clause translator under the constraints of minimalPPA overheads while preserving the original functionality with impenetrable security. Our proposed method is incubated with a SAT-hard clause generator that translates the existing conjunctive normal form (CNF) through minimal perturbations such as inclusion of pair of inverters or buffers or adding a new lightweightSAT-hard block depending on the provided CNF. For efficient SAT-hard clause generation, the proposed method is equipped with a multi-layer neural network that first learns the dependencies of features (literals and clauses), followed by a long-short-term-memory(LSTM) network to validate and backpropagate the SAT-hardness for better learning and translation. For a fair comparison with the state-of-the-art, we evaluate our proposed technique on ISCAS’85and ISCAS’89 benchmarks. It is seen to successfully defend against multiple state-of-the-art SAT attacks devised for hardware RE with minimal overheads.
07:30 CEST IP8_3.1 SEQUENTIAL LOGIC ENCRYPTION AGAINST MODEL CHECKING ATTACK
Speaker:
Amin Rezaei, California State University, Long Beach, US
Authors:
Amin Rezaei1 and Hai Zhou2
1California State University, Long Beach, US; 2Northwestern University, US
Abstract
Due to high IC design costs and emergence of countless untrusted foundries, logic encryption has been taken into consideration more than ever. In state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against both the SAT-based and the removal attacks. However, the SAT-based attack cannot decrypt the sequential circuits if the scan chain is protected or if the unreachable states encryption is adopted. Instead, these security schemes can be defeated by the model checking attack that searches iteratively for different input sequences to put the activated IC to the desired reachable state. In this paper, we propose a practical logic encryption approach to defend against the model checking attack on sequential circuits. The robustness of the proposed approach is demonstrated by experiments on around fifty benchmarks.
07:31 CEST IP8_3.2 RISK-AWARE COST-EFFECTIVE DESIGN METHODOLOGY FOR INTEGRATED CIRCUIT LOCKING
Speaker:
Yinghua Hu, University of Southern California, US
Authors:
Yinghua Hu, Kaixin Yang, Subhajit Dutta Chowdhury and Pierluigi Nuzzo, University of Southern California, US
Abstract
We introduce a systematic framework for logic locking of integrated circuits based on the analysis of the sources of information leakage from both the circuit and the locking scheme and their formalization into a notion of risk that can guide the design against existing and possible future attacks. We further propose a two-level optimization-based methodology to generate locking strategies minimizing a cost function and balancing security, risk, and implementation overhead, out of a collection of locking primitives. Optimization results on a set of case studies show the potential of layering multiple locking primitives to provide high security at significantly lower risk.
07:32 CEST 9.2.3 HARDWARE REDACTION VIA DESIGNER-DIRECTED FINE-GRAINED SOFT EFPGA INSERTION
Speaker:
Prashanth Mohan, Carnegie Mellon University, US
Authors:
Prashanth Mohan, Oguz Atli, Joseph Sweeney, Onur Kibar, Larry Pileggi and Ken Mai, Carnegie Mellon University, US
Abstract
In recent years, IC reverse engineering and IC fabrication supply chain security have grown to become significant economic and security threats for designers, system integrators, and end customers. Many of the existing logic locking and obfuscation techniques have shown to be vulnerable to attack once the attacker has access to the design netlist either through reverse engineering or through an untrusted fabrication facility. We introduce soft embedded FPGA redaction, a hardware obfuscation approach that allows the designer substitute security-critical IP blocks within a design with a synthesizable eFPGA fabric. This method fully conceals the logic and the routing of the critical IP and is compatible with standard ASIC flows for easy integration and process portability. To demonstrate eFPGA redaction, we obfuscate a RISC-V control path and a GPS P-code generator. We also show that the modified netlists are resilient to SAT attacks with moderate VLSI overheads. The secure RISC-V design has 1.89x area and 2.36x delay overhead while the GPS design has 1.39x area and negligible delay overhead when implemented on an industrial 22nm FinFET CMOS process.

9.3 Heterogeneous Architectures and Design Space Exploration

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/uQmiy3GQzGHhTZjvC

Session chair:
Lars Bauer, Karlsruhe Institute of Technology, DE

Session co-chair:
Lana Josipović, EPFL, CH

The session is devoted to design space exploration for heterogeneous coarse-grained architectures, optimizing mapping of large applications to platforms ranging from coarse-grained reconfigurable, accelerator-rich, to CPU/GPU platforms. The first paper presents a hierarchical algorithm to map multi-dimensional kernels to CGRAs, where at first individual iterations are mapped to a virtual systolic array that serves as an intermediate abstraction layer to exploit regularity. The second paper explores the design space for applications mapped onto smaller accelerator blocks, maximizing accelerator reuse by considering the similarity between the mapped functionality. And the third paper extends a Unified Virtual Memory simulation tool with a dynamically adaptive engine that identifies CPU-GPU interconnect traffic patterns, in order to choose the best existing memory management policy and subsequently trigger the switching of this policy accordingly.

Time Label Presentation Title
Authors
07:00 CEST 9.3.1 HIMAP: FAST AND SCALABLE HIGH-QUALITY MAPPING ON CGRA VIA HIERARCHICAL ABSTRACTION
Speaker:
Dhananjaya Wijerathne, National University of Singapore, SG
Authors:
Dhananjaya Wijerathne1, Zhaoying Li1, Anuj Pathania2, Tulika Mitra1 and Lothar Thiele3
1National University of Singapore, SG; 2University of Amsterdam, NL; 3ETH Zurich, CH
Abstract
Coarse-Grained Reconfigurable Array (CGRA) has emerged as a promising hardware accelerator due to the excellent balance among reconfigurability, performance, and energy efficiency. The CGRA performance strongly depends on a high-quality compiler to map the application kernels on the architecture. Unfortunately, the state-of-the-art compilers fall short in generating high-quality mapping within an acceptable compilation time, especially with increasing CGRA size. We propose {em HiMap} -- a fast and scalable CGRA mapping approach -- that is also adept at producing close-to-optimal solutions for multi-dimensional kernels prevalent in existing and emerging application domains. The key strategy behind {em HiMap}'s efficiency and scalability is to exploit the regularity in the loop iteration dependencies by employing a virtual systolic array as an intermediate abstraction layer in a hierarchical mapping. Experimental results confirm that {em HiMap} can generate application mappings that hit the performance envelope of the CGRA. {em HiMap} offers 17.3x and 5x improvement in performance and energy efficiency of the mappings compared to the state-of-the-art. The compilation time of {em HiMap} for near-optimal mappings is less than 15 minutes for 64x64 CGRA while existing approaches take days to generate inferior mappings.
07:15 CEST 9.3.2 MG-DMDSE: MULTI-GRANULARITY DOMAIN DESIGN SPACE EXPLORATION CONSIDERING FUNCTION SIMILARITY
Speaker:
Jinghan Zhang, Northeastern University, US
Authors:
Jinghan Zhang1, Aly Sultan1, Hamed Tabkhi2 and Gunar Schirner1
1Northeastern University, US; 2UNC Charlotte, US
Abstract
Heterogeneous accelerator-rich (ACC-rich) platforms combining general-purpose cores and specialized HW accelerators (ACCs) promise high-performance and low-power streaming application deployments in a variety of domains such as video analytics and software-defined radio. In order to benefit a domain of applications, a domain platform exploration tool must take advantage of structural and functional similarities across applications by allocating a common set of ACCs. A previous approach proposed a GenetIc Domain Exploration tool (GIDE) that applied a restrictive binding algorithm that mapped applications functions to monolithic accelerators. This approach suffered from lower average application throughput across and reduced platform generality. This paper introduces a Multi-Granularity based Domain Design Space Exploration tool (MG-DmDSE) to make the generated platform efficient for many more applications. This improves both average application throughput as well as platform generality. To achieve that goal, the key contributions of MG-DmDSE are: (1) Applying a multi-granular decomposition of coarse grain application functions into more granular compute kernels. (2) Examining compute similarity between functions in order to produce more generic functions. (3) Configuring monolithic ACCs by selectively bypassing compute elements within them during DSE to expose more functionality. To assess MG-DmDSE, both GIDE and MG-DmDSE were applied to applications in the OpenVX library. MG-DmDSE achieves an average 2.84x greater application throughput compared to GIDE. Additionally, 87.5% of applications benefited from running on the platform produced by MG-DmDSE vs 50% from GIDE, which indicated increase platform generality.
07:30 CEST IP8_2.1 FORMULATION OF DESIGN SPACE EXPLORATION PROBLEMS BY COMPOSABLE DESIGN SPACE IDENTIFICATION
Speaker:
Rodolfo Jordão, KTH Royal Institute of Technology, SE
Authors:
Rodolfo Jordao, Ingo Sander and Matthias Becker, KTH Royal Institute of Technology, SE
Abstract
Design space exploration (DSE) is a key activity in embedded system design methodologies and can be supported by well-defined models of computation (MoCs) and predictable platform architectures. The original design model, covering the application models, platform models and design constraints needs to be converted into a form analyzable by computer-aided decision procedures such as mathematical programming or genetic algorithms. This conversion is the process of design space identification (DSI), which becomes very challenging if the design domain comprises several MoCs and platforms. For a systematic solution to this problem, separation of concerns between the design domain and decision domain is of key importance. We propose in this paper a systematic DSI scheme that is (a) composable, as it enables the stepwise and simultaneous extension of both design and decision domain, and (b) tuneable, because it also enables different DSE solving techniques given the same design model. We exemplify this DSI scheme by an illustrative example that demonstrates the mechanisms for composition and tuning. Additionally, we show how different compositions can lead to the same decision model as an important property of this DSI scheme.
07:31 CEST IP8_2.2 RTL DESIGN FRAMEWORK FOR EMBEDDED PROCESSOR BY USING C++ DESCRIPTION
Speaker:
Eiji Yoshiya, Information and Communications Engineering, Tokyo Institute of Technology., JP
Authors:
Eiji Yoshiya, Tomoya Nakanishi and Tsuyoshi Isshiki, Tokyo Institute of Technology, JP
Abstract
In this paper, we propose a method to directly describe the RTL structure of a pipelined RISC-V processor with cache, memory management unit (MMU) and AXI bus interface using C++ language. This processor C++ model serves as a near cycle-accurate simulation model of the RISC-V core, while our C2RTL framework translates the processor C++ model into cycle-accurate RTL description in Verilog-HDL and RTL-equivalent C model. Our design methodology is unique compared to other existing methodologies since both the simulation model and the RTL model are derived from the same C++ source, which greatly simplifies the design verification and optimization processes. The effectiveness of our design methodology is demonstrated on a RISC-V processor which runs Linux OS on an FPGA board as well as significantly short simulation time of the original C++ processor model and RTL-equivalent C model compared to commercial RTL simulator.
07:32 CEST 9.3.3 AN ADAPTIVE FRAMEWORK FOR OVERSUBSCRIPTION MANAGEMENT IN CPU-GPU UNIFIED MEMORY
Speaker:
Debashis Ganguly, Department of Computer Science, School of Computing and Information, University of Pittsburgh, US
Authors:
Debashis Ganguly1, Rami Melhem1 and Jun Yang2
1Department of Computer Science, School of Computing and Information, University of Pittsburgh, US; 2Electrical and Computer Engineering Department, University of Pittsburgh, US
Abstract
Hardware support for fault-driven page migration and on-demand memory allocation along with the advancements in unified memory runtime in modern graphics processing units (GPUs) simplify the memory management in discrete CPU-GPU heterogeneous memory systems and ensure higher programmability. GPUs adopt to accelerate general purpose applications as they are now an integral part of heterogeneous computing platforms ranging from supercomputers to commodity cloud platforms. However, data-intensive applications face the challenge of device-memory oversubscription as the limited capacity of bandwidth-optimized GPU memory fails to accommodate their increasing working sets. Performance overhead under memory oversubscription comes from the thrashing of memory pages over slow CPU-GPU interconnect. Depending on the diverse computing and memory access pattern, each application demands special attention from memory management. As a result, the responsibility of effectively utilizing the plethora of memory management techniques supported by GPU programming libraries and runtime falls squarely on the application programmer. This paper presents a smart runtime that leverages the faults and page-migration information to detect underlying patterns in CPU-GPU interconnect traffic. Based on the online workload characterization, the extended unified memory runtime dynamically chooses and employs a suitable policy from a wide array of memory management strategies to address the issues with memory oversubscription. Experimental evaluation shows that this smart adaptive runtime provides 18% and 30% (geometric mean) performance improvement across all benchmarks compared to the default unified memory runtime under 125% and 150% device memory oversubscription, respectively.

9.4 Analog layout conquers new frontiers

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/XJahjaABdT3x4yQqd

Session chair:
Helmut Helmut Graeb, TUM, DE

Session co-chair:
Salvador Salvador Mir, TIMA, FR

This session presents various exciting developments in analog layout. The first paper presents a new approach to analog layout, which combines placement and routing. The second paper challenges the current consideration of matching in common-centroid and interdigitated layouts. The third paper deals with design of layout primitives in FinFET technologies.

Time Label Presentation Title
Authors
07:00 CEST 9.4.1 PERFORMANCE-DRIVEN ROUTING METHODOLOGY WITH INCREMENTAL PLACEMENT REFINEMENT FOR ANALOG LAYOUT DESIGN
Speaker:
Hao-Yu Chi, National Chiao Tung University, TW
Authors:
Hao-Yu Chi1, Han-Chung Chang1, Chih-Hsin Yang2, Chien-Nan Liu1 and Jing-Yang Jou2
1National Chiao Tung University, TW; 2National Central University, TW
Abstract
Analog layout is often considered as a difficult task because many layout-dependent effects will impact final circuit performance. In the literature, many automation techniques have been proposed for analog placement and routing respectively. However, very few works are able to consider the two steps simultaneously to obtain the best performance and cost after layout. Most of the routing-aware placement techniques optimize the layout results based on an assumed routing result, which may be quite different to the final layout. In this work, we proposed an automatic two-step layout methodology for analog circuits to alleviate the performance loss during layout process. Instead of using a rough routing prediction during placement stage, a crossing-aware global routing technique is first performed to provide an accurate routing resource estimation of the given compact placement. Then, the improved CDL-based layout migration technique is adopted to do a fast adjustment on the placement and routing to reduce the difference between estimation and final layout while keeping the optimality of the given placement. As shown in the experimental results, the proposed methodology is able to improve the accuracy of routing resource estimation thus improving the final layout quality and circuit performance.
07:15 CEST 9.4.2 COMMON-CENTROID LAYOUTS FOR ANALOG CIRCUITS: ADVANTAGES AND LIMITATIONS
Speaker:
Arvind Kumar Sharma, University of Minnesota, US
Authors:
Arvind Kumar Sharma1, Meghna Madhusudan1, Steven M. Burns2, Parijat Mukherjee2, Soner Yaldiz2, Ramesh Harjani1 and Sachin S. Sapatnekar1
1University of Minnesota, US; 2Intel Corporation, US
Abstract
Common-centroid (CC) layouts are widely used in analog design to make circuits resilient to variations by matching device characteristics. However, CC layout may involve increased routing complexity and higher parasitics than other alternative layout schemes. This paper critically analyzes the fundamental assumptions behind the use of common-centroid layouts, incorporating considerations related to systematic and random variations as well as the performance impact of common-centroid layout. Based on this study, conclusions are drawn on when CC layout styles can reduce variation, improve performance (even if they do not reduce variation), and when non-CC layouts are preferable.
07:30 CEST IP8_4.2 OPTIMIZED MULTI-MEMRISTOR MODEL BASED LOW ENERGY AND RESILIENT CURRENT-MODE MULTIPLIER DESIGN
Speaker:
Shengqi Yu, Newcastle University, GB
Authors:
Shengqi Yu1, Rishad Shafik2, Thanasin Bunnam2, Kaiyun Chen2 and Alex Yakovlev2
1Newcastle Universtiy, GB; 2Newcastle University, GB
Abstract
Multipliers are central to modern arithmetic-heavy applications, such as signal processing and artificial intelligence (AI). However, the complex logic chain in conventional multipliers, particularly due to cascaded carry propagation circuits, contributes to high energy and performance costs. This paper proposes a novel current-mode multiplier design that reduces the carry propagation chain and improves the current amplification. Fundamental to this design is a one transistor multiple memristors (1TxM) cell architecture. In each cell, transistor can be switched ONslash OFF to determine the cell selection, while memristor states determine the corresponding cell output current when selected. The highslash low resistive states as well as biasing configurations in each memristor are suitably optimized through a new memristor model. Depending on the significance of the cell current path the number of memristors in each cell is determined to achieve the required amplification. Consequently, the design reduces the need to have current mirror circuits in each current path, while also ensuring high resilience in transitional bias voltages. Parallel cell currents are then directed to a common current accumulation path to generate the multiplier output without requiring any carry propagation chain. We carried out a wide range of experiments to extensively validate our multiplier design in Cadence Virtuoso analogue design environment for functional and parametric properties. The results show that the proposed multiplier reduces up to 84.9% latency and 98.5% energy cost when compared with recently proposed approaches.
07:31 CEST 9.4.3 ANALOG LAYOUT GENERATION USING OPTIMIZED PRIMITIVES
Speaker:
Meghna Madhusudan, University of Minnesota, Twin Cities, US
Authors:
Meghna Madhusudan1, Arvind Kumar Sharma1, Yaguang Li2, Jiang Hu2, Sachin S. Sapatnekar1 and Ramesh Harjani1
1University of Minnesota, US; 2Texas A&M University, US
Abstract
Hierarchical analog layout generators proceed from leaf cells (“primitives”) to progressively larger blocks that are placed and routed. The quality of primitive cell layout is critical for design performance. This paper proposes a methodology that defines and optimizes the performance metrics of primitives during leaf cell layout. It incorporates layout parasitics and layout-dependent effects, providing a set of optimized layout choices for use by the place-and-route engine, as well as wire sizing guidelines for connections outside the cell. For FinFET-based designs of a high-frequency amplifier, a StrongARM comparator, and a fully differential VCO, our approach outperforms existing methods and is competitive with time-intensive manual layout.

9.5 Energy-Efficient and Thermal Aware Systems for Machine Learning

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/iYaQRwLcwoBLG9Y8X

Session chair:
Pascal Vivet, CEA-Leti, FR

Session co-chair:
Shao-Yun Fang, National Taiwan University of Science and Technology, TW

This session highlights energy efficient and thermal aware system solutions using advanced technologies and circuit level techniques. The first paper introduces an energy efficient accelerator using Non Volatile Memory and Monolithic 3D technologies for deep learning, the second paper reports thermal aware placement for 2.5D chiplet-based architecture, while the last paper presents circuit level techniques combining critical path manipulation and approximate computing for low voltage operation.

Time Label Presentation Title
Authors
07:00 CEST 9.5.1 MARVEL: A VERTICAL RESISTIVE ACCELERATOR FOR LOW-POWER DEEP LEARNING INFERENCE IN MONOLITHIC 3D
Speaker:
Fan Chen, Duke University, US
Authors:
Fan Chen1, Linghao Song1, Hai (Helen) Li2 and Yiran Chen1
1Duke University, US; 2Duke University/TUM-IAS, US
Abstract
Resistive memory (ReRAM) based Deep Neural Network (DNN) accelerators have achieved state-of-the-art DNN inference throughput. However, the power efficiency of such resistive accelerators is greatly limited by their peripheral circuitry including analog-to-digital converters (ADCs), digital-to-analog converters (DACs), SRAM registers, and eDRAM buffers. These power-hungry components consume 87% of the total system power, despite of the high power efficiency of ReRAM computing cores. In this paper, we propose MARVEL, a monolithic 3D stacked resistive DNN accelerator, which consists of carbon nanotube field-effect transistors (CNFETs) based low-power ADC/DACs, CNFET logic, CNFET SRAM, and high-density global buffers implemented by cross-point Spin Transfer Torque Magnetic RAM (STT-MRAM). To compensate for the loss of inference throughput that is incurred by the slow CNFET ADCs, we propose to integrate more ADC layers into MARVEL. Unlike the CMOS-based ADCs that can only be implemented in the bottom layer of the 3D structure, multiple CNFET layers can be implemented using a monolithic 3D stacking technique. Compared to prior ReRAMbased DNN accelerators, on average, MARVEL achieves the same inference throughput with 4.5x improvement on performance per Watt. We also demonstrated that increasing the number of integration layers enables MARVEL to further achieve 2x inference throughput with 7.6x improved power efficiency.
07:15 CEST 9.5.2 TAP-2.5D: A THERMALLY-AWARE CHIPLET PLACEMENT METHODOLOGY FOR 2.5D SYSTEMS
Speaker:
Ajay Joshi, Boston University, US
Authors:
Yenai Ma1, Leila Delshadtehrani1, Cansu Demirkiran1, Jose L. Abellan2 and Ajay Joshi1
1Boston University, US; 2Universidad Católica San Antonio de Murcia, ES
Abstract
Heterogeneous systems are commonly used today to sustain the historic benefits we have achieved through technology scaling. 2.5D integration technology provides a cost-effective solution for designing heterogeneous systems. The traditional physical design of a 2.5D heterogeneous system closely packs the chiplets to minimize wirelength, but this leads to a thermally-inefficient design. We propose TAP-2.5D: the first open-source network routing and thermally-aware chiplet placement methodology for heterogeneous 2.5D systems. TAP-2.5D strategically inserts spacing between chiplets to jointly minimize the temperature and total wirelength, and in turn, increases the thermal design power envelope of the overall system. We present three case studies demonstrating the usage and efficacy of TAP-2.5D.
07:30 CEST IP8_4.1 THERMAL-AWARE DESIGN AND MANAGEMENT OF EMBEDDED REAL-TIME SYSTEMS
Speaker and Author:
Youngmoon Lee, Hanyang University, KR
Abstract
Modern embedded systems face challenges in managing on-chip temperature as they are increasingly realized in powerful system-on-chips. This paper presents thermal-aware design and management of embedded systems by tightly coupling two mechanisms, thermal-aware utilization bound and real-time dynamic thermal management. The former provides the processor utilization upper-bound to meet the chip temperature constraint that depends not only on the system configurations and workloads but also chip cooling capacity and environment. The latter adaptively optimizes rates of individual task executions subject to the thermal-aware utilization bound. Our experiments on an automotive controller demonstrate the thermal-aware utilization bound and improved system utilization by 18.2% compared with existing approaches.
07:31 CEST IP8_1.1 (Best Paper Award Candidate)
PREDICTION OF THERMAL HAZARDS IN A REAL DATACENTER ROOM USING TEMPORAL CONVOLUTIONAL NETWORKS
Speaker:
Mohsen Seyedkazemi Ardebili, University of Bologna, IT
Authors:
Mohsen Seyedkazemi Ardebili1, Marcello Zanghieri1, Alessio Burrello2, Francesco Beneventi3, Andrea Acquaviva4, Luca Benini5 and Andrea Bartolini1
1University of Bologna, IT; 2Department of Electric and Eletronic Engineering, University of Bologna, IT; 3DEI - University of Bologna, IT; 4Politecnico di Torino, IT; 5Università di Bologna and ETH Zurich, IT
Abstract
Datacenters play a vital role in today's society. At large, a datacenter room is a complex controlled environment composed of thousands of computing nodes, which consume kW of power. To dissipate the power, forced air/liquid flow is employed, with a cost of millions of euros per year. Reducing this cost involves using free-cooling and average case design, which can create a cooling shortage and thermal hazards. When a thermal hazard happens, the system administrators and the facility manager must stop the production to avoid IT equipment damage and wear-out. In this paper, we study the thermal hazards signatures on a Tier-0 datacenter room's monitored data during a full year of production. We define a set of rules for detecting the thermal hazards based on the inlet and outlet temperature of all nodes of a room. We then propose a custom Temporal Convolutional Network (TCN) to predict the hazards in advance. The results show that our TCN can predict the thermal hazards with an F1-score of 0.98 for a randomly sampled test set. When causality is enforced between the training and validation set the F1-score drops to 0.74, demanding for an in-place online re-training of the network, which motivates further research in this context.
07:32 CEST 9.5.3 CRITICAL PATH ISOLATION AND BIT-WIDTH SCALING ARE HIGHLY COMPATIBLE FOR VOLTAGE OVER-SCALABLE DESIGN
Speaker:
Yutaka Masuda, Nagoya University, JP
Authors:
Yutaka Masuda1, Jun Nagayama2, TaiYu Cheng3, Tohru Ishihara1, Yoichi Momiyama2 and Masanori Hashimoto3
1Nagoya University, JP; 2Socionext Inc., JP; 3Osaka University, JP
Abstract
This work proposes a design methodology that saves the power under voltage over-scaling (VOS) operation. The key idea of the proposed design methodology is to combine critical path isolation (CPI) and bit-width scaling (BWS) under the constraint of computational quality, e.g., Peak Signal-to-Noise Ratio (PSNR). Conventional CPI inherently cannot reduce the delay of intrinsic critical paths (CPs), which may significantly restrict the power saving effect. On the other hand, the proposed methodology tries to reduce both intrinsic and non-intrinsic CPs. Therefore, our design dramatically reduces the supply voltage and power dissipation while satisfying the quality constraint. Moreover, for reducing co-design exploration space, the proposed methodology utilizes the exclusiveness of the paths targeted by CPI and BWS, where CPI aims at reducing the minimum supply voltage of non-intrinsic CP, and BWS focuses on intrinsic CPs in arithmetic units. From this key exclusiveness, the proposed design splits the simultaneous optimization problem into three sub-problems; (1) the determination of bit-width reduction, (2) the timing optimization for non-intrinsic CPs, and (3) investigating the minimum supply voltage of the BWS and CPI-applied circuit under quality constraint, for reducing power dissipation. Thanks to the problem splitting, the proposed methodology can efficiently find quality-constrained minimum-power design. Evaluation results show that CPI and BWS are highly compatible, and they significantly enhance the efficacy of VOS. In a case study of GPGPU processor, the proposed design saves the power dissipation by 42.7% for an image processing and by 51.2% for a neural network inference workload.

9.6 System-level Security

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 07:50 CEST
Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/HodD6RRekGnEAnGAK

Session chair:
Aydin Aysu, NC State University, US

Session co-chair:
Lionel Torres, University of Montpellier, FR

The session focuses on security from a high-level perspective: a practical approach to watermark behavioral HLS descriptions, targeting ASIC design, a software implementation of NTRUEncrypt, one of the finalists in the NIST PQC competition, a Private Membership Test (PMT) at a low computational complexity by using Bloom filters, and Homomorphic Encryption, and an intra-process memory protection mechanism for the RISC-V.

Time Label Presentation Title
Authors
07:00 CEST 9.6.1 WATERMARKING OF BEHAVIORAL IPS: A PRACTICAL APPROACH
Speaker:
Jianqi Chen, University of Texas at Dallas, US
Authors:
Jianqi Chen and Benjamin Carrion Schaefer, University of Texas at Dallas, US
Abstract
This paper proposes a practical method to watermark behavioral IP (BIPs) for High-Level Synthesis (HLS), such that the watermark can be unequivocally retrieved at the generated RTL code, while being unremovable. The main approaches to watermark BIPs so far have focus on modifying the HLS process by e.g. introducing watermarking-aware scheduling or register binding algorithms. The main problem with these approaches is that they involve having full control over the HLS tool’s internal behavior, which is not practically possible. Specifically, state-of-the-art HLS tools do not allow this type of controllability. Hence, these approaches are currently impossible to be implemented. On the other hand, commercial HLS tools make extensive use of synthesis directives in the form of pragmas. In this work we make use of these synthesis directives to assign operations in the source code to specific functional units given in the technology library in order to create the watermark. Experimental results show that our proposed method is effective in creating strong watermarks, while practical at the same time.
07:15 CEST 9.6.2 AVRNTRU: LIGHTWEIGHT NTRU-BASED POST-QUANTUM CRYPTOGRAPHY FOR 8-BIT AVR MICROCONTROLLERS
Speaker:
Hao Cheng, University of Luxembourg, LU
Authors:
Hao Cheng, Johann Großschädl, Peter B. Rønne and Peter Y. A. Ryan, University of Luxembourg, LU
Abstract
Introduced in 1996, NTRUEncrypt is not only one of the earliest but also one of the most scrutinized lattice-based cryptosystems and expected to remain secure in the upcoming era of quantum computing. Furthermore, NTRUEncrypt offers some efficiency benefits over “pre-quantum” cryptosystems like RSA or ECC since the low-level arithmetic operations are less computation-intensive and, thus, more suitable for constrained devices. In this paper we present AVRNTRU, a highly-optimized implementation of NTRUEncrypt for 8-bit AVR microcontrollers that we developed from scratch to reach high performance and resistance to timing attacks. AVRNTRU complies with the EESS #1 v3.1 specification and supports product-form parameter sets such as ees443ep1, ees587ep1, and ees743ep1. An entire encryption (including mask generation and blinding-polynomial generation) using the ees443ep1 parameters requires 847973 clock cycles on an ATmega1281 microcontroller; the decryption is more costly and has an execution time of 1051871 cycles. We achieved these results with the help of a novel hybrid technique for multiplication in a truncated polynomial ring, whereby one of the operands is a sparse ternary polynomial in product form and the other an arbitrary element of the ring. A constant-time multiplication in the ring given by the ees443ep1 parameters takes only 192577 cycles, which sets a new speed record for the arithmetic part of a lattice-based cryptosystem on AVR.
07:30 CEST IP9_1.1 SEALPK: SEALABLE PROTECTION KEYS FOR RISC-V
Speaker:
Leila Delshadtehrani, Boston University, US
Authors:
Leila Delshadtehrani, Sadullah Canakci, Manuel Egele and Ajay Joshi, Boston University, US
Abstract
With the continuous increase in the number of software-based attacks, there has been a growing effort towards isolating sensitive data and trusted software components from untrusted third-party components. Recently, Intel introduced a new hardware feature for intra-process memory isolation, called Memory Protection Keys (MPK). The limited number of unique domains (16) provided by Intel MPK prohibits its use in cases where a large number of domains are required. Moreover, Intel MPK suffers from the protection key use-after-free vulnerability. To address these shortcomings, in this paper, we propose an efficient intra-process isolation technique for the RISC-V open ISA, called SealPK, which supports up to 1024 unique domains. Additionally, we devise three novel sealing features to protect the allocated domains, their associated pages, and their permissions from modifications or tampering by an attacker. We demonstrate the efficiency of SealPK by leveraging it to implement an isolated secure shadow stack on an FPGA prototype.
07:31 CEST 9.6.3 REAL-TIME PRIVATE MEMBERSHIP TEST USING HOMOMORPHIC ENCRYPTION
Speaker:
Eduardo Chielle, New York University Abu Dhabi, AE
Authors:
Eduardo Chielle, Homer Gamil and Michail Maniatakos, New York University Abu Dhabi, AE
Abstract
With the ever increasing volume of private data residing on the cloud, privacy is becoming a major concern. Often times, sensitive information is leaked during a querying process between a client and an online server hosting a database; The query may leak information about the element the client is looking up, while sensitive details about the contents of its database can leak on the server side. The ability to check if an element is included in a database while maintaining both the client's and the server's privacy is known as the Private Membership Test. In this context, we propose a method to privately query a database with computational complexity O(1) using Bloom filters and Homomorphic Encryption. The proposed methodology also enables post-encryption insertions and deletions without requiring a new setup. Experimental results show that our proposed solution has practical setup, insertion and deletion times for databases of up to a few million entries, with constant query time less than 0.3s, considering a false positive rate lower than 0.001. We instantiate our methodology for a URL denylisting service, and demonstrate that it can provide solid security guarantees without affecting the user experience.

ASD Autonomous Systems Design (ASD): A Two-Day Special Initiative

Date: Thursday, 04 February 2021
Time: 07:00 CEST - 18:10 CEST

Organizers:
Rolf Ernst, Technical University Braunschweig, DE
Selma Saidi, Technische Universität Dortmund, DE
Dirk Ziegenbein, Robert Bosch GmbH, DE
Sebastian Steinhorst, Technical University of Munich, DE
Jyotirmoy Deshmukh, University of Southern California, US
Christian Laugier, INRIA Grenoble, FR

 

Two-Day Special Initiative

Fueled by the progress of artificial intelligence, autonomous systems become more and more integral parts of many Internet of Things (IoT) and Cyber-Physical Systems (CPS) applications, such as automated driving, robotics, avionics and industrial automation. Autonomous systems are self-governed and self-adaptive systems that are designed to operate in an open and evolving environment that has not been completely defined at design time. This poses a unique challenge to the design and verification of dependable autonomous systems. The DATE Special Initiative on Autonomous Systems Design (ASD) on Thursday and Friday will include high-profile keynotes and panel discussions, as well as peer-reviewed papers, invited contributions and interactive sessions addressing these challenges.

The Thursday of the DATE Special Initiative on Autonomous Systems Design (ASD) will start with an opening session where industry leaders from Airbus, Porsche and Robert Bosch will talk about their visions of autonomous systems, the challenges they see in the development of autonomous systems as well as how autonomous systems will impact the business in their industries. These input will be discussed in an open floor panel with eminent speakers from academia. After the opening session, three sessions will present peer-reviewed papers on "Reliable Autonomous Systems: Dealing with Failure & Anomalies", "Safety Assurance of Autonomous Vehicles" and "Designing Autonomous Systems: Experiences, Technology and Processes". Furthermore, a special session will discuss latest research on "Predictable Perception for Autonomous Systems".

The Friday Interactive Day of the DATE Special Initiative on Autonomous Systems Design (ASD) features keynotes from industry leaders as well as interactive discussions initiated by short presentations on several hot topics. Presentations from General Motors and BMW on predictable perception, as well as a session on dynamic risk assessment will fuel the discussion on how to maximize safety in a technically feasible manner. Speakers from TTTech and APEX.AI will present insights into Motionwise and ROS2 as platforms for automated vehicles. Further sessions will highlight topics such as explainable machine learning, self-adaptation for robustness and self-awareness for autonomy, as well as cybersecurity for connected vehicles.

     

    Autonomous Systems Design (ASD) Thursday Sessions

    07:00 - 08:00 9.1 Autonomous Systems Design: Opening Panel

    08:00 - 09:00 10.1 Reliable Autonomous Systems: Dealing with Failure & Anomalies

    09:00 - 09:30 IP.ASD_1 Interactive Presentations

    09:30 - 10:30 11.1 Safety Assurance of Autonomous Vehicles

    15:00 - 15:50 K.5 Keynote AUTONOMY: ONE STEP BEYOND ON COMMERCIAL AVIATION by Pascal Traverse, Airbus, FR

    16:00 - 17:00 12.1 Designing Autonomous Systems: Experiences, Technology and Processes

    17:00 - 17:30 IP.ASD_2 Interactive Presentations

    17:30 - 18:30 13.1 Predictable Perception for Autonomous Systems

    18:30 - 19:00 ASD Reception

     

    ASD Friday Interactive Day

    Detailed Program: W05 Special Initiative on Autonomous Systems Design (ASD)

    Sessions

    08:30 - 09:00 Opening & Introduction

    09:00 - 10:00 Dynamic Risk Assessment in Autonomous Systems

    10:00 - 11:00 Cybersecurity for Connected Autonomous Vehicles

    11:00 - 12:00 Self-adaptive safety- and mission-critical CPS: wishful thinking or absolute necessity? 

    14:00 - 15:00 Predictable Perception

    15:00 - 16:00 Perspicuous Computing

    16:00 - 17:00 Production Architectures & Platforms for Automated Vehicles

    17:00 - 18:00 Self-Awareness for Autonomy

     

    Poll

    Please contribute to our poll on your perspective on Autonomous Systems Design: http://www.polljunkie.com/poll/ygmrfc/date-2021-special-initiative-on-autonomous-systems-design-poll

     

    Registration

    For attending the ASD Thursday Sessions, please obtain a DATE conference registration.

    For attending the ASD Friday Interactive Day (W05), a separate free registration sponsored by Argo AI is required.

    Both registrations can be made here: https://www.date-conference.com/registration 

     

    Technical Program Committee

    • Houssam Abbas, Oregon State University, USA
    • Rasmus Adler, Fraunhofer IESE, Germany
    • Eric Armengaud, AVL, Germany
    • Bart Besselink, University of Groningen, Netherlands
    • Philippe Bonnifait, UTC Compiegne, France
    • Paolo Burgio, Università degli Studi di Modena e Reggio Emilia, Italy
    • Arvind Easwaran, Nanyang Technological University, Singapore
    • Sebastian Fischmeister, University of Waterloo, Canada
    • Roman Gansch, Robert Bosch GmbH, Germany
    • Sabine Glesner, TU Berlin, Germany
    • Dominique Gruyer, Université Gustave Eiffel, France
    • Mohammad Hamad, Technical University Munich, Germany
    • Xiaoging Jin, Apple, USA
    • Martina Maggio, Saarland University, Germany
    • Philipp Mundhenk, AUDI, Germany
    • Alessandra Nardi, Cadence, USA
    • Gabor Orosz, University of Michigan, USA
    • Claire Pagetti, Onera, France
    • Daniele Palossi, ETH Zurich, Switzerland
    • Alessandro Papadopoulos, Mälardalen University, Sweden
    • Alessandro Renzaglia, INRIA, France
    • Shreejith Shanker, Trinity College Dublin, Ireland
    • Dongwha Shin, Soongsil University, Korea
    • Aviral Shrivastava, Arizona State University, USA
    • Andrei Terechko, NXP Semiconductors, Netherlands
    • Lin Xue, Northeastern University, USA

     


    K.6 Embedded Keynote: "Privacy this unknown - The new design dimension of computing architecture"

    Date: Thursday, 04 February 2021
    Time: 07:00 CEST - 07:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/LL4TqfhGHPh5ddLuz

    Session chair:
    Ian O'Connor, Ecole Centrale de Lyon, FR

    Session co-chair:
    Francesco Regazzoni, University of Amsterdam and ALaRI - USI, NL

    The Security is often presented as being based on the CIA triad, where the “C” actually stands for Confidentiality. Indeed, in many human activities we like to keep some(things) confidential, or “private”; this is particularly true when these activities are done in the cyber world where a lot of our private data are transmitted, processed, and stored. In this talk, we will first introduce the concept of privacy, and then see how this is interlaced with two important research threads. First we’ll discuss how computer architectures and particularly “trusted components” in processors could be helpful to protect privacy, allowing us to trust remote systems. Finally, we’ll discuss the issues of side-channels (in a broad sense, not only in processors) that could lead to leak of private information. Bio: Mauro Conti is Full Professor at the University of Padua, Italy. He is also affiliated with TU Delft and University of Washington, Seattle. He obtained his Ph.D. from Sapienza University of Rome, Italy, in 2009. After his Ph.D., he was a Post-Doc Researcher at Vrije Universiteit Amsterdam, The Netherlands. In 2011 he joined as Assistant Professor the University of Padua, where he became Associate Professor in 2015, and Full Professor in 2018. He has been Visiting Researcher at GMU, UCLA, UCI, TU Darmstadt, UF, and FIU. He has been awarded with a Marie Curie Fellowship (2012) by the European Commission, and with a Fellowship by the German DAAD (2013). His research is also funded by companies, including Cisco, Intel, and Huawei. His main research interest is in the area of Security and Privacy. In this area, he published more than 350 papers in topmost international peer-reviewed journals and conference. He is Area Editor-in-Chief for IEEE Communications Surveys & Tutorials, and Associate Editor for several journals, including IEEE Communications Surveys & Tutorials, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Dependable and Secure Computing, and IEEE Transactions on Network and Service Management. He was Program Chair for TRUST 2015, ICISS 2016, WiSec 2017, ACNS 2020, and General Chair for SecureComm 2012 and ACM SACMAT 2013. He is Senior Member of the IEEE and ACM. He is a member of the Blockchain Expert Panel of the Italian Government.

    Time Label Presentation Title
    Authors
    07:00 CEST K.6.1 PRIVACY THIS UNKNOWN - THE NEW DESIGN DIMENSION OF COMPUTING ARCHITECTURES
    Speaker and Author:
    Mauro Conti, Università di Padova, IT
    Abstract
    The Security is often presented as being based on the CIA triad, where the "C" actually stands for Confidentiality. Indeed, in many human activities we like to keep something confidential, or "private"; this is particularly true when these activities are done in the cyber world, where a lot of our private data are transmitted, processed, and stored. In this talk, we will introduce the concept of privacy, and then see how this is interlaced with two important research threads. First we’ll discuss how computer architectures, and particularly “trusted components” in processors, could be helpful to protect privacy, allowing us to trust remote systems. Then, we will discuss the issues of side-channels (in a broad sense) that could lead to the leak of private information.

    10.1 Reliable Autonomous Systems: Dealing with Failure & Anomalies

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 09:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/Gz6kTsdb9rNGixqMQ

    Session chair:
    Rolf Ernst, TU. Braunschweig, DE

    Session co-chair:
    Rasmus Adler, Fraunhofer IESE, DE

    Organizers:
    Rolf Ernst, TU. Braunschweig, DE
    Selma Saidi, TU Dortmund, DE

    Autonomous Systems need novel approaches to detect and handle failures and anomalies. The first paper introduces an approach that adapts the placement of applications on a vehicle platform via adjustment of optimization criteria under safety goal restrictions. The second paper presents a formal worst-case failover timing analysis for online verification to assure safe vehicle operation under failover safety constraints. The third paper proposes an explanation component that observes and analyses an autonomous system and tries to derive explanations for anomalous behavior.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.1.1 C-PO: A CONTEXT-BASED APPLICATION-PLACEMENT OPTIMIZATION FOR AUTONOMOUS VEHICLES
    Speaker:
    Tobias Kain, Volkswagen AG, Wolfsburg, Germany, DE
    Authors:
    Tobias Kain1, Hans Tompits2, Timo Frederik Horeis3, Johannes Heinrich3, Julian-Steffen Müller4, Fabian Plinke3, Hendrik Decke5 and Marcel Aguirre Mehlhorn4
    1Volkswagen AG, AT; 2Technische Universitat Wien, AT; 3Institut für Qualitäts- und Zuverlässigkeitsmanagement GmbH, DE; 4Volkswagen AG, DE; 5Volkswagen AG, Wolfsburg, DE
    Abstract
    Autonomous vehicles are complex distributed systems consisting of multiple software applications and computing nodes. Determining the assignment between these software applications and computing nodes is known as the application-placement problem. The input of this problem is a set of applications, their requirements, a set of computing nodes, and their provided resources. Due to the potentially large solution space of the problem, an optimization goal defines which solution is desired the most. However, the optimization goal used for the application-placement problem is not static but has to be adapted according to the current context the vehicle is experiencing. Therefore, an approach for a context-based determination of the optimization goal for a given instance of an application-placement problem is required. In this paper, we introduce C-PO, an approach to address this issue. C-PO ensures that if the safety level of a system drops due to an occurring failure, the optimization goal for the successively executed application-placement determination aims to restore the safety level. Once the highest safety level is reached, C-PO optimizes the application placement according to the current driving situation. Furthermore, we introduce two methods for dynamically determining the required level of safety.
    08:15 CEST 10.1.2 WORST-CASE FAILOVER TIMING ANALYSIS OF DISTRIBUTED FAIL-OPERATIONAL AUTOMOTIVE APPLICATIONS
    Speaker:
    Philipp Weiss, TU Munich, DE
    Authors:
    Philipp Weiss1, Sherif Elsabbahy1, Andreas Weichslgartner2 and Sebastian Steinhorst1
    1TU Munich, DE; 2AUDI AG, DE
    Abstract
    Enabling fail-operational behavior of safety-critical software is essential to achieve autonomous driving. At the same time, automotive vendors have to regularly deliver over-the-air software updates. Here, the challenge is to enable a flexible and dynamic system behavior while offering, at the same time, a predictable and deterministic behavior of time-critical software. Thus, it is necessary to verify that timing constraints can be met even during failover scenarios. For this purpose, we present a formal analysis to derive the worst-case application failover time. Without such an automated worst-case failover timing analysis, it would not be possible to enable a dynamic behavior of safetycritical software within safe bounds. We support our formal analysis by conducting experiments on a hardware platform using a distributed fail-operational neural network. Our randomly generated worst-case results are as close as 6.0% below our analytically derived exact bound. Overall, our presented worstcase failover timing analysis allows to conduct an automated analysis at run-time to verify that the system operates within the bounds of the failover timing constraint such that a dynamic and safe behavior of autonomous systems can be ensured.
    08:30 CEST IP.ASD_1.1 DECENTRALIZED AUTONOMOUS ARCHITECTURE FOR RESILIENT CYBER-PHYSICAL PRODUCTION SYSTEMS
    Speaker:
    Laurin Prenzel, TU Munich, DE
    Authors:
    Laurin Prenzel and Sebastian Steinhorst, TU Munich, DE
    Abstract
    Real-time decision-making is a key element in the transition from Reconfigurable Manufacturing Systems to Autonomous Manufacturing Systems. In Cyber-Physical Production Systems (CPPS) and Cloud Manufacturing, most decision-making algorithms are either centralized, creating vulnerabilities to failures, or decentralized, struggling to reach the performance of the centralized counterparts. In this paper, we combine the performance of centralized optimization algorithms with the resilience of a decentralized consensus. We propose a novel autonomous system architecture for CPPS featuring an automatic production plan generation, a functional validation, and a two-stage consensus algorithm, combining a majority vote on safety and optimality, and a unanimous vote on feasibility and authenticity. The architecture is implemented in a simulation framework. In a case study, we exhibit the timing behavior of the configuration procedure and subsequent reconfiguration following a device failure, showing the feasibility of a consensus-based decision-making process.
    08:31 CEST 10.1.3 ANOMALY DETECTION AND CLASSIFICATION TO ENABLE SELF-EXPLAINABILITY OF AUTONOMOUS SYSTEMS
    Speaker:
    Verena Klös, TU Berlin, DE
    Authors:
    Florian Ziesche, Verena Klös and Sabine Glesner, TU Berlin, DE
    Abstract
    While the importance of autonomous systems in our daily lives and in the industry increases, we have to ensure that this development is accepted by their users. A crucial factor for a successful cooperation between humans and autonomous systems is a basic understanding that allows users to anticipate the behavior of the systems. Due to their complexity, complete understanding is neither achievable, nor desirable. Instead, we propose self-explainability as a solution. A self-explainable system autonomously explains behavior that differs from anticipated behavior. As a first step towards this vision, we present an approach for detecting anomalous behavior that requires an explanation and for reducing the huge search space of possible reasons for this behavior by classifying it into classes with similar reasons. We envision our approach to be part of an explanation component that can be added to any autonomous system.
    08:46 CEST IP.ASD_1.2 PROVABLY ROBUST MONITORING OF NEURON ACTIVATION PATTERNS
    Speaker and Author:
    Chih-Hong Cheng, DENSO AUTOMOTIVE Deutschland GmbH, DE
    Abstract
    For deep neural networks (DNNs) to be used in safety-critical autonomous driving tasks, it is desirable to monitor in operation time if the input for the DNN is similar to the data used in DNN training. While recent results in monitoring DNN activation patterns provide a sound guarantee due to building an abstraction out of the training data set, reducing false positives due to slight input perturbation has been an issue towards successfully adapting the techniques. We address this challenge by integrating formal symbolic reasoning inside the monitor construction process. The algorithm performs a sound worstcase estimate of neuron values with inputs (or features) subject to perturbation, before the abstraction function is applied to build the monitor. The provable robustness is further generalized to cases where monitoring a single neuron can use more than one bit, implying that one can record activation patterns with a finegrained decision on the neuron value interval.

    10.2 Multi-Partner Innovative Research Projects

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/kQbMhMNhK7n9Zg87S

    Session chair:
    Francisco J. Cazorla, BSC, ES

    Session co-chair:
    Carles Hernandez, Universitat Politècnica de València, ES

    In this exciting session papers present multi-partner innovative and/or highly-technological Research Projects that cover a wide range of topics from cyber-physical systems to Nano electronics. Papers in this session introduce several projects that are in initial execution stages (i.e. they are recently accepted), and summarize the technical work done and the  lessons learnt of other projects that are in final stages of their execution.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.2.1 GPU4S: MAJOR PROJECT OUTCOMES, LESSONS LEARNT AND WAY FORWARD
    Speaker:
    Leonidas Kosmidis, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
    Authors:
    Leonidas Kosmidis1, Ivan Rodriguez Ferrandez2, Alvaro Jover-Alvarez3, Sergi Alcaide4, Jérôme Lachaize5, Olivier Notebaert5, Antoine Certain5 and David Steenari6
    1Barcelona Supercomputing Center (BSC), ES; 2Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 3Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC), ES; 4Universitat Politècnica de Catalunya - Barcelona Supercomputing Center (BSC), ES; 5Airbus Defence and Space, Toulouse, FR; 6European Space Agency, NL
    Abstract
    Embedded GPUs have been identified from both private and government space agencies as promising hardware technologies to satisfy the increased needs of payload processing. The GPU4S (GPU for Space) project funded from the European Space Agency (ESA) has explored in detail the feasibility and the benefit of using them for space workloads. Currently at the closing phases of the project, in this paper we describe the main project outcomes and explain the lessons we learnt. In addition, we provide some guidelines for the next steps towards their adoption in space.
    08:15 CEST 10.2.2 EVEREST: A DESIGN ENVIRONMENT FOR EXTREME-SCALE BIG DATA ANALYTICS ON HETEROGENEOUS PLATFORMS
    Speaker:
    Christian Pilato, Politecnico di Milano, IT
    Authors:
    Christian Pilato1, Stanislav Bohm2, Fabien Brocheton3, Jeronimo Castrillon4, Riccardo Cevasco5, Vojtech Cima2, Radim Cmar6, Dionysios Diamantopoulos7, Fabrizio Ferrandi1, Jan Martinovic2, Gianluca Palermo1, Michele Paolino8, Antonio Parodi9, Lorenzo Pittaluga5, Daniel Raho8, Francesco Regazzoni10, Katerina Slaninova2 and Christoph Hagleitner11
    1Politecnico di Milano, IT; 2IT4Innovations, VSB – TU Ostrava, CZ; 3NUMTECH, FR; 4TU Dresden, DE; 5Duferco Energia, IT; 6Sygic, SK; 7IBM Research, CH; 8Virtual Open System, FR; 9Centro Internazionale di Monitoraggio Ambientale, IT; 10University of Amsterdam and ALaRI - USI, CH; 11IBM, CH
    Abstract
    High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements with the characteristics of the underlying hardware is a key element to improve the predictions thanks to high performance and better use of resources. We present EVEREST, a novel H2020 project started on October 1, 2020, that aims at developing a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support. In the different stages, EVEREST combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions. We describe the EVEREST approach and the use cases that drive our research.
    08:30 CEST IP.MPIRP_1.1 PROJECT OVERVIEW FOR STEP-UP!CPS – PROCESS, METHODS AND TECHNOLOGIES FOR UPDATING SAFETY-CRITICAL CYBER-PHYSICAL SYSTEMS
    Speaker:
    Carl Philipp Hohl, FZI Forschungszentrum Informatik, DE
    Authors:
    Thomas Strathmann1, Georg Hake2, Houssem Guissouma3, Carl Philipp Hohl4, Yosab Bebawy1, Sebastian Vander Maelen1 and Andrew Koerner5
    1OFFIS e.V., DE; 2University of Oldenburg, DE; 3Karlsruhe Institute of Technology (KIT), DE; 4FZI Forschungszentrum Informatik, DE; 5DLR, DE
    Abstract
    We describe the challenges addressed by the three year German national collaborative research project Step-Up!CPS that is currently in its third year. The goal of the project is to develop software methods and technologies for modular updates of safety-critical cyber-physical systems. To make this possible, contracts are utilized, which formally describe the behaviour of an update and make it verifiable at different times of the update life cycle. We have defined a development process that allows for a continuous improvement of such systems by monitoring their operation, identifying the need for updates, and development and deploying these updates in a safe and secure manner. We highlight the points along the update process that are necessary for a secure update and show how we counteract them in a contractually secured update process.
    08:31 CEST IP.MPIRP_1.2 VERIDEVOPS: AUTOMATED PROTECTION AND PREVENTION TO MEET SECURITY REQUIREMENTS IN DEVOPS
    Speaker:
    Eduard Enoiu, MDH, SE
    Authors:
    Andrey Sadovykh1, Gunnar Widforss2, Dragos Truscan3, Eduard Paul Enoiu2, Wissam Mallouli4, Rosa Iglesias5, Alessandra Bagnto6 and Olga Hendel2
    1Innopolis University, RU; 2Mälardalen University, SE; 3Åbo Akademi University, FI; 4Montimage EURL, FR; 5Ikerlan Technology Research Centre, ES; 6Softeam, FR
    Abstract
    VeriDevOps is a Horizon 2020 funded research project in its initial stage. The project started on 1.10.2020 and it will run for three years. VeriDevOps is about fast, flexible software engineering that efficiently integrates development, delivery, and operations, thus aiming at quality deliveries with short cycle time to address ever evolving challenges. Current software development practices are increasingly based on using both COTS and legacy components which make such systems prone to security vulnerabilities. The modern practice addressing ever changing conditions, DevOps, promotes frequent software deliveries, however, verification methods artifacts should be updated in a timely fashion to cope with the pace of the process. VeriDevOps aims at providing a faster feedback loop for verifying the security requirements and other quality attributes of large scale cyber-physical systems. VeriDevOps focuses on optimizing the security verification activities, by automatically creating verifiable models directly from security requirements, and using these models to check security properties on design models and generate artefacts such as, automatically generated tests or monitors that can be used later in the DevOps process. The main drivers for these advances are: Natural Language Processing, a combined formal verification and model-based testing approach and machine-learning-based security monitors. In this paper, we present the planned contributions of the project, its consortium and its planned way of working used to accomplish the expected results.
    08:32 CEST 10.2.3 NANO SECURITY: FROM NANO-ELECTRONICS TO SECURE SYSTEMS
    Speaker:
    Ilia Polian, University of Stuttgart, DE
    Authors:
    Ilia Polian1, Frank Altmann2, Tolga Arul3, Christian Boit4, Lucas Davi5, Rolf Drechsler6, Nan Du7, Thomas Eisenbarth8, Tim Güneysu9, Sascha Herrmann10, Matthias Hiller11, Rainer Leupers12, Farhad Merchant13, Thomas Mussenbrock14, Stefan Katzenbeisser3, Akash Kumar15, Wolfgang Kunz16, Thomas Mikolajick17, Vivek Pachauri18, Jean-Pierre Seifert4, Frank Sill Torres19 and Jens Trommer20
    1University of Stuttgart, DE; 2Fraunhofer Institute for Microstructure of Materials and Systems IMWS, DE; 3Chair of Computer Engineering, University of Passau, DE; 4TU Berlin, DE; 5Secure Software Systems Group, University Duisburg-Essen, DE; 6University of Bremen/DFKI, DE; 7Material Systems and Nanoelectronics Group, TU Chemnitz, DE; 8University of Lübeck & WPI, DE; 9Ruhr-Universität Bochum & DFKI, DE; 10Center for Microtechnologies, TU Chemnitz, DE; 11Fraunhofer AISEC, DE; 12RWTH Aachen University, DE; 13Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, DE; 14Chair of Plasma Technology, Ruhr University Bochum, DE; 15Chair for Processor Design, TU Dresden, DE; 16TU Kaiserslautern, DE; 17NaMLab Gmbh / TU Dresden, DE; 18Institute of Materials in Electrical Engineering 1, RWTH Aachen University, DE; 19German Aerospace Center, DE; 20Namlab gGmbH, DE
    Abstract
    The field of computer hardware stands at the verge of a revolution driven by recent breakthroughs in emerging nano-devices. ``Nano Security'' is a new Priority Program recently approved by DFG, the German Research Council. This initial-stage project initiative at the crossroads of nano-electronics and hardware-oriented security includes 11 projects with a total of 23 Principal Investigators from 18 German institutions. It considers the interplay between security and nano-electronics, focusing on a dichotomy which emerging nano-devices (and their architectural implications) have on system security. The projects within the Priority Program consider both: potential security threats and vulnerabilities stemming from novel nano-electronics, and innovative approaches to establishing and improving system security based on nano-electronics. This paper provides an overview of the Priority Program's overall philosophy and discusses the scientific objectives of its individual projects.
    08:47 CEST IP.MPIRP_2.1 THE UP2DATE BASELINE RESEARCH PLATFORMS
    Speaker:
    Alvaro Jover, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
    Authors:
    Alvaro Jover-Alvarez1, Alejandro J. Calderón2, Ivan Rodriguez Ferrandez3, Leonidas Kosmidis4, Kazi Asifuzzaman4, Patrick Uven5, Kim Grüttner6, Tomaso Poggi7 and Irune Agirre8
    1Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC), ES; 2Ikerlan Technology Research Centre, ES; 3Barcelona Supercomputing Center and Universitat Politècnica de Catalunya, ES; 4Barcelona Supercomputing Center (BSC), ES; 5OFFIS, DE; 6OFFIS - Institute for Information Technology, DE; 7IKERLAN, ES; 8Ikerlan, ES
    Abstract
    The UP2DATE H2020 project focuses on high-performance heterogeneous embedded platforms for critical systems. We will develop observability and controllability solutions to support online updates while ensuring safety and security for mixed-criticality tasks. In this paper, we describe the rationale behind the selection of the baseline research platforms which will be used to develop and demonstrate the project concepts, including a performance comparison to identify the most efficient one.

    10.3 Hardware-aware Training

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/id7KyqmwgKYxFg3Z7

    Session chair:
    Callie Cong (Callie) Hao, Georgia Institute of Technology, US

    Session co-chair:
    Bert Bert Moons, Qualcomm, US

    Even with recent innovations in hardware and software platforms, both training and inference of deep learning models remains costly. These costs can be alleviated by optimizing neural networks towards their hardware targets and by using hardware-efficient number representations during training and deployment. This session proposes novel methodologies for hardware-aware training. In the first paper, the authors present a form of multi-objective differentiable NAS and focus on better controlling the correlation between projected latency during training (soft-architecture) and real latency during deployment (hard-architecture). The second paper proposes to use a recent innovation in numerical representation known as posits as an efficient number format for generative adversarial network (GAN) training and inference. The third paper proposes a new DNN training technique that quantizes each network layer to a different bit-width based on the sparsity of output activations for that layer, resulting in performance and energy-efficiency improvement. The last paper  proposes a novel sparsity pattern with mixed granularity named joint sparsity, combining the advantages of both unstructured and structured sparsity.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.3.1 MDARTS: MULTI-OBJECTIVE DIFFERENTIABLE NEURAL ARCHITECTURE SEARCH
    Speaker:
    Sunghoon Kim, Samsung Electronics, KR
    Authors:
    Sunghoon Kim1, Hyunjeong Kwon1, Eunji Kwon2, Youngchang Choi1, Taehyun Oh1 and Seokhyeong Kang3
    1POSTECH, KR; 2Postech, KR; 3Pohang University of Science and Technology, KR
    Abstract
    In this work, we present a differentiable neural architecture search(NAS) method that takes into account two competing objectives,quality of result (QoR) and quality of service (QoS) with hardwaredesign constraints. NAS research has recently received a lot ofattention due to its ability to automatically find architecture can-didates that can outperform handcrafted ones. However, the NASapproach which complies with actual HW design constraints hasbeen under-explored. A naive NAS approach for this would be tooptimize a combination of two criteria of QoR and QoS, but thesimple extension of the prior art often yields degenerated architec-tures, and suffers from a sensitive hyperparameter tuning. In thiswork, we propose a multi-objective differential neural architecturesearch, called MDARTS. MDARTS has an affordable search timeand can find Pareto frontier of QoR versus QoS. We also identifythe problematic gap between all the existing differentiable NASresults and those final post-processed architectures, where soft con-nections are binarized. This gap leads to performance degradationwhen the model is deployed. To mitigate this gap, we propose a sep-aration loss that discourages indefinite connections of componentsby implicitly minimizing entropy.
    08:15 CEST 10.3.2 POSIT ARITHMETIC FOR THE TRAINING AND DEPLOYMENT OF GENERATIVE ADVERSARIAL NETWORKS
    Speaker:
    Nhut Minh Ho, National University of Singapore, SG
    Authors:
    Nhut-Minh Ho1, Duy-Thanh Nguyen2, Himeshi De Silva1, John L. Gustafson1, Weng-Fai Wong1 and Ik Joon Chang3
    1National University of Singapore, SG; 2Kyung Hee University, KR; 3Kyunghee University, KR
    Abstract
    This paper proposes a set of methods which enable slow precision posit™ arithmetic to be successfully used for the training of Generative Adversarial Networks (GAN) with minimal quality loss. We show that ultra low precision posits, as small as 6-bits, can achieve high quality output for the generation phase after training. We also evaluate the usage of other floating-point formats and compare them to 8-bit posits in the context of GAN training. Our scaling and adaptive calibration techniques are capable of producing superior training quality for 8-bit posits that surpasses 8-bit floating-point and matches the results of half precision. Hardware simulation results indicate that our methods have higher energy efficiency compared to both 16- and 8-bit float training systems.
    08:30 CEST IP9_1.2 JOINT SPARSITY WITH MIXED GRANULARITY FOR EFFICIENT GPU IMPLEMENTATION
    Speaker:
    Chuliang Guo, Zhejiang University, CN
    Authors:
    Chuliang Guo1, Xingang Yan1, Yufei Chen1, He Li2, Xunzhao Yin1 and Cheng Zhuo1
    1Zhejiang University, CN; 2University of Cambridge, GB
    Abstract
    Given the over-parameterization property in recent deep neural networks, sparsification is widely used to compress networks and save memory footprint. Unstructured sparsity, i.e., fine-grained pruning, can help preserve model accuracy, while structured sparsity, i.e., coarse-grained pruning, is preferred for general-purpose hardwares, e.g., GPUs. This paper proposes a novel joint sparsity pattern using mixed granularity to take advantage of both unstructured and structured sparsity. We utilize a heuristic strategy to infer the joint sparsity pattern by mixing vector-wise fine-grained and block-wise coarse-grained pruning masks. Experimental results show that the joint sparsity can achieve higher model accuracy and sparsity ratio while consistently maintaining moderate inference speed for VGG-16 on CIFAR-100 in comparison to the commonly used block sparsity and balanced sparsity strategies.
    08:31 CEST 10.3.3 ACTIVATION DENSITY BASED MIXED-PRECISION QUANTIZATION FOR ENERGY EFFICIENT NEURAL NETWORKS
    Speaker:
    Yeshwanth Venkatesha, Yale University, US
    Authors:
    Karina Vasquez1, Yeshwanth Venkatesha2, Abhishek Moitra2, Abhiroop Bhattacharjee2 and Priya Panda2
    1University of Engineering and Technology (UTEC), Peru, PE; 2Yale University, US
    Abstract
    As neural networks gain widespread adoption in embedded devices, there is a growing need for model compression techniques to facilitate seamless deployment in resource-constrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most quantization approaches take a fully trained model, then apply different heuristics to determine the optimal bit-precision for different layers of the network, and finally retrain the network to regain any drop in accuracy. Based on Activation Density—the proportion of non-zero activations in a layer—we propose a novel intraining quantization method. Our method calculates optimal bitwidth/ precision for each layer during training yielding an energy-efficient mixed precision model with competitive accuracy. Since we train lower precision models progressively during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for re-training. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures and report the accuracy and energy estimates for the same. We achieve up to 4.5x benefit in terms of estimated multiply-and-accumulate (MAC) reduction while reducing the training complexity by 50% in our experiments. To further evaluate the energy benefits of our proposed method, we develop a mixed-precision scalable Process In Memory (PIM) hardware accelerator platform. The hardware platform incorporates shift-add functionality for handling multibit precision neural network models. Evaluating the quantized models obtained with our proposed method on the PIM platform yields about 5x energy reduction compared to baseline 16-bit models. Additionally, we find that integrating activation density based quantization with activation density based pruning (both conducted during training) yields up to ~198x and ~44x energy reductions for VGG19 and ResNet18 architectures respectively on PIM platform compared to baseline 16-bit precision, unpruned models.

    10.4 EDA tools: the next generation

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/owWfFvDXga3iFLXve

    Session chair:
    Marie-Minerve Louerat, LIP6, FR

    Session co-chair:
    Florence Azais, University of Montpellier, CNRS, FR

    This session presents new proposals for EDA tools dedicated to.analog circuits, targeting various aspects such as the recognition of building blocks, analog sizing taking into account layout parasaitics, and 2D electric field analysis.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.4.1 LIBRARY-FREE STRUCTURE RECOGNITION FOR ANALOG CIRCUITS
    Speaker:
    Maximilian Neuner, TU Munich, DE
    Authors:
    Maximilian Neuner, Inga Abel and Helmut Graeb, TU Munich, DE
    Abstract
    Extracting structural information of a design is one crucial aspect of many circuit verification and synthesis methods. State-of-the-art structure recognition methods use a predefined building block library to identify the basic building blocks in a circuit. However, the capability of these algorithms is limited by the scope, correctness and completeness of the provided library. This paper presents a new method to automatically generate the recognition rules required to identify a given circuit topology in a large design. Device pairs are grouped into building blocks by analyzing their characteristics, e.g., their connectivity, to enable a structure recognition as unambiguous as possible. The resulting blocks are consecutively assembled to larger blocks until the full building block description of the given topology has been established. Building block libraries dedicated to one specific topology type, e.g., operational amplifiers, can be obtained by applying the method to its basic version, subsequently extending the generated library by the additional elements required to identify its topology variants using the presented method. Experimental results for six folded cascode amplifier and five level shifter topologies are given.
    08:15 CEST 10.4.2 PARASITIC-AWARE ANALOG CIRCUIT SIZING WITH GRAPH NEURAL NETWORKS AND BAYESIAN OPTIMIZATION
    Speaker:
    Mingjie Liu, University of Texas at Austin, US
    Authors:
    Mingjie Liu1, Walker Turner2, George Kokai2, Brucek Khailany2, David Z. Pan3 and Haoxing Ren2
    1University of Texas Austin, US; 2NVIDIA Corporation, US; 3University of Texas at Austin, US
    Abstract
    Layout parasitics significantly impact the performance of analog integrated circuits, leading to discrepancies between schematic and post-layout performance and requiring several iterations to achieve design convergence. Prior work has accounted for parasitic effects during the initial design phase but relies on automated layout generation for estimating parasitics. In this work, we leverage recent developments in parasitic prediction using graph neural networks to eliminate the need for in-the-loop layout generation. We propose an improved surrogate performance model using parasitic graph embeddings from the pre-trained parasitic prediction network. We further leverage dropout as an efficient prediction of uncertainty for Bayesian optimization to automate transistor sizing. Experimental results demonstrate the proposed surrogate model has 20\% better $R^2$ prediction score and improves optimization convergence by 3.7 times and 2.1 times compared to conventional Gaussian process regression and neural network based Bayesian linear regression, respectively. Furthermore, the inclusion of parasitic prediction in the optimization loop could guarantee satisfaction of all design constraints, while schematic-only optimization fail numerous constraints if verified with parasitic estimations.
    08:30 CEST IP8_1.2 (Best Paper Award Candidate)
    SYSTEM LEVEL VERIFICATION OF PHASE-LOCKED LOOP USING METAMORPHIC RELATIONS
    Speaker:
    Muhammad Hassan, DFKI GmbH, DE
    Authors:
    Muhammad Hassan1, Daniel Grosse2 and Rolf Drechsler3
    1Cyber Physical Systems, DFKI, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI, DE
    Abstract
    In this paper we build on Metamorphic Testing (MT), a verification technique which has been employed very successfully in the software domain. The core idea is to uncover bugs by relating consecutive executions of the program under test. Recently, MT has been applied successfully to the verification of Radio Frequency (RF) amplifiers at the system level as well. However, this is clearly not sufficient as the true complexity stems from Analog/Mixed-Signal (AMS) systems. In this paper, we go beyond pure analog systems, i.e. we expand MT to verify AMS systems. As a challenging AMS system, we consider an industrial PLL. We devise a set of eight generic Metamorphic Relations (MRs). Theses MRs allow to verify the PLL behavioral at the component level and at the system level. Therefore, we have created MRs considering analog-to-digital as well as digital-to-digital behavior. We found a critical bug in the industrial PLL which clearly demonstrates the quality and potential of MT for AMS verification.
    08:31 CEST 10.4.3 DATA-DRIVEN ELECTROSTATICS ANALYSIS BASED ON PHYSICS-CONSTRAINED DEEP LEARNING
    Speaker:
    Wentian Jin, University of California, Riverside, US
    Authors:
    Wentian Jin1, Shaoyi Peng1 and Sheldon Tan2
    1University of California, Riverside, US; 2University of California at Riverside, US
    Abstract
    Computing the electric potential and electric field is important for modeling and analysis of VLSI chip and high speed circuits. For instance, it is an important step for DC analysis for high speed circuits as well as dielectric reliability and capacitance extraction for VLSI interconnects. In this paper, we propose a new data-driven meshless 2D analysis method, called PCEsolve, of electric potential and electric fields based on the physics-constrained deep learning scheme. We show how to formulate the differential loss functions to consider the Laplace differential equations with voltage boundary conditions for typical electrostatic analysis problem so that the supervised learning process can be carried out. We apply the resulting PCEsolve solver to calculate electric potential and electric field for VLSI interconnects with complicated boundaries. We show the potential and limitations of physics-constrained deep learning for practical electrostatics analysis. Our study for purely label-free training (in which no information from FEM solver is provided) shows that PCEsolve can get accurate results around the boundaries, but the accuracy degenerates in regions far away from the boundaries. To mitigate this problem, we explore to add some simulation data or labels at collocation points derived from FEM analysis and resulting PCEsolve can be much more accurate across all the solution domain. Numerical results demonstrate that the PCEsolve achieves an average error rate of 3.6% on 64 cases with random boundary conditions and it is 27.5x faster than COMSOL on test cases. The speedup can be further boosted to ~38000x in single-point estimations. We also study the impacts of weights on different components of loss functions to improve the model accuracy for both voltage and electric field.

    10.5 Coarse Grained Reconfigurable Architectures

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/sH3MveXQ9wGZGfzSh

    Session chair:
    Daniel Ziener, TU Ilmenau, DE

    Session co-chair:
    Michaela Blott, Xilinx, US

    This session explores approaches for improving coarse grained reconfigurable architectures, including an automated refinement approach for accelerators, an improved scheduling approach, and an architecture with extensions to support deep neural network workloads. Two EPs explore an optimised multiply-accumulate unit for deep learning, and a custom accelerator for video encoding.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.5.1 AURORA: AUTOMATED REFINEMENT OF COARSE-GRAINED RECONFIGURABLE ACCELERATORS
    Speaker:
    Cheng Tan, Pacific Northwest National Lab, US
    Authors:
    Cheng Tan1, Chenhao Xie1, Ang Li1, Kevin Barker1 and Antonino Tumeo2
    1Pacific Northwest National Lab, US; 2Pacific Northwest National Laboratory, US
    Abstract
    Coarse-grained reconfigurable arrays (CGRAs), loosely defined as arrays of functional units interconnected through a network-on-chip (NoC), provide higher flexibility than domain-specific ASIC accelerators while offering increased hardware efficiency with respect to fine-grained reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs). Unfortunately, designing a CGRA for a specific application domain involves enormous software/hardware engineering effort (e.g., designing the CGRA, map operations onto the CGRA, etc) and requires the exploration on a large design space (e.g., applying appropriate loop transformation on each application, specializing the reconfigurable processing elements of the CGRA, refining the network topology, deciding the size of the data memory, etc). In this paper, we propose AURORA -- a software/hardware co-design framework to automatically synthesize optimal CGRA given a set of applications of interest.
    08:15 CEST 10.5.2 SUBGRAPH DECOUPLING AND RESCHEDULING FOR INCREASED UTILIZATION IN CGRA ARCHITECTURE
    Speaker:
    Chen Yin, Shanghai Jiao Tong University, CN
    Authors:
    Chen Yin, Qin Wang, Jianfei Jiang, Weiguang Sheng, Guanghui He, Zhigang Mao and Naifeng Jing, Shanghai Jiao Tong University, CN
    Abstract
    When coarse-grained reconfigurable array architecture (CGRA) is shifting towards general-purpose, some complex control flows, such as nested loop, conditional branch and data dependence, may embarrass it and reduce the processing element (PE) array utilization by breaking the intact dataflow graph (DFG) into multiple regions with inconsistent control regions. This paper proposes subgraph decoupling and rescheduling, which decouples the inconsistent regions into control-independent subgraphs. Each subgraph can be rescheduled with zero-cost domino context switching and parallelized to fully utilize the PE resources. Then, we propose lightweight hardware changes based on general CGRA architecture to enable our design. The experiment results show that our proposal can improve the performance and energy efficiency by 1.35x and 1.18x over a static-mapped CGRA (Plasticine), and by 1.27x and 1.45x over an instruction-driven CGRA (TIA).
    08:30 CEST IP8_5.1 A 93 TOPS/WATT NEAR-MEMORY RECONFIGUREABLE SAD ACCELERATOR FOR HEVC/AV1/JEM ENCODING
    Speaker:
    Jainaveen Sundaram Priya, Intel Corporation, US
    Authors:
    Jainaveen Sundaram Priya1, Srivatsa Rangachar Srinivasa2, Dileep Kurian3, Indranil Chakraborty4, Sirisha Kale1, Nilesh Jain1, Tanay Karnik5, Ravi Iyer5 and Anuradha Srinivasan6
    1Intel Corporation, US; 2Intel Labs, US; 3Intel technologies, IN; 4Purdue University, US; 5Intel, US; 6Intel, IN
    Abstract
    Motion Estimation (ME) is a major bottleneck of a Video encoding pipeline. This paper presents a low power near memory Sum of Absolute Difference (SAD) accelerator for ME. The accelerator is composed of 64 modular SAD Processing Elements (PEs) on a Reconfigurable fabric, offering maximal parallelism to support traditional and futuristic Rate-Distortion-Optimization (RDO) schemes consistent with HEVC/AV1/JEM. The accelerator offers up-to 55% speedup over State-of-art accelerators and a 7x speedup when compared to a 12 core Intel Xeon E5 processor. Our solution achieves 93 TOPS/Watt running at 500MHz frequency, capable of processing real-time 4K 30fps video. Synthesized in 22nm process, the accelerator occupies 0.08mm2 and consumes 5.46mW dynamic power.
    08:31 CEST IP8_5.2 TRIPLE FIXED-POINT MAC UNIT FOR DEEP LEARNING
    Speaker:
    Madis Kerner, Taltech, EE
    Authors:
    Madis Kerner1, Kalle Tammemäe1, Jaan Raik2 and Thomas Hollstein2
    1Taltech, EE; 2Tallinn University of Technology, EE
    Abstract
    Deep Learning (DL) algorithms have proved to be successful in various domains. Typically, the models use FloatingPoint (FP) numeric formats and are executed on GraphicalProcessing Units (GPUs). However, Field Programmable Gate Arrays (FPGAs) are more energy-efficient and, therefore, a better platform for resource-constrained devices. As the FP design infers many FPGA resources, it is replaced with quantized fixed-point implementations in state-of-the-art. The loss of precision is mitigated by dynamically adjusting the radix point on network layers, reconfiguration, and re-training. In this paper, we present the first Triple Fixed-Point (TFxP) architecture, which provides the computational precision of FP while using significantly fewer hardware resources and does not need network re-training. Based on a comparison of FP and existing Fixed-Point (FxP)implementations in combination with a detailed precision analysis of YOLOv2 weights and activation values, the novel TFxP format is introduced.
    08:32 CEST 10.5.3 NP-CGRA: EXTENDING CGRAS FOR EFFICIENT PROCESSING OF LIGHT-WEIGHT DEEP NEURAL NETWORKS
    Speaker:
    Jongeun Lee, Dept. of Electrical Engineering, Ulsan National Institute of Science and Technology (UNIST), KR
    Authors:
    Jungi Lee1 and Jongeun Lee2
    1UNIST, KR; 2Ulsan National Institute of Science and Technology (UNIST), KR
    Abstract
    Coarse-grained reconfigurable architectures(CGRAs) can provide both high energy efficiency and flexibility, making them well-suited for machine learning applications. However previous work on CGRAs has a very limited support for deep neural networks (DNNs), especially for recent light-weight models such as depthwise separable convolution (DSC), which are an important workload for mobile environment. In this paper, we propose a set of architecture extensions and a mapping scheme to greatly enhance CGRA’s performance for DSC kernels. Our experimental results using MobileNets demonstrate that our proposed CGRA enhancement can deliver 8∼18× improvement in area-delay product depending on layer type, over a baseline CGRA with a state-of-the-art CGRA compiler. Moreover, our proposed CGRA architecture can also speed up 3D convolution with similar efficiency than previous work, demonstrating the effectiveness of our architectural features beyond DSC layers.

    10.6 Smart Cities, Internet of Everything, Smart Consumer Electronics

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/FsH8KA9oW9cyhdN5r

    Session chair:
    Fabrizio Lamberti, Politecnico di Torino, IT

    Session co-chair:
    Himanshu Thapliyal, University of Kentucky, US

    This session covers a bunch of Consumer Electronics technologies which are relevant to the development of smart cities, supported by the ongoing Internet of Things revolution. In particular, presentations will tackle human-activity recognition, sensor fusion, device localization and tracking, as well as security aspects in vehicular networks, mostly backed up by machine and deep learning solutions.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.6.1 (Best Paper Award Candidate)
    ORIGIN: ENABLING ON-DEVICE INTELLIGENCE FOR HUMAN ACTIVITY RECOGNITION USING ENERGY HARVESTING WIRELESS SENSOR NETWORKS
    Speaker:
    Cyan Subhra Mishra, Pennsylvania State University, US
    Authors:
    Cyan Subhra Mishra1, John (Jack) Sampson2, Mahmut Kandemir3 and Vijaykrishnan Narayanan4
    1The Pennsylvania State University, US; 2Penn State, US; 3PSU, US; 4Penn State University, US
    Abstract
    There is an increasing demand for performing machine learning tasks, such as human activity recognition (HAR) on emerging ultra-low-power internet of things (IoT) platforms. Recent works show substantial efficiency boosts from performing inference tasks directly on the IoT nodes rather than merely transmitting raw sensor data. However, the computation and power demands of deep neural network (DNN) based inference pose significant challenges when executed on the nodes of an energy-harvesting wireless sensor network (EH-WSN). Moreover, managing inferences requiring responses from multiple energy-harvesting nodes imposes challenges at the system level in addition to the constraints at each node. This paper presents a novel scheduling policy along with an adaptive ensemble learner to efficiently perform HAR on a distributed energy-harvesting body area network. Our proposed policy, Origin, strategically ensures efficient and accurate individual inference execution at each sensor node by using a novel activity-aware scheduling approach. It also leverages the continuous nature of human activity when coordinating and aggregating results from all the sensor nodes to improve final classification accuracy. Further, Origin proposes an adaptive ensemble learner to personalize the optimizations based on each individual user. Experimental results using two different HAR data-sets show Origin, while running on harvested energy, to be at least 2.5% more accurate than a classical battery-powered energy aware HAR classifier continuously operating at the same average power.
    08:15 CEST 10.6.2 A DEEP LEARNING APPROACH OF SENSOR FUSION INFERENCE AT THE EDGE
    Speaker:
    Thomas Becnel, First Author, US
    Authors:
    Thomas Becnel and Pierre-Emmanuel Gaillardon, University of Utah, US
    Abstract
    The advent of large scale urban sensor networks has enabled a paradigm shift of how we collect and interpret data. By equipping these sensor nodes with emerging low-power hardware accelerators, they become powerful edge devices, capable of locally inferring latent features and trends from their fused multivariate data. Unfortunately, traditional inference techniques are not well suited for operation in edge devices, or simply fail to capture many statistical aspects of these low-cost sensors. As a result, these methods struggle to accurately model nonlinear events. In this work, we propose a deep learning methodology that is able to infer unseen data by learning complex trends and the distribution of the fused time-series inputs. This novel hybrid architecture combines a multivariate Long Short-Term Memory (LSTM) branch and two convolutional branches to extract time-series trends as well as short-term features. By normalizing each input vector, we are able to magnify features and better distinguish trends between series. As a demonstration of the broad applicability of this technique, we use data from a currently deployed pollution monitoring network of low-cost sensors to infer hourly ozone concentrations at the device level. Results indicate that our technique greatly outperforms traditional linear regression techniques by 6× as well as state-of-the-art multivariate time-series techniques by 1.4× in mean squared error. Remarkably, we also show that inferred quantities can achieve lower variability than the primary sensors which produce the input data.
    08:30 CEST IP9_2.1 NEIGHBOR OBLIVIOUS LEARNING(NOBLE) FOR DEVICE LOCALIZATION AND TRACKING
    Speaker:
    Zichang Liu, Rice University, US
    Authors:
    Zichang Liu, Li Chou and Anshumali Shrivastava, Rice University, US
    Abstract
    On-device localization and tracking are increasingly crucial for various applications. Machine learning (ML) techniques are widely adopted along with the rapidly growing amount of data. However, during training, almost none of the ML techniques incorporate the known structural information such as floor plan, which can be especially useful in indoor or other structured environments. The problem is incredibly hard because the structural properties are not explicitly available, making most structural learning approaches inapplicable. We study our method through the intuitions from manifold learning. Whereas existing manifold methods utilize neighborhood information such as Euclidean distances, we quantize the output space to measure closeness on the structure. We propose Neighbor Oblivious Learning (NObLe) and demonstrate our approach's effectiveness on two applications, wifi-based fingerprint localization and inertial measurement unit(IMU) based device tracking. We show that oble gives significant improvement over state-of-art prediction accuracy.
    08:31 CEST IP9_2.2 A LOW-COST BLE-BASED DISTANCE ESTIMATION, OCCUPANCY DETECTION, AND COUNTING SYSTEM
    Speaker:
    Florenc Demrozi, Computer Science Department, University of Verona, Italy, IT
    Authors:
    Florenc Demrozi1, Fabio Chiarani1 and Graziano Pravadelli2
    1Computer Science Department, University of Verona, IT; 2University of Verona, IT
    Abstract
    This article presents a low-cost system for distance estimation, occupancy counting, and presence detection based on Bluetooth Low Energy radio signal variation patterns that mitigates the limitation of existing approaches related to economic cost, privacy concerns, computational requirements, and lack of ubiquitousness. To explore the approach effectiveness, exhaustive tests have been carried out on four different datasets by exploiting several pattern recognition models.
    08:32 CEST 10.6.3 REAL-TIME DETECTION AND LOCALIZATION OF DENIAL-OF-SERVICE ATTACKS IN HETEROGENEOUS VEHICULAR NETWORKS
    Speaker:
    Meenu Rani Dey, Indian Institute of Technology, Guwahati, IN
    Authors:
    Meenu Rani Dey1, Moumita Patra1 and Prabhat Mishra2
    1Indian Institute of Technology Guwahati, IN; 2University of Florida, US
    Abstract
    Vehicular communication has emerged as a powerful tool for providing a safe and comfortable driving experience for users. Long Term Evolution (LTE) supports and enhances the quality of vehicular communication due to its properties such as, high data rate, spatial reuse, and low delay. However, high mobility of vehicles introduces a wide variety of security threats, including Denial-of-Service (DoS) attacks. In this paper, we propose an effective solution for real-time detection and localization of DoS attacks in an LTE-based vehicular network with mobile network components (e.g., vehicles, femto access points, etc.). We consider malicious data transmission by vehicles in two ways -- using real identification (unintentional) and using fake identification. Our attack detection technique is based on data packet counter and average packet delivery ratio which helps to efficiently detect attack faster than traditional approaches. We use triangulation method for localizing the attacker, and analyze average packet delay incurred by vehicles by modelling the system as an M/M/m queue. Simulation results demonstrate that our proposed technique significantly outperforms state-of-the-art techniques.

    10.7 Software and architectural techniques for dependability

    Date: Thursday, 04 February 2021
    Time: 08:00 CEST - 08:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/3AkYmayZ8dLRX3nnZ

    Session chair:
    Paolo Rech, UFRGS, BR

    Session co-chair:
    Savino Alessandro, Politecnico di Torino, IT

    This session talks about dependability solutions at different abstraction levels, from the architecture up to the software layer. The solutions span from traditional replication to approximation, and are implemented at different levels from compiler, operating system and architecture, with low overheads.

    Time Label Presentation Title
    Authors
    08:00 CEST 10.7.1 CHITIN: A COMPREHENSIVE IN-THREAD INSTRUCTIONREPLICATION TECHNIQUE AGAINST TRANSIENT FAULTS
    Speaker:
    Hwisoo So, Yonsei University, KR
    Authors:
    HwiSoo So1, Moslem Didehban2, Jinhyo Jung1, Aviral Shrivastava3 and Kyoungwoo Lee1
    1Yonsei University, KR; 2Cadence Design Systems, US; 3Arizona State University, US
    Abstract
    Soft errors have become one of the most important design concerns due to drastic technology scaling. Software-based error detection techniques are attractive, due to their flexibility and hardware independence. However, our in-depth analysis reveals that the state-of-the-art techniques in the area cannot provide comprehensive fault coverage: i) their control-flow protection schemes provide incomplete redundancy of original instructions, ii) they do not protect function calls and returns, and iii) their instruction scheduling leaves many vulnerabilities open. In this paper, we propose CHITIN – code transformations for soft error resilience that adopts the load-back checking scheme of nZDC, an improved version of SWIFT-like control-flow protection scheme, and a contiguous scheduling of the original and redundant instructions to dramatically improve the vulnerability from soft errors that disrupt the control-flow. Our fault injection experiments demonstrate that CHITIN can reduce more than 89% of the silent data corruptions in the state-of-the-art solutions.
    08:15 CEST 10.7.2 ESTIMATION OF LINUX KERNEL EXECUTION PATH UNCERTAINTY FOR SAFETY SOFTWARE TEST COVERAGE
    Speaker:
    Imanol Allende, Ikerlan, ES
    Authors:
    Imanol Allende1, Nicholas Mc Guire2, Jon Perez3, Lisandro Gabriel Monsalve1, Javier Fernández4 and Roman Obermaisser5
    1Ikerlan Technology Research Centre, ES; 2Opentech EDV Research GmbH, AT; 3Ikerlan, ES; 4Ikerlan Technological Research Center & Universitat Politécnica de Catalunya, ES; 5University of Siegen, DE
    Abstract
    With the advent of next-generation safety-related systems, different industries face multiple challenges in ensuring the safe operation of these systems according to traditional safety and assurance techniques. The increasing complexity that characterizes these systems hampers the maximum achievable test coverage during system verification and, consequently, it often results in untested behaviors that hinder safety assurance and represent potential risk sources during system operation. In the context of paving the way towards quantifying the risks caused by software malfunction and, hence, towards the safety-compliance of next-generation safety-related systems, this paper studies and provides a method to estimate the probability of Linux kernel execution paths that remain unobserved during the test campaign.
    08:30 CEST IP8_6.1 AUTOMATED SOFTWARE COMPILER TECHNIQUES TO PROVIDE FAULT TOLERANCE FOR REAL-TIME OPERATING SYSTEMS
    Speaker:
    Benjamin James, Brigham Young University, US
    Authors:
    Benjamin James and Jeffrey Goeders, Brigham Young University, US
    Abstract
    In this work we explore applying automated software fault-tolerance techniques to protect a Real-Time Operating System (RTOS) and present experimental results showing that these programs can achieve anywhere from 1.3x–257x improvement in MWTF.
    08:31 CEST IP8_6.2 EXPLORING DEEP LEARNING FOR IN-FIELD FAULT DETECTION IN MICROPROCESSORS
    Speaker:
    Stefano Di Carlo, Politecnico di Torino, IT
    Authors:
    Simone Dutto, Alessandro Savino and Stefano Di Carlo, Politecnico di Torino, IT
    Abstract
    Nowadays, due to technology enhancement, faults are increasingly compromising all kinds of computing machines, from servers to embedded systems. Recent advances in ma- chine learning are opening new opportunities to achieve fault detection exploiting hardware metrics inspection, thus avoiding the use of heavy software techniques or product-specific errors reporting mechanisms. This paper investigates the capability of different deep learning models trained on data collected through simulation-based fault injection to generalize over different software applications.
    08:32 CEST 10.7.3 RELIABILITY-AWARE QUANTIZATION FOR ANTI-AGING NPUS
    Speaker:
    Sami Salamin, Karlsruhe Institute of Technology, DE
    Authors:
    Sami Salamin1, Georgios Zervakis1, Ourania Spantidi2, Iraklis Anagnostopoulos2, Joerg Henkel3 and Hussam Amrouch4
    1Karlsruhe Institute of Technology, DE; 2Southern Illinois University Carbondale, US; 3KIT, DE; 4University of Stuttgart, DE
    Abstract
    Transistor aging is one of the major concerns that challenges designers in advanced technologies. It profoundly degrades the reliability of circuits during its lifetime as it slows down transistors resulting in errors due to timing violations unless large guardbands are included, which leads to considerable performance losses. When it comes to Neural Processing Units (NPUs), where increasing the inference speed is the primary goal, such performance losses cannot be tolerated. In this work, we are the first to propose a reliability-aware quantization to eliminate aging effects in NPUs while completely removing guardbands. Our technique delivers a graceful inference accuracy degradation over time while compensating for the aging-induced delay increase of the NPU. Our evaluation, over ten state-of-the-art neural network architectures trained on the ImageNet dataset, demonstrates that for an entire lifetime of 10 years, the average accuracy loss is merely 3%. In the meantime, our technique achieves 23% higher performance due to the elimination of the aging guardband.

    11.2 Executive track: Silicon Photonics & Optical Computing

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 10:40 CEST

    Session chair:
    Twan Korthorst, Synopsys, US

    Session co-chair:
    Giovanni De Micheli, EPFL, CH

    The gap between memory and xPU bandwidth has been constant at 2-3X over the last two decades and, by now, there is a 1,000X gap between what is possible and what is required by applications such as AI. N5/N3, and Wafer Scale Integration (WSI) represent the extreme attempt to keep classical Moore’s Law afloat but, like fossil energy this attempt is not sustainable; there is not much room for improvement beyond 12” wafers and 3-nanometer CMOS. We are approaching an inflection point : an innovation tsunami is looming. Silicon photonics may leverage the existing semiconductor technology and its supply chain, to provide the foundation for Optical Computing, which is faster than Electronic Computing, more power savvy, but whose roadmap does not rely on nanometer manufacturing and does not suffer of its complexity. Most optical computing research aims at replacing electronic components with optical equivalents, building an optical digital computer processing binary data. While this approach appears to offer the best short-term commercial prospects for optical computing, since optical components could be integrated into traditional computers to produce an opto-electronic hybrid, these devices use 30% of their energy converting electronic energy into photons and back. More unconventional research aims at building all-optical computers that eliminate the need for optical-electrical-optical (OEO) conversions.

    Time Label Presentation Title
    Authors
    09:00 CEST 11.2.1 SILICON PHOTONICS & OPTICAL COMPUTING: CHAIR’S INTRODUCTION
    Speaker and Author:
    Twan Korthorst, Synopsys, US
    Abstract
    A 10-minute introduction about silicon photonics and optical computing.
    09:10 CEST 11.2.2 PROGRAMMABLE PHOTONIC CIRCUITS FOR LINEAR PROCESSING
    Speaker and Author:
    Wim Bogaerts, Ghent University & IMEC, BE
    Abstract
    Photonic circuits are an ideal platform for implementing complex interferometers, which perform linear transformations on coherent optical signals. Such linear transformations, which are computationally equivalent to matrix-vector multiplications or a multiply-accumulate operations, are at the core of many signal processing algorithms, neuromorphic computing paradigms, or quantum information processing. These photonic circuits can be made programmable, by electronically reconfiguring the weights and phases in the interferometer network, and this provides opportunities for massive acceleration of certain computational functions in the optical domain. We will discuss the current developments in such programmable photonics, and the technology stack needed to realize fully-functional accelerators or even general-purpose photonic processors. Speaker’s Bio : Wim Bogaerts is professor in Silicon Photonics at Ghent University and IMEC, Belgium. He pioneered wafer-scale fabrication of silicon photonics, and his current research focuses on the challenges for large-scale photonic circuits, including design challenges such as variability. He co-founded Luceda Photonics in 2014 to market the design tools developed at Ghent University and IMEC. Today, he is again a full-time academic researcher, with a consolidator grant from the European Research Council (ERC). His main research focus today is on large-scale programmable photonic circuits. He is a fellow of IEEE, and senior member of OSA and SPIE.
    09:30 CEST 11.2.3 UNLOCKING TRANSFORMATIVE AI WITH PHOTONIC COMPUTING
    Speaker and Author:
    Laurent Daudet, LightOn, FR
    Abstract
    Recent large-scale AI models, such as OpenAi’s GPT-3 model for NLP, may trigger possibly even deeper economic and societal impact than Deep Learning had in the last decade. However, training such models requires massive amounts of computing resources, already challenging the capacity of some of the largest supercomputing architectures. In this talk I will present LightOn’s view on how future AI hardware should be designed, to address some of the hardest computing challenges, such as in language models, recommender systems, or big science. In particular, I will demonstrate how LightOn Optical Processing Units (OPUs) can be seamlessly integrated into a variety of hybrid photonics / silicon pipelines implementing state-of-the-art Machine Learning algorithms. Speaker’s Bio : Laurent Daudet is currently employed as CTO at LightOn, a startup he co-founded in 2016, where he manages cross-disciplinary R&D projects, involving machine learning, optics, signal processing, electronics, and software engineering. Laurent is a recognized expert in signal processing and wave physics, and is currently on leave from his position of Professor of Physics at Paris Diderot University, Paris. Prior to that or in parallel, he has held various academic positions: fellow of the Institut Universitaire de France, associate professor at Universite Pierre et Marie Curie, Visiting Senior Lecturer at Queen Mary University of London, UK, Visiting Professor at the National Institute for Informatics in Tokyo, Japan. Laurent has authored or co-authored more than 200 scientific publications, has been a consultant to various small and large companies, and is a co-inventor in several patents. He is a graduate in physics from Ecole Normale Superieure in Paris, and holds a PhD in Applied Mathematics from Marseille University.
    09:50 CEST 11.2.4 NO BOTTLENECKS ALLOWED
    Speaker and Author:
    Eyal Cohen, CogniFiber, IL
    Abstract
    Like many photonic computing start-ups, CogniFiber strives to accelerate computing by using photons instead of electrons. But in the general HW markets, and AI HW in particular, the elephant in the room is still largely ignored. Embedding a light-speed chip in a system 1000-fold slower than it, might accelerate specific phases and result in 10x-fold system level acceleration, but such solutions do not fully utilize the advantages of photonic computing. In CogniFiber we took upon ourselves to invent, design and build a “No-Bottleneck” system where all computations from start to end are done by light. This quasi-all-optical AI system will reach 1000x system-level acceleration in our coming products. In the talk we will present the conceptual difference between CogniFiber and other photonics endeavors and update on our recent achievements. Speaker’s Bio : Dr. Eyal Cohen is the Co-Founder and CEO of CogniFiber. Inspired by the progress in photonics and recent developments in deep learning networks, Eyal and his partner prof. Zeev Zalevsky (Bar Ilan University), were set to harness the benefits of light-carried information into the new era of photonic computing. Following a unique set of inventions, they raised a $2.5M seed investment in early 2019 and rapidly built an outstanding R&D team that completed a successful PoC system with profound implications. CogniFiber is raising a funding round of $18M and will solidify its position as a leading company in photonic computing with first products expected early 2023. Eyal has a B.Sc. in Electric Engineering and B.A. in Biology (Technion, both Cam Laude), M.Sc. and Ph.D. in Neuroscience (Weizmann Institute), and experience as a HW Engineer (Oren Semiconductor, Mellanox, Saifun Semiconductor) and ML algorithms (M.S. Tech). Eyal published several papers both as a neuroscientist and photonics researcher.
    10:10 CEST 11.2.5 LIVE JOINT Q&A
    Authors:
    Twan Korthorst1, Wim Bogaerts2, Laurent Daudet3 and Eyal Cohen4
    1Synopsys, US; 2Ghent University & IMEC, BE; 3LightOn, FR; 4CogniFiber, IL
    Abstract
    20 minutes of live joint question and answer time for interaction among speakers and audience.

    IP.ASD_1 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/DoZTZhX3vc3YiysKh

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP.ASD_1.1 DECENTRALIZED AUTONOMOUS ARCHITECTURE FOR RESILIENT CYBER-PHYSICAL PRODUCTION SYSTEMS
    Speaker:
    Laurin Prenzel, TU Munich, DE
    Authors:
    Laurin Prenzel and Sebastian Steinhorst, TU Munich, DE
    Abstract
    Real-time decision-making is a key element in the transition from Reconfigurable Manufacturing Systems to Autonomous Manufacturing Systems. In Cyber-Physical Production Systems (CPPS) and Cloud Manufacturing, most decision-making algorithms are either centralized, creating vulnerabilities to failures, or decentralized, struggling to reach the performance of the centralized counterparts. In this paper, we combine the performance of centralized optimization algorithms with the resilience of a decentralized consensus. We propose a novel autonomous system architecture for CPPS featuring an automatic production plan generation, a functional validation, and a two-stage consensus algorithm, combining a majority vote on safety and optimality, and a unanimous vote on feasibility and authenticity. The architecture is implemented in a simulation framework. In a case study, we exhibit the timing behavior of the configuration procedure and subsequent reconfiguration following a device failure, showing the feasibility of a consensus-based decision-making process.
    IP.ASD_1.2 PROVABLY ROBUST MONITORING OF NEURON ACTIVATION PATTERNS
    Speaker and Author:
    Chih-Hong Cheng, DENSO AUTOMOTIVE Deutschland GmbH, DE
    Abstract
    For deep neural networks (DNNs) to be used in safety-critical autonomous driving tasks, it is desirable to monitor in operation time if the input for the DNN is similar to the data used in DNN training. While recent results in monitoring DNN activation patterns provide a sound guarantee due to building an abstraction out of the training data set, reducing false positives due to slight input perturbation has been an issue towards successfully adapting the techniques. We address this challenge by integrating formal symbolic reasoning inside the monitor construction process. The algorithm performs a sound worstcase estimate of neuron values with inputs (or features) subject to perturbation, before the abstraction function is applied to build the monitor. The provable robustness is further generalized to cases where monitoring a single neuron can use more than one bit, implying that one can record activation patterns with a finegrained decision on the neuron value interval.

    IP8_1 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/TfXgowugJaNsDihjz

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_1.1 PREDICTION OF THERMAL HAZARDS IN A REAL DATACENTER ROOM USING TEMPORAL CONVOLUTIONAL NETWORKS
    Speaker:
    Mohsen Seyedkazemi Ardebili, University of Bologna, IT
    Authors:
    Mohsen Seyedkazemi Ardebili1, Marcello Zanghieri1, Alessio Burrello2, Francesco Beneventi3, Andrea Acquaviva4, Luca Benini5 and Andrea Bartolini1
    1University of Bologna, IT; 2Department of Electric and Eletronic Engineering, University of Bologna, IT; 3DEI - University of Bologna, IT; 4Politecnico di Torino, IT; 5Università di Bologna and ETH Zurich, IT
    Abstract
    Datacenters play a vital role in today's society. At large, a datacenter room is a complex controlled environment composed of thousands of computing nodes, which consume kW of power. To dissipate the power, forced air/liquid flow is employed, with a cost of millions of euros per year. Reducing this cost involves using free-cooling and average case design, which can create a cooling shortage and thermal hazards. When a thermal hazard happens, the system administrators and the facility manager must stop the production to avoid IT equipment damage and wear-out. In this paper, we study the thermal hazards signatures on a Tier-0 datacenter room's monitored data during a full year of production. We define a set of rules for detecting the thermal hazards based on the inlet and outlet temperature of all nodes of a room. We then propose a custom Temporal Convolutional Network (TCN) to predict the hazards in advance. The results show that our TCN can predict the thermal hazards with an F1-score of 0.98 for a randomly sampled test set. When causality is enforced between the training and validation set the F1-score drops to 0.74, demanding for an in-place online re-training of the network, which motivates further research in this context.
    IP8_1.2 SYSTEM LEVEL VERIFICATION OF PHASE-LOCKED LOOP USING METAMORPHIC RELATIONS
    Speaker:
    Muhammad Hassan, DFKI GmbH, DE
    Authors:
    Muhammad Hassan1, Daniel Grosse2 and Rolf Drechsler3
    1Cyber Physical Systems, DFKI, DE; 2Johannes Kepler University Linz, AT; 3University of Bremen/DFKI, DE
    Abstract
    In this paper we build on Metamorphic Testing (MT), a verification technique which has been employed very successfully in the software domain. The core idea is to uncover bugs by relating consecutive executions of the program under test. Recently, MT has been applied successfully to the verification of Radio Frequency (RF) amplifiers at the system level as well. However, this is clearly not sufficient as the true complexity stems from Analog/Mixed-Signal (AMS) systems. In this paper, we go beyond pure analog systems, i.e. we expand MT to verify AMS systems. As a challenging AMS system, we consider an industrial PLL. We devise a set of eight generic Metamorphic Relations (MRs). Theses MRs allow to verify the PLL behavioral at the component level and at the system level. Therefore, we have created MRs considering analog-to-digital as well as digital-to-digital behavior. We found a critical bug in the industrial PLL which clearly demonstrates the quality and potential of MT for AMS verification.

    IP8_2 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/5BFi7JiSbihLBStp3

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_2.1 FORMULATION OF DESIGN SPACE EXPLORATION PROBLEMS BY COMPOSABLE DESIGN SPACE IDENTIFICATION
    Speaker:
    Rodolfo Jordão, KTH Royal Institute of Technology, SE
    Authors:
    Rodolfo Jordao, Ingo Sander and Matthias Becker, KTH Royal Institute of Technology, SE
    Abstract
    Design space exploration (DSE) is a key activity in embedded system design methodologies and can be supported by well-defined models of computation (MoCs) and predictable platform architectures. The original design model, covering the application models, platform models and design constraints needs to be converted into a form analyzable by computer-aided decision procedures such as mathematical programming or genetic algorithms. This conversion is the process of design space identification (DSI), which becomes very challenging if the design domain comprises several MoCs and platforms. For a systematic solution to this problem, separation of concerns between the design domain and decision domain is of key importance. We propose in this paper a systematic DSI scheme that is (a) composable, as it enables the stepwise and simultaneous extension of both design and decision domain, and (b) tuneable, because it also enables different DSE solving techniques given the same design model. We exemplify this DSI scheme by an illustrative example that demonstrates the mechanisms for composition and tuning. Additionally, we show how different compositions can lead to the same decision model as an important property of this DSI scheme.
    IP8_2.2 RTL DESIGN FRAMEWORK FOR EMBEDDED PROCESSOR BY USING C++ DESCRIPTION
    Speaker:
    Eiji Yoshiya, Information and Communications Engineering, Tokyo Institute of Technology., JP
    Authors:
    Eiji Yoshiya, Tomoya Nakanishi and Tsuyoshi Isshiki, Tokyo Institute of Technology, JP
    Abstract
    In this paper, we propose a method to directly describe the RTL structure of a pipelined RISC-V processor with cache, memory management unit (MMU) and AXI bus interface using C++ language. This processor C++ model serves as a near cycle-accurate simulation model of the RISC-V core, while our C2RTL framework translates the processor C++ model into cycle-accurate RTL description in Verilog-HDL and RTL-equivalent C model. Our design methodology is unique compared to other existing methodologies since both the simulation model and the RTL model are derived from the same C++ source, which greatly simplifies the design verification and optimization processes. The effectiveness of our design methodology is demonstrated on a RISC-V processor which runs Linux OS on an FPGA board as well as significantly short simulation time of the original C++ processor model and RTL-equivalent C model compared to commercial RTL simulator.

    IP8_3 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/7nyhLxRAuHBfPuecR

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_3.1 SEQUENTIAL LOGIC ENCRYPTION AGAINST MODEL CHECKING ATTACK
    Speaker:
    Amin Rezaei, California State University, Long Beach, US
    Authors:
    Amin Rezaei1 and Hai Zhou2
    1California State University, Long Beach, US; 2Northwestern University, US
    Abstract
    Due to high IC design costs and emergence of countless untrusted foundries, logic encryption has been taken into consideration more than ever. In state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against both the SAT-based and the removal attacks. However, the SAT-based attack cannot decrypt the sequential circuits if the scan chain is protected or if the unreachable states encryption is adopted. Instead, these security schemes can be defeated by the model checking attack that searches iteratively for different input sequences to put the activated IC to the desired reachable state. In this paper, we propose a practical logic encryption approach to defend against the model checking attack on sequential circuits. The robustness of the proposed approach is demonstrated by experiments on around fifty benchmarks.
    IP8_3.2 RISK-AWARE COST-EFFECTIVE DESIGN METHODOLOGY FOR INTEGRATED CIRCUIT LOCKING
    Speaker:
    Yinghua Hu, University of Southern California, US
    Authors:
    Yinghua Hu, Kaixin Yang, Subhajit Dutta Chowdhury and Pierluigi Nuzzo, University of Southern California, US
    Abstract
    We introduce a systematic framework for logic locking of integrated circuits based on the analysis of the sources of information leakage from both the circuit and the locking scheme and their formalization into a notion of risk that can guide the design against existing and possible future attacks. We further propose a two-level optimization-based methodology to generate locking strategies minimizing a cost function and balancing security, risk, and implementation overhead, out of a collection of locking primitives. Optimization results on a set of case studies show the potential of layering multiple locking primitives to provide high security at significantly lower risk.

    IP8_4 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/AjWT3m2j7jagtK8w5

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_4.1 THERMAL-AWARE DESIGN AND MANAGEMENT OF EMBEDDED REAL-TIME SYSTEMS
    Speaker and Author:
    Youngmoon Lee, Hanyang University, KR
    Abstract
    Modern embedded systems face challenges in managing on-chip temperature as they are increasingly realized in powerful system-on-chips. This paper presents thermal-aware design and management of embedded systems by tightly coupling two mechanisms, thermal-aware utilization bound and real-time dynamic thermal management. The former provides the processor utilization upper-bound to meet the chip temperature constraint that depends not only on the system configurations and workloads but also chip cooling capacity and environment. The latter adaptively optimizes rates of individual task executions subject to the thermal-aware utilization bound. Our experiments on an automotive controller demonstrate the thermal-aware utilization bound and improved system utilization by 18.2% compared with existing approaches.
    IP8_4.2 OPTIMIZED MULTI-MEMRISTOR MODEL BASED LOW ENERGY AND RESILIENT CURRENT-MODE MULTIPLIER DESIGN
    Speaker:
    Shengqi Yu, Newcastle University, GB
    Authors:
    Shengqi Yu1, Rishad Shafik2, Thanasin Bunnam2, Kaiyun Chen2 and Alex Yakovlev2
    1Newcastle Universtiy, GB; 2Newcastle University, GB
    Abstract
    Multipliers are central to modern arithmetic-heavy applications, such as signal processing and artificial intelligence (AI). However, the complex logic chain in conventional multipliers, particularly due to cascaded carry propagation circuits, contributes to high energy and performance costs. This paper proposes a novel current-mode multiplier design that reduces the carry propagation chain and improves the current amplification. Fundamental to this design is a one transistor multiple memristors (1TxM) cell architecture. In each cell, transistor can be switched ONslash OFF to determine the cell selection, while memristor states determine the corresponding cell output current when selected. The highslash low resistive states as well as biasing configurations in each memristor are suitably optimized through a new memristor model. Depending on the significance of the cell current path the number of memristors in each cell is determined to achieve the required amplification. Consequently, the design reduces the need to have current mirror circuits in each current path, while also ensuring high resilience in transitional bias voltages. Parallel cell currents are then directed to a common current accumulation path to generate the multiplier output without requiring any carry propagation chain. We carried out a wide range of experiments to extensively validate our multiplier design in Cadence Virtuoso analogue design environment for functional and parametric properties. The results show that the proposed multiplier reduces up to 84.9% latency and 98.5% energy cost when compared with recently proposed approaches.

    IP8_5 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/jf88jCfEaHLEWb8zX

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_5.1 A 93 TOPS/WATT NEAR-MEMORY RECONFIGUREABLE SAD ACCELERATOR FOR HEVC/AV1/JEM ENCODING
    Speaker:
    Jainaveen Sundaram Priya, Intel Corporation, US
    Authors:
    Jainaveen Sundaram Priya1, Srivatsa Rangachar Srinivasa2, Dileep Kurian3, Indranil Chakraborty4, Sirisha Kale1, Nilesh Jain1, Tanay Karnik5, Ravi Iyer5 and Anuradha Srinivasan6
    1Intel Corporation, US; 2Intel Labs, US; 3Intel technologies, IN; 4Purdue University, US; 5Intel, US; 6Intel, IN
    Abstract
    Motion Estimation (ME) is a major bottleneck of a Video encoding pipeline. This paper presents a low power near memory Sum of Absolute Difference (SAD) accelerator for ME. The accelerator is composed of 64 modular SAD Processing Elements (PEs) on a Reconfigurable fabric, offering maximal parallelism to support traditional and futuristic Rate-Distortion-Optimization (RDO) schemes consistent with HEVC/AV1/JEM. The accelerator offers up-to 55% speedup over State-of-art accelerators and a 7x speedup when compared to a 12 core Intel Xeon E5 processor. Our solution achieves 93 TOPS/Watt running at 500MHz frequency, capable of processing real-time 4K 30fps video. Synthesized in 22nm process, the accelerator occupies 0.08mm2 and consumes 5.46mW dynamic power.
    IP8_5.2 TRIPLE FIXED-POINT MAC UNIT FOR DEEP LEARNING
    Speaker:
    Madis Kerner, Taltech, EE
    Authors:
    Madis Kerner1, Kalle Tammemäe1, Jaan Raik2 and Thomas Hollstein2
    1Taltech, EE; 2Tallinn University of Technology, EE
    Abstract
    Deep Learning (DL) algorithms have proved to be successful in various domains. Typically, the models use FloatingPoint (FP) numeric formats and are executed on GraphicalProcessing Units (GPUs). However, Field Programmable Gate Arrays (FPGAs) are more energy-efficient and, therefore, a better platform for resource-constrained devices. As the FP design infers many FPGA resources, it is replaced with quantized fixed-point implementations in state-of-the-art. The loss of precision is mitigated by dynamically adjusting the radix point on network layers, reconfiguration, and re-training. In this paper, we present the first Triple Fixed-Point (TFxP) architecture, which provides the computational precision of FP while using significantly fewer hardware resources and does not need network re-training. Based on a comparison of FP and existing Fixed-Point (FxP)implementations in combination with a detailed precision analysis of YOLOv2 weights and activation values, the novel TFxP format is introduced.

    IP8_6 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 09:00 CEST - 09:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/kJomE5nL9WxaaioJf

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP8_6.1 AUTOMATED SOFTWARE COMPILER TECHNIQUES TO PROVIDE FAULT TOLERANCE FOR REAL-TIME OPERATING SYSTEMS
    Speaker:
    Benjamin James, Brigham Young University, US
    Authors:
    Benjamin James and Jeffrey Goeders, Brigham Young University, US
    Abstract
    In this work we explore applying automated software fault-tolerance techniques to protect a Real-Time Operating System (RTOS) and present experimental results showing that these programs can achieve anywhere from 1.3x–257x improvement in MWTF.
    IP8_6.2 EXPLORING DEEP LEARNING FOR IN-FIELD FAULT DETECTION IN MICROPROCESSORS
    Speaker:
    Stefano Di Carlo, Politecnico di Torino, IT
    Authors:
    Simone Dutto, Alessandro Savino and Stefano Di Carlo, Politecnico di Torino, IT
    Abstract
    Nowadays, due to technology enhancement, faults are increasingly compromising all kinds of computing machines, from servers to embedded systems. Recent advances in ma- chine learning are opening new opportunities to achieve fault detection exploiting hardware metrics inspection, thus avoiding the use of heavy software techniques or product-specific errors reporting mechanisms. This paper investigates the capability of different deep learning models trained on data collected through simulation-based fault injection to generalize over different software applications.

    11.1 Safety Assurance of Autonomous Vehicles

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/WymRmxKbnn387f8Di

    Session chair:
    Sebastian Steinhorst, TU Munich, DE

    Session co-chair:
    Simon Schliecker, Volkswagen AG, DE

    Organizers:
    Rolf Ernst, TU. Braunschweig, DE
    Selma Saidi, TU Dortmund, DE

    Safety of autonomous vehicles is a core requirement for their social acceptance. Hence, this session introduces three technical perspectives on this important field. The first paper presents a statistics-based view on risk taking when passing by pedestrians such that automated decisions can be taken with probabilistic reasoning. The second paper proposes a hardening of image classification for highly automated driving scenarios by identifying the similarity between target classes. The third paper improves the computational efficiency of safety verification of deep neural networks by reusing existing proof artifacts.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.1.1 AUTOMATED DRIVING SAFETY - THE ART OF CONSCIOUS RISK TAKING - MINIMUM LATERAL DISTANCES TO PEDESTRIANS
    Speaker:
    Bert Böddeker, private, DE
    Authors:
    Bert Böddeker1, Wilhard von Wendorff2, Nam Nguyen3, Peter Diehl4, Roland Meertens5 and Rolf Johannson6
    1private, DE; 2SGS-TÜV Saar GmbH, DE; 3Hochschule München für angewandte Wissenschaften, DE; 4Private, DE; 5private, NL; 6Private, SE
    Abstract
    The announced release dates for Automated Driving Systems (ADS) with conditional (SAE-L3) and high (SAE-L4) levels of automation according to SAE J3016 are getting closer. Still, there is no established state of the art for proving the safety of these systems. The ISO 26262 for automotive functional safety is still valid for these systems but only covers risks from malfunctions of electric and electronic (E/E) systems. A framework for considering issues caused by weaknesses of the intended functionality itself is standardized in the upcoming release of the ISO 21448 - Safety of the Intended Functionality (SOTIF). Rich experience regarding limitations of safety performance of complex sensors can be found in this standard. In this paper, we highlight another aspect of SOTIF that becomes important for higher levels of automation, especially, in urban areas: `conscious risk taking’. In traditional automotive systems, conflicting goal resolutions are generally left to the car driver. With SAE-level 3 or at latest SAE-level 4 ADS, the driver is not available for decisions anymore. Even ‘safe drivers’ do not use the safest possible driving behavior. In the example of occlusions next to the street, a driver balances the risk of occluded pedestrians against the speed of the traffic flow. Our aim is to make such decisions explicit and sufficiently safe. On the example of crossing pedestrians, we show how to use statistics to derive a conscious quantitative risk-based decision from a previously defined acceptance criterion. The acceptance criterion is derived from accident statistics involving pedestrians.
    09:45 CEST 11.1.2 ON SAFETY ASSURANCE CASE FOR DEEP LEARNING BASED IMAGE CLASSIFICATION IN HIGHLY AUTOMATED DRIVING
    Speaker:
    Himanshu Agarwal, HELLA GmbH & Co. KGaA, Lippstadt, Germany and Carl von Ossietzky University Oldenburg, Germany, DE
    Authors:
    Himanshu Agarwal1, Rafal Dorociak2 and Achim Rettberg3
    1HELLA GmbH & Co. KGaA and Carl von Ossietzky University Oldenburg, DE; 2HELLA GmbH & Co. KGaA, DE; 3University of Applied Science Hamm-Lippstadt & University Oldenburg, DE
    Abstract
    Assessing the overall accuracy of deep learning classifier is not a sufficient criterion to argue for safety of classification based functions in highly automated driving. The causes of deviation from the intended functionality must also be rigorously assessed. In context of functions related to image classification, one of the causes can be the failure to take into account during implementation the classifier’s vulnerability to misclassification due to high similarity between the target classes. In this paper, we emphasize that while developing the safety assurance case for such functions, the argumentation over the appropriate implementation of the functionality must also address the vulnerability to misclassification due to class similarities. Using the traffic sign classification function as our case study, we propose to aid the development of its argumentation by: (a) conducting a systematic investigation of the similarity between the target classes, (b) assigning a corresponding classifier vulnerability rating to every possible misclassification, and (c) ensuring that the claims against the misclassifications that induce higher risk (scored on the basis of vulnerability and severity) are supported with more compelling sub-goals and evidences as compared to the claims against misclassifications that induce lower risk.
    10:00 CEST 11.1.3 CONTINUOUS SAFETY VERIFICATION OF NEURAL NETWORKS
    Speaker:
    Rongjie Yan, Institute of Software, Chinese Academy of Sciences, CN
    Authors:
    Chih-Hong Cheng1 and Rongjie Yan2
    1DENSO AUTOMOTIVE Deutschland GmbH, Eching, Germa, DE; 2Institute of Software, Chinese Academy of Sciences, CN
    Abstract
    Deploying deep neural networks (DNNs) as core functions in autonomous driving creates unique verification and validation challenges. In particular, the continuous engineering paradigm of gradually perfecting a DNN-based perception can make the previously established result of safety verification no longer valid. This can occur either due to the newly encountered examples (i.e., input domain enlargement) inside the Operational Design Domain or due to the subsequent parameter fine-tuning activities of a DNN. This paper considers approaches to transfer results established in the previous DNN safety verification problem to the modified problem setting. By considering the reuse of state abstractions, network abstractions, and Lipschitz constants, we develop several sufficient conditions that only require formally analyzing a small part of the DNN in the new problem. The overall concept is evaluated in a 1/10-scaled vehicle that equips a DNN controller to determine the visual waypoint from the perceived image.

    11.3 Trust in Manufacturing

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/y7nFJxtuyX6ht7yvE

    Session chair:
    David Hély, LCIS/G-INP, FR

    Session co-chair:
    Heyszl Johann, Fraunhofer AISEC, DE

    This session covers topics on ensuring trust in manufacturing processes. The papers in the session present novel techniques based on using sensors and switching activity for identification of hardware devices and recycled ICs. In addition, detection of Hardware Trojans in SoCs using state-of-the-art methods in machine learning are also discussed in this session.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.3.1 HTNET: TRANSFER LEARNING FOR GOLDEN CHIP-FREE HARDWARE TROJAN DETECTION
    Speaker:
    Sina Faezi, University of California, Irvine, US
    Authors:
    Sina Faezi1, Rozhin Yasaei2 and Mohammad Al Faruque2
    1University of California, Irvine, US; 2University of California Irvine, US
    Abstract
    Design and fabrication outsourcing has made integrated circuits (IC) vulnerable to malicious modifications by third parties known as hardware Trojans (HT). Over the last decade, the use of side-channel measurements for detecting the malicious manipulation of the ICs has been extensively studied. However, the suggested approaches often suffer from three major limitations: reliance on a trusted identical chip (i.e. golden chip), untraceable footprints of subtle hardware Trojans which remain inactive during the testing phase, and the need to identify the best discriminative features that can be used for separating side-channel signals coming from HT-free and HT-infected circuits. To overcome these shortcomings, we propose a a novel neural network design (i.e. HTNet) and a feature extractor training methodology that can be used for HT detection in run time. We create a library of known hardware Trojans and collect electromagnetic and power side-channel signals for each case and train HTnet to learn the best discriminative features based on this library. Then, in the test time we fine tune HTnet to learn the behavior of the particular chip under test. We use HTnet followed by an anomaly detection mechanism in run-time to monitor the chip behavior and report malicious activities in the side-channel signals. We evaluate our methodology using Trust Hub citebenchmarks and show that HTnet can extract a robust set of features that can be used for HT-detection purpose.
    09:45 CEST 11.3.2 MALICIOUS ROUTING: CIRCUMVENTING BITSTREAM-LEVEL VERIFICATION FOR FPGAS
    Speaker:
    Qazi Arbab Ahmed, Paderborn University, DE
    Authors:
    Qazi Arbab Ahmed1, Tobias Wiersema1 and Marco Platzner2
    1Paderborn University, DE; 2University of Paderborn, DE
    Abstract
    The battle of developing hardware Trojans and corresponding countermeasures has taken adversaries towards ingenious ways of compromising hardware designs by circumventing even advanced testing and verification methods. Besides conventional methods of inserting Trojans into a design by a malicious entity, the design flow for field-programmable gate arrays (FPGAs) can also be surreptitiously compromised to assist the attacker to perform a successful malfunctioning or information leakage attack. The advanced stealthy malicious look-up-table (LUT) attack activates a Trojan only when generating the FPGA bitstream and can thus not be detected by register transfer and gate level testing and verification. However, also this attack was recently revealed by a bitstream-level proof-carrying hardware (PCH) approach. In this paper, we present a novel attack that leverages malicious routing of the inserted Trojan circuit to acquire a dormant state even in the generated and transmitted bitstream. The Trojan's payload is connected to primary inputs/outputs of the FPGA via a programmable interconnect point (PIP). The Trojan is detached from inputs/outputs during place-and-route and re-connected only when the FPGA is being programmed, thus activating the Trojan circuit without any need for a trigger logic. Since the Trojan is injected in a post-synthesis step and remains unconnected in the bitstream, the presented attack can currently neither be prevented by conventional testing and verification methods nor by recent bitstream-level verification techniques.
    10:00 CEST IP9_6.2 IDENTIFICATION OF HARDWARE DEVICES BASED ON SENSORS AND SWITCHING ACTIVITY: A PRELIMINARY STUDY
    Speaker:
    Honorio Martin, Univesity Carlos III of Madrid, ES
    Authors:
    Honorio Martin1, Elena Ioana Vatajelu2 and Giorgio Di Natale2
    1University Carlos III of Madrid, ES; 2TIMA, FR
    Abstract
    Hardware device identification has become an important feature for enhancing the security and the trust of interconnected objects. In this paper, we present a device identification method based on measuring physical and electrical properties of the device, while controlling its switching activity. The method is general an applicable to a large range of devices from FPGAs to processors, as long as they embed sensors (such as temperature and voltage) and their measurements are available. The method is enabled by the fact that both the sensors and the effects of the switching activity on the circuit are uniquely affected by manufacturing-induced process variability. The device identification based on this method is made possible by the use of machine learning. The efficiency of the method has been evaluated by a preliminary study conducted on eleven FPGAs.
    10:01 CEST IP9_6.1 (Best Paper Award Candidate)
    A DIFFERENTIAL AGING SENSOR TO DETECT RECYCLED ICS USING SUB-THRESHOLD LEAKAGE CURRENT
    Speaker:
    Turki Alnuayri, Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK and the Dept. of Computer Engineering, Taibah University, Medina, Saudi Arabia, SA
    Authors:
    Turki Alnuayri1, Saqib Khursheed2, Antonio Leonel Hernandez Martinez2 and Daniele Rossi3
    1Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK. and Dept. of Computer Engineering, Taibah University, Medina, Saudi Arabia, GB; 2Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK, GB; 3Dept. of Information Engineering, University of Pisa, Pisa, Italy, IT
    Abstract
    Integrated circuits (ICs) may be exposed to counterfeiting due to the involvement of untrusted parties in the semiconductor supply chain; this threatens the security and reliability of electronic systems. This paper focusses on the most common type of counterfeiting namely, recycled and remarked ICs. The goal is to develop a technique to differentiate between new and recycled ICs that have been used for a short period of time. Detecting recycled ICs using aging sensors have been researched using sub-threshold leakage current and frequency degradation utilizing ring oscillators (ROs). The resolution of these sensors requires further development to accurately detect short usage time. This paper proposes a differential aging sensor to detect recycled ICs using ring oscillators with sub-threshold leakage current to detect aging effects using bias temperature instability (BTI) and hot carrier injection (HCI) on a 22-nm CMOS technology, provided by GlobalFoundries. Simulation results confirm that we are able to detect recycled ICs with high confidence using proposed technique. It is shown that the discharge time increases by 14.72% only after 15 days and by 60.49% after 3 years’ usage, and outperforms techniques that use frequency degradation only, whilst considering process and temperature variation.
    10:02 CEST 11.3.3 GNN4TJ: GRAPH NEURAL NETWORKS FOR HARDWARE TROJAN DETECTION AT REGISTER TRANSFER LEVEL
    Speaker:
    Rozhin Yasaei, University of California Irvine, US
    Authors:
    Rozhin Yasaei, Shih-Yuan Yu and Mohammad Al Faruque, University of California Irvine, US
    Abstract
    The time to market pressure and resource constraints has pushed System-on-Chip (SoC) designers toward outsourcing the design and using third-party Intellectual Property (IP). It has created an opportunity for rogue entities in the Integrated Circuit (IC) supply chain to insert malicious circuits in the hardware design, known as Hardware Trojans (HT). HT detection is a major hardware security challenge, and its early discovery is crucial because postponing the removal of HT to late in design or after the fabrication process would be very expensive. Current works suffer from several shortcomings such as reliance on a golden HT-free reference, unable to identify all types of HTs or unknown ones, burdening the designer with the manual review of code, or scalability issues. To overcome these limitations, we propose GNN4TJ, a novel golden reference-free HT detection method in the register transfer level (RTL) based on Graph Neural Network (GNN). GNN4TJ represents the hardware design as its intrinsic data structure, a graph, and generates the data flow graphs for RTL codes. We utilize GNN to extract the features from DFG, learn the circuit's behavior, and identify the presence of HT, in a fully automated pipeline. We evaluate our model on a dataset that we create by expanding the Trusthub HT benchmarks. The results demonstrate that GNN4TJ detects unknown HT with 97% recall (true positive rate) very fast in 21.1ms.

    11.4 Techniques for Low-Latency Neural Network Inference

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/mWHPsEtPDSJAQzD8i

    Session chair:
    Ben Keller, NVIDIA, US

    Session co-chair:
    Ganapati Bhat, Washington State University, US

    Efficient deep neural network (DNN) inference is the key enabler of applications from IoT to datacenter, and the best performance can be achieved through co-optimization of algorithms, compilers, and hardware. This session includes five papers with diverse approaches to reducing DNN inference latency. The first paper applies new techniques to the challenging problem of deploying DNN models to hardware, achieving significant speedups on GPU inference workloads. The second work proposes a new approach to hardware acceleration of sparse neural networks. The third paper demonstrates how DNN models can be evaluated in the frequency domain, which can greatly reduce the amount of required computation compared to classical techniques. The final presentations explore biomedical image segmentation and stochastic computing.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.4.1 DEEP NEURAL NETWORK HARDWARE DEPLOYMENT OPTIMIZATION VIA ADVANCED ACTIVE LEARNING
    Speaker:
    Qi SUN, The Chinese University of Hong Kong, HK
    Authors:
    QI SUN1, Chen BAI2, Hao Geng2 and Bei Yu1
    1The Chinese University of Hong Kong, HK; 2The Chinese University of Hong Kong, CN
    Abstract
    Recent years have witnessed the great successes of deep neural network (DNN) models while deploying DNN models on hardware platforms is still challenging and widely discussed. Some works proposed dedicatedly designed accelerators for some specific DNN models, while some others proposed general-purpose deployment frameworks that can optimize the hardware configurations on various hardware platforms automatically. However, the extremely large design space and the very time-consuming on-chip tests bring great challenges to the hardware configuration optimization process. In this paper, to optimize the hardware deployment, we propose an advanced active learning framework which is composed of batch transductive experiment design (BTED) and Bootstrap-guided adaptive optimization (BAO). The BTED method generates a diverse initial configuration set filled with representative configurations. Based on the Bootstrap method and adaptive sampling, the BAO method guides the selection of hardware configurations during the searching process. To the best of our knowledge, these two methods are both introduced into general DNN deployment frameworks for the first time. We embed our advanced framework into AutoTVM, and the experimental results show that our methods reduce the model inference latency by up to 28.08% and decrease the variance of inference latency by up to 92.74%.
    09:45 CEST 11.4.2 APPROACH TO IMPROVE THE PERFORMANCE USING BIT-LEVEL SPARSITY IN NEURAL NETWORKS
    Speaker:
    Yesung Kang, EE Department, POSTECH, KR
    Authors:
    Yesung Kang1, Eunji Kwon2, Seunggyu Lee2, Younghoon Byun3, Youngjoo Lee4 and Seokhyeong Kang5
    1postech, KR; 2Postech, KR; 3POSTECH, KR; 4Pohang University of Science and Technology (POSTECH), KR; 5Pohang University of Science and Technology, KR
    Abstract
    This paper presents a convolutional neural network (CNN) accelerator that can skip zero weights and handle outliers, which are few but have a significant impact on the accuracy of CNNs, to achieve speedup and increase the energy efficiency of CNN. We propose an offline weight-scheduling algorithm which can skip zero weights and combine two non-outlier weights simultaneously using bit-level sparsity of CNNs. We use a reconfigurable multiplier-and-accumulator (MAC) unit for two purposes; usually used to compute combined two non-outliers and sometimes to compute outliers. We further improve the speedup of our accelerator by clipping some of the outliers with negligible accuracy loss. Compared to DaDianNao [7] and Bit-Tactical [16] architectures, our CNN accelerator can improve the speed by 3.34 and 2.31 times higher and reduce energy consumption by 29.3% and 30.2%, respectively.
    10:00 CEST IP9_7.1 MORPHABLE CONVOLUTIONAL NEURAL NETWORK FOR BIOMEDICAL IMAGE SEGMENTATION
    Speaker:
    Huaipan Jiang, Pennsylvania State University, CN
    Authors:
    Huaipan Jiang1, Anup Sarma1, Mengran Fan2, Jihyun Ryoo1, Meenakshi Arunachalam3, Sharada Naveen4 and Mahmut Kandemir5
    1Pennsylvania State University, US; 2psu, US; 3Intel corp, US; 4Intel, US; 5PSU, US
    Abstract
    We propose a morphable convolution framework, which can be applied to irregularly shaped region of input feature map. This framework reduces the computational footprint of a regular CNN operation in the context of biomedical semantic image segmentation. The traditional CNN based approach has high accuracy, but suffers from high training and inference computation costs, compared to a conventional edge detection based approach. In this work, we combine the concept of morphable convolution with the edge detection algorithms resulting in a hierarchical framework, which first detects the edges and then generate a layer-wise annotation map. The annotation map guides the convolution operation to be run only on a small, useful fraction of pixels in the feature map. We evaluate our framework on three cell tracking datasets and the experimental results indicate that our framework saves ~30% and ~10% execution time on CPU and GPU, respectively, without loss of accuracy, compared to the baseline conventional CNN approaches.
    10:01 CEST IP9_7.2 (Best Paper Award Candidate)
    SPEEDING UP MUX-FSM BASED STOCHASTIC COMPUTING FOR ON-DEVICE NEURAL NETWORKS
    Speaker:
    Jongsung Kang, Seoul National University, KR
    Authors:
    Jongsung Kang and Taewhan Kim, Seoul National University, KR
    Abstract
    We propose an acceleration technique for processing multiplication operations using stochastic computing (SC) in on-device neural networks. Recently, MUX-FSM based SCs, which employ a MUX controlled by an FSM to generate a bit stream for a multiplication operation, considerably reduces the processing time of MAC operations over the traditional stochastic number generator based SC. Nevertheless, the existing MUX-FSM based SCs still do not meet the multiplication processing time required for a wide adoption of on-device neural networks in practice even though it offers a very economical hardware implementation. In this respect, this work proposes a solution to the problem of speeding up the conventional MUX-FSM based SCs. Precisely, we analyze the bit counting pattern produced by MUX-FSM and replace the counting redundancy by shift operation, resulting in shortening the length of the required bit sequence significantly, together with analytically formulating the amount of computation cycles. Through experiments, it is shown that our enhanced SC technique is able to reduce the processing time by 44.1% on average over the conventional MUX-FSM based SCs.
    10:02 CEST 11.4.3 ACCELERATING FULLY SPECTRAL CNNS WITH ADAPTIVE ACTIVATION FUNCTIONS ON FPGA
    Speaker:
    Shuanglong Liu, Hunan Normal University, CN
    Authors:
    Shuanglong Liu1, Hongxiang Fan2 and Wayne Luk3
    1Hunan Normal University, CN; 2Imperial College London, GB; 3Imperial College, GB
    Abstract
    Computing convolutional layers in frequency domain can largely reduce the computation overhead for training and inference of convolutional neural networks (CNNs). However, existing designs with such an idea require repeated spatial- and frequency-domain transforms due to the absence of nonlinear functions in the frequency domain, as such it makes the benefit less attractive for low-latency inference. This paper presents a fully spectral CNN approach by proposing a novel adaptive Rectified Linear Unit (ReLU) activation in spectral domain. The proposed design maintains the non-linearity in the network while taking into account the hardware efficiency in algorithm level. The spectral model size is further optimized by merging and fusing layers. Then, a customized hardware architecture is proposed to implement the designed spectral network on FPGA device with DSP optimizations for 8-bit fixed point multipliers. Our hardware accelerator is implemented on Intel's Arria 10 device and applied to the MNIST, SVHN, AT&T and CIFAR-10 datasets. Experimental results show a speed improvement of 6x to 10x and 4x to 5.7x compared to state-of-the-art spatial or FFT-based designs respectively, while achieving similar accuracy across the benchmark datasets.

    11.5 Approximate arithmetic and synthesis

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/JB6m6r8ff9nQGqErq

    Session chair:
    Daniel Menard, INSA Rennes, FR

    Session co-chair:
    Vojtech Mrazek, Brno University of Technology, CZ

    Reducing energy with approximate computing requires efficient techniques to reduce the cost without scarifying the quality and tools to explore the approximate design space. In approximate computing techniques, refining data precision is an efficient approach to explore the trade-off between accuracy and energy consumption. For neural network, a hybrid representation combining 8-bit fixed point data and a dynamic shared exponent have been proposed to accelerate the training process. To explore efficiently the precision design space, a hybrid algorithm combining Bayesian optimization and fast local search have been proposed to solve word-length optimization process. In the context of approximate logic design space exploration for large Boolean network, a novel method is proposed to select portion of circuit leading to the best quality–cost trade-off.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.5.1 TRAINING DEEP NEURAL NETWORKS IN 8-BIT FIXED POINT WITH DYNAMIC SHARED EXPONENT MANAGEMENT
    Speaker:
    Hisakatsu Yamaguchi, Fujitsu Laboratories LTD., JP
    Authors:
    Hisakatsu Yamaguchi, Makiko Ito, Katsuhiro Yoda and Atsushi Ike, Fujitsu Laboratories Ltd., JP
    Abstract
    The increase in complexity and depth of deep neural networks (DNNs) has created a strong need to improve computing performance. Quantization methods for training DNNs can effectively improve computation throughput and energy efficiency of hardware platforms. We have developed an 8-bit quantization training method representing the weight, activation, and gradient tensors in an 8-bit fixed point data format. The shared exponent for each tensor is managed dynamically on the basis of the distribution of the tensor elements calculated in the previous training phase, not in the current training phase, which improves computation throughput. This method provides up to 3.7-times computation throughput compared with FP32 computation without accuracy degradation.
    09:45 CEST 11.5.2 LEVERAGING BAYESIAN OPTIMIZATION TO SPEED UP AUTOMATIC PRECISION TUNING
    Speaker:
    Van-Phu Ha, Inria, FR
    Authors:
    Van-Phu Ha and Olivier Sentieys, INRIA, FR
    Abstract
    Using just the right amount of numerical precision is an important aspect for guaranteeing performance and energy efficiency requirements. Word-Length Optimization (WLO) is the automatic process for tuning the precision, i.e., bit-width, of variables and operations represented using fixed-point arithmetic. However, state-of-the-art precision tuning approaches do not scale well in large applications where many variables are involved. In this paper, we propose a hybrid algorithm combining Bayesian optimization (BO) and a fast local search to speed up the WLO procedure. Through experiments, we first show some evidence on how this combination can improve exploration time. Then, we propose an algorithm to automatically determine a reasonable transition point between the two algorithms. By statistically analyzing the convergence of the probabilistic models constructed during BO, we derive a stopping condition that determines when to switch to the local search phase. Experimental results indicate that our algorithm can reduce exploration time by up to 50%-80% for large benchmarks.
    10:00 CEST IP9_3.1 FTAPPROX: A FAULT-TOLERANT APPROXIMATE ARITHMETIC COMPUTING DATA FORMAT
    Speaker:
    YE WANG, Harbin Institute of Technology, CN
    Authors:
    Ye Wang1, Jian Dong1, Qian Xu2 and Gang Qu3
    1Harbin Institute of Technology, CN; 2University of Maryland, US; 3University of Maryland, College Park, US
    Abstract
    Approximate computing (AC) is an effective energy-efficient method for error-resilient applications. The essence behind AC is to reduce energy consumption by slightly sacrificing computation accuracy purposefully while providing quality-acceptable results. On the other hand, soft error is a common problem during program execution and may cause unacceptable outputs or catastrophic failure to the system. As AC introduces errors and soft errors are mitigated by fault-tolerant mechanisms, they have conflict goals and contradictory approaches. To the best of our knowledge, there is no previous efforts to consider the two at the same time. In this paper, we study the problem of AC with soft errors in order to guarantee the safe execution of the program while reducing energy (by AC). More specifically, we propose FTApprox, a fault-tolerant approximate arithmetic computing data format, to enable the detection and correction of SEs. As an approximate data format, FTApprox can use 16 bits to approximate any 32-bit integers and fixed-point numbers, and will select only the most significant part of operands for AC at runtime. Energy saving is obtained by converting 32-bit arithmetic operations to 8-bit operations. Meanwhile, for soft errors such as random bit flips, FTApprox not only can detect all single bit flips and most 2-bit flips, it can also correct most of these errors. The experimental results show that FTApprox has significant resistance against soft errors while providing 66.4%-79.6% energy saving.
    10:01 CEST 11.5.3 APPROXIMATE LOGIC SYNTHESIS OF VERY LARGE BOOLEAN NETWORKS
    Speaker:
    Jorge Echavarria, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE
    Authors:
    Jorge Echavarria, Stefan Wildermann and Jürgen Teich, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), DE
    Abstract
    For very large Boolean circuits most approximate logic synthesis techniques successively apply local approximation transformations affecting only a portion of the whole design. Hence, allowing such transformations to be implemented in polynomial time and to gain better control of the introduced error. To improve the existing approximate logic synthesis flows, one key issue is to derive a more efficient technique for selecting from all the possible portions of the design those more likely to yield better trade-offs between hardware resources and quality of the result. A key issue here is to derive efficient techniques for selecting from all the possible portions of the design those more likely to yield better trade-offs between hardware resources and quality. Due to the likelihood of “error masking” growing with increasing circuit complexity, we expect the likelihood of a local transformation “reaching”---or being observable at---the primary outputs to decrease at a similar rate. Comparatively, the closer a portion undergoing a local transformation is to the primary outputs, the more likely the error introduced can be observed at the primary outputs. Based on this observation, this paper proposes a novel methodology for the selection of portions---or “sub”-functions---of Boolean circuits---represented by Boolean networks---for approximation according to their “degree of connectivity” with other portions of the design. Our selection criterion is based on that a Boolean “sub”-function shall be a better candidate for approximation when it drives many other “sub”-functions, especially those being driven by many other “sub”-functions. We introduce, integrate, and compare our connectivity-based selection methodology with a state-of-the-art approximate logic synthesis framework. Experimental results show that our selection technique yields better trade-offs between hardware resources and accuracy of the resulting approximated circuits. Moreover, our technique is efficient and can speed up the design space exploration of the aforementioned framework.

    11.6 Timing and Reconfigurable Logic

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/kLGiG6g2jD5XwpQR3

    Session chair:
    Ibrahim Abe Elfadel, Khalifa University, AE

    Session co-chair:
    Jürgen Teich, Universität Erlangen-Nürnberg, DE

    The first paper in the session proposes a technique for generating dynamic input and output assertions to be used in the early stages of hierarchical timing analysis and optimization. The second paper addresses the placement problem for FPGAs with heterogeneous architectures and clock constraints. The third paper presents techniques for the full implementation of clock-less wave-propagated pipelining based data paths to provide a sign-off quality design.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.6.1 (Best Paper Award Candidate)
    TECHNOLOGY LOOKUP TABLE BASED DEFAULT TIMING ASSERTIONS FOR HIERARCHICAL TIMING CLOSURE
    Speaker:
    Ravi Ledalla, IBM Corporation, US
    Authors:
    Ravi Ledalla1, Chaobo Li1, Debjit Sinha2, Adil Bhanji1, Gregory Schaeffer3, Hemlata Gupta1 and Jennifer Basile1
    1IBM Corporation, US; 2IBM, US; 3IBM Corportation, US
    Abstract
    This paper presents an approach to dynamically generating representative external driving cell and external wire parasitic assertions for the ports of sub-blocks of a hierarchical design. The assertions are based on a technology lookup table and use attributes of the port and the hierarchical wire connected to the port as keys. A concept of reverse timing calculation at the input of the driving cell is described that facilitates the approach to drive efficient timing optimization of boundary paths of design sub-blocks. Experimental results in an industrial timing environment demonstrate significantly improved timing optimization accuracy when compared to prior work.
    09:45 CEST 11.6.2 TIMING-DRIVEN PLACEMENT FOR FPGAS WITH HETEROGENEOUS ARCHITECTURES AND CLOCK CONSTRAINTS
    Speaker:
    Zhifeng Lin, Fuzhou University, CN
    Authors:
    Zhifeng Lin1, Yanyue Xie2, Gang Qian3, Jianli Chen1, Sifei Wang3, Jun Yu3 and Yao-Wen Chang4
    1Fuzhou University, CN; 2Northeastern University, US; 3Fudan University, CN; 4National Taiwan University, TW
    Abstract
    Modern FPGAs often contain heterogeneous architectures and clocking resources which must be considered to achieve desired solutions. As the design complexity keeps growing, placement has become critical for FPGA timing closure. In this paper, we present an analytical placement algorithm for heterogeneous FPGAs to optimize its worst slack and clock constraints simultaneously. First, a heterogeneity-aware and memory-friendly delay model is developed to accurately and rapidly assess each connection delay. Then, a two-stage clock region refinement method is presented to effectively resolve the clock and resource violations. Finally, we develop a novel timing-based co-optimization method to generate optimized placement without any clocking violations. Compared with the state-of-the-art placer based on the advanced commercial tool Xilinx Vivado 2019.1 with the Xilinx 7 Series FPGA architecture, our algorithm achieves the best worst slack and routed wirelength while satisfying all clock constraints.
    10:00 CEST IP9_4.1 ALIFROUTER: A PRACTICAL ARCHITECTURE-LEVEL INTER-FPGA ROUTER FOR LOGIC VERIFICATION
    Speaker:
    Zhen Zhuang, Fuzhou University, CN
    Authors:
    Zhen Zhuang1, Xing Huang2, Genggeng Liu1, Wenzhong Guo1, Weikang Qian3 and Wen-Hao Liu4
    1Fuzhou University, CN; 2TU Munich, DE; 3Shanghai Jiao Tong University, CN; 4Block Implementation, ICD, Cadence Design Systems, Austin, TX, US
    Abstract
    As the scale of VLSI circuits increases rapidly, multi-FPGA prototyping systems have been widely used for logic verification. Due to the limited number of connections between FPGAs, however, the routability of prototyping systems is a bottleneck. As a consequence, timing division multiplexing (TDM) technique has been proposed to improve the usability of prototyping systems, but it causes a dramatic increase in system delay. In this paper, we propose ALIFRouter, a practical architecture-level inter-FPGA router, to improve the chip performance by reducing the corresponding system delay. ALIFRouter consists of three major stages, including i) routing topology generation, ii) TDM ratio assignment, and iii) system delay optimization. Additionally, a multi-thread parallelization method is integrated into the three stages to improve the efficiency of ALIFRouter. With the proposed algorithm, major performance indicators of multi-FPGA systems such as signal multiplexing ratio can be improved significantly.
    10:01 CEST 11.6.3 WAVEPRO 2.0: SIGNOFF-QUALITY IMPLEMENTATION AND VALIDATION OF ENERGY-EFFICIENT CLOCK-LESS WAVE PROPAGATED PIPELINING
    Speaker:
    Yehuda Kra, Bar Ilan University, IL
    Authors:
    Yehuda Kra, Tzachi Noy and Adam Teman, Bar-Ilan University, IL
    Abstract
    The design of computational datapaths with the clockless wave-propagated pipelining (CWPP) approach is an area- and energy-efficient alternative to traditional pipelined logic. Removal of the internal registers saves both area and the toggling power of these complex gates, while also simplifying the clock tree. However, this approach is very rarely used in modern scaled technologies, due to the complexity of implementation and the lack of a robust, scalable, and automated design methodology that meets rigid industry standards. In this paper, we present WavePro 2.0, an extension of the original WavePro algorithm and automation utility, which demonstrated how to apply CWPP to any generic combinatorial circuit using a CMOS standard cell library. WavePro 2.0 advances this concept to provide full-flow implementation capabilities, providing a post-layout CWPP-ready design that meets signoff-quality industry timing requirements. The WavePro 2.0 utility interfaces with commercial design automation software to balance a post-synthesis netlist to achieve a high CWPP launch rate (frequency). We demonstrate the calculation of a fused dot-product unit, implemented with a 65nm standard cell library, enabling a launch rate of 1 GHz under worst-case conditions with post-silicon field configuration capabilities for dealing with variation. The worst-case launch rate is comparable to a sequential design implemented with between 3 and 4 pipeline stages. This is achieved with a 30% power reduction and 15% less area than a 3-stage sequential design.

    11.7 Artificial Intelligence and Fault Injection in Test

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/vN6zmNScE8rxfKnx4

    Session chair:
    Paolo Bernardi, Politecnico di Torino, IT

    Session co-chair:
    Melanie Schillinsky, NXP Semiconductors GmbH, DE

    The session starts with an application of ML in test, then the design of a robust AI system is considered. Finally the focus is on fault injection in FPGAs.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.7.1 A LEARNING-BASED METHODOLOGY FOR ACCELERATING CELL-AWARE MODEL GENERATION
    Speaker:
    Pierre d'Hondt, STMicroelectronics, FR
    Authors:
    Pierre d'Hondt1, Aymen Ladhar1, Patrick Girard2 and Arnaud Virazel3
    1STMicroelectronics, FR; 2LIRMM / CNRS, FR; 3LIRMM, FR
    Abstract
    Cell-aware model generation refers to the process of characterizing cell-internal defects, a key step to ensure high test and diagnosis quality. The main limitation of this process is the generation effort, which is costly in terms of run time, SPICE simulator license usage and flow complexity. In this work, a methodology that does not use any electrical defect simulation is developed to predict the response of a cell-internal defect once it is injected in a standard cell. More widely, the aim is to use existing cell-aware models from various standard cell libraries and technologies to predict cell-aware models for new standard cells independently of the technology. A Random Forest classification algorithm is used for prediction. Experiments on several cell libraries using different technologies demonstrate the accuracy and performance of the method. The paper concludes by the presentation of a new hybrid CA model generation flow.
    09:45 CEST 11.7.2 RELIABILITY-DRIVEN NEUROMORPHIC COMPUTING SYSTEMS DESIGN
    Speaker:
    Qi Xu, University of Science and Technology of China, CN
    Authors:
    Qi Xu1, Junpeng Wang2, Hao Geng3, Song Chen2 and Xiaoqing Wen4
    1Hefei University of Technology, CN; 2University of Science and Technology of China, CN; 3The Chinese University of Hong Kong, CN; 4Kyushu Institute of Technology, JP
    Abstract
    In recent years, memristive crossbar-based neuromorphic computing systems (NCS) have provided a promising solution to the acceleration of neural networks. However, stuck-at faults (SAFs) in the memristor devices significantly degrade the computing accuracy of NCS. Besides, memristors suffer from process variations, causing the deviation of actual programming resistance from its target resistance. In this paper, we propose a novel reliability-driven design framework for a memristive crossbar-based NCS in combination with general and chip-specific design optimizations. First, we design a general reliability-aware training scheme to enhance the robustness of NCS to SAFs and device variations; a dropout-inspired approach is developed to alleviate the impact of SAFs; a new weighted error function, including cross-entropy error (CEE), the l2-norm of weights, and the sum of squares of first-order derivatives of CEE with respect to weights, is proposed to obtain a smooth error curve, where the effects of variations are suppressed. Second, given the neural network model generated by the reliability-aware training scheme, we exploit chip-specific mapping and re-training to further reduce the computation accuracy loss incurred by SAFs. Experimental results clearly demonstrate that the proposed method can boost the computation accuracy of NCS and improve the NCS robustness.
    10:00 CEST IP9_4.2 TESTING RESISTIVE MEMORY BASED NEUROMORPHIC ARCHITECTURES USING REFERENCE TRIMMING
    Speaker:
    Christopher Münch, Karlsruhe Institute of Technology, DE
    Authors:
    Christopher Münch and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
    Abstract
    Neuromorphic architectures based on emerging resistive memories are in the spotlight of today's research as they are able to solve complex problems with an unmatched efficiency. In particular resistive approaches offer multiple advantages over CMOS-based designs. Most prominently they are non-volatile and offer small device footprints in addition to very low power operation. However, regular memory testing used for conventional resistive Random Access Memory (RAM) architectures cannot detect all possible faults in the synaptic operations done in a resistive neuromorphic architecture. At the same time, testing all neuromorphic operations from the logic testing perspective is infeasible. In this paper we propose to use reference resistance trimming for the test phase and derive a generic test sequence to detect all the faults impacting the neuromorphic operations based on an extensive defect injection analysis. By exploiting the resistive nature of the underlying architecture, we are able to reduce the testing time from an exponential complexity necessary for a conventional logic testing approach to a linear complexity and reduce this by another 50% with the help of resistance trimming.
    10:01 CEST IP9_5.1 FAULT-CRITICALITY ASSESSMENT FOR AI ACCELERATORS USING GRAPH CONVOLUTIONAL NETWORKS
    Speaker:
    Arjun Chaudhuri, Duke University, US
    Authors:
    Arjun Chaudhuri1, Jonti Talukdar1, Jinwook Jung2, Gi-Joon Nam2 and Krishnendu Chakrabarty1
    1Duke University, US; 2IBM Research, US
    Abstract
    Owing to the inherent fault tolerance of deep neural networks (DNNs), many structural faults in DNN accelerators tend to be functionally benign. In order to identify functionally critical faults, we analyze the functional impact of stuck-at faults in the processing elements of a 128x128 systolic-array accelerator that performs inferencing on the MNIST dataset. We present a 2-tier machine-learning framework that leverages graph convolutional networks (GCNs) for quick assessment of the functional criticality of structural faults. We describe a computationally efficient methodology for data sampling and feature engineering to train the GCN-based framework. The proposed framework achieves up to 90% classification accuracy with negligible misclassification of critical faults.
    10:02 CEST 11.7.3 (Best Paper Award Candidate)
    DEVICE- AND TEMPERATURE DEPENDENCY OF SYSTEMATIC FAULT INJECTION RESULTS IN ARTIX-7 AND ICE40 FPGAS
    Speaker:
    Christian Fibich, Dept. of Electronic Engineering, University of Applied Sciences Technikum Wien, Vienna, Austria, AT
    Authors:
    Christian Fibich1, Martin Horauer1 and Roman Obermaisser2
    1University of Applied Sciences Technikum Wien, AT; 2University of Siegen, DE
    Abstract
    Systematic fault injection into the configuration memory of SRAM-based FPGAs promises to gain insight into the criticality of individual configuration bits. Current approaches implicitly assume that results obtained on one FPGA device can be generalized to all devices of that type and hence allow to parallelize fault injection. This work, to the best of our knowledge, is the first to challenge this assumption. To that end, a synthetic test design was subjected to systematic fault injection on 16 Xilinx Artix-7 as well as 10 Lattice iCE40 FPGAs for which bitstream documentation is publicly available. The results of these experiments indicate that the derived sets of critical configuration bits vary from device to device of the same type, especially if the interconnect is targeted. Furthermore, temperature is observed to influence the fault injection results on Artix-7. Suggestions for dealing with the implications in future fault injection experiments are provided.
    10:17 CEST IP9_5.2 ANALYZING ARM CORESIGHT ETMV4.X DATA TRACE STREAM WITH A REAL-TIME HARDWARE ACCELERATOR
    Speaker:
    Ali Zeinolabedin, Chair of Highly-Parallel VLSI Systems and Neuro-Microelectronics, TU Dresden, Dresden, Germany, DE
    Authors:
    Seyed Mohammad Ali Zeinolabedin, Johannes Partzsch and Christian Mayr, Dresden University of Technology, DE
    Abstract
    Debugging and verification of modern SoCs is a vital step in realizing complex systems consisting of various components. Monitoring memory operations such as data transfer address and value is an essential debugging and verification feature. ARM CoreSight technology generates a specific debug trace stream standard to monitor the memory without affecting the normal execution of the system. This paper proposes a hardware architecture to analyze the debug trace stream in real-time. It is implemented on the Xilinx Virtex xc6vcx75t-2ff784 FPGA device and can operate at 125 MHz and occupies less than 8% of the FPGA resources.

    11.8 Industrial Design Methods and Tools: Neural Network Design

    Date: Thursday, 04 February 2021
    Time: 09:30 CEST - 10:20 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/R3BgeviJ3jGivTJXN

    Organizer:
    Jürgen Haase, edacentrum GmbH, DE

    This Exhibition Workshop features industrial design methods and tools. It is open to conference delegates as well as to exhibition visitors.

    Time Label Presentation Title
    Authors
    09:30 CEST 11.8.1 AUTOMATING TINY NEURAL NETWORK DESIGN WITH MCU DEPLOY-ABILITY IN THE LOOP
    Speaker:
    Danilo Pau, STMicroelectronics, IT
    Abstract

    Tiny Machine Learning (TinyML) is a growing, widely popular community focusing on the deployment of Deep Learning (DL) models on microcontrollers (MCUs). To run a trained DL model on an MCU, developers must have the necessary skills to handcraft network topologies and associated hyperparameters to fit a wide range of hardware requirements including operating frequency, embedded SRAM and embedded Flash memory along with the corresponding power consumption requirements.

    Unfortunately, a hand-crafted design methodology poses multiple challenges: 1) AI and embedded developers exhibit different orthogonal skills, which do not meet each other during the development of AI applications until their validation in an operational environment 2) Tools for automated network design often assume virtually unlimited resources (typically deep networks are trained on cloud- or GPU-based systems) 3) The time-to-market from conception to realization of an AI system is usually quite long. Consequently, mass market adoption of AI technologies at the deep edge is jeopardized.

    Our solution is based on Sequential Model Based Optimization (SMBO) – aka Bayesian Optimization (BO) – that is the standard methodology for Automated Machine Learning (AutoML) and Neural Architecture Search (NAS). Although AutoML and NAS are successfully applied on large GPU/Cloud platforms (i.e., some AutoML/NAS tools are commercialized by Google, Amazon and Microsoft), their application is still an issue in the case of tiny devices, such as MCUs. Our approach, instead, includes “deployability” constraints – related to the hardware resources of the MCUs – into the hyperparameter optimization process, leading to this new “AutoTinyML” perspective.

    This talk will present our approach, along with its pros and cons with respect to multi-objective optimization (usually adopted to reduce resource usage on cloud). A set of relevant results will be presented and discussed, providing an overview of the next open challenges and perspectives in the AutoTinyML field.


    IP9_1 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/AzbDJJpYc8YHyEd3D

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_1.1 SEALPK: SEALABLE PROTECTION KEYS FOR RISC-V
    Speaker:
    Leila Delshadtehrani, Boston University, US
    Authors:
    Leila Delshadtehrani, Sadullah Canakci, Manuel Egele and Ajay Joshi, Boston University, US
    Abstract
    With the continuous increase in the number of software-based attacks, there has been a growing effort towards isolating sensitive data and trusted software components from untrusted third-party components. Recently, Intel introduced a new hardware feature for intra-process memory isolation, called Memory Protection Keys (MPK). The limited number of unique domains (16) provided by Intel MPK prohibits its use in cases where a large number of domains are required. Moreover, Intel MPK suffers from the protection key use-after-free vulnerability. To address these shortcomings, in this paper, we propose an efficient intra-process isolation technique for the RISC-V open ISA, called SealPK, which supports up to 1024 unique domains. Additionally, we devise three novel sealing features to protect the allocated domains, their associated pages, and their permissions from modifications or tampering by an attacker. We demonstrate the efficiency of SealPK by leveraging it to implement an isolated secure shadow stack on an FPGA prototype.
    IP9_1.2 JOINT SPARSITY WITH MIXED GRANULARITY FOR EFFICIENT GPU IMPLEMENTATION
    Speaker:
    Chuliang Guo, Zhejiang University, CN
    Authors:
    Chuliang Guo1, Xingang Yan1, Yufei Chen1, He Li2, Xunzhao Yin1 and Cheng Zhuo1
    1Zhejiang University, CN; 2University of Cambridge, GB
    Abstract
    Given the over-parameterization property in recent deep neural networks, sparsification is widely used to compress networks and save memory footprint. Unstructured sparsity, i.e., fine-grained pruning, can help preserve model accuracy, while structured sparsity, i.e., coarse-grained pruning, is preferred for general-purpose hardwares, e.g., GPUs. This paper proposes a novel joint sparsity pattern using mixed granularity to take advantage of both unstructured and structured sparsity. We utilize a heuristic strategy to infer the joint sparsity pattern by mixing vector-wise fine-grained and block-wise coarse-grained pruning masks. Experimental results show that the joint sparsity can achieve higher model accuracy and sparsity ratio while consistently maintaining moderate inference speed for VGG-16 on CIFAR-100 in comparison to the commonly used block sparsity and balanced sparsity strategies.

    IP9_2 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/iLJvRaH73wHNj9Z24

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_2.1 NEIGHBOR OBLIVIOUS LEARNING(NOBLE) FOR DEVICE LOCALIZATION AND TRACKING
    Speaker:
    Zichang Liu, Rice University, US
    Authors:
    Zichang Liu, Li Chou and Anshumali Shrivastava, Rice University, US
    Abstract
    On-device localization and tracking are increasingly crucial for various applications. Machine learning (ML) techniques are widely adopted along with the rapidly growing amount of data. However, during training, almost none of the ML techniques incorporate the known structural information such as floor plan, which can be especially useful in indoor or other structured environments. The problem is incredibly hard because the structural properties are not explicitly available, making most structural learning approaches inapplicable. We study our method through the intuitions from manifold learning. Whereas existing manifold methods utilize neighborhood information such as Euclidean distances, we quantize the output space to measure closeness on the structure. We propose Neighbor Oblivious Learning (NObLe) and demonstrate our approach's effectiveness on two applications, wifi-based fingerprint localization and inertial measurement unit(IMU) based device tracking. We show that oble gives significant improvement over state-of-art prediction accuracy.
    IP9_2.2 A LOW-COST BLE-BASED DISTANCE ESTIMATION, OCCUPANCY DETECTION, AND COUNTING SYSTEM
    Speaker:
    Florenc Demrozi, Computer Science Department, University of Verona, Italy, IT
    Authors:
    Florenc Demrozi1, Fabio Chiarani1 and Graziano Pravadelli2
    1Computer Science Department, University of Verona, IT; 2University of Verona, IT
    Abstract
    This article presents a low-cost system for distance estimation, occupancy counting, and presence detection based on Bluetooth Low Energy radio signal variation patterns that mitigates the limitation of existing approaches related to economic cost, privacy concerns, computational requirements, and lack of ubiquitousness. To explore the approach effectiveness, exhaustive tests have been carried out on four different datasets by exploiting several pattern recognition models.

    IP9_3 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/cGN8F96Tx9N87rFMp

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_3.1 FTAPPROX: A FAULT-TOLERANT APPROXIMATE ARITHMETIC COMPUTING DATA FORMAT
    Speaker:
    YE WANG, Harbin Institute of Technology, CN
    Authors:
    Ye Wang1, Jian Dong1, Qian Xu2 and Gang Qu3
    1Harbin Institute of Technology, CN; 2University of Maryland, US; 3University of Maryland, College Park, US
    Abstract
    Approximate computing (AC) is an effective energy-efficient method for error-resilient applications. The essence behind AC is to reduce energy consumption by slightly sacrificing computation accuracy purposefully while providing quality-acceptable results. On the other hand, soft error is a common problem during program execution and may cause unacceptable outputs or catastrophic failure to the system. As AC introduces errors and soft errors are mitigated by fault-tolerant mechanisms, they have conflict goals and contradictory approaches. To the best of our knowledge, there is no previous efforts to consider the two at the same time. In this paper, we study the problem of AC with soft errors in order to guarantee the safe execution of the program while reducing energy (by AC). More specifically, we propose FTApprox, a fault-tolerant approximate arithmetic computing data format, to enable the detection and correction of SEs. As an approximate data format, FTApprox can use 16 bits to approximate any 32-bit integers and fixed-point numbers, and will select only the most significant part of operands for AC at runtime. Energy saving is obtained by converting 32-bit arithmetic operations to 8-bit operations. Meanwhile, for soft errors such as random bit flips, FTApprox not only can detect all single bit flips and most 2-bit flips, it can also correct most of these errors. The experimental results show that FTApprox has significant resistance against soft errors while providing 66.4%-79.6% energy saving.

    IP9_4 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/ipxBMtaczNPreSCuF

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_4.1 ALIFROUTER: A PRACTICAL ARCHITECTURE-LEVEL INTER-FPGA ROUTER FOR LOGIC VERIFICATION
    Speaker:
    Zhen Zhuang, Fuzhou University, CN
    Authors:
    Zhen Zhuang1, Xing Huang2, Genggeng Liu1, Wenzhong Guo1, Weikang Qian3 and Wen-Hao Liu4
    1Fuzhou University, CN; 2TU Munich, DE; 3Shanghai Jiao Tong University, CN; 4Block Implementation, ICD, Cadence Design Systems, Austin, TX, US
    Abstract
    As the scale of VLSI circuits increases rapidly, multi-FPGA prototyping systems have been widely used for logic verification. Due to the limited number of connections between FPGAs, however, the routability of prototyping systems is a bottleneck. As a consequence, timing division multiplexing (TDM) technique has been proposed to improve the usability of prototyping systems, but it causes a dramatic increase in system delay. In this paper, we propose ALIFRouter, a practical architecture-level inter-FPGA router, to improve the chip performance by reducing the corresponding system delay. ALIFRouter consists of three major stages, including i) routing topology generation, ii) TDM ratio assignment, and iii) system delay optimization. Additionally, a multi-thread parallelization method is integrated into the three stages to improve the efficiency of ALIFRouter. With the proposed algorithm, major performance indicators of multi-FPGA systems such as signal multiplexing ratio can be improved significantly.
    IP9_4.2 TESTING RESISTIVE MEMORY BASED NEUROMORPHIC ARCHITECTURES USING REFERENCE TRIMMING
    Speaker:
    Christopher Münch, Karlsruhe Institute of Technology, DE
    Authors:
    Christopher Münch and Mehdi Tahoori, Karlsruhe Institute of Technology, DE
    Abstract
    Neuromorphic architectures based on emerging resistive memories are in the spotlight of today's research as they are able to solve complex problems with an unmatched efficiency. In particular resistive approaches offer multiple advantages over CMOS-based designs. Most prominently they are non-volatile and offer small device footprints in addition to very low power operation. However, regular memory testing used for conventional resistive Random Access Memory (RAM) architectures cannot detect all possible faults in the synaptic operations done in a resistive neuromorphic architecture. At the same time, testing all neuromorphic operations from the logic testing perspective is infeasible. In this paper we propose to use reference resistance trimming for the test phase and derive a generic test sequence to detect all the faults impacting the neuromorphic operations based on an extensive defect injection analysis. By exploiting the resistive nature of the underlying architecture, we are able to reduce the testing time from an exponential complexity necessary for a conventional logic testing approach to a linear complexity and reduce this by another 50% with the help of resistance trimming.

    IP9_5 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/qcf5595uySh4MXpAm

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_5.1 FAULT-CRITICALITY ASSESSMENT FOR AI ACCELERATORS USING GRAPH CONVOLUTIONAL NETWORKS
    Speaker:
    Arjun Chaudhuri, Duke University, US
    Authors:
    Arjun Chaudhuri1, Jonti Talukdar1, Jinwook Jung2, Gi-Joon Nam2 and Krishnendu Chakrabarty1
    1Duke University, US; 2IBM Research, US
    Abstract
    Owing to the inherent fault tolerance of deep neural networks (DNNs), many structural faults in DNN accelerators tend to be functionally benign. In order to identify functionally critical faults, we analyze the functional impact of stuck-at faults in the processing elements of a 128x128 systolic-array accelerator that performs inferencing on the MNIST dataset. We present a 2-tier machine-learning framework that leverages graph convolutional networks (GCNs) for quick assessment of the functional criticality of structural faults. We describe a computationally efficient methodology for data sampling and feature engineering to train the GCN-based framework. The proposed framework achieves up to 90% classification accuracy with negligible misclassification of critical faults.
    IP9_5.2 ANALYZING ARM CORESIGHT ETMV4.X DATA TRACE STREAM WITH A REAL-TIME HARDWARE ACCELERATOR
    Speaker:
    Ali Zeinolabedin, Chair of Highly-Parallel VLSI Systems and Neuro-Microelectronics, TU Dresden, Dresden, Germany, DE
    Authors:
    Seyed Mohammad Ali Zeinolabedin, Johannes Partzsch and Christian Mayr, Dresden University of Technology, DE
    Abstract
    Debugging and verification of modern SoCs is a vital step in realizing complex systems consisting of various components. Monitoring memory operations such as data transfer address and value is an essential debugging and verification feature. ARM CoreSight technology generates a specific debug trace stream standard to monitor the memory without affecting the normal execution of the system. This paper proposes a hardware architecture to analyze the debug trace stream in real-time. It is implemented on the Xilinx Virtex xc6vcx75t-2ff784 FPGA device and can operate at 125 MHz and occupies less than 8% of the FPGA resources.

    IP9_6 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/k5E7iuofp3ormz7n6

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_6.1 A DIFFERENTIAL AGING SENSOR TO DETECT RECYCLED ICS USING SUB-THRESHOLD LEAKAGE CURRENT
    Speaker:
    Turki Alnuayri, Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK and the Dept. of Computer Engineering, Taibah University, Medina, Saudi Arabia, SA
    Authors:
    Turki Alnuayri1, Saqib Khursheed2, Antonio Leonel Hernandez Martinez2 and Daniele Rossi3
    1Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK. and Dept. of Computer Engineering, Taibah University, Medina, Saudi Arabia, GB; 2Dept. of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK, GB; 3Dept. of Information Engineering, University of Pisa, Pisa, Italy, IT
    Abstract
    Integrated circuits (ICs) may be exposed to counterfeiting due to the involvement of untrusted parties in the semiconductor supply chain; this threatens the security and reliability of electronic systems. This paper focusses on the most common type of counterfeiting namely, recycled and remarked ICs. The goal is to develop a technique to differentiate between new and recycled ICs that have been used for a short period of time. Detecting recycled ICs using aging sensors have been researched using sub-threshold leakage current and frequency degradation utilizing ring oscillators (ROs). The resolution of these sensors requires further development to accurately detect short usage time. This paper proposes a differential aging sensor to detect recycled ICs using ring oscillators with sub-threshold leakage current to detect aging effects using bias temperature instability (BTI) and hot carrier injection (HCI) on a 22-nm CMOS technology, provided by GlobalFoundries. Simulation results confirm that we are able to detect recycled ICs with high confidence using proposed technique. It is shown that the discharge time increases by 14.72% only after 15 days and by 60.49% after 3 years’ usage, and outperforms techniques that use frequency degradation only, whilst considering process and temperature variation.
    IP9_6.2 IDENTIFICATION OF HARDWARE DEVICES BASED ON SENSORS AND SWITCHING ACTIVITY: A PRELIMINARY STUDY
    Speaker:
    Honorio Martin, Univesity Carlos III of Madrid, ES
    Authors:
    Honorio Martin1, Elena Ioana Vatajelu2 and Giorgio Di Natale2
    1University Carlos III of Madrid, ES; 2TIMA, FR
    Abstract
    Hardware device identification has become an important feature for enhancing the security and the trust of interconnected objects. In this paper, we present a device identification method based on measuring physical and electrical properties of the device, while controlling its switching activity. The method is general an applicable to a large range of devices from FPGAs to processors, as long as they embed sensors (such as temperature and voltage) and their measurements are available. The method is enabled by the fact that both the sensors and the effects of the switching activity on the circuit are uniquely affected by manufacturing-induced process variability. The device identification based on this method is made possible by the use of machine learning. The efficiency of the method has been evaluated by a preliminary study conducted on eleven FPGAs.

    IP9_7 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 10:30 CEST - 11:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/HSgTHvyx6ASbctJiH

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP9_7.1 MORPHABLE CONVOLUTIONAL NEURAL NETWORK FOR BIOMEDICAL IMAGE SEGMENTATION
    Speaker:
    Huaipan Jiang, Pennsylvania State University, CN
    Authors:
    Huaipan Jiang1, Anup Sarma1, Mengran Fan2, Jihyun Ryoo1, Meenakshi Arunachalam3, Sharada Naveen4 and Mahmut Kandemir5
    1Pennsylvania State University, US; 2psu, US; 3Intel corp, US; 4Intel, US; 5PSU, US
    Abstract
    We propose a morphable convolution framework, which can be applied to irregularly shaped region of input feature map. This framework reduces the computational footprint of a regular CNN operation in the context of biomedical semantic image segmentation. The traditional CNN based approach has high accuracy, but suffers from high training and inference computation costs, compared to a conventional edge detection based approach. In this work, we combine the concept of morphable convolution with the edge detection algorithms resulting in a hierarchical framework, which first detects the edges and then generate a layer-wise annotation map. The annotation map guides the convolution operation to be run only on a small, useful fraction of pixels in the feature map. We evaluate our framework on three cell tracking datasets and the experimental results indicate that our framework saves ~30% and ~10% execution time on CPU and GPU, respectively, without loss of accuracy, compared to the baseline conventional CNN approaches.
    IP9_7.2 SPEEDING UP MUX-FSM BASED STOCHASTIC COMPUTING FOR ON-DEVICE NEURAL NETWORKS
    Speaker:
    Jongsung Kang, Seoul National University, KR
    Authors:
    Jongsung Kang and Taewhan Kim, Seoul National University, KR
    Abstract
    We propose an acceleration technique for processing multiplication operations using stochastic computing (SC) in on-device neural networks. Recently, MUX-FSM based SCs, which employ a MUX controlled by an FSM to generate a bit stream for a multiplication operation, considerably reduces the processing time of MAC operations over the traditional stochastic number generator based SC. Nevertheless, the existing MUX-FSM based SCs still do not meet the multiplication processing time required for a wide adoption of on-device neural networks in practice even though it offers a very economical hardware implementation. In this respect, this work proposes a solution to the problem of speeding up the conventional MUX-FSM based SCs. Precisely, we analyze the bit counting pattern produced by MUX-FSM and replace the counting redundancy by shift operation, resulting in shortening the length of the required bit sequence significantly, together with analytically formulating the amount of computation cycles. Through experiments, it is shown that our enhanced SC technique is able to reduce the processing time by 44.1% on average over the conventional MUX-FSM based SCs.

    W03-P1 Cadence - BTU - Europractice Workshop - Generation and Implementation of an industry-grade ASSP core (Part 1)

    Date: Thursday, 04 February 2021
    Time: 11:00 CEST - 15:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/RrBfEKipDjWKo53PA

    Organizers:
    Anton Klotz, Cadence Design Systems, DE
    Michael Hübner, Brandenburg University of Technology Cottbus, DE
    Florian Fricke, Brandenburg University of Technology Cottbus, DE
    Marcus Binning, Cadence Design Systems, GB
    Chris Skinner, Cadence Design Systems, GB
    Charis Kalantzi, Cadence Design Systems, DE
    Clive Holmes, Europractice, GB
    Loganathan Sabesan, Cadence Design Systems, GB
    Simone Fini, Cadence Design Systems, GB
    Aspasia Karanasiou, Cadence Design Systems, GB
    Marcus Binning, Cadence Design Systems, GB
    Arturs Kozlovskis, Cadence Design Systems, GB

    Tensilica ASSP is offered by Cadence Design Systems in the framework of Tensilica University Program to universities. It is possible to create a model of a processor core and extend it with special commands which accelerate certain operations. After comparison of before-after optimization, the core is exported to RTL and then processed through a physical design flow using Cadence tools Genus and Innovus towards GDS.

    During the workshop on the first day the attendees will explore Tensilica Fusion F1 core, extend it with a simple extension using the TIE language, and compare the performance increase. The optimized core will be streamed out to Verilog RTL. On the second day the attendees will perform synthesis, placement, clock synthesis, routing, timing optimization and streamout to GDS.

    The workshop will include various hand-ons, which can be performed by attendees using cloud-based tools from Cadence Design Systems. Every attendee will receive personal account for performing the exercises. The cloud platform will be provided by Europractice.

    The attendees will be able to earn a digital badge for attending the workshop and accomplishing the hand-on exercises.


    K.5 Keynote - Special day on ASD

    Date: Thursday, 04 February 2021
    Time: 15:00 CEST - 15:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/Rv6DcWWGXHvRDApRS

    Session chair:
    Selma Saidi, TU Dortmund, DE

    Session co-chair:
    Rolf Ernst, TU. Braunschweig, DE

    Autonomy is in the air: on one hand, automation is clearly a lever to improve safety margins; on another hand technologies are maturing, pulled by the automotive market. In this context, Airbus is building a concept airplane from a blank sheet with the objective to improve human-machine teaming for better overall performance. Foundation of this new concept is that when they are made aware of the “big picture” with enough time to analyze it, humans are still the best to make strategic decisions. Autonomy technologies are the main enabler of this concept. Benefit are expected both in a two-crew cockpit and eventually in Single Pilot Operations. Bio: Pascal Traverse is General Manager for the Autonomy “fast track” at Airbus. Autonomy is a top technical focus area for Airbus. The General Manager creates a vision, coordinates R&T activities with the objective to accelerate the increase of knowledge in Airbus. Before his nomination last year, Pascal was coordinating Airbus Commercial R&T activities related to the cockpit and flight operations. Earlier in his carrier, Pascal participated in the A320/A330/A340/A380 Fly-by-Wire developments, certification harmonization with FAA and EASA, management of Airbus safety activities and even of qualities activities in the A380 Final Assembly Line. Pascal has Master and Doctorate’s degrees in embedded systems from N7, conducted research in LAAS and UCLA and is a 3AF Fellow.

    Time Label Presentation Title
    Authors
    15:00 CEST K.5.1 AUTONOMY: ONE STEP BEYOND ON COMMERCIAL AVIATION
    Speaker and Author:
    Pascal Traverse, Airbus, FR
    Abstract
    Autonomy is in the air: on one hand, automation is clearly a lever to improve safety margins; on another hand technologies are maturing, pulled by the automotive market. In this context, Airbus is building a concept airplane from a blank sheet with the objective to improve human-machine teaming for better overall performance. Foundation of this new concept is that when they are made aware of the “big picture” with enough time to analyze it, humans are still the best to make strategic decisions. Autonomy technologies are the main enabler of this concept. Benefit are expected both in a two-crew cockpit and eventually in Single Pilot Operations.

    12.1 Designing Autonomous Systems: Experiences, Technology and Processes

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 17:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/J65mDxoTBDp4Cn857

    Session chair:
    Selma Saidi, TU Dortmund, DE

    Session co-chair:
    Philipp Mundhenk, Robert Bosch GmbH, DE

    Organizers:
    Rolf Ernst, TU. Braunschweig, DE
    Selma Saidi, TU Dortmund, DE

    This session discusses technology innovation, experiences and processes in building autonomous systems. The first paper presents Fünfliber, a nano-sized Unmanned Aerial Vehicle (UAV) composed of a mudular open-hardware robotic platform controlled by a Parallel Ultra-low power system-on-chip (PULP) capable of running sophisticated autonomous DNN-based navigation workloads. The second paper presents an abstracted runtime for managing adaptation and integrating FPGA-accelerators to autonomous software framework , a show-case study integration into ROS is demonstrated. The third paper discusses current processes in engineering dependable collaborative autonomous systems and new buisness models based on agile approaches for innovative management.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.1.1 FüNFLIBER-DRONE: A MODULAR OPEN-PLATFORM 18-GRAMS AUTONOMOUS NANO-DRONE
    Speaker:
    Hanna Müller, Integrated Systems Laboratory, CH
    Authors:
    Hanna Mueller1, Daniele Palossi2, Stefan Mach1, Francesco Conti3 and Luca Benini4
    1Integrated Systems Laboratory - ETH Zurich, CH; 2Integrated Systems Laboratory - ETH Zurich, Switzerland, Dalle Molle Institute for Artificial Intelligence - University of Lugano and SUPSI, CH; 3Department of Electrical, Electronic and Information Engineering - University of Bologna, Italy, IT; 4Integrated Systems Laboratory - ETH Zurich, Department of Electrical, Electronic and Information Engineering - University of Bologna, CH
    Abstract
    "Miniaturizing an autonomous robot is a challenging task – not only the mechanical but also the electrical components have to operate within limited space, payload, and power. Furthermore, the algorithms for autonomous navigation, such as state-of-the-art (SoA) visual navigation deep neural networks (DNNs), are becoming increasingly complex, striving for more flexibility and agility. In this work, we present a sensor-rich, modular, nano-sized Unmanned Aerial Vehicle (UAV), almost as small as a five Swiss Franc coin – called Funfliber – with a ¨ total weight of 18g and 7.2cm in diameter. We conceived our UAV as an open-source hardware robotic platform, controlled by a parallel ultra-low power (PULP) system-on-chip (SoC) with a wide set of onboard sensors, including three cameras (i.e., infrared, optical flow, and standard QVGA), multiple Time-of-Flight (ToF) sensors, a barometer, and an inertial measurement unit. Our system runs the tasks necessary for a flight controller (sensor acquisition, state estimation, and low-level control), requiring only 10% of the computational resources available aboard, consuming only 9mW – 13x less than an equivalent Cortex M4-based system. Pushing our system at its limit, we can use the remaining onboard computational power for sophisticated autonomous navigation workloads, as we showcase with an SoA DNN running at up to 18Hz, with a total electronics’ power consumption of 271mW"
    16:15 CEST 12.1.2 RUNTIME ABSTRACTION FOR AUTONOMOUS ADAPTIVE SYSTEMS ON RECONFIGURABLE HARDWARE
    Speaker:
    Alex R Bucknall, University of Warwick, GB
    Authors:
    Alex R. Bucknall1 and Suhaib A. Fahmy2
    1University of Warwick, GB; 2KAUST, SA
    Abstract
    Autonomous systems increasingly rely on on-board computation to avoid the latency overheads of offloading to more powerful remote computing. This requires the integration of hardware accelerators to handle the complex computations demanded by data-intensive sensors. FPGAs offer hardware acceleration with ample flexibility and interfacing capabilities when paired with general purpose processors, with the ability to reconfigure at runtime using partial reconfiguration (PR). Managing dynamic hardware is complex and has been left to designers to address in an ad-hoc manner, without first-class integration in autonomous software frameworks. This paper presents an abstracted runtime for managing adaptation of FPGA accelerators, including PR and parametric changes, that presents as a typical interface used in autonomous software systems. We present a demonstration using the Robot Operating System (ROS), showing negligible latency overhead as a result of the abstraction.
    16:30 CEST IP.ASD_2.1 SYSTEMS ENGINEERING ROADMAP FOR DEPENDABLE AUTONOMOUS CYBER-PHYSICAL SYSTEMS
    Speaker and Author:
    Rasmus Adler, Fraunhofer IESE, DE
    Abstract
    Autonomous cyber-physical systems have enormous potential to make our lives more sustainable, more comfortable, and more economical. Artificial Intelligence and connectivity enable autonomous behavior, but often stand in the way of market launch. Traditional engineering techniques are no longer sufficient to achieve the desired dependability; current legal and normative regulations are inappropriate or insufficient. This paper discusses these issues, proposes advanced systems engineering to overcome these issues, and provides a roadmap by structuring fields of action.
    16:31 CEST 12.1.3 DDI: A NOVEL TECHNOLOGY AND INNOVATION MODEL FOR DEPENDABLE, COLLABORATIVE AND AUTONOMOUS SYSTEMS
    Speaker:
    Eric Armengaud, Armengaud Innovate GmbH, AT
    Authors:
    Eric Armengaud1, Daniel Schneider2, Jan Reich2, Ioannis Sorokos2, Yiannis Papadopoulos3, Marc Zeller4, Gilbert Regan5, Georg Macher6, Omar Veledar7, Stefan Thalmann8 and Sohag Kabir9
    1Armengaud Innovate GmbH, AT; 2Fraunhofer IESE, DE; 3University of Hull, GB; 4Siemens AG, DE; 5Lero @DKIT, IE; 6Graz University of Technology, AT; 7AVL List GmbH, AT; 8University of Graz, AT; 9University of Bradford, GB
    Abstract
    Digital transformation fundamentally changes established practices in public and private sector. Hence, it represents an opportunity to improve the value creation processes (e.g., “industry 4.0”) and to rethink how to address customers’ needs such as “data-driven business models” and “Mobility-as-a-Service”. Dependable, collaborative and autonomous systems are playing a central role in this transformation process. Furthermore, the emergence of data-driven approaches combined with autonomous systems will lead to new business models and market dynamics. Innovative approaches to reorganise the value creation ecosystem, to enable distributed engineering of dependable systems and to answer urgent questions such as liability will be required. Consequently, digital transformation requires a comprehensive multi-stakeholder approach which properly balances technology, ecosystem and business innovation. Targets of this paper are (a) to introduce digital transformation and the role of / opportunities provided by autonomous systems, (b) to introduce Digital Depednability Identities (DDI) - a technology for dependability engineering of collaborative, autonomous CPS, and (c) to propose an appropriate agile approach for innovation management based on business model innovation and co-entrepreneurship

    12.2 When FPGA Turns Against You: Side-Channel and Fault Attacks in Shared FPGAs

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 17:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/fkPYdfa8kc9LaQ3xe

    Session chair:
    Francesco Regazzoni, University of Amsterdam and ALaRI - USI, NL

    Session co-chair:
    Daniel Holcomb, University of Massachusetts Amherst, US

    Organizers:
    Mirjana Stojilovic, EPFL, CH
    Francesco Regazzoni, University of Amsterdam and ALaRI - USI, NL

    FPGAs are now part of the cloud acceleration-as-a-service portfolio offered by major cloud providers. Cloud is naturally a multi-tenant platform. However, FPGA multitenancy raises security concerns, fueled by the recent research works that showed how a malicious cloud user can deploy remotely-controlled attacks to extract secret information from the FPGA co-tenants or inject faults. This hot-topic session aims at spreading the awareness of the threats and attack techniques and discussing the limitations of existing countermeasures, hopefully leading to a deeper understanding of the problem of developing the most appropriate mitigation techniques.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.2.1 REMOTE AND STEALTHY FAULT ATTACKS ON VIRTUALIZED FPGAS
    Speaker:
    Jonas Krautter, Karlsruhe Institute of Technology (KIT), DE
    Authors:
    Jonas Krautter1, Dennis Gnad2 and Mehdi Tahoori2
    1Karlsruhe Institute of Technology (KIT), DE; 2Karlsruhe Institute of Technology, DE
    Abstract
    The increasing amount of resources per FPGA chip makes virtualization and multi-tenancy a promising direction to improve utilization and efficiency of these flexible accelerators in the cloud. However, the freedom given to untrusted parties on a multi-tenant FPGA can result in severe security issues. Side-channel, fault, and Denial-of-Service attacks are possible through malicious use of FPGA logic resources. In this work, we perform a detailed analysis of fault attacks between logically isolated designs on a single FPGA. Attacks were often based on mapping a massive amount of Ring Oscillators into FPGA logic, which naturally induce a high current and subsequent voltage drop. However, they are easy to detect as combinational loops and can be prevented by a hypervisor. Here, we demonstrate how even elaborate fault attacks to recover a secret key of an AES encryption module can be deployed using seemingly benign benchmark circuits or even AES modules themselves to generate critical voltage fluctuations.
    16:15 CEST 12.2.2 EXTENDED ABSTRACT: COVERT CHANNELS AND DATA EXFILTRATION FROM FPGAS
    Speaker:
    Kasper Rasmussen, Uniersity of Oxford, GB
    Authors:
    Ilias Giechaskiel1, Ken Eguro2 and Kasper Rasmussen3
    1Independent Researcher, GB; 2Microsoft Research, US; 3University of Oxford, GB
    Abstract
    In complex FPGA designs, implementations of algorithms and protocols from third-party sources are common. However, the monolithic nature of FPGAs means that all subcircuits share common on-chip infrastructure, such as routing resources. This presents an attack vector for all FPGAs that contain designs from multiple vendors, especially for FPGAs used in multi-tenant cloud environments, or integrated into multi-core processors: hardware imperfections can be used to infer high-level state and break security guarantees. In this paper, we domstrate how “long” routing wires present can be used to for covert communication between disconnected cores, or by a malicious core to exfiltrate secrets. The information leakage is measurable for both static and dynamic signals, and that it can be detected using small on-board circuits. In our prototype we achived 6 kbps bandwidth and 99.9% accuracy, and a side channel which can recover signals kept constant for only 128 cycles, with an accuracy of more than 98.4%.
    16:30 CEST 12.2.3 REMOTE POWER SIDE-CHANNEL ATTACKS ON BNN ACCELERATORS IN FPGAS
    Speaker:
    Russell Tessier, University of Massachusetts Amherst, US
    Authors:
    Shayan Moini1, Shanquan Tian2, Daniel Holcomb3, Jakub Szefer2 and Russell Tessier4
    1School of Electrical and Computer Engineering, University of Massachusetts Amherst, US; 2Yale University, US; 3UMass Amherst, US; 4University of Massachusetts, US
    Abstract
    Multi-tenant FPGAs have recently been proposed, where multiple independent users simultaneously share a remote FPGA. Despite its benefits for cost and utilization, multi-tenancy opens up the possibility of malicious users extracting sensitive information from co-located victim users. To demonstrate the dangers, this paper presents a remote, power-based side-channel attack on a binarized neural network (BNN) accelerator. This work shows how to remotely obtain voltage estimates as the BNN circuit executes, and how the information can be used to recover the inputs to the BNN. The attack is demonstrated with a BNN used to recognize handwriting images from the MNIST dataset. With the use of precise time-to-digital converters (TDCs) for remote voltage estimation, the MNIST inputs can be successfully recovered with a maximum normalized cross-correlation of 75% between the input image and the recovered image.
    16:45 CEST 12.2.4 SHARED FPGAS AND THE HOLY GRAIL: PROTECTIONS AGAINST SIDE-CHANNEL AND FAULT ATTACKS
    Speaker:
    Mirjana Stojilovic, EPFL, CH
    Authors:
    Ognjen Glamocanin1, Dina Mahmoud1, Francesco Regazzoni2 and Mirjana Stojilovic1
    1EPFL, CH; 2University of Amsterdam and ALaRI - USI, CH
    Abstract
    In this paper, we survey recently proposed methods for protecting against side-channel and fault attacks in shared FPGAs. These methods are quite versatile, targeting FPGA compilation flow, real-time timing-fault detection, on-chip active fences, automated bitstream verification, etc. Despite their versatility, they are mostly designed to counteract a specific class of attacks. To understand how to address the problem of security in shared FPGAs in a comprehensive way, we discuss their individual strengths and weaknesses, in an attempt to identify research directions necessitating further investigation.

    12.3 Emerging in memory computing paradigms

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 17:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/24ansQcEPjaMi4Krn

    Session chair:
    Farhad Merchant, ICE, RWTH, DE

    Session co-chair:
    Mark Wijtvliet, TU Dresden, DE

    Organizer:
    Shubham Rai, TU Dresden, DE

    With the rise of neuromorphic computing, the traditional Von-Neumann architecture is finding it difficult to cope up with the rising demands of machine learning workloads. This requirement has fueled the search of technologies that can mimic human brain to efficiently combine both memory and computation with a single device. In this special session, we present the state-of-the-art research in the domain of in-memory computing. In particular, we take a look at memristors and its widespread application in neuromorphic computation. We introduce ReRAMs in terms of their novel computing paradigms and present ReRAM-specific design flows. We address the various circuit opportunities and challenges related to reliability and fault tolerance associated with them. Finally, we look at an emerging nanotechnologies involving co-integration of CMOS and FeFET, which has the potential to leverage memory and computation from a single device.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.3.1 RE-ENGINEERING COMPUTING WITH MEMRISTOR DEVICES
    Speaker and Author:
    Said Hamdioui, Delft University of Technology, NL
    Abstract
    This talk addresses the potential and the design of Computation-in-memory (CIM) architectures based on non-volatile (NV) devices such as ReRAM, PCM and STT-MRAM. it classifies the sate-of-the art computer architectures and highlight how the trend is going toward CIM architectures in order to eliminate and/or significantly reduce the limitations of today's technologies. The concept of CIM-based on NV devices is discussed, and logic and arithmetic circuit designs using such devices and how they enable such architectures are covered; data measurements are shown to demonstrate the CIM concept in silicon. The strong dependency of application domains on the selection of appropriate CIM architecture and its building block, as well as the huge potential of CIM (in realizing fJ/operation order of magnitude is illustrated based on some case studies.
    16:15 CEST 12.3.2 TESTING OF RERAM CROSSBARS FOR IN-MEMORY COMPUTING
    Speaker and Author:
    Krishnendu Chakrabarty, Duke University, US
    Abstract
    Deep Learning (DL) applications are becoming increasingly ubiquitous. However, recent research has highlighted a number of reliability concerns associated with deep neural networks (DNNs) used for DL. In particular, hardware-level reliability of DNNs is of particular concern when DL models are mapped to specialized neuromorphic hardware such as ReRAM-based crossbars. However, DNN architectures are inherently fault-tolerant and many faults do not have any significant impact on inferencing accuracy. Therefore, careful analysis must be carried out to identify faults that are critical for a given application. We will describe a computationally efficient technique to identify critical faults (CFs) in massive crossbars for very large DNNs.
    16:30 CEST 12.3.3 DESIGN AUTOMATION FOR IN-MEMORY COMPUTING USING RERAM CROSSBAR
    Speaker and Author:
    Anupam Chattopadhyay, Nanyang Technological University, SG
    Abstract
    ReRAM devices enable high endurance, non- volatile storage and low leakage power while at the same time allowing functionally complete Boolean logic operations. Various logic families, such as, Majority-Inverter, Material Implication and Multi-input NOR gates have been practically demonstrated. Such capabilities can be leveraged to develop non-Von Neumann Computing platforms, and thereby address the memory bottleneck. In this talk, we will discuss quantifiable benefits of in-memory computing and representative platforms. Subsequently, we will introduce the novel design automation problems brought forth by such platforms. We will present our solutions in the domain of logic synthesis and technology mapping for ReRAM crossbar array taking into account area, delay and crossbar dimension constraints.
    16:45 CEST 12.3.4 COMBINING MEMORY AND LOGIC USING FERRO-ELECTRIC TRANSISTORS
    Speaker and Author:
    Akash Kumar, TU Dresden, DE
    Abstract
    Ferro-electric field-effect transistors (FeFET) based on hafnium oxide offer great opportunities for Logic-in-Memory applications, due to their natural ability to combine logic (transistor) and memory (ferroelectric material), their low-power operation, and CMOS compatible integration. This provides a strategy for manufacturing Schottky type nanoscale transistors with the add‐on nonvolatile option. In particular, the device concept is of great interest for achieving nonvolatile polarity modification in reconfigurable field‐effect transistors. While reconfigurability at the transistor level allows more functionality per unit, the non-volatility can provide quick data access. In this talk, we will look at possible circuit design paradigms incorporating both computation and logic from a single transistor which can greatly contribute to the ever increasing problem of resource optimization in neuromorphic computing.

    12.4 NoC looking for higher performance

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 16:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/QEhZXKon2mRbE64sL

    Session chair:
    Romain Lemaire, CEA, FR

    Session co-chair:
    Pierre-Axel Lagadec, ATOS, FR

    Various NoC approaches have been proposed in the last two decades to provide efficient communication infrastructures in complex systems-on-chip. Starting from 2D wired topologies the networks are expending to a large spectrum of architectures integrating emerging technologies. This session presents advanced NoC-based designs and mechanisms leveraging different innovative solutions: optical, wireless or 3D vertical communication links. Those solutions are targeting applications where the complexity of the communication patterns are increasing and can become more critical than computation regarding the performance of the overall system.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.4.1 FAST: A FAST AUTOMATIC SWEEPING TOPOLOGY CUSTOMIZATION METHOD FOR APPLICATION-SPECIFIC WAVELENGTH-ROUTED OPTICAL NOCS
    Speaker:
    Moyuan Xiao, TU Munich, DE
    Authors:
    Moyuan Xiao1, Tsun-Ming Tseng2 and Ulf Schlichtmann2
    1TU München, DE; 2TU Munich, DE
    Abstract
    Optical network-on-chip (ONoC) is an emerging upgrade for electronic network-on-chip (ENoC). As a kind of ONoC, wavelength-routed optical network-on-chip (WRONoC) shows ultra-high bandwidth and ultra-low latency in data communication. Manually designed WRONoC topologies typically reserve all to all communications. Topologies customized for application-specific networks can save resources, but require automation for their efficient design. The state-of-the-art design automation method proposes an integer-linear-programming (ILP) model. The runtime for solving the ILP model increases exponentially with the growth of communication density. Besides, the locations of the physical ports are not taken into consideration in the model. This causes unavoidable detours and crossings in physical layout. In this work, we present FAST: an automatic topology customization and optimization method combining ILP and a sweeping technique. FAST overcomes the runtime problem and provides multiple topology variations with different port orders for physical layout. Experimental results show that FAST is thousands times faster when tackling dense communications and ten to thousands times faster when tackling sparse communications while providing multiple better or equivalent topologies regarding resource usage and the worst-case insertion loss.
    16:15 CEST 12.4.2 FUZZY-TOKEN: AN ADAPTIVE MAC PROTOCOL FOR WIRELESS-ENABLED MANYCORES
    Speaker:
    Antonio Franques, University of Illinois at Urbana-Champaign, US
    Authors:
    Antonio Franques1, Sergi Abadal2, Haitham Hassanieh1 and Josep Torrellas3
    1University of Illinois at Urbana-Champaign, US; 2N3Cat at Universitat Politècnica de Catalunya (UPC), ES; 3University of Illinois Urbana Champaign, US
    Abstract
    Recent computer architecture trends herald the arrival of manycores with over one hundred cores on a single chip. In this context, traditional on-chip networks do not scale well in latency or energy consumption, leading to bottlenecks in the execution. The Wireless Network-on-Chip (WNoC) paradigm holds considerable promise for the implementation of on-chip networks that will enable such highly-parallel manycores. However, one of the main challenges in WNoCs is the design of mechanisms that provide fast and efficient access to the wireless channel, while adapting to the changing traffic patterns within and across applications. Existing approaches are either slow or complicated, and do not provide the required adaptivity. In this paper, we propose Fuzzy-Token, a simple WNoC protocol that leverages the unique properties of the on-chip scenario to deliver efficient and low-latency access to the wireless channel irrespective of the application characteristics. We substantiate our claim via simulations with a synthetic traffic suite and with real application traces. Fuzzy-Token consistently provides one of the lowest packet latencies among the evaluated WNoC MAC protocols. On average, the packet latency in Fuzzy-Token is 4.4x and 2.6x lower than in a state-of-the art contention-based WNoC MAC protocol and in a token-passing protocol, respectively.
    16:30 CEST IP10_1.1 A HYBRID ADAPTIVE STRATEGY FOR TASK ALLOCATION AND SCHEDULING FOR MULTI-APPLICATIONS ON NOC-BASED MULTICORE SYSTEMS WITH RESOURCE SHARING
    Speaker:
    Navonil Chatterjee, Lab-STICC, CNRS, UBS Research Center, FR
    Authors:
    Suraj Paul1, Navonil Chatterjee2, Prasun Ghosal1 and Jean-Philippe Diguet3
    1Indian Institute of Engineering Science and Technology, IN; 2Université Bretagne Sud, FR; 3CNRS, Lab-STICC, UBS research center, FR
    Abstract
    Allocation and scheduling of applications affect the timing response and system performance, particularly for Network-on-Chip (NoC) based multicore systems executing real-time applications. These systems with multitasking processors provide improved opportunity for parallel application execution. In dynamic scenarios, runtime task allocation improves the system resource utilization and adapts to varying application workload. In this work, we present an efficient hybrid strategy for unified allocation and scheduling of tasks at runtime. By considering multitasking capability of processors, communication cost and task timing characteristics, potential allocation solutions are obtained at design-time. These are adapted for dynamic mapping and scheduling of computation and communication workloads of real-time applications. Simulation results show that the proposed approach achieves 34.2% and 26% average reduction in network latency and communication cost of the allocated applications. Also, the deadline satisfaction of the tasks improves on average by 42.1% while reducing the allocation-time overhead by 32% when compared with existing techniques.
    16:31 CEST 12.4.3 (Best Paper Award Candidate)
    REGRAPHX: NOC-ENABLED 3D HETEROGENEOUS RERAM ARCHITECTURE FOR TRAINING GRAPH NEURAL NETWORKS
    Speaker:
    Aqeeb iqbal Arka, Washington State University, US
    Authors:
    Aqeeb Iqbal Arka1, Biresh Kumar Joardar1, Jana Doppa1, Partha Pratim Pande1 and Krishnendu Chakrabarty2
    1Washington State University, US; 2Duke University, US
    Abstract
    Graph Neural Network (GNN) is a variant of Deep Neural Networks (DNNs) operating on graphs. However, GNNs are more complex compared to traditional DNNs as they simultaneously exhibit features of both DNN and graph applications. As a result, architectures specifically optimized for either DNNs or graph applications are not suited for GNN training. In this work, we propose a 3D heterogeneous manycore architecture for on-chip GNN training to address this problem. The proposed architecture, ReGraphX, involves heterogeneous ReRAM crossbars to fulfill the disparate requirements of both DNN and graph computations simultaneously. The ReRAM-based architecture is complemented with a multicast-enabled 3D NoC to improve the overall achievable performance. We demonstrate that ReGraphX outperforms conventional GPUs by up to 3.5X (on an average 3X) in terms of execution time, while reducing energy consumption by as much as 11X.

    12.5 Intelligent Resource Optimization and Prediction for Power Management

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 16:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/aWixnqshwbfetWfwY

    Session chair:
    Andrea Calimera, Politecnico di Torino, IT

    Session co-chair:
    Laleh Behjat, University of Calgary, CA

    The session highlights innovative techniques for embedded power management optimization and prediction, ranging from formal verification, machine learning, and implementation in real systems. The first paper presents the use of novel formal modeling techniques to bridge between run time decisions and design time exploration, the second paper introduces battery lifetime management using prediction model to optimize the operation points in real systems, the last paper describes a power prediction model based on Long Short-Term memory Neural Network.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.5.1 (Best Paper Award Candidate)
    EFFICIENT RESOURCE MANAGEMENT OF CLUSTERED MULTI-PROCESSOR SYSTEMS THROUGH FORMAL PROPERTY EXPLORATION
    Speaker:
    Ourania Spantidi, Southern Illinois University Carbondale, US
    Authors:
    Ourania Spantidi1, Iraklis Anagnostopoulos1 and Georgios Fainekos2
    1Southern Illinois University Carbondale, US; 2Arizona State University, US
    Abstract
    Modern embedded systems have adopted the clustered Chip Multi-Processor (CMP) paradigm in conjunction with dynamic frequency scaling techniques to improve application performance and power consumption. Nonetheless, modern applications are becoming more aggressive in terms of computational power. At the same time, the integration of multiple cores in the same cluster has resulted in significant increase of power consumption creating thermal hotspots. Conventional design approaches consider fixed power and temperature constraints, which are mostly extracted experimentally leading many times to pessimistic run-time decisions and performance losses. In this paper, we present a unified framework for efficient resource management of clustered CMPs by enabling formal property exploration and integrating robustness analysis. Specifically, we bridge the gap between run-time decisions and design-time exploration by using Parametric Signal Temporal Logic (PSTL) for mining the values of system constraints. Then, we utilize the extracted values to enhance the decisions of the run-time resource manager. Results on the Odroid-XU3 show that the proposed methodology offers more coarse- and fine-grain optimizations.
    16:15 CEST 12.5.2 WORKLOAD- AND USER-AWARE BATTERY LIFETIME MANAGEMENT FOR MOBILE SOCS
    Speaker:
    Sofiane Chetoui, Brown University, US
    Authors:
    sofiane chetoui and Sherief Reda, Brown University, US
    Abstract
    Mobile devices have become an essential part of daily life with the increased computing capabilities and features. For battery powered devices, the user experience depends on both quality-of-service (QoS) and battery lifetime. Previous works have been proposed to balance QoS and battery lifetime of mobile devices; however, they often consider only the CPU. Additionally, they fail in considering the user's desired battery lifetime while having a high QoS variation, which undermine the user satisfaction. In this work, we propose a CPU-GPU workload- and user-aware battery lifetime management technique for mobile devices using machine learning. Firstly, we design a workload-aware governor through an offline and an online analysis. A set of CPU and GPU performance counters is used during the offline analysis to identify a set of canonical phases (CP). In runtime, k-means is used to classify each sample of the performance counters to one of the predefined CP. Afterwards, we build a model that predicts the energy consumption given the user usage history. Finally, the energy model is used to find the optimal frequency settings for the CPU and GPU to provide the best QoS while meeting the target battery lifetime. The evaluation of the proposed work against state of the art techniques in a commercial smartphone, shows 15.8% and 9.4% performance improvement on the CPU and GPU, respectively. The proposed technique also shows 10× improvement in QoS variation, while meeting the desired battery lifetime.
    16:30 CEST 12.5.3 LONG SHORT-TERM MEMORY NEURAL NETWORK-BASED POWER FORECASTING OF MULTI-CORE PROCESSORS
    Speaker:
    Mark Sagi, TU Munich, DE
    Authors:
    Mark Sagi1, Martin Rapp2, Heba Khdr3, Yizhe Zhang1, Nael Fasfous1, Nguyen Anh Vu Doan1, Thomas Wild1, Joerg Henkel4 and Andreas Herkersdorf5
    1TU Munich, DE; 2Karlsruhe Institute of Technology, DE; 3Karlsruhe Institute of Technology (KIT), DE; 4KIT, DE; 5TU München, DE
    Abstract
    We propose a novel technique to forecast the power consumption of processor cores at run-time. Power consumption varies strongly with different running applications and within their execution phases. Accurately forecasting future power changes is highly relevant for proactive power/thermal management. While forecasting power is straightforward for known or periodic workloads, the challenge for general unknown workloads at different voltage/frequency (v/f)-levels is still unsolved. Our technique is based on a long short-term memory (LSTM) recurrent neural network (RNN) to forecast the average power consumption for both the next 1 ms and 10 ms periods. The runtime inputs for the LSTM RNN are current and past power information as well as performance counter readings. An LSTM RNN enables this forecasting due to its ability to preserve the history of power and performance counters. Our LSTM RNN needs to be trained only once at design-time while adapting during run-time to different system behavior through its internal memory. We demonstrate that our approach accurately forecasts power for unseen applications at different v/f-levels. The experimental results shows that the forecasts of our LSTM RNN provide 43% lower worst case error for the 1ms forecasts and 38% for the 10ms forecasts, compared to the state of the art.

    12.6 Attacks and defense mechanisms

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 16:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/QdoahRyEmTARoxXvj

    Session chair:
    David Hely, University Grenoble Alpes, Grenoble INP, LCIS, FR

    Session co-chair:
    Nektarios Tsoutsos, University of Delaware, US

    This session addresses attacks and mitigation techniques at the application level. In the first paper, a novel approach is proposed for creating a timing-based covert channel on a dynamically partitioned shared Last Level Cache. Then several protection techniques are presented: an hardware mechanism to detect and mitigate cross-core cache attacks , a Row-Hammer attack mitigation technique based on time-varying activation probabilities, a non-intrusive malware detection method for PLCs, and a method for detecting memory corruption vulnerability exploits that relies on fuzzing and dynamic data flow analysis.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.6.1 (Best Paper Award Candidate)
    EXPLOITING SECRETS BY LEVERAGING DYNAMIC CACHE PARTITIONING OF LAST LEVEL CACHE
    Speaker:
    Shirshendu Das, Indian Institute of Technology Ropar, IN
    Authors:
    Anurag Agarwal, Jaspinder Kaur and Shirshendu Das, Indian Institute of Technology Ropar, IN
    Abstract
    Dynamic cache partitioning for shared Last Level Caches (LLC) is deployed in most modern multicore systems to achieve process isolation and fairness among the applications and avoid security threats. Since LLC has visibility of all cache blocks requested by several applications running on a multicore system, a malicious application can potentially threaten the system that can leverage the dynamic partitioning schemes applied to the LLCs by creating a timing-based covert channel attack. We call it as Cache Partitioned Covert Channel (CPCC) attack. The malicious applications may contain a trojan and a spy and use the underlying shared memory to create the attack. Through this attack, secret pieces of information like encryption keys or any secret information can be transmitted between the intended parties. We have observed that CPCC can target single or multiple cache sets to achieve a higher transmission rate with a maximum error rate of 5% only. The paper also addresses a few defense strategies that can avoid such cache partitioning based covert channel attacks.
    16:15 CEST 12.6.2 PIPOMONITOR: MITIGATING CROSS-CORE CACHE ATTACKS USING THE AUTO-CUCKOO FILTER
    Speaker:
    Fengkai Yuan, State Key Laboratory of Information Security, Institute of Information Engineering, CN
    Authors:
    Fengkai Yuan1, Kai Wang2, Rui Hou1, Xiaoxin Li1, Peinan Li1, Lutan Zhao1, Jiameng Ying1, Amro Awad3 and Dan Meng1
    1State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, CN; 2Department of Computer Science and Technology, Harbin Institute of Technology, CN; 3Department of Electrical and Computer Engineering, North Carolina State University, US
    Abstract
    Cache side channel attacks obtain victim cache line access footprint to infer security-critical information. Among them, cross-core attacks exploiting the shared last level cache are more threatening as their simplicity to set up and high capacity. Stateful approaches of detection-based mitigation observe precise cache behaviors and protect specific cache lines that are suspected of being attacked. However, their recording structures incur large storage overhead and are vulnerable to reverse engineering attacks. Exploring the intrinsic non-determinate layout of a traditional Cuckoo filter, this paper proposes a space efficient Auto-Cuckoo filter to record access footprints, which succeed to decrease storage overhead and resist reverse engineering attacks at the same time. With Auto-Cuckoo filter, we propose PiPoMonitor to detect Ping-Pong patterns and prefetch specific cache line to interfere with adversaries' cache probes. Security analysis shows the PiPoMonitor can effectively mitigate cross-core attacks and the Auto-Cuckoo filter is immune to reverse engineering attacks. Evaluation results indicate PiPoMonitor has negligible impact on performance and the storage overhead is only 0.37$\%$, an order of magnitude lower than previous stateful approaches.
    16:30 CEST IP10_2.1 TOWARDS NON-INTRUSIVE MALWARE DETECTION FOR INDUSTRIAL CONTROL SYSTEMS
    Speaker:
    Prashant Hari Narayan Rajput, NYU Tandon School of Engineering, US
    Authors:
    Prashant Hari Narayan Rajput1 and Michail Maniatakos2
    1NYU Tandon School of Engineering, US; 2New York University Abu Dhabi, AE
    Abstract
    The convergence of the Operational Technology (OT) sector with the Internet of Things (IoT) devices has increased cyberattacks on prominent OT devices such as Programmable Logic Controllers (PLCs). These devices have limited computational capabilities, no antivirus support, strict real-time requirements, and often older, unpatched operating systems. The use of traditional malware detection approaches can impact the real-time performance of such devices. Due to these constraints, we propose Amaya, an external malware detection mechanism based on a combination of signature detection and machine learning. This technique employs remote analysis of malware binaries collected from the main memory of the PLC by a non-intrusive method using the Joint Test Action Group (JTAG) port. We evaluate Amaya against in-the-wild malware for ARM and x86 architecture, achieving an accuracy of 98% and 94.7%, respectively. Furthermore, we analyze concept drift, spatial experimental bias, and the effects of downsampling the feature vector to understand the behavior of the model in a real-world setting.
    16:31 CEST IP10_2.2 TOWARDS AUTOMATED DETECTION OF HIGHER-ORDER MEMORY CORRUPTION VULNERABILITIES IN EMBEDDED DEVICES
    Speaker:
    Lei Yu, UCAS, CN
    Authors:
    Lei Yu, Linyu Li, Haoyu Wang, Xiaoyu Wang and Xiaorui Gong, UCAS, CN
    Abstract
    The rapid growth and limited security protection of the networked embedded devices put the threat of remote code execution related memory corruption attacks front and center among security concerns. Current detection approaches can detect single-step and single-process memory corruption vulnerabilities well by fuzzing tests, and often assume that data stored in the current embedded device or even the embedded device connected to it is safe. However, an adversary might corrupt memory via multi-step exploits if she manages first to abuse the embedded application to store the attack payload and later use this payload in a security-critical operation on memory. These exploits usually lead to persistent code execution attacks and complete control of the device in practice but are rarely covered in state-of-the-art dynamic testing techniques. To address these stealthy yet harmful threats, we identify a large class of such multi-step memory corruption attacks and define them as higher-order memory corruption vulnerabilities (HOMCVs). We can abstract the detailed multi-step exploit models for these vulnerabilities and expose various attacker-controllable data stores (ACDS) that contribute to memory corruption. Aided by the abstract models, a dynamic data flow tracking (DDFA) based solution is developed to detect data stores that would be transferred to memory and then identify HOMCVs. Our proposed method is validated on an experimental embedded system injected with different variants of higher-order memory corruption vulnerabilities and two real-world embedded devices. We demonstrate that successful detection can be accomplished with an automatic system named Higher-Order Fuzzing Framework (HOFF) which realizes the DDFA-based solution.
    16:32 CEST 12.6.3 TIVAPROMI: TIME-VARYING PROBABILISTIC ROW-HAMMER MITIGATION
    Speaker:
    Hassan Nassar, Karlsruhe Institute of Technology, DE
    Authors:
    Hassan Nassar, Lars Bauer and Joerg Henkel, Karlsruher Institut für Technologie, DE
    Abstract
    Row-Hammering is a challenge for computing systems that use DRAM. It can cause bit flips in a DRAM row by accessing its neighboring rows. Several mitigation techniques on memory controller level were already suggested. The techniques are in two categories: The first category uses static probabilities, which leads to a performance penalty due to a high number of extra row activations. The second category is based on so-called Tabled Counters, which have large hardware requirements and are mostly infeasible to implement. We introduce a novel Row-Hammer mitigation technique that uses time-varying probabilities combined with a relatively small history table. Our technique reduces the number of extra row activations compared to static probabilistic techniques and it demands less storage than Tabled Counters techniques. Compared to state of the art, our technique offers a good compromise that has 9×− 27× reduced storage requirement than Tabled Counters and 6× − 12× fewer activations than probabilistic techniques.

    12.7 Advancement in modelling of defects and yield in memories.

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 16:50 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/TgvxGr2c7S8qQ7wLc

    Session chair:
    Karimi Naghmeh, UMBC, US

    Session co-chair:
    Walker Hank, Texas A&M University, US

    Characterization and fault modeling of intermediate state defects in STT-MRAMs is addressed in this session. This session continues with a scheme for yield estimation of an SRAM array. Then, a method to mitigate the effect of stuck-at faults in ReRAM crossbar architectures is highlighted.

    Time Label Presentation Title
    Authors
    16:00 CEST 12.7.1 (Best Paper Award Candidate)
    CHARACTERIZATION AND FAULT MODELING OF INTERMEDIATE STATE DEFECTS IN STT-MRAM
    Speaker:
    Lizhou Wu, TUDelft, NL
    Authors:
    Lizhou Wu1, Siddharth Rao2, Mottaqiallah Taouil1, Erik Jan Marinissen3, Gouri Sankar Kar3 and Said Hamdioui1
    1Delft University of Technology, NL; 2imec, BE; 3IMEC, BE
    Abstract
    Understanding the defects in magnetic tunnel junctions (MTJs) and their faulty behaviors are paramount for developing high-quality tests for STT-MRAM. This paper characterizes and models intermediate (IM) state defects in MTJs; IM state manifests itself as an abnormal third resistive state, apart from the two bi-stable states of MTJ. We performed silicon measurements on MTJ devices with diameter ranging from 60nm to 120nm; the results reveal that the occurrence probability of IM state strongly depends on the switching direction, device size, and applied bias voltage. To test such defect, appropriate fault models are needed. Therefore, we use the advanced device-aware modeling approach, where we first physically model the defect and incorporate it into a Verilog-A MTJ compact model and calibrate it with silicon data. Thereafter, we use a systematic fault analysis to accurately validate a theoretically predefined fault space and derive realistic fault models. Our simulation results show that the IM state defect causes intermittent write transition faults. This paper also demonstrates that the conventional resistor-based fault modeling and test approach fails in appropriately modeling IM defects, and hence incapable of detecting such defects.
    16:15 CEST 12.7.2 AN EFFICIENT YIELD ESTIMATION METHOD FOR LAYOUTS OF HIGH DIMENSIONAL AND HIGH SIGMA SRAM ARRAYS
    Speaker:
    Yue Shen, Fudan University, CN
    Authors:
    Yue Shen1, Changhao Yan2, Sheng-Guo Wang3, Dian Zhou4 and Xuan Zeng1
    1Fudan University, CN; 2Associate Prof. Fudan University, CN; 3University of North Carolina at Charlotte, US; 4University of Texas, US
    Abstract
    This paper firstly focuses on yield estimation problem on post-layout-simulation of high dimensional SRAM arrays. Post-layout-simulation is much more credible than pre-simulation. However, it introduces strong relationship among SRAM columns. The Multi-Fidelity Gaussian Process model between the small and the large SRAM arrays near Optimal Shift Vector (OSV) is built. An iterative strategy is proposed and Multi-Modal method is applied to obtain more prior knowledge of the small SRAM arrays and further accelerate convergence. Experimental results show that the proposed method can gain 5-7x speedup with less relative errors than the state-of-the-art method for 384D cases.
    16:30 CEST IP10_1.2 MODELING OF THRESHOLD VOLTAGE DISTRIBUTION IN 3D NAND FLASH MEMORY
    Speaker:
    Weihua Liu, Huazhong University of Science and Technology, CN
    Authors:
    Weihua Liu1, Fei Wu1, Jian Zhou2, Meng Zhang1, Chengmo Yang3, Zhonghai Lu4, Yu Wang1 and Changsheng Xie1
    1Huazhong University of Science and Technology, CN; 2University of Central Florida, US; 3University of Delaware, US; 4KTH Royal Institute of Technology in Stockholm, SE
    Abstract
    3D NAND flash memory faces unprecedented complicated interference than planar NAND flash memory, resulting in more concern regarding reliability and performance. Stronger error correction code (ECC) and adaptive reading strategies are proposed to improve the reliability and performance taking a threshold voltage (Vth) distribution model as the backbone. However, the existing modeling methods are challenged to develop such a Vth distribution model for 3D NAND flash memory. To facilitate it, in this paper, we propose a machine learning-based modeling method. It employs a neural network taking advantage of the existing modeling methods and fully considers multiple interferences and variations in 3D NAND flash memory. Compared with state-of-the-art models, evaluations demonstrate it is more accurate and efficient for predicting Vth distribution.
    16:31 CEST 12.7.3 COST- AND DATASET-FREE STUCK-AT FAULT MITIGATION FOR RERAM-BASED DEEP LEARNING ACCELERATORS
    Speaker:
    Mohammed Fouda, University of California Irvine, US
    Authors:
    Giju Jung1, Mohammed Fouda2, Sugil Lee3, Jongeun Lee3, Ahmed Eltawil4 and Fadi Kurdahi5
    1Ulsan National Institute of Science and Technology, KR; 2University of California Irvine, US; 3Ulsan National Institute of Science and Technology (UNIST), KR; 4Professor, US; 5University of California, Irvine, US
    Abstract
    Resistive RAMs can implement extremely efficient matrix vector multiplication, drawing much attention for deep learning accelerator research. However, high fault rate is one of the fundamental challenges of ReRAM crossbar array-based deep learning accelerators. In this paper we propose a dataset-free, cost-free method to mitigate the impact of stuck-at faults in ReRAM crossbar arrays for deep learning applications. Our technique exploits the statistical properties of deep learning applications, hence complementary to previous hardware or algorithmic methods. Our experimental results using MNIST and CIFAR-10 datasets in binary networks demonstrate that our technique is very effective, both alone and together with previous methods, up to 20% fault rate, which is higher than the previous remapping methods. We also evaluate our method in the presence of other non-idealities such as variability and IR drop.

    12.8 Career forum: YPP panel

    Date: Thursday, 04 February 2021
    Time: 16:00 CEST - 17:00 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/n4S3Jug5K4658nWMX

    Session chair:
    Sara Vinco, Politecnico di Torino, IT

    Session co-chair:
    Anton Klotz, Cadence, DE

    Organizers:
    Xavier Salazar Forn, BSC, ES
    Antonio Louro, BSC, ES

    Panel session on different career opportunities having education in microelectronics. Panelists: Dr. Ioana Vatajelu – University Grenoble-Alpes Dr. Paul McLellan Cadence Design Systems Inc. Dr. David Moloney – Ubotica Technologies Dr. Andrea Kells - ARM Dr. John Davis – Barcelona Supercomputing Center

    Time Label Presentation Title
    Authors
    16:00 CEST 12.8.1 YPP PANEL ON DIFFERENT CAREER OPPORTUNITIES HAVING EDUCATION IN MICROELECTRONICS.
    Panelists:
    Sara Vinco1, Anton Klotz2, Elena Ioana Vatajelu3, Paul McLellan2, David Moloney4, Andrea Kells5 and John Davis6
    1Politecnico di Torino, IT; 2Cadence, US; 3TIMA, FR; 4Ubotica Technologies, IE; 5ARM, GB; 6BSC, ES
    Abstract
    The session will feature a round table discussion with different views and opportunities in computer science high-end research as speakers from Academia, Industry, and even self-made Entrepreneurs have been invited to give their insights and valuable knowledge on these different paths. Panelists: Dr. Ioana Vatajelu – University Grenoble-Alpes Dr. Paul McLellan Cadence Design Systems Inc. Dr. David Moloney – Ubotica Technologies Dr. Andrea Kells - ARM Dr. John Davis – Barcelona Supercomputing Center

    EXTH_K.YPP YPP Keynote

    Date: Thursday, 04 February 2021
    Time: 17:00 CEST - 17:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/jMCsSq4MB4yAqnbyr

    Session chair:
    Xavier Salazar Forn, BSC, ES

    Session co-chair:
    Antonio Louro, BSC, ES

    Organizers:
    Sara Vinco, Politecnico di Torino, IT
    Anton Klotz, Cadence, DE

    With the slow-down of the pace of Moore's law, concerns have been expressed on IC design becoming completely commodified. These concerns are misplaced: on the contrary, innovation through design is regaining importance with respect to innovation through technology evolution, as we need to become more creative to work around the hard brick walls in scaling devices. I will dray a couple of examples from my recent experience in designing machine learning accelerators and open source processors, to concretely illustrate how it is not only possible to find fun jobs in IC design, but also that there are interesting new options and business models to innovate in this area.

    Time Label Presentation Title
    Authors
    17:00 CEST EXTH_K.YPP.1 MOORE'S LAW IS IN TROUBLE... MORE JOBS IN IC DESIGN!
    Speaker and Author:
    Luca Benini, Università di Bologna and ETH Zurich, IT
    Abstract
    With the slow-down of the pace of Moore's law, concerns have been expressed on IC design becoming completely commodified. These concerns are misplaced: on the contrary, innovation through design is regaining importance with respect to innovation through technology evolution, as we need to become more creative to work around the hard brick walls in scaling devices. I will dray a couple of examples from my recent experience in designing machine learning accelerators and open source processors, to concretely illustrate how it is not only possible to find fun jobs in IC design, but also that there are interesting new options and business models to innovate in this area.

    IP.ASD_2 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 17:00 CEST - 17:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/nM6kYLXg8nwB5C4un

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    17:00 CEST IP.ASD_2.1 SYSTEMS ENGINEERING ROADMAP FOR DEPENDABLE AUTONOMOUS CYBER-PHYSICAL SYSTEMS
    Speaker and Author:
    Rasmus Adler, Fraunhofer IESE, DE
    Abstract
    Autonomous cyber-physical systems have enormous potential to make our lives more sustainable, more comfortable, and more economical. Artificial Intelligence and connectivity enable autonomous behavior, but often stand in the way of market launch. Traditional engineering techniques are no longer sufficient to achieve the desired dependability; current legal and normative regulations are inappropriate or insufficient. This paper discusses these issues, proposes advanced systems engineering to overcome these issues, and provides a roadmap by structuring fields of action.

    IP.MPIRP_1 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 17:00 CEST - 17:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/9HJsmCiF4z9qJ4NYs

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP.MPIRP_1.1 PROJECT OVERVIEW FOR STEP-UP!CPS – PROCESS, METHODS AND TECHNOLOGIES FOR UPDATING SAFETY-CRITICAL CYBER-PHYSICAL SYSTEMS
    Speaker:
    Carl Philipp Hohl, FZI Forschungszentrum Informatik, DE
    Authors:
    Thomas Strathmann1, Georg Hake2, Houssem Guissouma3, Carl Philipp Hohl4, Yosab Bebawy1, Sebastian Vander Maelen1 and Andrew Koerner5
    1OFFIS e.V., DE; 2University of Oldenburg, DE; 3Karlsruhe Institute of Technology (KIT), DE; 4FZI Forschungszentrum Informatik, DE; 5DLR, DE
    Abstract
    We describe the challenges addressed by the three year German national collaborative research project Step-Up!CPS that is currently in its third year. The goal of the project is to develop software methods and technologies for modular updates of safety-critical cyber-physical systems. To make this possible, contracts are utilized, which formally describe the behaviour of an update and make it verifiable at different times of the update life cycle. We have defined a development process that allows for a continuous improvement of such systems by monitoring their operation, identifying the need for updates, and development and deploying these updates in a safe and secure manner. We highlight the points along the update process that are necessary for a secure update and show how we counteract them in a contractually secured update process.
    IP.MPIRP_1.2 VERIDEVOPS: AUTOMATED PROTECTION AND PREVENTION TO MEET SECURITY REQUIREMENTS IN DEVOPS
    Speaker:
    Eduard Enoiu, MDH, SE
    Authors:
    Andrey Sadovykh1, Gunnar Widforss2, Dragos Truscan3, Eduard Paul Enoiu2, Wissam Mallouli4, Rosa Iglesias5, Alessandra Bagnto6 and Olga Hendel2
    1Innopolis University, RU; 2Mälardalen University, SE; 3Åbo Akademi University, FI; 4Montimage EURL, FR; 5Ikerlan Technology Research Centre, ES; 6Softeam, FR
    Abstract
    VeriDevOps is a Horizon 2020 funded research project in its initial stage. The project started on 1.10.2020 and it will run for three years. VeriDevOps is about fast, flexible software engineering that efficiently integrates development, delivery, and operations, thus aiming at quality deliveries with short cycle time to address ever evolving challenges. Current software development practices are increasingly based on using both COTS and legacy components which make such systems prone to security vulnerabilities. The modern practice addressing ever changing conditions, DevOps, promotes frequent software deliveries, however, verification methods artifacts should be updated in a timely fashion to cope with the pace of the process. VeriDevOps aims at providing a faster feedback loop for verifying the security requirements and other quality attributes of large scale cyber-physical systems. VeriDevOps focuses on optimizing the security verification activities, by automatically creating verifiable models directly from security requirements, and using these models to check security properties on design models and generate artefacts such as, automatically generated tests or monitors that can be used later in the DevOps process. The main drivers for these advances are: Natural Language Processing, a combined formal verification and model-based testing approach and machine-learning-based security monitors. In this paper, we present the planned contributions of the project, its consortium and its planned way of working used to accomplish the expected results.

    IP.MPIRP_2 Interactive Presentations

    Date: Thursday, 04 February 2021
    Time: 17:00 CEST - 17:30 CEST
    Virtual Conference Room: https://virtual21.date-conference.com/meetings/virtual/9PXs3DNoMeqKW3ENC

    Interactive Presentations run simultaneously during a 30-minute slot. Additionally, each IP paper is briefly introduced in a one-minute presentation in a corresponding regular session

    Time Label Presentation Title
    Authors
    IP.MPIRP_2.1 THE UP2DATE BASELINE RESEARCH PLATFORMS
    Speaker:
    Alvaro Jover, Barcelona Supercomputing Center (BSC) and Universitat Politècnica de Catalunya (UPC), ES
    Authors:
    Alvaro Jover-Alvarez1, Alejandro J. Calderón2, Ivan Rodriguez Ferrandez3, Leonidas Kosmidis4, Kazi Asifuzzaman4, Patrick Uven5, Kim Grüttner6, Tomaso Poggi7 and Irune Agirre8
    1Universitat Politècnica de Catalunya (UPC) and Barcelona Supercomputing Center (BSC), ES; 2Ikerlan Technology Research Centre, ES; 3Barcelona Supercomputing Center and Universitat Politècnica