Articles de revista
http://hdl.handle.net/2117/3912
2024-03-29T05:30:50ZAn M-ary concentration shift keying with common detection thresholds for multitransmitter molecular communication
http://hdl.handle.net/2117/405512
An M-ary concentration shift keying with common detection thresholds for multitransmitter molecular communication
Shitiri, Ethungshan; Cho, Ho-Shin
Concentration shift keying (CSK) is a widely studied modulation technique for molecular communication-based nanonetworks, which is a key enabler for the Internet of Bio-NanoThings (IoBNT). Existing CSK methods, while offering optimal error performance, suffer from increased operational complexity that scales poorly as the number of transmitters, K, grows. In this study, a novel M-ary CSK method is proposed: CSK with common detection thresholds (CSK-CT). CSK-CT uses common thresholds, set sufficiently low to guarantee the reliable detection of symbols from all transmitters, regardless of distance. Closed-form expressions are derived to obtain the common thresholds and release concentrations. To further enhance error performance, the release concentration is optimized using a scaling exponent that also optimizes the common thresholds. The performance of CSK-CT is evaluated against the benchmark CSK across various K and M values. CSK-CT has an error probability between 10-7 and 10-4, which is a substantial improvement from that of the benchmark CSK (from 10-4 to 10-3). In terms of complexity, CSK-CT is O(n) and does not scale with K but M « (MK), whereas the benchmark is O(n2). Furthermore, CSK-CT can mitigate inter-symbol interference (ISI), although this facet merits further investigation. Owing to its low error rates, improved scalability, reduced complexity, and potential ISI mitigation features, CSK-CT is particularly advantageous for IoBNT applications focused on data gathering. Its effectiveness is especially notable in scenarios where a computationally limited receiver is tasked with collecting vital health data from multiple transmitters.
2024-03-28T11:41:03ZShitiri, EthungshanCho, Ho-ShinConcentration shift keying (CSK) is a widely studied modulation technique for molecular communication-based nanonetworks, which is a key enabler for the Internet of Bio-NanoThings (IoBNT). Existing CSK methods, while offering optimal error performance, suffer from increased operational complexity that scales poorly as the number of transmitters, K, grows. In this study, a novel M-ary CSK method is proposed: CSK with common detection thresholds (CSK-CT). CSK-CT uses common thresholds, set sufficiently low to guarantee the reliable detection of symbols from all transmitters, regardless of distance. Closed-form expressions are derived to obtain the common thresholds and release concentrations. To further enhance error performance, the release concentration is optimized using a scaling exponent that also optimizes the common thresholds. The performance of CSK-CT is evaluated against the benchmark CSK across various K and M values. CSK-CT has an error probability between 10-7 and 10-4, which is a substantial improvement from that of the benchmark CSK (from 10-4 to 10-3). In terms of complexity, CSK-CT is O(n) and does not scale with K but M « (MK), whereas the benchmark is O(n2). Furthermore, CSK-CT can mitigate inter-symbol interference (ISI), although this facet merits further investigation. Owing to its low error rates, improved scalability, reduced complexity, and potential ISI mitigation features, CSK-CT is particularly advantageous for IoBNT applications focused on data gathering. Its effectiveness is especially notable in scenarios where a computationally limited receiver is tasked with collecting vital health data from multiple transmitters.Innovative predictive approach towards a personalized oxygen dosing system
http://hdl.handle.net/2117/405437
Innovative predictive approach towards a personalized oxygen dosing system
Pascual Saldaña, Heribert; Masip Bruin, Xavier; Asensio Garcia, Adrian; Alonso Beltran, Albert; Blanco Vich, Isabel
Despite the large impact chronic obstructive pulmonary disease (COPD) that has on the population, the implementation of new technologies for diagnosis and treatment remains limited. Current practices in ambulatory oxygen therapy used in COPD rely on fixed doses overlooking the diverse activities which patients engage in. To address this challenge, we propose a software architecture aimed at delivering patient-personalized edge-based artificial intelligence (AI)-assisted models that are built upon data collected from patients’ previous experiences along with an evaluation function. The main objectives reside in proactively administering precise oxygen dosages in real time to the patient (the edge), leveraging individual patient data, previous experiences, and actual activity levels, thereby representing a substantial advancement over conventional oxygen dosing. Through a pilot test using vital sign data from a cohort of five patients, the limitations of a one-size-fits-all approach are demonstrated, thus highlighting the need for personalized treatment strategies. This study underscores the importance of adopting advanced technological approaches for ambulatory oxygen therapy.
2024-03-27T10:13:32ZPascual Saldaña, HeribertMasip Bruin, XavierAsensio Garcia, AdrianAlonso Beltran, AlbertBlanco Vich, IsabelDespite the large impact chronic obstructive pulmonary disease (COPD) that has on the population, the implementation of new technologies for diagnosis and treatment remains limited. Current practices in ambulatory oxygen therapy used in COPD rely on fixed doses overlooking the diverse activities which patients engage in. To address this challenge, we propose a software architecture aimed at delivering patient-personalized edge-based artificial intelligence (AI)-assisted models that are built upon data collected from patients’ previous experiences along with an evaluation function. The main objectives reside in proactively administering precise oxygen dosages in real time to the patient (the edge), leveraging individual patient data, previous experiences, and actual activity levels, thereby representing a substantial advancement over conventional oxygen dosing. Through a pilot test using vital sign data from a cohort of five patients, the limitations of a one-size-fits-all approach are demonstrated, thus highlighting the need for personalized treatment strategies. This study underscores the importance of adopting advanced technological approaches for ambulatory oxygen therapy.Assessing Saiph, a task-based DSL for high-performance computational fluid dynamics
http://hdl.handle.net/2117/404539
Assessing Saiph, a task-based DSL for high-performance computational fluid dynamics
Macià Sorrosal, Sandra; Martínez Ferrer, Pedro José; Ayguadé Parra, Eduard; Beltran Querol, Vicenç
Scientific applications face the challenge of efficiently exploiting increasingly complex parallel and distributed systems. Developing hand-tuned codes is a time-consuming, tedious and hardly reusable task. Reaching high performance appears detrimental to productivity and portability and unreasonable to expect from scientists. Domain-Specific Languages (DSLs) are collaborative environments aiming to overcome such difficulties by decoupling the problem description from the algorithmic implementation. However, developing a competitive tool in High-Performance Computing (HPC) is challenging: DSLs for HPC environments have two additional critical requirements, performance and scalability. Moreover, documented and successful cases are few, making it difficult to popularise DSLs as problem-solving environments for scientific HPC code development. In this context, Saiph is a task-based DSL easing the simulation of physical phenomena from Computational Fluid Dynamics (CFD), developed to meet HPC productivity and performance requirements. This work reports the tuning and evaluation of Saiph using the Taylor–Green Vortex (TGV) problem as a case study. We assess Saiph’s productivity, numerical methods, and high-performance strategies to illustrate its use and demonstrate its competitiveness, viability and benefits for CFD software developments in HPC environments. Hence, we contribute to the popularisation of HPC DSLs as suitable problem-solving environments able to unify modern computational and scientific knowledge.
2024-03-14T12:15:26ZMacià Sorrosal, SandraMartínez Ferrer, Pedro JoséAyguadé Parra, EduardBeltran Querol, VicençScientific applications face the challenge of efficiently exploiting increasingly complex parallel and distributed systems. Developing hand-tuned codes is a time-consuming, tedious and hardly reusable task. Reaching high performance appears detrimental to productivity and portability and unreasonable to expect from scientists. Domain-Specific Languages (DSLs) are collaborative environments aiming to overcome such difficulties by decoupling the problem description from the algorithmic implementation. However, developing a competitive tool in High-Performance Computing (HPC) is challenging: DSLs for HPC environments have two additional critical requirements, performance and scalability. Moreover, documented and successful cases are few, making it difficult to popularise DSLs as problem-solving environments for scientific HPC code development. In this context, Saiph is a task-based DSL easing the simulation of physical phenomena from Computational Fluid Dynamics (CFD), developed to meet HPC productivity and performance requirements. This work reports the tuning and evaluation of Saiph using the Taylor–Green Vortex (TGV) problem as a case study. We assess Saiph’s productivity, numerical methods, and high-performance strategies to illustrate its use and demonstrate its competitiveness, viability and benefits for CFD software developments in HPC environments. Hence, we contribute to the popularisation of HPC DSLs as suitable problem-solving environments able to unify modern computational and scientific knowledge.Energy hardware and workload aware job scheduling towards interconnected HPC environments
http://hdl.handle.net/2117/404193
Energy hardware and workload aware job scheduling towards interconnected HPC environments
D'Amico, Marco; Corbalán González, Julita
New HPC machines are getting close to the exascale. Power consumption for those machines has been increasing, and researchers are studying ways to reduce it. A second trend is HPC machines' growing complexity, with increasing heterogeneous hardware components and different clusters architectures cooperating in the same machine. We refer to these environments with the term heterogeneous multi-cluster environments. With the aim of optimizing performance and energy consumption in these environments, this paper proposes an Energy-Aware-Multi-Cluster (EAMC) job scheduling policy. EAMC-policy is able to optimize the scheduling and placement of jobs by predicting performance and energy consumption of arriving jobs for different hardware architectures and processor frequencies, reducing workload's energy consumption, makespan, and response time. The policy assigns a different priority to each job-resource combination so that the most efficient ones are favored, while less efficient ones are still considered on a variable degree, reducing response time and increasing cluster utilization. We implemented EAMC-policy in Slurm, and we evaluated a scenario in which two CPU clusters collaborate in the same machine. Simulations of workloads running applications modeled from real-world show a reduction of response time and makespan by up to 25% and 6% while saving up to 20% of total energy consumed when compared to policies minimizing runtime, and by 49%, 26%, and 6% compared to policies minimizing energy.
2024-03-12T10:20:25ZD'Amico, MarcoCorbalán González, JulitaNew HPC machines are getting close to the exascale. Power consumption for those machines has been increasing, and researchers are studying ways to reduce it. A second trend is HPC machines' growing complexity, with increasing heterogeneous hardware components and different clusters architectures cooperating in the same machine. We refer to these environments with the term heterogeneous multi-cluster environments. With the aim of optimizing performance and energy consumption in these environments, this paper proposes an Energy-Aware-Multi-Cluster (EAMC) job scheduling policy. EAMC-policy is able to optimize the scheduling and placement of jobs by predicting performance and energy consumption of arriving jobs for different hardware architectures and processor frequencies, reducing workload's energy consumption, makespan, and response time. The policy assigns a different priority to each job-resource combination so that the most efficient ones are favored, while less efficient ones are still considered on a variable degree, reducing response time and increasing cluster utilization. We implemented EAMC-policy in Slurm, and we evaluated a scenario in which two CPU clusters collaborate in the same machine. Simulations of workloads running applications modeled from real-world show a reduction of response time and makespan by up to 25% and 6% while saving up to 20% of total energy consumed when compared to policies minimizing runtime, and by 49%, 26%, and 6% compared to policies minimizing energy.Charging of the ABR service in ATM networks: a numerical example
http://hdl.handle.net/2117/403991
Charging of the ABR service in ATM networks: a numerical example
Cerdà Alabern, Llorenç; Casals Torres, Olga M.
The Available Bit Rate service (ABR) is a "best effort" service intended for traffic which imposes no bound on delay or delay variation. The network guarantees a Minimum Cell Rate (MCR) for an ABR source, which is negotiated at the connection set up, and commits to fairly divide the unused bandwidth among all ABR sources.
2024-03-08T10:55:11ZCerdà Alabern, LlorençCasals Torres, Olga M.The Available Bit Rate service (ABR) is a "best effort" service intended for traffic which imposes no bound on delay or delay variation. The network guarantees a Minimum Cell Rate (MCR) for an ABR source, which is negotiated at the connection set up, and commits to fairly divide the unused bandwidth among all ABR sources.O(n) key–value sort with active compute memory
http://hdl.handle.net/2117/403907
O(n) key–value sort with active compute memory
Esmaili Dokht, Pouya; Guiot Cusido, Miquel; Radojkovic, Petar; Martorell Bofill, Xavier; Ayguadé Parra, Eduard; Labarta Mancho, Jesús José; Adlard, Jason; Amato, Paolo; Sforzin, Marco
We propose the Active Compute Memory (ACM), a near-memory-processing architecture capable of performing key–value sort directly in the DRAM. In the ACM architecture, sort is merely the writing of data into memory with one addressing protocol (perspective) and reading it back with different perspective. The first perspective is conventional, based on the data address; the second perspective is the sorted order. The ACM requires additional tables to store the meta-data and moderate control logic enhancements that can be implemented directly in the DRAM silicon. By these modest enhancements to DRAM, ACM exploits the parallelism inherently available in the row buffer to enable sort with O ( n ) complexity. This leads to an order of magnitude improvement in ACM performance and energy compared to conventional O ( n log n ) CPU-centric sort algorithms. The ACM also shows superior performance compared to other near-memory sort accelerators. This is because the ACM processing is done near the row buffer and it exploits much lower memory access latency, higher bandwidth and wider parallel processing. The sort operation covered in this paper is just an example of an address management operation that can be efficiently implemented directly in the DRAM silicon. We release as an open source the simulation infrastructure for the ACM performance and energy modeling. We would encourage the community to use it, adapt it to other PIM proposals, and share their own evaluations.
2024-03-07T07:22:54ZEsmaili Dokht, PouyaGuiot Cusido, MiquelRadojkovic, PetarMartorell Bofill, XavierAyguadé Parra, EduardLabarta Mancho, Jesús JoséAdlard, JasonAmato, PaoloSforzin, MarcoWe propose the Active Compute Memory (ACM), a near-memory-processing architecture capable of performing key–value sort directly in the DRAM. In the ACM architecture, sort is merely the writing of data into memory with one addressing protocol (perspective) and reading it back with different perspective. The first perspective is conventional, based on the data address; the second perspective is the sorted order. The ACM requires additional tables to store the meta-data and moderate control logic enhancements that can be implemented directly in the DRAM silicon. By these modest enhancements to DRAM, ACM exploits the parallelism inherently available in the row buffer to enable sort with O ( n ) complexity. This leads to an order of magnitude improvement in ACM performance and energy compared to conventional O ( n log n ) CPU-centric sort algorithms. The ACM also shows superior performance compared to other near-memory sort accelerators. This is because the ACM processing is done near the row buffer and it exploits much lower memory access latency, higher bandwidth and wider parallel processing. The sort operation covered in this paper is just an example of an address management operation that can be efficiently implemented directly in the DRAM silicon. We release as an open source the simulation infrastructure for the ACM performance and energy modeling. We would encourage the community to use it, adapt it to other PIM proposals, and share their own evaluations.Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials
http://hdl.handle.net/2117/403584
Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials
Vetter, Jeffrey; Date, Prasanna; Fahim, Farah; Kulkarni, Shruti R.; Maksymovych, Petro; Talin, Alec; González Tallada, Marc; Vanna Iampikul, Pruek; Young, Aaron Reed; Brooks, David
The Abisko project aims to develop an energy-efficient spiking neural network (SNN) computing architecture and software system capable of autonomous learning and operation. The SNN architecture explores novel neuromorphic devices that are based on resistive-switching materials, such as memristors and electrochemical RAM. Equally important, Abisko uses a deep codesign approach to pursue this goal by engaging experts from across the entire range of disciplines: materials, devices and circuits, architectures and integration, software, and algorithms. The key objectives of our Abisko project are threefold. First, we are designing an energy-optimized high-performance neuromorphic accelerator based on SNNs. This architecture is being designed as a chiplet that can be deployed in contemporary computer architectures and we are investigating novel neuromorphic materials to improve its design. Second, we are concurrently developing a productive software stack for the neuromorphic accelerator that will also be portable to other architectures, such as field-programmable gate arrays and GPUs. Third, we are creating a new deep codesign methodology and framework for developing clear interfaces, requirements, and metrics between each level of abstraction to enable the system design to be explored and implemented interchangeably with execution, measurement, a model, or simulation. As a motivating application for this codesign effort, we target the use of SNNs for an analog event detector for a high-energy physics sensor.
2024-03-01T10:40:01ZVetter, JeffreyDate, PrasannaFahim, FarahKulkarni, Shruti R.Maksymovych, PetroTalin, AlecGonzález Tallada, MarcVanna Iampikul, PruekYoung, Aaron ReedBrooks, DavidThe Abisko project aims to develop an energy-efficient spiking neural network (SNN) computing architecture and software system capable of autonomous learning and operation. The SNN architecture explores novel neuromorphic devices that are based on resistive-switching materials, such as memristors and electrochemical RAM. Equally important, Abisko uses a deep codesign approach to pursue this goal by engaging experts from across the entire range of disciplines: materials, devices and circuits, architectures and integration, software, and algorithms. The key objectives of our Abisko project are threefold. First, we are designing an energy-optimized high-performance neuromorphic accelerator based on SNNs. This architecture is being designed as a chiplet that can be deployed in contemporary computer architectures and we are investigating novel neuromorphic materials to improve its design. Second, we are concurrently developing a productive software stack for the neuromorphic accelerator that will also be portable to other architectures, such as field-programmable gate arrays and GPUs. Third, we are creating a new deep codesign methodology and framework for developing clear interfaces, requirements, and metrics between each level of abstraction to enable the system design to be explored and implemented interchangeably with execution, measurement, a model, or simulation. As a motivating application for this codesign effort, we target the use of SNNs for an analog event detector for a high-energy physics sensor.Finding, analysing and solving MPI communication bottlenecks in Earth System models
http://hdl.handle.net/2117/403445
Finding, analysing and solving MPI communication bottlenecks in Earth System models
Tintó Prims, Oriol; Castrillo Melguizo, Miguel; Acosta Cobos, Mario César; Mula Valls, Josep Oriol; Sánchez Lorente, Alícia; Serradell Maronda, Kim; Cortés Fité, Ana; Doblas Reyes, Francisco
It is a matter of consensus that the ability to efficiently use current and future high performance computing systems is crucial for science, however, the reality is that the performance currently achieved by most of the parallel scientific applications is far from desired. Despite inter-process communication has already been a matter of study in many different works, it is a fact that their recommendations are not taken into account in most of computational model development processes, at least in the case of Earth Science. This work presents a methodology that aims to help scientists working with computational models using inter-process communication, to deal with the difficulties they face when trying to understand their applications behaviour. Following a series of steps that are presented here, both users and developers will learn how to identify performance issues by characterizing applications scalability, identifying which parts present a bad performance and understand the role that inter-process communication plays. In this work, the Nucleus for European Modelling of the Ocean (NEMO), the state-of-the-art European global ocean circulation model, will be used as an example of success. It is a community code widely used in Europe, to the extent that more than a hundred million core hours are used every year in experiments involving NEMO. In the analysis exercise, it is shown how to answer the questions of where, why and what is degrading model's scalability, and how this information can help developers in finding solutions that will mitigate their eventual issues. This document also demonstrates how performance analysis carried out with small size experiments, using limited resources, can lead to optimizations that will impact bigger experiments running on thousands of cores, making it easier to deal with the exascale challenge.
2024-02-29T11:59:29ZTintó Prims, OriolCastrillo Melguizo, MiguelAcosta Cobos, Mario CésarMula Valls, Josep OriolSánchez Lorente, AlíciaSerradell Maronda, KimCortés Fité, AnaDoblas Reyes, FranciscoIt is a matter of consensus that the ability to efficiently use current and future high performance computing systems is crucial for science, however, the reality is that the performance currently achieved by most of the parallel scientific applications is far from desired. Despite inter-process communication has already been a matter of study in many different works, it is a fact that their recommendations are not taken into account in most of computational model development processes, at least in the case of Earth Science. This work presents a methodology that aims to help scientists working with computational models using inter-process communication, to deal with the difficulties they face when trying to understand their applications behaviour. Following a series of steps that are presented here, both users and developers will learn how to identify performance issues by characterizing applications scalability, identifying which parts present a bad performance and understand the role that inter-process communication plays. In this work, the Nucleus for European Modelling of the Ocean (NEMO), the state-of-the-art European global ocean circulation model, will be used as an example of success. It is a community code widely used in Europe, to the extent that more than a hundred million core hours are used every year in experiments involving NEMO. In the analysis exercise, it is shown how to answer the questions of where, why and what is degrading model's scalability, and how this information can help developers in finding solutions that will mitigate their eventual issues. This document also demonstrates how performance analysis carried out with small size experiments, using limited resources, can lead to optimizations that will impact bigger experiments running on thousands of cores, making it easier to deal with the exascale challenge.Perpetual reconfigurable intelligent surfaces through in-band energy harvesting: architectures, protocols, and challenges
http://hdl.handle.net/2117/402673
Perpetual reconfigurable intelligent surfaces through in-band energy harvesting: architectures, protocols, and challenges
Ntontin, Konstantinos; Boulogeorgos, Alexandros Apostolos A.; Abadal Cavallé, Sergi; Mesodiakaki, Agapi; Chatzinotas, Symeon; Ottersten, Björn
Reconfigurable intelligent surfaces (RISs) are considered a key enabler of highly energy-efficient 6G and beyond networks. This property arises from the absence of power amplifiers in the structure, in contrast to active nodes, such as small cells and relays. However, a certain amount of power is still required for RIS operation. To improve their energy efficiency further, we propose the notion of perpetual RISs, which secure the power needed to supply their functionalities through wireless energy harvesting (EH) of impinging transmitted electromagnetic (EM) signals. Toward this, we initially explain the rationale behind such RIS capability and proceed with a presentation of the main RIS controller architecture that can realize this vision under an in-band EH consideration. Furthermore, we present a typical EH architecture, followed by two harvesting protocols. Subsequently, we study the performance of the two protocols under a typical communications scenario. Finally, we elaborate on the main research challenges governing the realization of large-scale networks with perpetual RISs.
2024-02-22T12:52:56ZNtontin, KonstantinosBoulogeorgos, Alexandros Apostolos A.Abadal Cavallé, SergiMesodiakaki, AgapiChatzinotas, SymeonOttersten, BjörnReconfigurable intelligent surfaces (RISs) are considered a key enabler of highly energy-efficient 6G and beyond networks. This property arises from the absence of power amplifiers in the structure, in contrast to active nodes, such as small cells and relays. However, a certain amount of power is still required for RIS operation. To improve their energy efficiency further, we propose the notion of perpetual RISs, which secure the power needed to supply their functionalities through wireless energy harvesting (EH) of impinging transmitted electromagnetic (EM) signals. Toward this, we initially explain the rationale behind such RIS capability and proceed with a presentation of the main RIS controller architecture that can realize this vision under an in-band EH consideration. Furthermore, we present a typical EH architecture, followed by two harvesting protocols. Subsequently, we study the performance of the two protocols under a typical communications scenario. Finally, we elaborate on the main research challenges governing the realization of large-scale networks with perpetual RISs.The effects of weight quantization on online federated learning for the IoT: a case study
http://hdl.handle.net/2117/402637
The effects of weight quantization on online federated learning for the IoT: a case study
Llisterri Giménez, Nil; Lee, Junkyu; Freitag, Fèlix; Vandierendonck, Hans
Many weight quantization approaches were explored to save the communication bandwidth between the clients and the server in federated learning using high-end computing machines. However, there is a lack of weight quantization research for online federated learning using TinyML devices which are restricted by the mini-batch size, the neural network size, and the communication method due to their severe hardware resource constraints and power budgets. We name Tiny Online Federated Learning (TinyOFL) for online federated learning using TinyML devices in the Internet of Things (IoT). This paper performs a comprehensive analysis of the effects of weight quantization in TinyOFL in terms of accuracy, stability, overfitting, communication efficiency, energy consumption, and delivery time, and extracts practical guidelines on how to apply the weight quantization to TinyOFL. Our analysis is supported by a TinyOFL case study with three Arduino Portenta H7 boards running federated learning clients for a keyword spotting task. Our findings include that in TinyOFL, a more aggressive weight quantization can be allowed than in online learning without FL, without affecting the accuracy thanks to TinyOFL’s quasi-batch training property. For example, using 7-bit weights achieved the equivalent accuracy to 32-bit floating point weights, while saving communication bandwidth by 4.6× . Overfitting by increasing network width rarely occurs in TinyOFL, but may occur if strong weight quantization is applied. The experiments also showed that there is a design space for TinyOFL applications by compensating for the accuracy loss due to weight quantization with an increase of the neural network size.
2024-02-22T11:06:28ZLlisterri Giménez, NilLee, JunkyuFreitag, FèlixVandierendonck, HansMany weight quantization approaches were explored to save the communication bandwidth between the clients and the server in federated learning using high-end computing machines. However, there is a lack of weight quantization research for online federated learning using TinyML devices which are restricted by the mini-batch size, the neural network size, and the communication method due to their severe hardware resource constraints and power budgets. We name Tiny Online Federated Learning (TinyOFL) for online federated learning using TinyML devices in the Internet of Things (IoT). This paper performs a comprehensive analysis of the effects of weight quantization in TinyOFL in terms of accuracy, stability, overfitting, communication efficiency, energy consumption, and delivery time, and extracts practical guidelines on how to apply the weight quantization to TinyOFL. Our analysis is supported by a TinyOFL case study with three Arduino Portenta H7 boards running federated learning clients for a keyword spotting task. Our findings include that in TinyOFL, a more aggressive weight quantization can be allowed than in online learning without FL, without affecting the accuracy thanks to TinyOFL’s quasi-batch training property. For example, using 7-bit weights achieved the equivalent accuracy to 32-bit floating point weights, while saving communication bandwidth by 4.6× . Overfitting by increasing network width rarely occurs in TinyOFL, but may occur if strong weight quantization is applied. The experiments also showed that there is a design space for TinyOFL applications by compensating for the accuracy loss due to weight quantization with an increase of the neural network size.