Optimizing Trade-off Between Pollutant Exposure and Energy Consumption in Buildings: Uncertainty Informed Reinforcement Learning and Robustness Analysis of Control Policies

Patel, Sameer

doi:10.1016/j.buildenv.2026.114224

Optimizing Trade-off Between Pollutant Exposure and Energy Consumption in Buildings: Uncertainty Informed Reinforcement Learning and Robustness Analysis of Control Policies

Source

Building and Environment

ISSN

03601323

Date Issued

2026-03-01

Author(s)

Mishra, Nishchaya Kumar

Patel, Sameer

DOI

10.1016/j.buildenv.2026.114224

Volume

291

Abstract

Intelligent control systems are crucial for ensuring occupants' comfort and minimizing exposure to indoor pollutants. At the same time, these systems must optimize the building's energy use. Recent research in building science and control has demonstrated the advantages of reinforcement learning (RL) agents over their physics-based and rule-based counterparts in optimizing indoor environment dynamics. While previous studies have shown that RL agents can be transferred across buildings, ensuring their robustness and reliability is essential for such transfer to be effective in real-world applications. Consequently, this study analyzes the robustness of the decision-making ability of a deep Q network (DQN)-based RL agent and estimates the uncertainties in the predicted actions using the Monte Carlo (MC) dropout approach. The performance of twelve action selection policies (MC-DQN1 to MC-DQN12) has been assessed in terms of associated particulate matter (PM) exposure and energy consumption, with the traditional DQN (TradDQN) serving as the benchmark. The analysis reveals substantial variability in average exposure among different MC-DQNs for various emission activities, with variations ranging from -23% to +34% during high-emission activities. Finally, fractional exposures corresponding to different indoor PM levels (≤ 10, 11–20, 21–30, 31–40, and > 40 µg m<sup>-3</sup>) have been estimated to identify the specific periods during which these MC-DQNs underperform. Barring a few MC-DQNs that effectively reduced exposure at high PM levels (> 40 µg m<sup>-3</sup>), other agents struggled to bring the PM levels to the desired level.

Unpaywall

URI

https://repository.iitgn.ac.in/handle/IITG2025/33945

Keywords

Deep reinforcement learning | Energy-exposure tradeoff | Indoor environment control | Optimization | Uncertainty quantification