This section describes in more details about ANN and the
Hecht-Nielsen proposed the formal definition of an
Artificial Neural Network in 70:
Neural Network is a parallel, distributed information processing structure
consisting of processing units (which can possess a local memory and can carry
out localized information processing operations) interconnected via
unidirectional signal channels called connections. Each processing unit has a
single output connection that branches (“fans out”) into as many
collateral connections as desired; each carries the same signal – the
processing unit output signal. The processing unit output signal can be of any
mathematical type desired. The information processing that goes on within each
processing unit can be defined arbitrarily with the restriction that it must be
completely local; that is, it must depend only on the current values of the
input signals arriving at the processing element via impinging connections and
on values stored in the processing unit’s local memory.”
Department of Aeronautics (ITDA, Sao Paulo) mentions in 71 that there can be various ANN
models, but each model can be precisely specified by the following eight major aspects,
also as stated in 72:
A set of processing units
A state of activation for each unit
An output function for each unit
A pattern of connectivity among units or
topology of the network
A propagation rule, or combining function, to
propagate the activities of the units through the network
An activation rule to update the activities of
each unit by using the current activation value and the inputs received from
An external environment that provides
information to the network and/or interacts with it.
A learning rule to modify the pattern of
connectivity by using information provided by the external environment.
Louis Francis in 73 states that Neural Networks
originated in the artificial intelligence discipline, where they’re often
portrayed as a brain in a computer. They are designed to incorporate key
features of neurons in the brain and to process data in a manner analogous to
the human brain. Much of the terminology used to describe and explain neural
networks is borrowed from biology. Data mining tools can be trained to identify
complex relationships in data. Typically, the data sets are large, with the
number of records at least in the tens of thousands and the number of
independent variables often in the hundreds. Their advantage over classical
statistical models used to analyse data, such as regression and ANOVA, is that
they can fit data where the relationship between independent and dependent variables
is nonlinear and where the specific form of the nonlinear relationship is
Artificial neural networks share the same advantages as many
other data mining tools, but also offer advantages of their own 73. For instance, decision tree,
a method of splitting data into homogenous clusters with similar expected
values for the dependent variable, are often less effective when the predictor
variables are continuous than when they are categorical 73. Neural networks work well
with both categorical and continuous variables.
There are several data mining techniques, such as regression
splines, were developed by statisticians 73. Louis further states in 73 that the data mining
techniques are computationally intensive generalizations of classical linear
models. Classical linear models assume that the functional relationship between
the independent variables and the dependent variable is linear. Classical modelling
also allows linear relationships that result from a transformation of dependent
or independent variables, so some nonlinear relationships can be approximated.
Neural networks and other data mining techniques don’t require that the relationships
between predictor and dependent variables be linear (whether the variables are
transformed or not).
The various data mining tools differ in their approaches to
approximating nonlinear functions and complex data structures 73. Neural networks use a series
of neurons in what is known as the hidden layer that apply nonlinear activation
functions to approximate complex functions in the data.
Despite their advantages, Louis 73 states that many
statisticians and actuaries are reluctant to embrace neural networks. One
reason is that they are considered a “black box”: Data goes in and a prediction
comes out, but the nature of the relationship between independent and dependent
variables is usually not revealed. Because of the complexity of the functions
used in the neural network approximations, neural network software typically
does not supply the user with information about the nature of the relationship
between predictor and target variables. The output of a neural network is a
predicted value and some goodness-of-fit statistics. However, the functional
form of the relationship between independent and dependent variables is not
In addition, the strength of the relationship between
dependent and independent variables, i.e., the importance of each variable, is
also often not revealed 73. Classical models as well as
other popular data mining techniques, such as decision trees, supply the user
with a functional description or map of the relationships.
There exist two main types of training process: supervised
and unsupervised training 74. In supervised training (e.g.
multi-layer feed-forward (MLF) neural network), the neural network knows the
desired output, and adjusting of the weight coefficients is done in such a way
that the calculated and desired outputs are as close as possible to each other 74. Unsupervised training (e.g.
Kohonen network 4) means, that the desired output is not known, the system is
provided with a group of facts (patterns) and then left to itself, to train and
settle down (or not) to a stable state in some number of iterations 74.
The Neural Network type most commonly used is the
Feedforward Network or the Multilayer Perceptron. This is also called the
Backpropagation Neural Network as it uses the Backpropagation Algorithm 73,74.
A neural network model contains three types of layers – an
input layer, hidden layer(s), and an output layer. A feedforward neural network
is a network where the signal is passed from an input layer of neurons through
a hidden layer to an output layer of neurons 73.
The input layer is the first layer of a Neural Network Model
and contains a list of influencers, or input parameters. These input
parameters, occupying the input nodes, represent the actual data used to fit a
model to the dependent variable, and each node is a separate independent
variable. These are connected to another layer of neurons called the hidden
layer or hidden nodes, which modifies the data while attempting to solve the
fitting equation. The connection between the ith and yth
neuron (Figure) is characterised
by the weight coefficient (W) and a threshold coefficient (T). The weight
coefficient reflects the degree of importance of the given connection in the
neural network 73.
The nodes in the hidden layer connect to the output layer.
The output layer represents the target or dependent variable(s). It is common
for networks to have only one target variable, or output node, but there can be
Generally, each node in the input layer connects to each
node in the hidden layer and each node in the hidden layer connects to each
node in the output layer. The artificial intelligence literature views this
structure as analogous to biological neurons. The arrows leading to a node are
like the axons leading to a neuron. Like the axons, they carry a signal to the
neuron or node. The arrows leading away from a node are like the dendrites of a
neuron, and they carry a signal away from a neuron or node. The neurons of a
brain have far more complex interactions than those displayed in the diagram,
but the developers of neural networks view them as abstracting the most
relevant features of neurons in the human brain 73.
Neural networks “learn” by adjusting the strength of the
signal coming from nodes in the previous layer connecting to it. As the neural
network better learns how to predict the target value from the input pattern,
each of the connections between the input neurons and the hidden or
intermediate neurons and between the intermediate neurons and the output
neurons increases or decreases in strength 73.
A function called a threshold or activation function
modifies the signal coming into the hidden layer nodes. In the early days of
neural networks, this function produced a value of 1 or 0 (Equation 2.19),
depending on whether the signal from the prior layer exceeded a threshold
value. Thus, the node or neuron would only fire if the signal exceeded the
threshold, a process thought to be similar to that of a neuron (Equation 2.18) 73.
Hence, the activation functions currently used are typically
sigmoid in shape and can take on any value between 0 and 1, as stated above, or
between –1 and 1, depending on the particular function chosen. Sigmoid
functions are often used in artificial neural networks to introduce
nonlinearity in the model. A neural network element computes a linear
combination of its input signals, and applies a sigmoid function to the result.
A reason for its popularity in neural networks is because the sigmoid function
satisfies a property between the derivative and itself such that it is
computationally easy to perform. Derivatives of the sigmoid function are usually
employed in learning algorithms. Equation 2.20 determines the mathematical
expression of the sigmoid function.
The modified signal is then output to the output layer
nodes, which also apply activation functions. Thus, the information about the
pattern being learned is encoded in the signals carried to and from the nodes.
These signals map a relationship between the input nodes (the data) and the
output nodes (the dependent variable(s)).
The Multi-layered Feedforward (MLF) neural network operates
in two modes: training and prediction mode. For the training of the MLF neural
network and for the prediction using the MLF neural network we need two data
sets, the training set and the set that we want to predict (test set).
The training mode begins with arbitrary values of the
weights – they might be random numbers – and proceeds iteratively. Each
iteration of the complete training set is called an epoch. In each epoch the
network adjusts the weights in the direction that reduces the error (see
back-propagation algorithm). As the iterative process of incremental adjustment
continues, the weights gradually converge to the locally optimal set of values.
Many epochs are usually required before training is completed.
For a given training set, back-propagation leaming may
proceed in one of two basic ways: pattern mode and batch mode. In the pattern
mode of backpropagation learning, weight updating is performed after the
presentation of each training pattern. In the batch mode of back-propagation
learning, weight updating is performed after the presentation of all the training
examples (i.e. after the whole epoch). From an ‘on-line’ point of view, the
pattern mode is preferred over the batch mode, because it requires less local
storage for each synaptic connection. Moreover, given that the patterns are
presented to the network in a random manner, the use of pattern-by-pattern updating
of weights makes the search in weight space stochastic, which makes it less
likely for the back-propagation algorithm to be trapped in a local minimum. On
the other hand, the use of batch mode of training provides a more accurate
estimate of the gradient vector. Pattern mode is necessary to use for example
in on-line process control, because there are not all of training patterns
available in the given time. In the final analysis the relative effectiveness
of the two training modes depends on the solved problem 75,76.
In prediction mode, information flows forward through the
network, from inputs to outputs. The network processes one example at a time,
producing an estimate of the output value(s) based on the input values. The
resulting error is used as an estimate of the quality of prediction of the
In back-propagation learning, the usual start is with a
training set and the usage of the back-propagation algorithm to compute the
synaptic weights of the network with the neural network so designed aiming at
generalising. A network is said to generalise well when the input-output
relationship computed by network is correct (or nearly correct) for
input/output patterns never used in training the network. Generalisation is not
a mystical property of neural networks, but it can be compared to the effect of
a good non-linear interpolation of the input data S. When the learning
process is repeated too many iterations (i.e. the neural network is over-trained
or over-fitted, between overtraining and overfitting is no difference), the
network may memorise the training data and therefore be less able to generalise
between similar input-output patterns. The network gives nearly perfect results
for examples from the training set, but fails for examples from the test set.
Overfitting can be compared to improper choose of the degree of polynom in the
polynomial regression. Severe overfitting can occur with noisy data, even when
there are many more training cases than weights.
The basic condition for good generalisation is sufficiently
large set of the training cases. This training set must be in the same time representative
subset of the set of all cases that you want to generalise to. The importance
of this condition is related to the fact that there are two different types of
generalisation: interpolation and extrapolation. Interpolation applies to cases
that are more or less surrounded by nearby training cases; everything else is
extrapolation. In particular, cases that are outside the range of the training
data require extrapolation. Interpolation can often be done reliably, but
extrapolation is notoriously unreliable. Hence it is important to have
sufficient training data to avoid the need for extrapolation. Methods for
selecting good training sets arise from experimental design 9.
Data splitting for ANN development is essentially a sampling
roblem where, given a database D comprising N data, the goal is to sample the
data into disjoint subsets T, test, and val of size NT, Ntest and Nval,
for training, testing and validating, respectively. Within ANN literature, this
task has been performed using many different approaches, each with their
advantages and disadvantages. Simple Random Sampling (SRS) is the most common
method for data splitting in ANN development, where data are selected with uniform
probability, which is determined as
and similarly, for x ?
Stest and x ?
Sval. Simple random sampling is easy to perform, and can be
efficiently implemented in just a single pass over the data using algorithms
such as Knuth’s algorithm (Knuth,
1997). However, the problem with this approach is that there is a chance
that the splitting of data suffers from variance, or bias, especially when the
data are non-uniformly distributed (Tourassi, Frederick, Markey, & Floyd Jr., 2001).
Artificial Intelligence Applications in
buildings: Why needed?
Driven by the pressure of cutting down the building energy
consumption, the management of these special purpose buildings seek to several
potential measures to induce energy savings from all the aspects of building
design 8. This is by
no means an easy task. Firstly, designing and implementing an energy saving
intervention measure is complex in nature 10. Secondly, most of the special-purpose historical
buildings impose restrictions forbidding any retrofit solutions to be
implemented that may alter the original appearance and character of the
building 11. Thirdly,
the strategy of using the building Air Handling Unit (AHU) needs careful
planning, satisfying a balanced optimisation for both ensuring proper
microclimatic controls as well as energy savings 12. Hence, it is imperative for the building
management to monitor, predict and analyse the indoor environment and energy
use to target adequate future energy saving and optimisation programs.
It is known that the first step for optimising energy use in
buildings is to have a mean for adequate energy usage prediction 1, not only for the
building owners but also for urban planners and energy suppliers. With the
potential of buildings to contribute towards the reduction in CO2
emissions well recognised 2,
urban planners seek to prediction of building energy systems to assess the
impact of energy conservation measures
13. It is also known that the building energy and indoor environmental
predict model forms the core of a building’s energy control and operation
strategy design to induce energy savings including peak demand shaving 14,15. However, due to cost
constraints, building energy systems are typically not well measured or
monitored. Sensors are only installed when they are necessary for certain
control actions. Sub-metering for building’s energy sub-systems are also not
commonly available in a building 15. These problems lead to a lot of vital information not
available to better understand the existing building system. A number of data
model analyses, developed in recent years, cater to the need of obtaining
building energy prediction and optimisation strategies while tackling the
associated problems of system uncertainties and data availabilities. Building
thermal and energy performance modelling is very complicated. It requires
substantial and quality data input. The gap between design predicted and actual
performance are common and mainly due to discrepancy of the two set of data 16,17. For old buildings, a
great amount of data is not available, and the study information is usually
based on the best assumption, which further enlarges the gap 16,17.
Some of the modelling approaches followed White-Box
Modelling, involving detailed physics-based dynamic equations to model the
building components 18–20.
A number of mature white box software tools, such as EnergyPlus, ESP-r, IES,
TAS, and TRNSYS, also exists and they simplify the manual modelling process
using this technique 21.
However, even though the tools are effective and accurate, these approaches
bear the drawback of requiring detailed information and parameters of the
buildings, energy systems, and outside weather conditions which are difficult
to obtain or even unavailable 15.
Also, creating these models demand a lot of calculation time investment and
expertise 10. Some
other approaches follow the Grey-Box Modelling strategy, such as Resistance and
Capacitance model, or lumped capacitance model, representing the building
elements in an analogue circuit 22,23. These approaches reduce the requisite amount of training
dataset and calculation time. Model coefficients are identified based on
operational data using statistics and parameter identification 24–26. However, the
parameter computation process is often computationally demanding and time
consuming and developing the structure of the grey model requires expert
This is where Black-Box Models, or purely data-driven models
are beneficial as they are easy to build and computationally efficient 15,27–29, especially when a
large amount of historical data is available to train the models. Multiple
linear regression and self-regression methods were combined to predict building
monthly energy consumption 30.
Fuzzy inferences system is also extensively used 31,32. Autoregressive with exogenous (ARX)
model was developed to predict building load in 33. An optimal trade-off between comfort and
energy using a meta-model based on regression techniques was developed in 34. Another simple and
easy to implement building energy tool is the Degree Day model 35. However, linear models
are obtained around a specific working condition hence cannot guarantee a
satisfactory approximation performance under varying working environments 36. Artificial
Neural Networks (ANN) have also been extensively used in the past ten years for
their outstanding approximation ability of non-linear mapping along with online
learning. The application of ANN models in building modelling sector has mostly
been towards prediction and optimisation of building energy consumption 25,37,38, cooling loads 35,39–41, temperature 10,36,42 and system
System identification, which is the process for developing
or improving a mathematical representation of a physical system using data
collection is widely used in engineering problems, but with limited use in
building system modelling 46,47.
Owing to the inherently different building type and function of buildings like
art galleries and museums than the ones already studied, it would be
interesting to use this successful SID approach to obtain not only energy
prediction but also a prediction of future indoor conditions based on the study
of historical patterns.
2.8.Conclusion: What the
existing literature tells us?