This section describes in more details about ANN and the

background.

Hecht-Nielsen proposed the formal definition of an

Artificial Neural Network in 70:

“An Artificial

Neural Network is a parallel, distributed information processing structure

consisting of processing units (which can possess a local memory and can carry

out localized information processing operations) interconnected via

unidirectional signal channels called connections. Each processing unit has a

single output connection that branches (“fans out”) into as many

collateral connections as desired; each carries the same signal – the

processing unit output signal. The processing unit output signal can be of any

mathematical type desired. The information processing that goes on within each

processing unit can be defined arbitrarily with the restriction that it must be

completely local; that is, it must depend only on the current values of the

input signals arriving at the processing element via impinging connections and

on values stored in the processing unit’s local memory.”

Department of Aeronautics (ITDA, Sao Paulo) mentions in 71 that there can be various ANN

models, but each model can be precisely specified by the following eight major aspects,

also as stated in 72:

o

A set of processing units

o

A state of activation for each unit

o

An output function for each unit

o

A pattern of connectivity among units or

topology of the network

o

A propagation rule, or combining function, to

propagate the activities of the units through the network

o

An activation rule to update the activities of

each unit by using the current activation value and the inputs received from

other units

o

An external environment that provides

information to the network and/or interacts with it.

o

A learning rule to modify the pattern of

connectivity by using information provided by the external environment.

Louis Francis in 73 states that Neural Networks

originated in the artificial intelligence discipline, where they’re often

portrayed as a brain in a computer. They are designed to incorporate key

features of neurons in the brain and to process data in a manner analogous to

the human brain. Much of the terminology used to describe and explain neural

networks is borrowed from biology. Data mining tools can be trained to identify

complex relationships in data. Typically, the data sets are large, with the

number of records at least in the tens of thousands and the number of

independent variables often in the hundreds. Their advantage over classical

statistical models used to analyse data, such as regression and ANOVA, is that

they can fit data where the relationship between independent and dependent variables

is nonlinear and where the specific form of the nonlinear relationship is

unknown 73.

Artificial neural networks share the same advantages as many

other data mining tools, but also offer advantages of their own 73. For instance, decision tree,

a method of splitting data into homogenous clusters with similar expected

values for the dependent variable, are often less effective when the predictor

variables are continuous than when they are categorical 73. Neural networks work well

with both categorical and continuous variables.

There are several data mining techniques, such as regression

splines, were developed by statisticians 73. Louis further states in 73 that the data mining

techniques are computationally intensive generalizations of classical linear

models. Classical linear models assume that the functional relationship between

the independent variables and the dependent variable is linear. Classical modelling

also allows linear relationships that result from a transformation of dependent

or independent variables, so some nonlinear relationships can be approximated.

Neural networks and other data mining techniques don’t require that the relationships

between predictor and dependent variables be linear (whether the variables are

transformed or not).

The various data mining tools differ in their approaches to

approximating nonlinear functions and complex data structures 73. Neural networks use a series

of neurons in what is known as the hidden layer that apply nonlinear activation

functions to approximate complex functions in the data.

Despite their advantages, Louis 73 states that many

statisticians and actuaries are reluctant to embrace neural networks. One

reason is that they are considered a “black box”: Data goes in and a prediction

comes out, but the nature of the relationship between independent and dependent

variables is usually not revealed. Because of the complexity of the functions

used in the neural network approximations, neural network software typically

does not supply the user with information about the nature of the relationship

between predictor and target variables. The output of a neural network is a

predicted value and some goodness-of-fit statistics. However, the functional

form of the relationship between independent and dependent variables is not

made explicit.

In addition, the strength of the relationship between

dependent and independent variables, i.e., the importance of each variable, is

also often not revealed 73. Classical models as well as

other popular data mining techniques, such as decision trees, supply the user

with a functional description or map of the relationships.

There exist two main types of training process: supervised

and unsupervised training 74. In supervised training (e.g.

multi-layer feed-forward (MLF) neural network), the neural network knows the

desired output, and adjusting of the weight coefficients is done in such a way

that the calculated and desired outputs are as close as possible to each other 74. Unsupervised training (e.g.

Kohonen network 4) means, that the desired output is not known, the system is

provided with a group of facts (patterns) and then left to itself, to train and

settle down (or not) to a stable state in some number of iterations 74.

The Neural Network type most commonly used is the

Feedforward Network or the Multilayer Perceptron. This is also called the

Backpropagation Neural Network as it uses the Backpropagation Algorithm 73,74.

A neural network model contains three types of layers – an

input layer, hidden layer(s), and an output layer. A feedforward neural network

is a network where the signal is passed from an input layer of neurons through

a hidden layer to an output layer of neurons 73.

The input layer is the first layer of a Neural Network Model

and contains a list of influencers, or input parameters. These input

parameters, occupying the input nodes, represent the actual data used to fit a

model to the dependent variable, and each node is a separate independent

variable. These are connected to another layer of neurons called the hidden

layer or hidden nodes, which modifies the data while attempting to solve the

fitting equation. The connection between the ith and yth

neuron (Figure) is characterised

by the weight coefficient (W) and a threshold coefficient (T). The weight

coefficient reflects the degree of importance of the given connection in the

neural network 73.

The nodes in the hidden layer connect to the output layer.

The output layer represents the target or dependent variable(s). It is common

for networks to have only one target variable, or output node, but there can be

more 73.

Generally, each node in the input layer connects to each

node in the hidden layer and each node in the hidden layer connects to each

node in the output layer. The artificial intelligence literature views this

structure as analogous to biological neurons. The arrows leading to a node are

like the axons leading to a neuron. Like the axons, they carry a signal to the

neuron or node. The arrows leading away from a node are like the dendrites of a

neuron, and they carry a signal away from a neuron or node. The neurons of a

brain have far more complex interactions than those displayed in the diagram,

but the developers of neural networks view them as abstracting the most

relevant features of neurons in the human brain 73.

Neural networks “learn” by adjusting the strength of the

signal coming from nodes in the previous layer connecting to it. As the neural

network better learns how to predict the target value from the input pattern,

each of the connections between the input neurons and the hidden or

intermediate neurons and between the intermediate neurons and the output

neurons increases or decreases in strength 73.

A function called a threshold or activation function

modifies the signal coming into the hidden layer nodes. In the early days of

neural networks, this function produced a value of 1 or 0 (Equation 2.19),

depending on whether the signal from the prior layer exceeded a threshold

value. Thus, the node or neuron would only fire if the signal exceeded the

threshold, a process thought to be similar to that of a neuron (Equation 2.18) 73.

(2.18)

(2.19)

Hence, the activation functions currently used are typically

sigmoid in shape and can take on any value between 0 and 1, as stated above, or

between –1 and 1, depending on the particular function chosen. Sigmoid

functions are often used in artificial neural networks to introduce

nonlinearity in the model. A neural network element computes a linear

combination of its input signals, and applies a sigmoid function to the result.

A reason for its popularity in neural networks is because the sigmoid function

satisfies a property between the derivative and itself such that it is

computationally easy to perform. Derivatives of the sigmoid function are usually

employed in learning algorithms. Equation 2.20 determines the mathematical

expression of the sigmoid function.

(2.20)

The modified signal is then output to the output layer

nodes, which also apply activation functions. Thus, the information about the

pattern being learned is encoded in the signals carried to and from the nodes.

These signals map a relationship between the input nodes (the data) and the

output nodes (the dependent variable(s)).

The Multi-layered Feedforward (MLF) neural network operates

in two modes: training and prediction mode. For the training of the MLF neural

network and for the prediction using the MLF neural network we need two data

sets, the training set and the set that we want to predict (test set).

The training mode begins with arbitrary values of the

weights – they might be random numbers – and proceeds iteratively. Each

iteration of the complete training set is called an epoch. In each epoch the

network adjusts the weights in the direction that reduces the error (see

back-propagation algorithm). As the iterative process of incremental adjustment

continues, the weights gradually converge to the locally optimal set of values.

Many epochs are usually required before training is completed.

For a given training set, back-propagation leaming may

proceed in one of two basic ways: pattern mode and batch mode. In the pattern

mode of backpropagation learning, weight updating is performed after the

presentation of each training pattern. In the batch mode of back-propagation

learning, weight updating is performed after the presentation of all the training

examples (i.e. after the whole epoch). From an ‘on-line’ point of view, the

pattern mode is preferred over the batch mode, because it requires less local

storage for each synaptic connection. Moreover, given that the patterns are

presented to the network in a random manner, the use of pattern-by-pattern updating

of weights makes the search in weight space stochastic, which makes it less

likely for the back-propagation algorithm to be trapped in a local minimum. On

the other hand, the use of batch mode of training provides a more accurate

estimate of the gradient vector. Pattern mode is necessary to use for example

in on-line process control, because there are not all of training patterns

available in the given time. In the final analysis the relative effectiveness

of the two training modes depends on the solved problem 75,76.

In prediction mode, information flows forward through the

network, from inputs to outputs. The network processes one example at a time,

producing an estimate of the output value(s) based on the input values. The

resulting error is used as an estimate of the quality of prediction of the

trained network.

In back-propagation learning, the usual start is with a

training set and the usage of the back-propagation algorithm to compute the

synaptic weights of the network with the neural network so designed aiming at

generalising. A network is said to generalise well when the input-output

relationship computed by network is correct (or nearly correct) for

input/output patterns never used in training the network. Generalisation is not

a mystical property of neural networks, but it can be compared to the effect of

a good non-linear interpolation of the input data S. When the learning

process is repeated too many iterations (i.e. the neural network is over-trained

or over-fitted, between overtraining and overfitting is no difference), the

network may memorise the training data and therefore be less able to generalise

between similar input-output patterns. The network gives nearly perfect results

for examples from the training set, but fails for examples from the test set.

Overfitting can be compared to improper choose of the degree of polynom in the

polynomial regression. Severe overfitting can occur with noisy data, even when

there are many more training cases than weights.

The basic condition for good generalisation is sufficiently

large set of the training cases. This training set must be in the same time representative

subset of the set of all cases that you want to generalise to. The importance

of this condition is related to the fact that there are two different types of

generalisation: interpolation and extrapolation. Interpolation applies to cases

that are more or less surrounded by nearby training cases; everything else is

extrapolation. In particular, cases that are outside the range of the training

data require extrapolation. Interpolation can often be done reliably, but

extrapolation is notoriously unreliable. Hence it is important to have

sufficient training data to avoid the need for extrapolation. Methods for

selecting good training sets arise from experimental design 9.

Data splitting for ANN development is essentially a sampling

roblem where, given a database D comprising N data, the goal is to sample the

data into disjoint subsets T, test, and val of size NT, Ntest and Nval,

for training, testing and validating, respectively. Within ANN literature, this

task has been performed using many different approaches, each with their

advantages and disadvantages. Simple Random Sampling (SRS) is the most common

method for data splitting in ANN development, where data are selected with uniform

probability, which is determined as

(2.21)

and similarly, for x ?

Stest and x ?

Sval. Simple random sampling is easy to perform, and can be

efficiently implemented in just a single pass over the data using algorithms

such as Knuth’s algorithm (Knuth,

1997). However, the problem with this approach is that there is a chance

that the splitting of data suffers from variance, or bias, especially when the

data are non-uniformly distributed (Tourassi, Frederick, Markey, & Floyd Jr., 2001).

2.7.1.

Artificial Intelligence Applications in

buildings: Why needed?

Driven by the pressure of cutting down the building energy

consumption, the management of these special purpose buildings seek to several

potential measures to induce energy savings from all the aspects of building

design 8. This is by

no means an easy task. Firstly, designing and implementing an energy saving

intervention measure is complex in nature 10. Secondly, most of the special-purpose historical

buildings impose restrictions forbidding any retrofit solutions to be

implemented that may alter the original appearance and character of the

building 11. Thirdly,

the strategy of using the building Air Handling Unit (AHU) needs careful

planning, satisfying a balanced optimisation for both ensuring proper

microclimatic controls as well as energy savings 12. Hence, it is imperative for the building

management to monitor, predict and analyse the indoor environment and energy

use to target adequate future energy saving and optimisation programs.

It is known that the first step for optimising energy use in

buildings is to have a mean for adequate energy usage prediction 1, not only for the

building owners but also for urban planners and energy suppliers. With the

potential of buildings to contribute towards the reduction in CO2

emissions well recognised 2,

urban planners seek to prediction of building energy systems to assess the

impact of energy conservation measures

13. It is also known that the building energy and indoor environmental

predict model forms the core of a building’s energy control and operation

strategy design to induce energy savings including peak demand shaving 14,15. However, due to cost

constraints, building energy systems are typically not well measured or

monitored. Sensors are only installed when they are necessary for certain

control actions. Sub-metering for building’s energy sub-systems are also not

commonly available in a building 15. These problems lead to a lot of vital information not

available to better understand the existing building system. A number of data

model analyses, developed in recent years, cater to the need of obtaining

building energy prediction and optimisation strategies while tackling the

associated problems of system uncertainties and data availabilities. Building

thermal and energy performance modelling is very complicated. It requires

substantial and quality data input. The gap between design predicted and actual

performance are common and mainly due to discrepancy of the two set of data 16,17. For old buildings, a

great amount of data is not available, and the study information is usually

based on the best assumption, which further enlarges the gap 16,17.

Some of the modelling approaches followed White-Box

Modelling, involving detailed physics-based dynamic equations to model the

building components 18–20.

A number of mature white box software tools, such as EnergyPlus, ESP-r, IES,

TAS, and TRNSYS, also exists and they simplify the manual modelling process

using this technique 21.

However, even though the tools are effective and accurate, these approaches

bear the drawback of requiring detailed information and parameters of the

buildings, energy systems, and outside weather conditions which are difficult

to obtain or even unavailable 15.

Also, creating these models demand a lot of calculation time investment and

expertise 10. Some

other approaches follow the Grey-Box Modelling strategy, such as Resistance and

Capacitance model, or lumped capacitance model, representing the building

elements in an analogue circuit 22,23. These approaches reduce the requisite amount of training

dataset and calculation time. Model coefficients are identified based on

operational data using statistics and parameter identification 24–26. However, the

parameter computation process is often computationally demanding and time

consuming and developing the structure of the grey model requires expert

knowledge 10,15.

This is where Black-Box Models, or purely data-driven models

are beneficial as they are easy to build and computationally efficient 15,27–29, especially when a

large amount of historical data is available to train the models. Multiple

linear regression and self-regression methods were combined to predict building

monthly energy consumption 30.

Fuzzy inferences system is also extensively used 31,32. Autoregressive with exogenous (ARX)

model was developed to predict building load in 33. An optimal trade-off between comfort and

energy using a meta-model based on regression techniques was developed in 34. Another simple and

easy to implement building energy tool is the Degree Day model 35. However, linear models

are obtained around a specific working condition hence cannot guarantee a

satisfactory approximation performance under varying working environments 36. Artificial

Neural Networks (ANN) have also been extensively used in the past ten years for

their outstanding approximation ability of non-linear mapping along with online

learning. The application of ANN models in building modelling sector has mostly

been towards prediction and optimisation of building energy consumption 25,37,38, cooling loads 35,39–41, temperature 10,36,42 and system

identification 43–45.

System identification, which is the process for developing

or improving a mathematical representation of a physical system using data

collection is widely used in engineering problems, but with limited use in

building system modelling 46,47.

Owing to the inherently different building type and function of buildings like

art galleries and museums than the ones already studied, it would be

interesting to use this successful SID approach to obtain not only energy

prediction but also a prediction of future indoor conditions based on the study

of historical patterns.

2.8.Conclusion: What the

existing literature tells us?

References