美文网首页
2018 Forecasting exam

2018 Forecasting exam

作者: 钟俊涛 | 来源:发表于2019-05-16 23:00 被阅读0次

    A1. Explain how the moving average method uses n observations to smooth time series data. What would be the difference in using n = 3 compared to n = 20?

    The moving average method uses the last n observations in order to predict (observation t+1) or smooth

    Moving average calculation

    There are two differences in using n=3 and n=20: 1. the larger the n, the more smoothing the data is. That is to say, little fluctuation could we see in the data. we lose more futures in data points. 2. when the n= an odd number, we can choose centred moving average of order n, which is a weighted moving average method, and we can also choose not centred moving average. while n= a even number, we can simply use the famulation above to calculate the data value.

    A2. Describe how simulated annealing works. Explain how the temperature variable and greed works in your answer.

    The algorithm mimics the cooling of metallic solids from the liquid phase to increase the volume of crystals to make the metal “harder” and reduce the number of defects.The initial heat applied to the material forces its atoms to freely move in random directions (stochastic nature). As the cooling process occurs, the atom’s energy will slowly decrease resulting in a new formation.

    Process of simulated annealing Temperature variable and greed in SA

    温度先设置成高温,原子可以随机游走,斯托克斯自然。当算法开始时,对于遇到的更差的解决方案,接受度会较高,随着温度逐渐降低,接受更差的解决方案的几率也会降低,最终会不再跳出这个局部最优。这样的算法能够减少其在高温时停留的概率,增加到寻找全局最优解的可能性。整个过程,目标函数时寻求最大解,温度只是其中的一个参数,用来计算接受邻近方案概率的。

    When T is low, the probability of acceptance the next worse solution is low 

    A3. Explain what deep learning is and give examples of how it is being used.

    Deep learning

    packages: Tensorflow and Keras.

    Convolution Neural Network is a subfield of deep learning. CNN was firstly used to solve problems of computer vision and pattern recognition and it has subsequently been shown to be effective for NLP (Natural Language Processing) and have achieved excellent results in many NLP tasks.

    A4. List the conditional probabilities for the following Bayesian network:

    Probabilities for Bayesian network

    在知道B发生的情况下,E发生的概率

    知道 A 和 B 同时发生的情况下, C 发生的概率

    知道C发生的情况下, D 发生的概率

    B1. Your manager has created a function to model a business process and wants to use a genetic algorithm to identify the minimum solution.

    a. Describe how genetic algorithms work. Explain the crossover and

    mutation stages in your answer giving examples with binary

    encoding. 

    The general procedure of a Genetic Algorithm is as follows:

    1. Define an end condition (time or number of iterations).

    2. Generate a random population of chromosomes.

    3. Evaluate fitness of each chromosome in the population.

    4. Create a new population by repeating the following steps until a new population is complete:

    • Select two parent chromosomes from the population according to their fitness.

    •  Crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring stochastically. There are a number of techniques which handle the crossover stage, however the most common method is binary encoding (从母体A和母体B 中前后各取一段,组成后代:Select a random cut off point and form a new offspring by merging one side of the cut point of parent A to the other side of the cut point of parent B: eg A:10001|011 and B:01101|110 produces offspring :10001110 

    • Randomly mutate the offspring.The mutation stage consists of a small alteration to the new offspring: For example: 10001110 mutates to 10101110 •The probability of this occurring to each individual bit of the chromosome is set by the decision-maker or analyst. Generally the probability is fixed to less than 0.1 (<10%). The level of the mutation probability denotes the stochastic nature of the algorithm.

    • Place the offspring into the population.

    5. Evaluate fitness of each chromosome in the population.

    6. If the end condition is met, return the best solution(s) in the current

    population.

    b. Solve the function below with a genetic algorithm using the ga

    function in the GA package in R. Use a real-valued optimisation

    type and set the minimum input parameters as c(-10, -10) and the

    maximum input parameters as c(10, 10).

    Include the code used, a plot of the fitness value throughout the GA

    generations, the summary output of the function and an explanation

    of the summary output.

    CODE HERE

    # Define the business task function

    B1 <- function(x)

    {

      sum <- sum(x^4 - 16 *x^2 +5*x)

      return(-sum/2)

    }

    # load the GA package

    library("GA")

    # fit the B1 function into ga model and set parameters

    GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10))

    # look at the results

    summary(GA)

    plot(GA)

    GA results HERE: 

    Iterations             = 100 

    Fitness function value = 78.2173 

    Solution = 

                x1        x2

    [1,] -2.821962 -2.916912

    The summary of GA results show that afer 100 iteration, GA found a largest fitness function value with 78.2173 with the parameters : x1 (-2.82) and x2 ( -2.92). then, reverse the sign of the fitness value for GAs, we get the minimum solution : 78.21

    c. Repeat the ga optimisation with the following custom parameters:

    popSize = 100

    pcrossover = 0.9

    pmutation = 0.2

    maxiter = 500

    Include the code used and the summary output of the function.

    Describe what these custom parameters are used for and how they

    have affected the result in comparison to your previous answer.

    CODE:

    # fit the B1 function into ga model and modify parameters

    GA <- ga(type = "real-valued", fitness = B1, lower=c(-10, -10), upper=c(10, 10),popSize = 100,pcrossover = 0.9,pmutation = 0.2, maxiter = 500)

    # look at the results

    summary(GA)

    plot(GA)

    GA results: 

    Iterations             = 500 

    Fitness function value = 78.33233 

    Solution = 

                x1        x2

    [1,] -2.903774 -2.903451

    EXPLANATION :

    popSize= The population size.

    pcrossover = The probability of crossover between pairs of chromosomes. Typically this is a large. DEFAULT IS 0.8, HERE IS 0.9

    pmutation = The probability of mutation in a parent chromosome. Usually mutation occurs with a small probability, and by default is set to 0.1. HERE IS 0.2, it allows bigger mutation in population, and searched a better result

    maxiter = The maximum number of iterations to run before the GA search is stopped. HERE IS 500, more iteration than before, a better result

    Compared with previous result, this time we find a better result. The optimal solution increased 0.13 because this time we use a larger population size, a higher mutation rate and more iterations that increase the probability to find a better solution.

    B2. Iveco has approached your consultancy company asking you to help them forecast the number of 35S12 vans sold in the UK in the next year. They have provided you with the quarterly time series sales data

    from Q3 2008 to Q3 2017 (B2.csv) for the Iveco Daily 35S12 van.

    a. Using the read.csv, ts and plot functions in R, import the data,

    create a time series object then plot the time series object.

    From looking at this plot, what can you say about the trend and

    seasonality of the data? Include the plot in your answer.

    Code :

    #load the data set

    data <- read.csv("2018B2.csv")

    View(data)

    # creat a time series object

    iveco <- ts(data$Sales,start = c(2008,3),end = c(2017,3),frequency = 4)

    # see the plot of time series data

    plot (iveco)

    From this plot, we can see that the trend is up before 2011 and then drop down dramatically. We can hardly see any seasonality of the data, maybe the seasonality is quite small.

    b. Using the plot and stl functions in R, decompose the data with loess (additive) decomposition and explain what is shown in the plot.

    Explain what the bars to the right of the plot represent.

    # load "forecast" package

    library("forecast")

    #using loess decompose the data and set the seasonal window to periodic

    lo <- stl(iveco, s.window = "periodic")

    #take a look at the results

    lo$time.series

    plot(lo)

    iveco decomposition

    the bars indicate relative scale, large seasonal bar show that this variation is relatively small compared to data and trend.

    In this plot, the first line shows the time series of the Iveco sales data. The following lines show seasonal, trend and reminder decomposition because we use the additive method here, so the sum of the last three is equal to the first line. The trend decomposition  account for the largest part of the whole data and the seasonal decomposition is very small.

    c. Using the ets() function in the forecast package in R, predict future sales using exponential smoothing for the next year (4 observations only). Set alpha so that you give more weight to more recent observations. Include an image of the forecast in your answer.

    #using ets function to predict the sales of next year, set alpha a big value, more weight on recent value

    fit <- ets(iveco[1:37],model = "ZZZ", alpha = 0.9 )

    pre <- predict(fit, h=4)

    plot(pre)

    line(iveco[1:37])


    B3. HR has approached you to help them study your company’s

    employees. They have provided you with a dataset (B3.csv) with the

    following 6 columns about 14,999 employees:

    satisfaction_level: Satisfaction Level

    last_evaluation: Last evaluation

    number_project: Number of projects

    average_montly_hours: Average monthly hours

    time_spend_company: Time spent at the company

    Work_accidents: Number of accidents the employee

    has had at work

    a. Describe the differences between Principal Components Analysis

    (PCA) and Exploratory Factor Analysis (EFA).

    pca 是降低变量间线性相关性的方法, EFA是寻找导致变量发生的因素的方法

    b. Using read.csv and the corrgram function from the corrgram

    package, import the data and create a correlogram plot of the 6

    measurements of the employees. Discuss the suitability of the data

    for PCA and include an image of the plot in your answer.

    CODE:

    pdata <- read.csv("2018B3.csv")

    View(pdata)

    # load corrgram package

    library("corrgram")

    corrgram(pdata)

    The PCA method is suitable for reducing a large number of correlated variables. As we can see in the corrgram plot, the blue means variables are positive correlated, while red means negative.  A darker colour means these two variables are highly correlated. last_evaluation, number_project and average_montly_hours are highly positive correlated, while satisfiction_level and number_project are highly negative correlated. We could use PCA method to reduce those correlated variables.

    c. Using the plot and prcomp functions in R, plot a scree plot and

    describe how you can use this plot to identify the number of

    components to use in principal components analysis. Include the

    scree plot in your answer.

    code:

    # fit the data into pca

    hrpca <- prcomp(pdata, scale = TRUE)

    plot(hrpca, type= "line", main = "scree plot")

    A scree plot displays how much variation each principal component captures from the data. we can use the following rules

    • Kaisers rule states to use components with values over 1.

    •  use the "elbow rule"

    •  Proportion of variance plot: the selected PCs should be able to describe at least 80% of the variance.

    d. Using the prcomp function in R, use principal components analysis

    on the data. Include and describe the results of the analysis.

    Discuss the loadings and how appropriate it would be to use two

    components.

    code:

    hrpca

    summary(hrpca)

    Result:

    Importance of components:

                                        PC1    PC2    PC3    PC4    PC5

    Standard deviation     1.353 1.0534 1.0000 0.9362 0.7968

    Proportion of Variance 0.305 0.1849 0.1667 0.1461 0.1058

    Cumulative Proportion  0.305 0.4899 0.6566 0.8027 0.9085

    Rotation (n x k) = (6 x 6):

                                 PC1         PC2         PC3

    satisfaction_level    0.08693115 -0.82848859  0.08271569

    last_evaluation      -0.50728391 -0.36995575  0.01296449

    number_project       -0.57900111  0.11114716 -0.03199330

    average_montly_hours -0.54922118 -0.12501818 -0.00810438

    time_spend_company   -0.31310859  0.38036651  0.03235213

    Work_accidents       -0.01352139  0.06385507  0.99541656

                                 PC4          PC5          PC6

    satisfaction_level   -0.37912166  0.272273055 -0.285204994

    last_evaluation      -0.04769970 -0.714195147  0.305414036

    number_project        0.20810048  0.005747078 -0.779770228

    average_montly_hours  0.25387813  0.635654862  0.462763070

    time_spend_company   -0.86109751  0.107569645  0.056369470

    Work_accidents        0.06886704 -0.011459253 -0.003404899

    The result shows that:

    PC1 and PC2 together only can explain 48% of all variables.  The selected PCs should be able to describe at least 80% of the variance. So, they are not enough. 

    last_evaluation,number_project and average_montly_hours have more weights on PC1 and satisfaction_level  weight more on PC2

    相关文章

      网友评论

          本文标题:2018 Forecasting exam

          本文链接:https://www.haomeiwen.com/subject/abwiaqtx.html