CSC 1147 - Foundations of AI - 2025-26

Section 1: Multiple choice

Answer all 20 multiple-choice questions. (2 marks each.) Total 40 marks. No negative marking.
Only the filled-in boxes will be looked at. No additional commentary or analysis will be read.
Remember to hand up the multiple-choice answer sheet!

Imagine the following in Best-first search. The heuristic evaluation function tells us a state is 4 steps away from the goal. We examine the children of this state and the heuristic evaluation function tells us all of them are 5 steps away from the goal. How can this happen?
1. The heuristic evaluation function is inaccurate.
2. This always happens.
3. This cannot happen.
4. We are in an infinite loop.
5. We are going the wrong way.
6. The goal has been wrongly defined.
The set of inequalities that define a perceptron to implement NOT are:
1. t < w ≤ 0
2. 0 < w ≤ t
3. 0 ≤ t < w
4. t ≤ w < 0
5. w < t ≤ 0
6. w ≤ t < 0
7. 0 < t ≤ w
8. 0 ≤ w < t
Consider the Travelling salesman problem with n cities. We must start in city A and then visit all the other cities. We do not consider paths that visit the same city twice. The number of possible paths is:
1. 2^(n-1)
2. (n-1)!
3. n!
4. 2ⁿ
5. n²
6. nⁿ
We have two exemplars:
Temperature 22 degrees, pressure 1000 mb. Rain.
Temperature 15 degrees, pressure 990 mb. No rain.
Then the question is:
Temperature 17 degrees, pressure 1010 mb. Will there be rain?
We predict by Hamming distance to the nearest exemplar, (a) with pressure in millibars, (b) with pressure in bars. What does it predict?
1. (a) No rain, (b) No rain
2. (a) Rain, (b) No rain
3. (a) Rain, (b) Rain
4. (a) No rain, (b) Rain
In state-space search, we typically need a function children(x) which can generate the child states of state x. We typically use this function to:
1. Generate a graph of the child states of all states.
2. Generate a graph of the child states of a tiny percentage of the states.
3. Generate a graph of the child states of most of the states.
4. Generate a list of the child states of a tiny percentage of the states.
5. Generate a list of the child states of all states.
6. Generate a list of the child states of most of the states.
In neural networks, where w_ij is the weight connecting input node i with hidden node j, then input I_i is irrelevant to the workings of the network if:
1. It is never irrelevant.
2. w_ij = 0 for all i and j.
3. It is always irrelevant.
4. w_ij = 0 for all j.
5. w_ij = 0 for no j.
6. w_ij = 0 for at least one j.
In a multi-layer neural network, hidden node j can learn to ignore input I_i by:
1. It cannot learn to ignore input I_i.
2. Moving w_ii towards 1.
3. Moving w_jj towards 1.
4. Moving w_jj towards 0.
5. Moving w_ii towards 0.
6. Moving w_ij towards 0.
7. Moving w_ij towards 1.
Consider the function y = f(x) = ς(-x) where ς is the sigmoid function.
f has its (a) highest slope value, and (b) steepest slope (highest absolute value) at:
1. (a) y = 1/2, (b) y = 0 or 1
2. (a) y = 0 or 1, (b) y = 1/2
3. (a) y = 1/2, (b) y = 1/2
4. (a) y = 0 or 1, (b) y = 0 or 1
We are learning to represent a function Q(x). The input x is a 10 dimensional vector. Each dimension is an integer from 1 to 100. One approach to representing this function would be to have a "lookup table" or array of entries, where we have one entry Q(x) for each input x. This array would be initially blank and we would fill it in by learning. The size of the array is:
1. 100
2. 110
3. 10
4. 10¹⁰⁰
5. 100¹⁰
6. 1010
7. 1000
One of these is not a property of neural networks:
1. Given a network architecture, an Input leads to an Output after a fixed number of calculations.
2. It can be used to represent a function from real-valued multi-dimensional input to real-valued multi-dimensional output.
3. Input is compared to previous Input, and "nearby" Input should produce "nearby" Output.
4. It can be used to represent a function from real-valued single-dimensional input to real-valued single-dimensional output.
5. It can be used to represent any normal linear, continuous, non-chaotic function.
6. It can be used to represent any normal non-linear, continuous, non-chaotic function.
7. It can be used to represent a function from integer-valued multi-dimensional input to integer-valued multi-dimensional output.
8. As learning progresses, the size of the network remains fixed.
Why would we in practice not learn the function y = sin(x) + cos(x) from exemplars?
1. Divide by zero error.
2. It would take too long.
3. Learning it from exemplars is impossible for some values of x.
4. Learning it from exemplars is impossible.
5. It is a chaotic function.
6. There is nothing to learn. The problem is solved.
7. We would learn it from exemplars.
In traversing a search tree, nodes that are placed on the tree to the left are searched before nodes placed to the right. Does the ordering of the nodes introduce a bias?
1. Best-first search will eliminate all bias.
2. Nodes on the left are not searched first.
3. Yes. But a heuristic can be used to eliminate all bias.
4. Yes. But it is rarely a problem.
5. No.
6. Yes. But a heuristic can be used to produce a bias that is better than random.
7. Yes. But it is alright because all nodes will be searched eventually.
8. Yes. But it is never a problem.
Which is the correct ordering of these search methods, from the least work for the human programmer to the most work for the human programmer?
1. Domain-specific heuristic, Random search, Metaheuristic
2. Domain-specific heuristic, Metaheuristic, Random search
3. Random search, Metaheuristic, Domain-specific heuristic
4. Metaheuristic, Random search, Domain-specific heuristic
5. Random search, Domain-specific heuristic, Metaheuristic
6. Metaheuristic, Domain-specific heuristic, Random search
The input to the 7th output node of a neural network is:
x₇ = w₁₇ y₁ + w₂₇ y₂ + w₃₇ y₃
Then ∂x₇ / ∂w₂₇ is:
1. y_j
2. w_jk
3. y_j
4. w₂₇
5. y₂
6. w_jk
We are learning from exemplars. We experience 3 exemplars:
When input is (x1) output should be (a1).
When input is (x2) output should be (a1).
When input is (x1) output should be (a2).
How can this happen?
1. There are missing inputs, or the world is probabilistic.
2. The 2nd exemplar can happen but not the 3rd.
3. The machine is badly programmed.
4. This cannot happen.
5. This always happens.
6. The 3rd exemplar can happen but not the 2nd.
We are learning to represent a function Q(x). The input x is a 100 dimensional vector. Each dimension is an integer from 1 to 10. Instead of a lookup table to representing the function Q(x), we use a neural network. We have 100 input units, n hidden units, and 1 output unit. How many real-numbered parameters (weights and thresholds) do we have to set aside memory for?
1. 101 n
2. 101 + n
3. 100 + n
4. 101 n + 1
5. 102 n
6. 100 n
7. 102 n + 1
8. 100 n + 1
In a neural network, the network generates a (vector) output y = (y₁,..,y_n). We then tell it the "correct" output O = (O₁,..,O_n). Consider the following as a measure of "error" E:

An example showing that this is this a bad measure of the error is:
1. y = (5,0,5), O = (-5,0,5)
2. y = (5,0,-5), O = (-5,0,-5)
3. y = (5,0,5), O = (-5,0,-5)
4. y = (-5,0,5), O = (-5,0,5)
5. y = (5,0,-5), O = (-5,0,5)
6. y = (5,0,5), O = (5,0,5)
A problem has a branching factor of x. To examine all possible states between the current state and n steps into the future requires examining how many states?
1. n
2. x
3. xⁿ
4. nx
5. x + x² + x³ + ... + xⁿ
6. x+xⁿ
7. x^x
8. n^x
Given a line in 2-dimensional space:
y = a x + b
Define a perceptron that receives x and y as input and fires if and only if the point is on this side of the line:
y ≤ a x + b
1. weights: -a, 1, threshold: -b
2. weights: a, -1, threshold: b
3. weights: -a, 1, threshold: b
4. weights: a, -1, threshold: -b
A perceptron has 2 inputs, taking values 0 or 1, and 1 output. Its 2 weights are 0 and 1. The output threshold is -0.5. This perceptron implements:
1. Constant output 0.
2. Constant output 1.
3. XOR
4. None of the above.
5. OR
6. AND
7. NOT

Section 2: Long questions

Answer 3 long questions out of 5. (20 marks each.) Total 60 marks.

Each question gives maximum number of sentences for the answer. Do not exceed this.
You might like to write your answers as rough work first to figure out what sentences to use. Make sure any rough work is crossed out.
Put your sentences as bullet points not a paragraph.
Do not answer more than 3 long questions.
If you want to scrap one and give me another one, you must cross out the one you are scrapping. I will mark the first 3 (not crossed out) that I see. I will not mark more than 3.

Question 1

[Total marks: 20]

1(a) A typical machine learning algorithm converges to a solution in a fairly short time. Usually within minutes or hours. Discuss what convergence means, and whether human learning converges in a similar way.
Maximum length: 2 sentences
[5 marks]

1(b) I wrote a chatbot that (arguably) passed the Turing Test on the Internet in 1989. Have I proved the Turing Test is nonsense?
Maximum length: 2 sentences
[5 marks]

1(c) "When trying to make intelligent machines, AI should carefully try to copy how nature made intelligent creatures." Suggest why this might be a good idea.
Maximum length: 2 sentences
[5 marks]

1(d) "When trying to make intelligent machines, AI should carefully try to copy how nature made intelligent creatures." Suggest why this might be a bad idea.
Maximum length: 2 sentences
[5 marks]

Question 2
[Total marks: 20]

2(a) Is this function y = f(x) easy or hard to maximise from exemplars?
f(a) = 1 for some value a.
f(x) = 0 for all other x.
Maximum length: 3 sentences
[5 marks]

2(b) Consider trying to maximise a function of x, where x is an integer from 1 to 10²⁰. There are seven (and only seven) values of x where f(x) = 10. For all other x, f(x) = 1. The machine is not told this but has to find out by trial and error.
What would maximising this function by trial and error look like?
Maximum length: 3 sentences
[5 marks]

2(c) Consider a heuristic search as follows:
Initialise with 100 random solutions. Repeat for many iterations: { Test the 100 solutions. Pick the best n. Make small changes in them to generate 100 new solutions. }
If n=99 what will this look like?
Maximum length: 2 sentences
[5 marks]

2(d) Consider a heuristic search as follows:
Initialise with 100 random solutions. Repeat for many iterations: { Test the 100 solutions. Pick the best n. Make small changes in them to generate 100 new solutions. }
If n=1 what will this look like?
Maximum length: 2 sentences
[5 marks]

Question 3
[Total marks: 20]

3(a) Imagine a search tree where every node has an infinite number of children. Can search still proceed?
Maximum length: 3 sentences
[5 marks]

3(b) Imagine the following situation in Best-first search. The heuristic evaluation function tells us a state is 4 steps away from the goal. We examine the children of this state and the heuristic evaluation function tells us all of them are 5 steps away from the goal. Can this happen?
Maximum length: 2 sentences
[5 marks]

3(c) In the Travelling Salesman Problem with n cities, we say there is an upper bound of (n-1)! on the number of paths to be searched. What assumption did we just make?
Maximum length: 2 sentences
[5 marks]

3(d) The idea of "informed search" is that we choose the next state that is most likely to lead to the goal. But if we can do that, then the problem is solved! Just keep choosing the best next state after that and you get to the goal ASAP. No search needed! No AI needed. Explain.
Maximum length: 2 sentences
[5 marks]

Question 4
[Total marks: 20]

4(a) Let XOR in n dimensions be defined as follows. If all inputs are 0, output is 0. If precisely one input is 1, and all others are 0, then output is 1. If more than one input is 1, output is 0.
Show that a perceptron cannot implement this definition of XOR in n dimensions.
Maximum length: 5 sentences
[5 marks]

4(b) The following is the perceptron learning rule. We have multi-dimensional input, where each component of the input I_i is binary 0 or 1. It produces output y = 0 or 1. We show it the "correct" output O = 0 or 1. We modify the weights and the threshold by the rules:
w_i := w_i + C (O-y) I_i

t := t - C (O-y)

where C is some positive constant.
Question: What happens if y is too small?
Maximum length: 1 sentence

[5 marks]

4(c) A multi-layer neural network has 1 input unit, n hidden units, and 1 output unit. The weights on the input layer are all the same: w_ij = W₁. The thresholds of the hidden units are all the same: t_j = T. The weights on the output layer are all the same: w_jk = W₂.
Show that this network is equivalent to a network with 1 input unit, 1 hidden unit, and 1 output unit. What are the weights and thresholds of the latter network?
Maximum length: 5 sentences
[5 marks]

4(d) Derive the set of inequalities that define a Perceptron for OR (2 inputs).
Maximum length: 4 sentences
[5 marks]

Question 5
[Total marks: 20]

5(a) In Reinforcement Learning, is a punishment of -1000 a big punishment?
Maximum length: 2 sentences
[5 marks]

5(b) In a Markov Decision Process (MDP), what is wrong with the following: If we take action a in state x, then P_xa(y)=0.6 for some state y, and P_xa(w)=0 for all other states w.
Maximum length: 1 sentence
[5 marks]

5(c) In a Markov Decision Process (MDP), explain why E(r) exists and E(y) does not exist.
Maximum length: 4 sentences
[5 marks]

5(d) In state x, if we take action a, we sometimes get reward 0, and sometimes get reward 5. If we take action b, we always get reward 3. Under what circumstances is the expected reward for taking action a the same as the expected reward for taking action b?
Maximum length: 3 sentences
[5 marks]

CSC 1147

Answer sheet

Black out the square that matches the correct answer.

Exam No:_____________________

Seat No:_____________________

Question	1.	2.	3.	4.	5.	6.	7.	8.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20