CS188：MDP-solver to work in Pacman environment
This coursework exercise asks you to write code to create an MDP-solver to work in the Pacman environment that we used for the practical exercises.
Read all these instructions before starting.
This exercise will be assessed.
2 Getting started
You should download the file pacman-cw.zip from KEATS. This contains a familiar set of files that implement Pacman, and version 6 of api.py which defines the observability of the environment that you will have to deal with, and the same non-deterministic motion model that the practicals used.
Version 6 of api.py, further extends what Pacman can know about the world. In addition to knowing the location of all the objects in the world (walls, food, capsules, ghosts), Pacman can now see what state the ghosts are in, and so can decide whether they have to be avoided or not (we will ignore the score when marking the coursework so there is no need to chase ghosts and try to eat them, but you can if you really want to).
3 What you need to do
3.1 Write code
This coursework requires you to write code to control Pacman and win games using an MDPsolver. There is a (rather familiar) skeleton piece of code to take as your starting point in the file mdpAgents.py. This code defines the class MDPAgent.
There are two main aims for your code:
- (a) Win hard in smallGrid
- (b) Win hard in mediumClassic
To win games, Pacman has to be able to eat all the food. For this coursework, “winning” just means getting the environment to report a win. Score is irrelevant.
3.2 Things to bear in mind
Some things that you may find helpful:
- (a) We will evaluate whether your code can win games in smallGrid by running:
python pacman.py -q -n 25 -p MDPAgent -l smallGrid
-l is shorthand for -layout. -p is shorthand for -pacman. -q runs the game without the interface (making it faster).
- (b) We will evaluate whether your code can win games in mediumClassic by running:
python pacman.py -q -n 25 -p MDPAgent -l mediumClassic
The -n 25 runs 25 games in a row.
- (c) When using the -n option to run multiple games, the same agent (the same instance of MDPAgent.py) is run in all the games.
That means you might need to change the values of some of the state variables that control Pacman’s behaviour in between games. You can do that using the final() function.
- (d) There is no requirement to use any of the methods described in the practicals, though you can use these if you wish.
- (e) If you wish to use the map code I provided in MapAgent, you may do this, but you need to include comments that explain what you used and where it came from (just as you would for any code that you make use of but don’t write yourself).
- (f) You can only use libraries that are part of a the standard Python 2.7 distribution. This ensures that (a) everyone has access to the same libraries (since only the standard distribution is available on the lab machines) and (b) we don’t have trouble running your code due to some library incompatibilities.
3.3 Write a report
Write up a description of your program along with your evaluation in a separate report that you will submit along with your code.
As you work through your implementation of the MDP-solver, you will find that you are making lots of decisions about how precisely to translate your ideas into working code. The report should explain these at length. The perfect report will give enough detail that we don’t feel we have to read your code in order to understand what you code does (we will read it anyway). But don’t overwhelm us with detail, either.
Given the requirement for your code to be based around an MDP-solver, you would be wise to include a description of how your code solves the MDP, and which bits of the code do this solving.
Remember, when writing your report, that there is credit for well-written and beautiful solutions. Highlight things that make your work unique.
Having said that, reports that are needlessly long will not get any more marks. We value concise reports (we have to read a lot of them).
Your report should also analyse the performance of your code. Because there is a certain amount of randomness in the behaviour of the ghosts, a good analysis will run multiple games to assess Pacman’s performance. For example, you might like to try running:
python pacman.py -n 50 -q -p MDPAgent -l mediumClassic
to get a statistically significant number of runs so that you can, for example, establish the average score with some accuracy. (Of course, to decide whether this was a statistically significant number of runs, you would have to do some statistical analysis — it might well need more runs.) All the conclusions that you present in your analysis should be justified by the data that you have collected. Finding out what a statistical analysis is and how it works, is part of your project.
There are some limitations on what you can submit.
- (a) Your code must be in Python 2.7.
Code written in a language other than Python will not be marked.
Code written in Python 3.X is unlikely to run with the clean copy of pacman-cw that we will test it against. If is doesn’t run, you will lose marks.
Code using libraries that are not in the standard Python 2.7 distribution will not run (in particular, NumPy is not allowed). If you choose to use such libraries and your code does not run as a result, you will lose marks.
- (b) Your code must only interact with the Pacman environment by making calls through functions in Version 6 of api.py. Code that finds other ways to access information about the environment will lose marks.
The idea here is to have everyone solve the same task, and have that task explore issues with non-deterministic actions.
- (c) You are not allowed to modify any of the files in pacman-cw.zip except mdpAgents.py.
Similar to the previous point, the idea is that everyone solves the same problem — you can’t change the problem by modifying the base code that runs the Pacman environment.
- (d) You are not allowed to copy, without credit, code that you might get from other students or find lying around on the Internet. We will be checking.
This is the usual plagiarism statement. When you submit work to be marked, you should only seek to get credit for work you have done yourself. When the work you are submitting is code, you can use code that other people wrote, but you have to say clearly that the other person wrote it — you do that by putting in a comment that says who wrote it. That way we can adjust your mark to take account of the work that you didn’t do.
- (e) Your code must be based on solving the Pacman environment as an MDP. If you don’t submit a program that contains a recognisable MDP solver, you will lose marks.
- (f) The only MDP solvers we will allow are the ones presented in the lecture, i.e., Value iteration, Policy iteration and Modified policy iteration. In particular, Q-Learning is unacceptable.
- (g) Your code must only use the results of the MDP solver to decide what to do. If you submit code which makes decisions about what to do that uses other information in addition to what the MDP-solver generates (like ad-hoc ghost avoiding code, for example), you will lose marks.
This is to ensure that your MDP-solver is the thing that can win enough games to pass the functionality test.
4 What you have to hand in
Your submission should consist of a single ZIP file. (KEATS will be configured to only accept a single file.) This ZIP flle must include a single PDF document (the report), and a single Python file (your code).
5 How your work will be marked