A TensorFlow dataset for identifying classic arcade games from sequenzes of screendumps. All screendumps for a game are, typically, from the games attract mode. The sequence could also be from the game being played, if the game only has a single stage/level. That is, what is trying to recognize, are
The games are identified using the name of the game's MAME ROM set.
The dataset is available as archive files or as a TensorFlow dataset built with the TFDS CLI.
last part of sequence is used for testing, and first par for training. Not good. Should be mixed.
The dataset contains data for the following games (As named in MAME):
- amidar
- depthcho
- digdug
- dkong
- frogger
- galagao
- invadrmr
- missile1
- pacman
- qix
- rallyx
An example of loading the 16x16 version of the TensorFlow dataset. The first 50% will be used for training, and the second 50% will be used for testing:
import tensorflow_datasets as tfds
from dataset import classic_arcade_games
(train_images, train_labels), (test_images, test_labels) = tfds.load(
"classic_arcade_games/16x16",
split=['train[:50%]', 'train[50%:]'],
as_supervised=True,
batch_size=-1
)
For a full example, take a look at dataset_demo.py in this repository, as well as this Terraform tutorial.
The data in this dataset has been collected by running MAME and manually creating screendumps from the chosen sequenze in the game.
Unmodified screendumps are stored in /data/mame/original/<mame_id>/*.png
Squared, grayscale version in different resolutions, are avaiable in .zip archives. These images have been created using the script scale_screndumps.py. are stored in these directories:
- data/mame/8x8.zip
- data/mame/16x16.zip
- data/mame/32x32.zip
- data/mame/64x64.zip
Please contribute to this dataset by adding screendumps made with MAME of game sequences. Make a PR adding around 100 screendums to data/original/<name-of-game-rom>/<number>.png. Do not use any effects on the image.
Examples of what the squared screendumps look like.
Results for the variations of the dataset with 50% of the data used for training and the other 50% used for testing.