MAI is a dataset for multi-scene recognition in single aerial images. It consists of 3,923 labelled large-scale images from Google Earth imagery that covers the United States, Germany, and France. The size of each image is 512 ×512, and spatial resolutions vary from 0.3 m/pixel to 0.6 m/pixel. After capturing aerial images, multiple scene-level labels were manually assigned to each image from in total 24 scene categories, including apron, baseball, beach, commercial, farmland, woodland, parking lot, port, residential, river, storage tanks, sea, bridge, lake, park, roundabout, soccer field, stadium, train station, works, golf course, runway, sparse shrub, and tennis court