New Study Reveals Fewer Fossils Needed for AI Species Identification
A recent study co-authored by Bruce MacFadden, UF Distinguished Professor Emeritus, indicates a significant finding for paleontological research. Approximately 250 fossils are sufficient to accurately train an image-based AI algorithm for species identification. This crucial finding is lower than previous estimates, promising to streamline the challenging work of paleontologists.
Background on Paleontology Challenges
Paleontologists frequently encounter significant challenges when identifying fossil fragments. While a single fossil provides limited information, answering most scientific questions with certainty requires multiple specimens of the same species and type.
Vertebrate fossils, in particular, are often incomplete due to their intricate skeletal structure and complex preservation processes. For instance, vertebrate skeletons can comprise over 200 bones, yet complete specimens are rarely, if ever, found.
The identification of fragmented fossils is a notoriously time-consuming aspect of paleontological work. The Florida Museum's vast vertebrate fossil collection, containing over one million specimens, includes numerous bags of sediment that are still awaiting meticulous sorting and identification.
The Role of AI in Identification
AI has the potential to significantly accelerate the fossil identification process. Fields such as palynology, which specializes in the study of fossilized spores and pollen, have successfully utilized AI since the 1980s, largely due to the sheer abundance of specimens available.
While vertebrate fossils are generally less plentiful, AI can still be applied effectively in this domain. The main question prior to this study was to determine the minimum number of specimens required to adequately train an AI algorithm for accurate and reliable results.
Study Methodology
To pinpoint this critical threshold, MacFadden and his colleagues strategically selected sharks for their research. Sharks are well-known for their abundant fossilized teeth; although their cartilaginous skeletons rarely fossilize, their durable teeth are commonly preserved across geological timescales.
The research team concentrated on six specific shark species from the Neogene period (ranging from 23 to 2.6 million years ago). This selection included both extinct species, such as the colossal Megalodon, and extant species like the modern great white shark.
Thousands of shark teeth specimens from the extensive Florida Museum collection were meticulously photographed for inclusion in the study. To further bolster their dataset, additional specimens, particularly from tiger sharks and the extinct precursor of the great white shark, were generously obtained from avocational paleontologists Lee Cone, Barbara Fite, and C. O'Connor.
The research made extensive use of computer vision, a specialized branch of artificial intelligence. Cristobal Barberis of Adaptive Computing was primarily responsible for fine-tuning the intricate AI models, with valuable assistance from Arthur Porto, who holds the distinction of being the Florida Museum's first curator of artificial intelligence.
The AI models underwent a rigorous training process. They were systematically fed labeled images in increments of 50, gradually increasing the training set up to 500 images per species. Following this training phase, the models were then put to the test on 25 unlabeled images of each species to evaluate their identification accuracy.
Results and Implications
The study reported impressive accuracy rates exceeding 90%, with performance notably plateauing at approximately 250 specimens. This significant finding indicates that employing more than 250 specimens may offer only a modest, incremental increase in accuracy, strongly suggesting that sufficiency is achieved at these lower numbers.
Remarkably, even with only 50 specimens used for training, the AI models achieved accuracy rates of at least 93%.
Arthur Porto highlighted that the study successfully revealed "reasonable performance even with relatively low sample sizes."
These findings carry profound broader implications extending well beyond the field of paleontology. MacFadden and the "SharkAI" team are also actively involved in innovative educational initiatives. They envision developing curricula where K-12 students could engage with AI, using it to classify shark tooth images from biorepositories based on tooth shape and the estimated prey types of the original owners.
The groundbreaking study was officially published in the esteemed journal Paleobiology. Additional authors who contributed to this research included Maria Vallejo-Pareja, Stephanie Killingsworth, Samantha Zbinden, Victor Perez, Kenneth Marks, and Dévi Hall.