The FDA is joining hands with fellow regulators in Canada and the U.K. to present 10 principles outlining best practices in the development of artificial intelligence-powered healthcare programs and medical devices.
The missive aims to underline what the agencies describe as “Good Machine Learning Practice,” in the same vein as the industry quality standards centered around product manufacturing or clinical testing.
“With artificial intelligence and machine learning progressing so rapidly, our three regulatory agencies, together, see a global opportunity” to foster strong procedures, said Bakul Patel, director of the FDA’s Digital Health Center of Excellence, which was established last year.
“We recognize that machine learning technologies present unique considerations due to their complexity and the iterative and data-driven nature of their development. While this is true, we are excited for continued progress in this area,” Patel said in a statement.
Described as the start of a collaboration with Health Canada and the U.K. Medicines and Healthcare products Regulatory Agency, the work looks to set the stage for future efforts by the International Medical Device Regulators Forum—a group that includes governing agencies from Australia, Brazil, China, Europe, Japan, Russia, Singapore and South Korea.
The 10 principles aim to identify areas where international harmonization and consensus standards could be developed—including by the IMDRF and other global organizations—as well as ways for regulatory policies to be aligned when it comes to AI-powered research and tools.
Specifically, the FDA hopes this could serve as a starting point for AI guidelines tailored solely for healthcare or drive the adoption of best practices that have been proven in other areas.
At the top of the list is the need for multidisciplinary expertise to be present in every stage of the total product life cycle for a fuller understanding of any potential risks to patients and the AI model’s eventual integration into clinical workflow.
The principles also include the need for training datasets to be kept separate from testing datasets and that this information accurately represents the intended patient population.
In addition, testing of the device should reflect its use in clinically relevant conditions, and the criteria for success should focus on the performance of the doctors, nurses and technicians using the AI—and not the performance of the AI itself. Users should be provided clear, essential information and be made aware of modifications or updates that come when models are retrained on new, real-world data.