Developing a novel process for producing optimised real-world and synthetic training datasets designed for application driven machine learning