How Data Scientists Can Improve Data Curation Practices
Working on the model has always been viewed as the “important” work of AI. In fact, data is the most undervalued and deglamorized aspect of AI, according to a Google white paper, and when perusing a sampling of recently published papers on AI, only 1% covered data-centric themes, while 99% discussed model-centric topics.
But the reality is that 80% of a data scientist’s work today revolves around the collection, cleaning, and organization of data. We’ve got to move past data as a “dirty little secret” and prioritize data curation with more time and resources spent.
There are many good models available to give data scientists a head start to a data-centric approach. However, jumping from 80% to 95% accuracy ranges is exponentially more difficult. It’s no longer enough to adjust and tweak the model; good training data and a good way to select your data are critical.
That’s where modern data curation tools come into play. Great data selection and curation tools can slice down the manual labor component of data management, saving overworked, timestrapped data scientists valuable hours. Consider this example: By deploying visual data search, clustering, and model audit tools, data science teams can dramatically boost the model training process and improve model accuracy.
The Future of Visual Data and Its Uses
The collection of visual data is ever-expanding; what we demand from that visual data will continue to grow as AI weaves into more of our world, from more security cameras to medical imaging technology to the advent of self-driving cars.
The potential uses of visual data to improve real-world AI applications are huge, but only if we can find the algorithmic means to assess, store, curate, and select visual data.
For the true power of AI and visual data to be unleashed, we must invest in the tools and processes that save data scientists time and make visual data management a more streamlined, efficient undertaking.
We’re officially at a point where data quality matters more than model performance. It’s time to shift from the model-centric mindset to embracing a data-centric approach.