Organizations embarking on AI projects cannot wait for long-run data centralization projects. They must be able to integrate high-quality data, wherever it is, as quickly as possible for their AI tools to deliver contextually accurate outputs leveraging insights from the most recent data.
Outdated and cumbersome data extraction and copying from multiple sources to a central location to be ingested by AI tools is unnecessary and expends significant technical and financial resources.
The Rise of Data Repatriation: Cloud to On-Prem Shift
For many organizations, the movement of data from cloud-based storage to on-prem systems is in full swing. This is known as data repatriation. As the volume of data organizations create and transact has increased, so have the costs associated with data management. Even at a few cents per gigabyte, the costs add up quickly.
Bill Burnham, the CTO for U.S. Public Sector at Hewlett Packard Enterprise, notes costs can grow “astronomically” as organizations move into processing petabytes of data. Bringing data back to on-prem storage, particularly in AI applications where new data is used to refine and update outputs, makes economic sense.
From an operational perspective, it’s ideal to place data as close as possible to where it will be used. Cloud-based systems offer many benefits, but they are not a solution to every problem. When training AI models, it is critical that they have access to the most recent, accurate data.
Safeguarding AI Data and Outcomes
Research from Gartner suggests that cloud service misconfigurations are a significant issue that can result in sensitive data being ingested by unauthorized AI models. Just as end-users are cautioned that the queries they submit to public generative AI services can be used, this scenario applies to any data that is exposed.
On-prem systems are not immune from data breaches but the risk of an unauthorized AI model accessing corporate data resulting in an intellectual property leak can be mitigated.
Inaccurate results from AI models remain a significant issue. Recent examples, such as Google’s AI suggesting cooks use glue to make cheese stay on a pizza and eating a rock each day as a source of vitamins and minerals, reflect how important it is for LLMs to be populated with appropriate and contextual data. By using your own data and making it available quickly and cost-effectively, the risk of misleading or erroneous results is reduced.
The Role of Contextual Data in AI Accuracy
The importance of contextual information cannot be understated. The best data your organization can use for AI tools is data that specifically pertains to your actions.
For apparel retailers, data about the demographics they serve is critical. A clothing outlet that focuses on servicing the 16 to 25-year-old female demographic needs different inputs than stores that sell suits to men in the 35 to 50-year-old cohort.
AI models that ingest general data and do not understand the specific needs of the business can deliver outputs that lead to poor decisions. While examples such as gluing cheese to a pizza are humorous, a buyer for a retail chain ordering thousands of units of clothes that won’t be purchased can be costly or even catastrophic.
Bringing Your AI On-Prem
Putting data as close as possible to where it will be used for AI reduces complexity and costs. AI projects are highly dependent on the data used to train the model. Having high-quality, timely data is more valuable than hiring more data scientists. Organizations must prioritize the data they use for their models and make it accessible.
The typical approach to managing data for AI applications relies on copying data from the source so it can be used to train the models. But when the best data is distributed across multiple platforms, such as cloud-based CRM, an on-prem finance platform, and online productivity tools, it can be challenging to make the data accessible. Often, the result is that the easiest data to centralize is used, with the rest left for when budgets and time allow.
The question AI teams need to ask is how they can access all the data they need without having to wait for costly and time-consuming data repatriation projects. They need an easy way to access disparate data in multiple locations and be able to redirect the queries that access that data as data is moved.
Data preparation tools can condition data for AI activation while minimizing disruption during data repatriation. Leveraging these new state-of-the-art approaches, AI projects can turbocharge ahead without waiting for data migration or the need to significantly re-engineer systems. Training data can be brought to AI models and LLMs as it’s created in near real-time.
Accelerating AI Projects With On-Prem Data Hubs
Burgeoning costs, concerns about the leakage of intellectual property, and greater agility in the development of AI tools are driving the shift from cloud-based platforms to on-prem. AI and LLMs demand access to high-quality, contextual, and timely data to ensure they deliver the best results for users.
An AI data hub that is the single, central workbench and governance zone for all AI and data integration projects enables the acceleration of AI projects in parallel with cloud repatriation projects so organizations can rapidly leverage their data.
At the same time, they can continue to give business users better customer insights and advanced analytics to boost revenues and beat competitors in an extremely competitive business environment.
Subscribe to The CTO Club newsletter for more AI insights and best practices.