Many technical decisions go into supporting AI—from which LLMs to use, where to deploy, what infrastructure is required, and how to train employees. Recently, we surveyed enterprise IT decision-makers in the U.S., and nearly half (44%) said that creating AI-ready data infrastructure is today's top priority. IT organizations are also custom-training existing models (37%), using cloud services for AI (32%), building their own learning models (32%), allowing employees to use commercially available AI models (29%), and training employees (33%).
In this article, I’ll review these different areas, offering a few considerations for enterprises building their so-called AI-ready infrastructure. As always, budget factors heavily into AI technology decision-making, but so do security, governance compliance, and the availability of IT staff with the right AI and ML skill sets.
Creating AI-ready Data Infrastructure
Launching an AI initiative in your enterprise may require model development and training if you need to build your own generative AI model. This typically begins with acquiring adequate high-performing computational resources—the pricey CPUs, GPUs, and TPUs that are required to host machine learning models and process data at warp speed. While pre-baked infrastructure, public models, and cloud services offer cost and ease-of-use benefits, IT organizations must also weigh the benefits of keeping AI in-house for better controls or, rather, establishing a hybrid model that provides the right levels of data governance, transparency, and security.
The average cost of an AI server is $32,000. “Gartner distinguished analyst John-David Lovelock points out a rack of AI servers will cost over $1 million.” Flash-based storage technologies designed for AI may also add to the costs. Then there’s the support and maintenance of all this gear, requiring full-time IT staff and a state-of-the-art data center.
Using Corporate Data With AI
Regardless of whether you are building your own model from scratch or, more likely, fine-tuning and using pre-built models, you need data management to bring the right unstructured data to AI. Unstructured data management automates AI data workflows and manages corporate data governance, especially with sensitive data.
Unstructured data, which according to IDC, accounts for 90% of all data, is typically scattered across many silos, and that’s part of the role of data management: to facilitate rapid search, tagging, and feeding of the right data to AI models.
Cloud Services for AI
The major cloud providers have built soup-to-nuts services to support AI for organizations that can’t or don’t want to manage the technology in-house. The components range from fast storage and compute resources to machine learning, GenAI, and development tools. While cloud-based AI has distinct cost advantages – you don’t need to buy servers or storage nor pay for the increased energy, which will add to your data center footprint – you can easily overprovision and overspend in the cloud. There is also the issue of cloud skills gaps.
A cloud AI strategy can be both successful and cost-efficient if you can manage data appropriately. For example, copying petabytes of unstructured data into the cloud and then trying to figure out which data is useful for AI would run up a huge bill quickly. You’d also want to avoid feeding an AI application without cleaning up the data mess first: most organizations have large quantities of duplicate, obsolete, or zombie data that should be purged. Make sure your data is in good shape—classified and organized—before moving it, and only move the data you know fits the scope of your project.
Pick use cases with a predictable ROI, and be sure you can measure the results later. Security and compliance requirements may preclude the option of hosting AI in the cloud. At a minimum, understanding the risks of your data in any AI service and knowing how to audit projects for data risk are critical steps before beginning any project.
Machine Learning Model Decisions
Popular machine learning models, such as GPT, Claude, Gemini, TensorFlow, and PyTorch, rely upon massive public data sets for training. Yet, to make AI useful and credible for enterprise projects aimed at improving operations, R&D, or customer relationships, you’ll want to train a model with your own proprietary data and keep it private.
Training and/or developing a model requires the skills of specialized data scientists who understand top programming languages like Python and R, big data modeling and analysis, knowledge of machine learning models, as well as security and cloud computing.
An ambitious, well-funded analytics and data science team may even choose to develop a model from scratch. The reasons for this include the desire for full control over architecture and security and/or to support a highly sensitive, competitive project. While there are communities like Hugging Face and OpenAI that help choose the components and collaborate with others, this is a tremendous lift. It entails cleaning and preparing data, selecting and training algorithms, and fine-tuning the model for accuracy and reliability. You’ll need to procure not only the infrastructure but a team of engineers to do the work.
Due to the resource constraints of most organizations, using pre-trained proprietary or open-source ML models with corporate data is likely the most common pathway to AI. AI inferencing is a much larger, broader market than AI training. Hence, IT organizations are increasingly investing in creating the appropriate data infrastructure to find, curate, audit, and feed corporate data to AI while maintaining data governance.
The Rise of Off-the-Shelf AI Tools
The Komprise survey found that only 30% of organizations have designated a budget for AI, implying that 70% are still experimenting and researching the technology. And today, that probably means using low-cost applications such as OpenAI ChatGPT, Anthropic Claude, Microsoft Copilot, or Google Gemini. Employees across departments use these tools to answer questions, write text, create graphics and images, or write software code – with laser speed and good enough results.
What’s missing are standards and mainstream best practices. What projects are safe and appropriate for GenAI? What data should be used, and which should be protected from ingestion? How should GenAI-derived works be evaluated for accuracy and legitimacy? What happens if IP or customer data is leaked into a general-purpose LLM? How can a company protect itself from copyright or libel lawsuits based on GenAI-produced work?
Start by understanding your data estate in terms of data characteristics and quantity of sensitive data such as PII and IP. That analysis will help guide the organization in developing policies for GenAI use that govern data and use cases. You’ll need a tool to monitor compliance and investigate issues that arise from using GenAI, when and if they arise.
Can you track which data has been sent into the AI tool by which users or departments? Can you find and move sensitive data out of directories where it can be discovered and pulled into an AI tool? Some unstructured data management solutions provide this functionality; AI data governance is a growing area of demand to prevent blowbacks from AI that can damage customer trust, loyalty, and marketplace credibility.
The Need for GenAI Governance
Given the general marketplace concerns with AI, its known ability to create false outcomes and damaging hallucinations, the risk for corporate data leakage into general-purpose LLMs, and the expense of developing and implementing AI technologies, IT leaders will want a watertight plan and process to evaluate and deploy the AI stack.
Want more AI insights? Subscribe to The CTO Club's newsletter for tips and tools in your inbox!