Driving AI Adoption: An Interview With Pete Lilley on Open Source Vector Databases

Katie Sanders

on Nov 18, 2024

With contribution from

Pete Lilley

QUICK SUMMARY

Integrating vector database capabilities into open source tools like PostgreSQL is revolutionizing enterprise AI by enabling precise vector search and Retrieval Augmented Generation. In this Q&A, Pete Lilley shares insights on overcoming adoption challenges, leveraging managed services, and building scalable, future-ready AI infrastructure.

The integration of vector database capabilities into widely used open source databases like PostgreSQL and Apache Cassandra represents a significant leap forward for AI adoption in enterprise environments.

Pete Lilley, Vice President and General Manager at NetApp Instaclustr, brings over 25 years of experience in IT services and solution implementation to the discussion. With his deep expertise in scalable data infrastructure, Pete shares insights on how these open source advancements make vector search and Retrieval Augmented Generation (RAG) a practical and powerful reality for AI-driven enterprises.

These technologies empower CTOs to accelerate AI initiatives, support enterprise-grade performance, and tackle the opportunities and challenges of embedding vector capabilities into existing data infrastructures.

How do you see the integration of vector database capabilities into popular open source databases like PostgreSQL and Apache Cassandra impacting the adoption of AI technologies in enterprise environments?

The ability to harness vector search by tapping into familiar open source databases like PostgreSQL (with the pgvector extension), the new Apache Cassandra 5.0, and OpenSearch (as a third example) means an easier path to standing up and scaling enterprise AI initiatives. Each of these fully open source technologies—which are open source technologies most enterprises already have as part of their stack—has evolved to now provide not just the enterprise-grade vector search capabilities critical to enabling AI accuracy but also the underlying data infrastructure to ensure AI projects can thrive for the long run.

Technology leaders understand they need vector databases, but many are wary of adopting and building up talent around proprietary vector databases that are expensive and promote lock-in. Fully open source alternatives can be much more inviting, given the breadth of experts and managed services available, as well as the supportive open source communities that surround each of those aforementioned projects. Whereas proprietary vector databases mean upfront costs and the loss of flexibility, open source vector databases allow enterprises to hit the ground running and pursue AI projects with more confidence.

What are some specific advantages of implementing Retrieval Augmented Generation (RAG) with open source vector databases for enterprise-specific AI use cases?

In the absence of a RAG architecture and vector search, enterprise LLMs have to use traditional search engine technology to try to understand the relationships between keywords as they interpret queries. The result is often inefficiency and a lack of contextual understanding—or even complete misunderstanding of the query context, which can lead to AI hallucination. Without a strong enough way to understand the contextual intent of a user’s query, enterprise AI projects are prone to poor LLM performance and low-quality results…if not disastrously misguided ones.

Vector search offers a better path forward to achieve contextual understanding—one that’s especially effective when backed by a RAG utilization of vector datastores. Vector databases store embeddings vectors that assign spatial data to keywords as sets of numerical coordinates. The nearer those numbers, the more alike the two keyword terms are. Vector search utilizes these embeddings to concentrate searches on limited sets of data that are most relevant to the context of the query. That narrower scope means harnessing vast datasets more efficiently. That reduces hallucination risk while offering better performance.

What are some of the key challenges CTOs might face when introducing vector database capabilities to their existing data infrastructure, and how can they best prepare their teams for this transition?

CTOs should anticipate a learning curve that their teams will need to overcome before a vector database delivers the cost-efficient operations and performance they’d like to see. Long-view planning is essential to ensure teams receive the resources and time required to correctly implement and continuously optimize the database.

Following specific data best practices will also heavily influence AI project outcomes. These should include utilizing high-quality data, correctly chunking and embedding that data, and utilizing metadata and hybrid search terms (combining traditional and vector search methods). Taking AI projects powered by LLMs and vector search from the demo stage into enterprise-grade production takes dedication and sustained effort. Making sure that experienced talent is in place in vector database operations and data science roles—or external managed-services-based expertise is on hand for support—will flatten the learning curve and accelerate projects toward delivering meaningful results.

How do you envision the role of managed services in helping IT leadership implement and optimize vector database capabilities, particularly for those with limited in-house expertise?

Managed services can provide enterprises with a fast pass to getting their intelligent data infrastructure up and running and doing everything right the first time, even without in-house experts on staff. Enterprises utilizing popular open source technologies like PostgreSQL, Cassandra 5.0, or OpenSearch will have no trouble finding managed services ready to help implement and optimize their AI projects while reducing some of the hiccups teams inevitably experience when finding their bearings.

Looking ahead, how do you think the landscape of AI-driven data technologies will evolve, and what steps should CTOs take now to ensure their organizations are well-positioned for future developments?

Increased demand for more performant, more flexible, and more capable AI data technologies is pretty much a given going forward. CTOs should certainly look at open source software that has already demonstrated its enterprise-grade reliability, scalability, security, efficiency, and staying power in the industry and consider how those options can go to work within the intelligent data infrastructure supporting their AI projects. At the end of the day, choosing the right data-layer tooling can make all the difference when it comes to matching enterprise AI vision to enterprise AI reality.

What’s Next

As AI-driven technologies reshape data management and analysis, the expansion of vector search capabilities in open source databases like PostgreSQL and Apache Cassandra provides enterprises with an accessible, powerful foundation for AI innovation.

Leveraging these technologies with strategic foresight can enable organizations to scale AI effectively, mitigate common implementation challenges, and ensure alignment with long-term AI goals.

By prioritizing open source options and managed services that support vector database operations, CTOs can future-proof their infrastructure and position their organizations to capitalize on the next wave of AI advancements in data technology.

Subscribe to The CTO Club’s newsletter for more open source insights.

What’s Next

What Are Automation Testing Tools? 9 Types & Examples

How CTOs Can Drive Product Adoption and Retention

Modernize Your Business’s Approach to Endpoint Management With These 3 Tips