In response to increasing demand for streamlined AI solutions that prioritize cost savings, Cloudflare has introduced a new collection of products and applications designed to facilitate the development, deployment, and execution of AI models at the network edge.
One of the flagship offerings, Workers AI, enables customers to harness the power of nearby GPUs hosted by Cloudflare’s partners, offering a convenient pay-as-you-go approach for running AI models. Another key addition, Vectorize, provides a robust vector database for storing vector embeddings – essential mathematical data representations generated by AI models from Workers AI. Completing the trio, AI Gateway offers comprehensive metrics to assist customers in managing the expenses associated with AI applications.
Cloudflare’s CEO, Matthew Prince, explained that the motivation behind these AI-focused products stemmed from their commitment to simplifying AI management while putting a strong emphasis on cost efficiency. He expressed, “The offerings already on the market are still very complicated — they require stitching together lots of new vendors, and it gets expensive fast. There’s also very little insight currently available on how you’re spending money on AI; observability is a big challenge as AI spend skyrockets. We can help simplify all of these aspects for developers.”
Workers AI takes a novel approach by ensuring AI inference occurs on GPUs that are physically close to users, significantly reducing latency and enhancing the end-user experience. By leveraging ONNX, an intermediary machine learning toolkit backed by Microsoft, Workers AI empowers AI models to run in the most optimal location considering factors like bandwidth, latency, connectivity, processing, and localization constraints.
Users of Workers AI can select models from a catalog that covers a wide range of applications, including large language models (LLMs), automatic speech recognition models, image classifiers, and sentiment analysis models. Moreover, data used for inference remains within the server region, ensuring data privacy and security.
Prince also noted the challenges associated with executing large AI models like LLMs on user devices due to computational and battery limitations. He pointed out, “Meanwhile, traditional centralized clouds are often geographically too far from the end user. These centralized clouds are also mostly based in the U.S., making it complicated for businesses around the world that prefer not to (or legally cannot) send data out of its home country. Cloudflare provides the best place to solve both these problems.”
Cloudflare has already forged a significant partnership with AI startup Hugging Face, where Hugging Face will optimize generative AI models to run on Workers AI. In return, Cloudflare will become the first serverless GPU partner for deploying Hugging Face models.
Databricks is another key partner that plans to bring AI inference to Workers AI through MLflow, an open-source platform for managing machine learning workflows. Cloudflare is set to join the MLflow project as an active contributor, further solidifying the collaboration between the two companies.
Vectorize, another component of Cloudflare’s AI suite, targets customers requiring storage solutions for vector embeddings used in AI models. Vector embeddings, fundamental to machine learning algorithms powering applications like search engines and AI assistants, are compact representations of training data that preserve meaningful information. While vector databases are not new, Cloudflare’s global network enables database queries to occur closer to users, reducing latency and inference time.
Lastly, AI Gateway offers observability features to assist in tracking AI traffic, including monitoring the number of model inferencing requests, their duration, user engagement, and overall AI app costs. The tool also provides capabilities such as caching and rate limiting to optimize costs and manage traffic effectively.
Matthew Prince claimed that with AI Gateway, Cloudflare stands out by allowing developers and companies to pay solely for the compute they use. While similar functionalities can be found in third-party tools like GPTCache and other providers such as Vercel, Prince argued that Cloudflare’s approach is more streamlined, offering a simplified and cost-effective solution for customers.
With the launch of these AI-centric products and applications, Cloudflare aims to empower developers and organizations with the tools they need to leverage AI effectively while minimizing complexity and costs, ultimately fostering innovation in the AI landscape.