In June 2022, a collective of AI pioneers—Vipul Ved Prakash, Ce Zhang, Percy Liang, and Chris Ré—came together to address a significant gap in the AI landscape. Their goal was to make generative AI models more accessible and customizable for enterprises, focusing on open-source solutions to foster innovation while reducing dependency on closed systems. Together AI’s mission centers on the belief that generative models are a consequential technology for both business and society. Vipul Ved Prakash emphasizes the importance of an open and decentralized alternative to closed AI systems, stating, “We believe that generative models are a consequential technology for society, and open and decentralized alternatives to closed systems are crucial for business and society.”
Together AI’s core offering is its AI Cloud platform, described by Prakash as “a full stack platform. We build everything from the data centers to the developer APIs, and it’s really focused on optimizing open-source custom AI models and applications.” This comprehensive approach simplifies AI adoption by handling everything from infrastructure to model deployment, giving enterprises an easy-to-use and scalable solution.
Addressing Key AI Challenges
One of the main obstacles in adopting generative AI today is the expertise required to train and fine-tune models, as well as the complexity of managing large-scale infrastructure. Together AI tackles these challenges head-on. As Prakash notes, “Training, fine-tuning, or productizing open-source generative models is extremely challenging. Current solutions require that you have significant expertise in AI and are simultaneously able to manage the large-scale infrastructure needed. The Together platform takes care of both challenges out-of-the-box, with an easy-to-use and accessible solution.” By addressing both the technical and operational hurdles, Together AI enables businesses to deploy AI without needing a deep understanding of the underlying infrastructure.
Reducing Costs Through Optimization
Another critical advantage of Together AI is its cost-efficiency. Prakash outlines the company’s approach to cost reduction, stating, “We optimize down the stack, with thousands of GPUs located in multiple secure facilities, software for virtualization, scheduling, and model optimizations that significantly bring down operating costs.” This focus on optimization helps companies reduce their AI operating expenses while maintaining industry-leading performance and reliability.
The Value Proposition for Enterprises
Together AI’s commitment to open-source, transparent, and decentralized AI models offers enterprises the flexibility to control and customize their AI investments. Prakash highlights the platform’s value proposition: “Our customers choose to bring their generative AI workloads to Together owing to our industry-leading performance and reliability, while still having comfort that they own the result of their investment in AI and are always free to run their model on any platform.” This flexibility and focus on ownership ensure that businesses can scale their AI efforts without being locked into proprietary systems.
Core Offerings
Together AI’s platform provides a comprehensive suite of tools and services for AI development:
- Together Inference: Facilitates running over 100 open-source models on serverless or dedicated instances.
- Together Fine-Tuning: Enables users to fine-tune generative AI models using proprietary data.
- Together GPU Clusters: Offers frontier clusters with configurations ranging from 16 to over 1000 GPUs.
- Together Custom Models: Supports training of frontier models from scratch with various architectures.
- Together Enterprise Platform: Manages the entire Generative AI lifecycle, boasting 2-3x faster inference and up to 50% lower operational costs.
Key Projects and Collaborations
Together AI is involved in significant AI projects, including:
- RedPajama: Aiming to develop chat-based generative AI models to compete with OpenAI’s ChatGPT.
- OpenChatKit: Another initiative focused on competing with ChatGPT.
- GPT-JT: A fork of the open-source GPT-J-6B model designed for text analysis.
- FlashAttention-3: An optimization achieving up to 75% GPU utilization on H100s, doubling AI model performance.
- Cocktail SGD: An optimization reducing network overhead in distributed AI training by up to 117x.
Together AI also supports numerous companies in their AI initiatives:
- Pika Labs: Built its text-to-video model on Together GPU Clusters, scaling to generate millions of videos monthly.
- Cartesia: Achieved industry-leading text-to-voice performance with less than 200 ms latency using Together AI’s services.
- Vals.ai: Efficiently conducts evaluations for new models, enhancing their leaderboard.
- Upstage: Deployed their Solar model on Together Inference, processing over 2.8 million peak tokens per hour.
- Wordware: Leveraged Together AI’s API to reduce operational costs for AI-powered NPC interactions.
Use Cases
Together AI’s platform enables a wide array of use cases across industries:
- Text-to-Video Generation: Assisting companies like Pika Labs to create millions of AI-generated videos.
- Text-to-Voice Synthesis: Enabling real-time voice generation for companies such as Cartesia.
- AI Model Evaluation: Facilitating rapid testing and evaluation for companies like Vals.ai.
- NPC Interactions in Gaming: Providing cost-effective AI-driven character interactions.
- Custom AI Model Development: Supporting businesses in creating specialized models for unique needs.
- Large-Scale AI Deployment: Helping organizations like Zoom and The Washington Post implement GenAI applications in production.
Technological Innovations: FlashAttention-3
Together AI has introduced FlashAttention-3, a major advancement in optimizing attention mechanisms for AI applications. Key features include:
- GPU Utilization: Achieves up to 75% utilization of the NVIDIA H100 GPU’s theoretical capabilities.
- Performance: Operates at nearly 1.2 PFLOPS while maintaining competitive accuracy, offering up to 2x faster processing speed.
- Memory Optimization: Reduces memory footprint, aiding in cost reductions for large-scale deployments.
- Enhanced Contextual Processing: Capable of handling longer contextual inputs for extensive text analysis.
FlashAttention-3 represents a collaborative effort involving Colfax Research, Meta, NVIDIA, and Princeton University’s Together AI team.
Looking to the Future
Prakash is optimistic about the future of AI, forecasting significant growth in the coming decades. He believes we are only scratching the surface of what AI can achieve, stating, “If you look at the next 10 years or the next 20 years, we are doing maybe 0.1 percent of [the] AI that we’ll be doing 10 years from now.”
The launch of Together AI’s Together Kernel Collection (TKC) will further enhance the efficiency of common AI operations, promising significant improvements in speed and cost-effectiveness. The TKC offers:
- Training speedups of up to 24% for frequently used operators.
- Inference speedups of up to 75% for essential FP8 operations.
- Compatibility with PyTorch and rigorous testing for reliability.
Additionally, the upcoming availability of NVIDIA H200 Tensor Core GPUs within Together AI’s GPU Clusters promises enhanced performance metrics, including nearly double the memory capacity and significantly faster inference times compared to its predecessor.
Looking to the Future
Together AI is optimistic about the future of AI, forecasting significant growth in the coming decades. He believes we are only scratching the surface of what AI can achieve, stating, “If you look at the next 10 years or the next 20 years, we are doing maybe 0.1 percent of [the] AI that we’ll be doing 10 years from now.”