We recently hosted a call with Karthik Bharathy, Director of AI/ML services at AWS, who spoke on recent strategic announcements and AWS capabilities – specifically on how AWS’ AI/ML offerings look, how you can leverage them, and how to best work with the AWS team as a startup.
- There are four key considerations for startups when spinning up ML capabilities:
- What is the easiest way for your company to build?
- How can you differentiate with your data?
- How can you get the highest performance infrastructure at the lowest cost?
- How can applications and services vastly improve your productivity?
- Amazon Bedrock is a fully managed service that offers high performing foundation models from companies like AI21 Labs, Cohere, Anthropic, Stability AI, etc. including foundation models coming from Amazon (Titan models).
- Foundation models are just one piece of the puzzle. From a process perspective, there’s a lot more to the orchestration piece. Users typically want a task to be accomplished (vs. just interacting with data). There’s a process in and of itself which involves more than just interaction with the model – you must also interact with the data, a bunch of APIs on the back-end, and so on
- Differentiating with your data is key – It’s pretty evident, but while foundation models can do a lot of things out of the box, their impact is vastly amplified once they are fine-tuned with your data sources.
- Getting Educated – Amazon has heavily leveraged Coursera to provide a comprehensive suite of offerings on the topic. You can choose the service that you want to learn about + your role in your organization, and it provides several resources to learn more. And should you have specific asks on a Bedrock Workshop, or you want to learn about Sagemaker Jumpstart, AWS has hands-on workshops where their specialists will engage with you and help set things up.
The Tipping Point for Generative AI
At AWS, the primary goal today is figuring out how the business can take advantage of generative AI, beyond just simple text and chat use cases. What led to this moment was the massive proliferation of data and compute: it became available at low cost and a very large scale. So, machine learning has experienced a ton of innovation over the 2-3 years, which has accelerated the prior efforts tremendously.
Generative AI is a fundamental paradigm shift from the AI of the past – you’re trying to generate new content, powered by foundation models, which are pre-trained ML models leveraging vast amounts of data. What’s important is that these foundation models can be customized for specific uses cases, giving them more power and relevance
From an AWS standpoint, ML innovation is in their DNA – going all the way back to e-commerce recommendations, or picking the routes where packages can be stored, Alexa, Prime Air, or even Amazon Go where you have a physical retail experience. These products already incorporate machine learning and are backed by foundation models
Today, there are over 100K customers across different geographies and verticals, all using AWS for machine learning, and all of these customers are already in production.
There are four key considerations for startups when spinning up ML capabilities:
- How should you be building? What is the easiest way for your company to build – this probably comes down to where are you on your ML journey as a startup
- How can you differentiate with your data? What are the specific capabilities or functionalities you’re looking at? You want to customize what’s already out there so that you can get the best of what’s already being provided
- How can you get the highest performance infrastructure at the lowest cost? This applies to startups building their own foundational models, as well as those looking at fine-tuning existing models and leveraging the infrastructure for their application(s)
- How can applications and services vastly improve your productivity?
Easiest Way to Build / Getting Started with Amazon Bedrock
Amazon Bedrock is a fully managed service that offers high performing foundation models from companies like AI21 Labs, Cohere, Anthropic, Stability AI, etc. including foundation models coming from Amazon (Titan models).
This is a serverless offering, meaning you don’t have to manage any infrastructure when you access these models – all you have to do is use APIs to interact with them. You can also customize these models with your own data (e.g. fine-tuning or RAG)
You have the choice of taking advantage of an on-demand mode where you look at the input and output tokens, and then essentially index on what your pricing will be, so that you can project based on your current application needs (vs. future application needs). Whether this is coming from NVIDIA, AWS Inferentia, etc. – it’s all under the hood – you’re not exposed to instances. There are even capabilities like instance recommender that can suggest what’s the best instance for a given model – it really just comes down to the use case
There are many different models today, and this list will continue to expand. You can try them all in a sandbox environment via the AWS console.
The Bedrock service is HIPPA compliant so you can use the models in GA along with Bedrock for your production use cases
Getting started with Bedrock:
- Head to the AWS console and choose which foundation model you want to start with (there are many predefined templates that let you get started on the set of prompts that you can issue)
- Once you narrow down your choice on a given foundation model from a provider, you can use prompt engineering to get the best output POS, or, you can fine-tune the model and create something that’s very specific to your use-case
- Once you have the right response, you can have that model deployed in your environment, and there are options on how to deploy it
- Finally, you can integrate it with the rest of your gen AI application
In terms of pricing there are three different dimensions:
- On-Demand – Pay as you go, no usage commitments. You’re charged based on how many input tokens are processed and how many output tokens are generated. A token is a sequence of characters, and you pay based on the number of tokens that have been processed
- Provision Throughput – In some cases, where you want to refer to an earlier question and you need a certain level of performance (consistent inference workloads), you will get pro throughput guarantees on when a model is available for inference, and then the pricing dimension is based on the number of model units
- Fine-Tuning – You will be charged for fine-tuning the model, which is based off of a training Epoch, this is nothing but a run that’s trying to fine-tune the model based on a given dataset. And there are of course storage charges for the new model that’s being generated with fine-tuning. Additionally, fine-tuning fine-tuned models requires a provision throughput which is more of a dollars per hour cost
If you’re an ML practitioner who wants to try out an open-source model, and have access to the instances, you can use Sagemaker JumpStart. This is a model hub where you can access HuggingFace models (or other models) directly within Sagemarker and deploy them to an inference endpoint. Fine-tuning is different from the Bedrock experience – you work via a notebook where you make changes to the model in a more hands-on way, so if you’re an ML practitioner who is very familiar with the different techniques of fine-tuning, it gives you a lot of knobs and flexibility on how you can build, train, and deploy models on Sagemarker
Foundation models, however, are just one piece of the puzzle. From a process perspective, there’s a lot more to the orchestration piece. Users typically want a task to be accomplished (vs. just interacting with data), so if you have an application, but you just want to book your vacation, that’s going to involve a series of steps (e.g. understanding the different prices, selecting the different options, etc.). So, it’s a process in and of itself and that involves more than just interaction with the model – you must also interact with the data, and a bunch of APIs on the back-end, and so on. And at the same time, you want to ensure that security is tight, because while there’s orchestration, you also want to meet your enterprise cloud policies. This can take a number of weeks if you do this on your own, and Amazon Bedrock has just announced “Agents” to make this a lot simpler
How do Amazon Bedrock Agents work to enable generative AI applications to complete tasks in just a few clicks?
- Breaks down and orchestrates tasks – You can break down a set of activities into different tasks, and you can then orchestrate them within the Bedrock environment
- Securely accesses and retrieves company data – You can connect your company’s data sources, convert them into a format that can be understood by the model, and have relevant outputs coming off the application
- Takes action by executing API calls on your behalf – You have the capability to execute APIs on behalf of your application, and orchestrate the entire end to end flow
- Provides fully managed infrastructure support – You can complete entire tasks in a matter of clicks rather than composing an entire end-to-end application
Differentiate with your Data
Differentiating with your data is key – It’s pretty evident, but while foundation models can do a lot of things out of the box, their impact is vastly amplified once they are fine-tuned with your data sources. So, net data is your differentiator. You’ll get higher accuracy for the specific tasks that you’re after. All you need to do is point to the repository of your custom data, then point that to the foundation models. The foundation models will do the training run and produce a fine-tuned version of the model that you can use in your application
The customer data you provide to fine-tune the model is only used for your own, newly made, fine-tuned model. It won’t ever be used to improve the underlying model that Amazon is providing. AWS can’t access your data and don’t intend to use it for improving their own service. Everything is being generated in your VPC, so there are enough guardrails in place on who can access the data or the model. In fact, whenever a model is being used, e.g. a proprietary model, the model weights are protected so that consumers of the model don’t get access to the model. At the same time, the data that’s being used to fine-tune the model is not available to AWS or the model provider
It’s important to have a comprehensive data strategy that augments your gen AI applications. You have a variety of different data sources and in the case of structured or vector data, in some cases you may want to label the data. There are services in AWS which can be used for labeling the data which will give you more accurate results when you fine-tune your model. Then of course, you also may need to integrate multiple datasets. There are capabilities for ETL, so you can connect all your different data sources. Data and ML governance are also available as you build out your application
Most Performant, Low Cost
When you’re trying to run foundation models, it’s important that you have the more performant infrastructure at the lowest cost. AWS silicon supports many of the foundation models – you have the choice of using the GPUs when it comes to hosting and training foundation models like the A100s and the H100s, there is also custom silicon, AWS Inferentia 2, and Tranium available
- Performance – Optimized infrastructure for training large models using model parallelism and data parallelism. SageMaker Distributed Training is up to 40
- Resilience – AWS also offers resiliency against hardware failure, which becomes a big concern when you want to minimize downtime and focus purely on model-building. There is monitoring of failed nodes – how do you replace them? This can save up to 21% in model training time
- Cost Efficiency – Dynamically run multiple models on GPU-based instances using multi-model endpoints to reduce cost. Cost efficiency is so important because GPUs are at a premium – and you want to get maximum utilization of the instances that you’re using
- Responsible AI – When it comes to detecting buyers, explaining predictions, providing alerts when there is model drift, etc. – there are capabilities available today that you can take advantage of. AWS also offers ML governance, which lets you enforce a set of standard policies that apply to all of your data scientists and devs in the organization (who can build a model, who can deploy a model, etc.). There’s a model dashboard that provides you with all these key metrics
Another fantastic way to leverage Amazon, is via targeted applications that enhance productivity. These are a few popular options:
- Amazon Quicksight – New FM-powered capabilities for business users to extract insights, collaborate, and visualize data. Author, fine-tune, and add visuals to dashboards (e.g. you write in natural language to generate a sales forecast and add that goes directly to your dashboard)
- AWS HealthScribe – This is specific to the healthcare vertical, it’s a HIPAA-eligible service that’s targeted towards healthcare vendors for building clinical applications. The service can automatically generate clinical notes by analyzing patient and clinician convos. You can validate the notes that you pointed to and have it generate a summary for you. It also supports responsible AI by including references to the original patient transcripts for anything that’s generated by AI
- Amazon CodeWhisperer – This is targeted towards app developers who are writing undifferentiated code. It can automatically generate the code, allowing developers to focus on the creative aspects of coding. Whisperer integrates with all the popular IDUs including Sagemaker Studio
Learn the Fundamentals of Generative AI for Real-World Applications
Amazon has heavily leveraged Coursera to provide a comprehensive suite of offerings on the topic.
You can choose the service that you want to learn about + your role in your organization, and it gives you a bunch of resources to learn more. And should you have specific asks on a Bedrock Workshop or you want to learn about Sagemaker Jumpstart, AWS has hands-on workshops where their specialists will engage with you and help set things up. AWS is also investing a large amount of money into their generative AI innovation center, which will connect you with ML and AI experts, so that if you have an idea, they can help you transform that into a generative AI solution
Karthik is an product and engineering leader with experience in product management, product strategy, execution and launch. He manages a team of product managers, TPMs, UX designers, Business Development Managers, Solution Specialists and engineers on Amazon SageMaker. He has incubated and grown several businesses including Amazon Neptune at AWS, PowerApps, Windows Azure BizTalk Services, SQL Server at Microsoft and shipped across all layers of the technology stack – graphs, relational databases, middleware, and low-code/no-code app development.
Karthik holds a masters degree in business and an undergraduate in computer engineering.