Lightweight Session State: Using Vertex AI's Session Management Without a Full Agent Deployment
Agent Development Kit or ADK from Google is one of popular frameworks for developing AI applications. It provides a rich set of instruments for developers saving the development time and enabling the use of the industry’s best practices. One of such instruments is session management. It is used to maintain a state of the user’s session during interaction with agents. ADK provides several implementations of the session management tool to be used for development, for use with relative databases and for maintaining state using Vertex AI - a Google Cloud platform for AI applications and ML models. You can find a lot of information about session management with ADK. You can read documentation or to learn about managing state and memory. And there is more.
When building with the ADK, using the VertexAISessionService
for session management is highly attractive.
It offloads the hassle of database maintenance for session state and gives you powerful, out-of-the-box memory management capabilities.
But what if you want these benefits while hosting your agent on your own terms—for instance, in a lightweight Cloud Run service or within an existing GKE cluster?
Most documentation and tutorials assume your destination is a full Vertex AI Agent Engine deployment, requiring you to use its instance ID for the session service configuration.
This presents a frustrating dilemma: to get the managed session service you want, it seems you must adopt a hosting platform you don’t need, potentially adding unwanted cost and complexity.
How can you decouple these two powerful features?
The solution is simple and straightforward: create an Agent Engine instance and use its ID. Do not use the engine itself. Following ADK tutorial for deploying to Agent Engine creates too many artifacts that may incur additional cost. Consider using the following steps to do it by calling Agent Engine API directly.
STEP 1: Install two Python packages
pip3 install google-genai argparse
STEP 2: Download a Python script that creates a new Agent Engine instance
TEMP_DIR=$(mktemp -d)
curl -s -L -o "$TEMP_DIR/create_agent_engine.py" \
"https://raw.githubusercontent.com/GoogleCloudPlatform/devrel-demos/3132ff456ab692b220a5a9be25c858e25c78b477/ai-ml/adk_cloud_run_with_ae_memory/deployment/get_agent_engine.py"
STEP 3: Create a new instance of Agent Engine in a project PROJECT_ID, located in the region REGION with a unique name AGENT_NAME
python3 "$TEMP_DIR/create_agent_engine.py" \
--agent-name "$AGENT_NAME" \
--project-id "$PROJECT_ID" \
--location "$REGION"
The output of the last command is the instance ID.
Now you can use it together with the project ID and location with any of the tutorials about using VertexAISessionService
.
For example, the description of how to deploy the agent application to Cloud Run using Fast API can be upgraded to use VertexAISessionService for session management by updating the call to get_fast_api_app
:
import os
from google.adk.cli.fast_api import get_fast_api_app
from fastapi import FastAPI
import logging
logger = logging.getLogger(__name__)
PID='your-project-id'
REGION='your-region'
EID='your-agent-engine-instance-id'
AGENT_DIR = os.path.dirname(os.path.abspath(__file__))
AGENT_ENGINE = f'agentengine://projects/{PID}/locations/{REGION}/reasoningEngines/{EID}'
# Create FastAPI app with VertexAISessionService
app: FastAPI = get_fast_api_app(
agents_dir=AGENT_DIR,
session_service_uri=AGENT_ENGINE,
# web=True,
# trace_to_cloud=True,
)
. . .
You can uncomment last arguments if you want to have ADK Web UI and tracing to Cloud Trace.
NOTE of caution: if you use Fast API to expose your agent application, you should not use the adk
CLI to run it locally. If you do, it will use the in-memory session management.
Now that your sessions are persisted in Vertex AI. You can browse them in the Cloud console. Once you set up the session management using Vertex AI, you can define memory management using the same Agent Engine instance ID. You can see the devrel-demos Github repository for a code example.
In order to minimize latency in communication between your agent application, LLM model and other components such as session management service, it is recommended to have all components to be deployed in the same region.
It is important to mention that Agent Engine is in development and the method of using session management in Vertex AI may change. At the time of writing this post the Vertex AI has a free tier available in express mode. Please, consult with the Vertex AI pricing for Agent Engine regarding additional charges.
This post is mirrored to Medium.