Vertexai - minherz: another techno-blog

Lightweight Session State: Using Vertex AI's Session Management Without a Full Agent Deployment

Agent Development Kit or ADK from Google is one of popular frameworks for developing AI applications. It provides a rich set of instruments for developers saving the development time and enabling the use of the industry’s best practices. One of such instruments is session management. It is used to maintain a state of the user’s session during interaction with agents. ADK provides several implementations of the session management tool to be used for development, for use with relative databases and for maintaining state using Vertex AI - a Google Cloud platform for AI applications and ML models. You can find a lot of information about session management with ADK. You can read documentation or to learn about managing state and memory. And there is more.

If you did not notice, the Generative AI part of Vertex AI SDK is now deprecated. It means that new versions of this SDK will not update generative AI functions and these functions will be completely removed from SDK versions in 2026. You can find more info about it in the deprecation notice.

In 2024, the Generative AI module was introduced to the Vertex AI SDK. The way it was published for different programming languages introduced quite a confusion. For example, in Python a developer had to install the google-cloud-aiplatform package and then to import vertexai while in Go a name of the installed module was cloud.google.com/go/vertexai and the import statement had to import "cloud.google.com/go/vertexai/genai". In 2025, Google released a new GenAI SDK that was called to replace the collection of VertexAI SDKs for different languages. The new SDK has a more intuitive interface that is similar across different programming languages.

Control your Generative AI costs with the Vertex API’s context caching

Note: This blog has two authors.

What is context caching?

Vertex AI is a Google Cloud machine learning (ML) platform that, among other things, provides access to a collection of generative AI models. This includes the models known under the common name “Gemini models”. When you interact with these models you provide it with all the information about your inquiry. The Gemini models accept information in multiple formats including text, video and audio. The provided information is often referred to as “context”. The Gemini models are known to accept very long contexts.