Cover Image for Google introduces 'implicit caching' to reduce costs in accessing its new artificial intelligence models.
Fri May 09 2025

Google introduces 'implicit caching' to reduce costs in accessing its new artificial intelligence models.

Google has introduced a new feature in its Gemini API called implicit cache, which, according to the company, will make it easier for external developers to use its artificial intelligence models at a reduced cost.

Google has started to implement a new feature in its Gemini API that, according to the company, will allow third-party developers to utilize more cost-effective artificial intelligence models. This feature, called "implicit caching," promises to offer users a 75% savings on handling "repetitive contexts" sent to models via the Gemini API. It is available for the Gemini 2.5 Pro and 2.5 Flash models, which is certainly good news for developers, given the rising costs of using cutting-edge models.

The caching functionality, a common practice in the artificial intelligence industry, reuses pre-computed or frequently accessed data, reducing computational needs and costs. For example, responses to recurring questions can be stored, thereby eliminating the need for the model to regenerate answers to the same requests. Previously, Google had offered model caching that required developers to define their most commonly used prompts, but this process was tedious and prone to errors, leading some developers to complain about unexpectedly high bills.

In response to recent complaints, the Gemini team apologized and committed to making changes to the caching implementation. Unlike explicit caching, which requires manual intervention, the new implicit caching system is automatically enabled for the 2.5 models. Google detailed that when a request is sent to one of these models, if this request shares a common prefix with previous requests, there is a possibility of accessing the caching system. Additionally, the minimum token requirements to activate implicit caching have been reduced to 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, making access to these automatic savings easier.

However, the implementation of the new system still presents areas of caution. Google advises developers to keep repetitive context at the beginning of the requests to increase the chances of accessing the cache, suggesting that varying context should be added at the end. Despite the promises of automatic savings, the company has not provided independent verification that this new implicit caching system achieves the expected economies, so evaluations from early users will need to be awaited.