OpenAI’s application programming interface (API) unveiled a new service tier for developers on Thursday. Known as Flex processing, it cuts developers’ AI usage expenses in half when compared to conventional pricing. Slower response times and sporadic resource outages are the drawbacks of the reduced rates, nevertheless. A beta version of the new API functionality is already accessible for a few large language models (LLMs) with a reasoning focus. According to the San Francisco-based AI company, non-production and low-priority jobs may benefit from this service tier.
OpenAI Adds New Service Tier in API
The AI company described this service tier in detail on their support page. For Chat Completions and Responses APIs, Flex processing is presently in beta and compatible with the o3 and o4-mini AI models. To activate the new mode, developers can specify the service tier option in their API request to Flex.
One downside of the cheaper API pricing is that the processing time will be significantly higher. OpenAI says developers opting for Flex processing should expect slower response times and occasional resource unavailability. Additionally, users may also face API request timeout issues, in case the prompt is lengthy or the request is complex. As per the AI firm, this mode can be helpful for non-production or low-priority tasks such as model evaluations, data enrichment, or asynchronous workloads.
Notably, OpenAI emphasizes that by raising the default timeout, developers can prevent timeout issues. These APIs have a 10-minute timeout configured by default. Long and intricate prompts, however, may take longer due to Flex processing. According to the firm, extending the timeout will lessen the likelihood of receiving an error.
Furthermore, Flex processing may occasionally run out of resources to process developer requests, in which case it will raise the “429 Resource Unavailable” error code. Developers can handle these situations by retrying requests with exponential backoff or, if prompt completion is required, switching to the default service tier. According to OpenAI, developers will not be billed when they encounter this mistake.
In the normal mode, the o3 AI model now costs $10 (about Rs. 854) per million input tokens and $40 (approximately Rs. 3,418) per million output tokens. The input costs $5 (about Rs. 427) and the output costs $20 (approximately Rs. 1,709) thanks to the Flex processing. In the same way, the o4-mini AI model will cost $0.55 (about Rs. 47) per million input tokens and $2.20 (approximately Rs. 188) per million output tokens under the new service tier, as opposed to $1.10 (approximately Rs. 94) for input and $4.40 (approximately Rs. 376) for output under the existing mode.