OpenAI Reduces Developers’ AI Usage Costs with Flex Processing in API

Raj Kapoor
3 Min Read

OpenAI’s application programming interface (API) unveiled a new service tier for developers on Thursday. Known as Flex processing, it cuts developers’ AI usage expenses in half when compared to conventional pricing. Slower response times and sporadic resource outages are the drawbacks of the reduced rates, nevertheless. A beta version of the new API functionality is already accessible for a few large language models (LLMs) with a reasoning focus. According to the San Francisco-based AI company, non-production and low-priority jobs may benefit from this service tier.

OpenAI Adds New Service Tier in API

The AI company described this service tier in detail on their support page. For Chat Completions and Responses APIs, Flex processing is presently in beta and compatible with the o3 and o4-mini AI models. To activate the new mode, developers can specify the service tier option in their API request to Flex.

One downside of the cheaper API pricing is that the processing time will be significantly higher. OpenAI says developers opting for Flex processing should expect slower response times and occasional resource unavailability. Additionally, users may also face API request timeout issues, in case the prompt is lengthy or the request is complex. As per the AI firm, this mode can be helpful for non-production or low-priority tasks such as model evaluations, data enrichment, or asynchronous workloads.

Notably, OpenAI highlights that developers can avoid timeout errors by increasing the default timeout. By default, these APIs are set to timeout at 10 minutes. However, with Flex processing, lengthy and complex prompts can take longer than that. The company suggests increasing the timeout will reduce the chances of getting a error.

Additionally, Flex processing might sometimes lack resources to handle developers’ requests, and instead flag the “429 Resource Unavailable” error code. To manage these scenarios, developers can retry requests with exponential backoff, or switch to the default service tier if timely completion is necessary. OpenAI said it will not charge developers when they receive this error.

Currently, the o3 AI model charges $10 (roughly Rs. 854) per million input tokens and $40 (roughly Rs. 3,418) per million output tokens in the standard mode. The Flex processing brings down the input cost to $5 (roughly Rs. 427) and the output cost to $20 (roughly Rs. 1,709). Similarly, the new service tier will charge $0.55 (roughly Rs. 47) per million input tokens and $2.20 (roughly Rs. 188) per million output tokens for the o4-mini AI model, instead of $1.10 (roughly Rs. 94) for input and $4.40 (roughly Rs. 376) for output in the standard mode.

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *