We're implementing a custom function to replace the standard route for AI conversations. This function will run repeatedly in a loop because chat interactions are inherently conversational—you send a message, the AI responds, you reply, and the cycle continues seamlessly.

Understanding context preservation is crucial here. AI models are stateless by design, meaning they don't inherently remember previous exchanges. Every time you send a new prompt, you must include the entire conversation history. Without this context, the AI can't maintain coherent dialogue or reference earlier points in your discussion.

Here's how the context flow works: You ask question one, receive answer one. When you ask question two, you must send it alongside question one AND answer one—including the AI's own previous response. This creates a comprehensive conversation thread that grows with each exchange, similar to how humans naturally reference earlier parts of a conversation.

This approach mirrors human conversation patterns. Just as you'd expect a colleague to remember what you discussed minutes earlier, the AI needs that same conversational context to provide relevant, coherent responses. Each new interaction builds upon the established dialogue foundation.

Our initial implementation will use terminal-based interaction—no HTML interface yet. Users will type questions directly in the terminal and receive responses there. This streamlined approach lets us focus on the core functionality before adding UI complexity.

Once we've established reliable terminal-based chat, we'll transition to browser implementation. This will involve creating input fields, handling form submissions, and using Jinja templates to display the conversation flow on a webpage. The complexity increases significantly at this stage, but the underlying chat logic remains the same.

Let's build this step by step. We'll create server4.py by copying from server03.py—this maintains our development progression while building new functionality.

Rather than declaring a function directly within the route, we're creating a standalone function that the route can call repeatedly. Route functions typically execute once per request, but we need repeated execution capability within a single session. This architectural change gives us the flexibility to manage ongoing conversations effectively.

The new approach replaces the entire route structure with our custom chat function. We'll comment out the existing route code rather than deleting it—keeping it as a reference point for understanding the transition from simple request-response to complex conversational patterns.


Our custom function, `chat_with_ai_model`, takes a conversation list as its parameter. This list contains the complete dialogue history, formatted as the AI expects it. Each function call processes this full context, appends the new response, and returns the updated conversation state.

The function structure maintains the familiar try-catch pattern but introduces key enhancements. The messages parameter now accepts our conversation list instead of a static prompt. We're also adding two critical parameters: temperature and max_tokens, which give us fine-grained control over AI behavior.

Token management has become increasingly important as AI applications scale. The max_tokens parameter caps response length, helping control costs and ensuring responses stay focused. Since GPT-4's 2024 pricing improvements—making tokens 33-50% cheaper—developers have more flexibility, but cost optimization remains essential for production applications.

Temperature control significantly impacts output quality and style. This parameter ranges from 0 to 1, controlling response randomness and creativity. Low temperatures (0.01-0.3) produce factual, predictable responses ideal for technical documentation or customer support. High temperatures (0.7-1.0) encourage creative, varied outputs perfect for brainstorming or creative writing tasks.

For general-purpose chatbots, a middle-ground temperature around 0.5 balances reliability with natural variation. This prevents overly robotic responses while maintaining factual accuracy and coherence. Understanding this balance is crucial for creating engaging yet trustworthy AI interactions.

The conversation structure follows OpenAI's expected format: each message contains a "role" (system, user, or assistant) and "content" (the actual message text). The system role establishes the AI's behavioral parameters, user messages contain human input, and assistant messages store AI responses. This structured approach ensures consistent, contextual dialogue.

Now we'll implement the chat loop mechanism. This runs after the standard `if __name__ == "__main__":` block, creating a terminal-based chat interface. We're deliberately avoiding web interfaces initially—this focuses our attention on core chat logic without frontend complexity.

The implementation starts by initializing the conversation with a system message that defines the AI's role and expertise scope. This foundational message shapes how the AI interprets and responds to subsequent user inputs throughout the entire conversation.


Our while loop continues indefinitely until the user explicitly exits by typing "quit" or "exit." This pattern is common in command-line applications where users need clear, simple exit commands. The boolean flag controlling the loop provides clean state management and prevents infinite execution.

User input handling includes graceful exit functionality. When users type exit commands, we don't just break the loop—we send a polite closing message to the AI, allowing it to respond appropriately. This maintains conversational courtesy and provides natural dialogue closure.

For ongoing conversations, each user message gets appended to the chat list using the proper role-content structure. The conversation list grows continuously, ensuring the AI maintains full context throughout extended discussions. This approach supports complex, multi-turn conversations that can span various topics while maintaining coherence.

The function call mechanism sends our complete chat history to the AI and captures the response. By storing the return value as `AI_response_text`, we can display it to the user and add it back to the conversation list, maintaining the bilateral dialogue structure.

Response handling requires careful attention to format consistency. The AI's response must be appended to the chat list with the "assistant" role, ensuring the next interaction includes this exchange in the conversation context. This bidirectional approach creates natural, flowing dialogue.

Terminal output displays both user inputs and AI responses in a clear, readable format. The input function automatically shows user entries, while we explicitly print AI responses with clear labeling. This creates an intuitive chat interface entirely within the command line environment.

This foundation establishes robust chat functionality that can later be adapted for web interfaces, mobile applications, or integrated into larger systems. The core conversation management logic remains consistent regardless of the user interface implementation, making this a valuable architectural pattern for AI-powered applications.

Testing this implementation reveals the power of context-aware AI conversation. Users can engage in extended dialogues, reference earlier topics, and experience natural conversational flow—all through simple terminal interaction. This proves the concept before investing in more complex interface development.