Understanding Attachments in Messages and Threads with the OpenAI Assistant
Sep 25, 2024

Utilizing OpenAI Assistant with Attachments in Perspect

Perspect heavily relies on the OpenAI Assistant when creating products, pages, and posts if you choose to have our chatbot on your site. We store representations of this content as attachments in an Assistant, enabling visitors to interact with a Perspect-powered chatbot that provides responses based on this content, which is contained in an Assistant-level vector store.

We're also beginning to use message and thread-level attachments to make chatbot conversations more tailored to individual user interactions. Content attached to messages and threads are stored in a thread-level vector store.

However, there's limited documentation on the nuances of developing an integration that uses attachments with threads and messages. Initially, we were confused because the Assistant would only sometimes respond as if it were using the attachment. In other cases, it would answer vaguely or claim it didn't have access to uploaded documents. No amount of system prompting saying "you have access to the documents, please use those ..." would work.

We've read nearly every post on message and thread-level attachments, including those on the OpenAI community site, but none were as clear as what we'll describe here.

While calling the APIs correctly and using a clear system prompt are necessary, they're not sufficient.

The Key Point

The user's message must contain unique references to content in the attachments themselves.

Let's explain with an example. If we have an Assistant for an eCommerce store hosted on Perspect that stores schematics for automobile parts, each with its own Bill of Materials (BOM), a user could upload a BOM and ask:

"Does this attached document contain all the necessary parts for an engine for a 1992 Chevy Camaro?"

This reference to an "attached document" or an "uploaded file" is too vague for the Assistant to know which actual attachment is being referenced. The user's message needs to reference specific content in the file_ids added to the attachments key when calling messages.create().

A better question from the user would be:

"Does BOM 8675309 contain all the necessary parts for an engine for a 1992 Chevy Camaro?"

That message (or the thread where that message belongs) would, of course, need to have references to the file_ids of already-uploaded files like this:

message_data = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": templated_content
        }
    ],
}

# Add attachments only if file_ids is not None and not empty
if len(file_ids) > 0:
    message_data["attachments"] = [
        {"file_id": file_id, "tools": [{"type": "file_search"}]} for file_id in file_ids
    ]
message = await client.beta.threads.messages.create(thread_id, **message_data)

When you add an attachment to a message or a thread, OpenAI creates another temporary vector store tied to that thread. When you create a message in that thread, the OpenAI Assistant seems to query across both the vector stores attached to the assistant and the thread simultaneously, creating a union of sorts across these two vector stores. Therefore, the assistant doesn't know how to differentiate between "an attachment" or "uploaded document" between the two stores.

We also tried modifying the system prompt for that message by appending the file_id and filename, individually and together. Mentioning the file_id in the system prompt for that message had no bearing on the outcome. Mentioning the filename was initially promising, but it became clear that this could lead the Assistant astray because the vector store allows multiple files with the same filename.

However, using the filename in the system prompt's additional_instructions field together with a very specific question such as "Does BOM 8675309 ..." has led to very good results.

In case the user doesn't mention unique strings contained in the message or thread-level attachment, we have another means of trying to help the LLM, but this doesn't always work well. We include the following string in the additional_instructions:

"... if the user refers to uploaded files or attachments or documents or drafts, etc. they are referring to {file_names}."

We do all of that like this:

if filenames_and_ids:
    file_names = [item[0] for item in filenames_and_ids]
    additional_instructions += f"The user added files named {file_names} to this thread as attachments or an uploaded files --- if the user refers to uploaded files or attachments or documents or drafts, etc. they are referring to {file_names}. Demonstrate to the user that you have read the file by using key details from the file in your response to add context and depth. The user may be asking you to consider the contents of the uploaded file relative to other documents you already have. You definitely have access to these files so retry if necessary."

try:
    async for chunk in create_and_stream_run(
        request,
        site_name,
        assistant_id,
        thread_id,
        additional_instructions,
    ):
        yield chunk
    # If successful, return early
    return
except Exception as e:
    capture_exception(e)
    logger.exception("Exception during create_and_stream_run")
    # Don't return; allow retry
    pass  # Let the outer loop handle retries