<aside> 🔨 This documentation is under construction, so there are many missing components. If you need something that is missing here, please contact us.

</aside>

Chat Completions API

RunLLM’s streaming chat completions API is compatible with the OpenAI standard. You can use any server-side-event (SEE) framework to the responses and render the stream.

Sending Requests

To use the streaming API, you will need:

your assistant ID, which is the number in the URL for your assistant (https://app.runllm.com/assistant/NNN)
your API Key, which can be found under your account in the bottom left corner of the admin console.

You can now send requests to the streaming endpoint. The following example uses Microsoft's fetch-event-source framework, but this is not a requirement:

fetchEventSource(
    `https://api.runllm.com/api/pipeline/${pipeline_id}/chat`,
    {
        method: "POST",
        headers: {
            cors: "no-cors",
            "Access-Control-Allow-Origin": "*",
            Accept: "*",
            "Content-Type": "application/json",
            "x-api-key": ${api_key},
        },
        body: JSON.stringify({
            // The message from the user.
            message: string,

						// Optional
						// Only set this if the message is using previous questions and answers as 
            // historical context for this exchange. Session IDs are allocated by the API
            // and returned on the streaming response chunks.
						// Eg. 
            // Request: {message: "Tell me about <x>"} -> Response Chunk Example: {content="...Would you like more details on <y>?", session_id=123}
            // Request: {message: "yes!", session_id=123}
            session_id: "int (optional, default: undefined)",
        }),
        ...
    }
)

Processing Responses

Responses are streamed back as SSE messages. The content is serialized as an object with the following structure:

{
    // chunk_type is an enum that allows you to track the progress
    // of response. This helps you render progress bar or messages
    // while the user is waiting for the first token.
    chunk_type: ChunkType;
    // The ID associated with the current individual chat message.
    chat_id: number;
    // The ID associated with the current chat session.
    session_id: number;
    // The content of the message. Only set for GenerationInProgress 
    // chunk type. This behavior allows you to concatenate all chunk 
    // contents to obtain the full message, without the need to look 
    // into the chunk types.
    content: string;
}

enum ChunkType {
  Retrieval = "retrieval",
  Classification = "classification"
  GenerationStarts = "generation_starts",
  GenerationInProgress = "generation_in_progress",
}

For the generated responses, it follows standard markdown format.

Putting it together, a typical event flow of the chat events should look like this:

data: '{"chunk_type": "retrieval", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "classification", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "generation_starts", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "# A markdown response"}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "## First section"}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "Start responding with `code`"}'
data: '{"chunk_type": "sources", "chat_id": 1337, "session_id": 42, "content" : "..."}'
data: '{"chunk_type": "explanation", "chat_id": 1337, "session_id": 42, "content": "..."}'
...

Handle Sources

After generating all responses, the server may sends two additional chunks sources and explanation if any citation is included in the response. They provide additional data to help giving more information about the citations. We are in-progress of formalizing the formats of two chunks so that it’s easier to external clients to use. For now, it takes a bit effort to use these two chunks in your codes.

Example chunk contents (we omitted the json struct wrapping that includes chunk_type, chat_id , and session_id ):

// sources chunk, a multi-line mrkdwn text
[first item](docs.mysite/first)
[second item](docs.myotherside/second)
Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. // this line is included if no good sources found

// explanations chunk, a json object
{
	"docs.mysite/first": "this is relevant because...",
	"docs.myothersite/second": "this is related to the question because..."
}