API

Our API documentation will change soon

You can find example code for our most common APIs on this page, including the chat API, the feedback API, and the chat history API. We will be releasing a more comprehensive API guide in the coming weeks.

Chat API

RunLLM's streaming chat completions API is compatible with the OpenAI standard. You can use any server-side-event (SSE) framework to process the responses and render the stream.

Sending Requests

To use the streaming API, you will need:

your assistant ID, which is the number in the URL for your assistant (https://app.runllm.com/assistant/<ID>)
your API Key, which can be found under your account in the bottom left corner of the admin console.

You can now send requests to the streaming endpoint. The following example uses Microsoft's fetch-event-source framework, but this is not a requirement:

fetchEventSource(
    `https://api.runllm.com/api/pipeline/${pipeline_id}/chat`,
    {
        method: "POST",
        headers: {
            cors: "no-cors",
            "Access-Control-Allow-Origin": "*",
            Accept: "*",
            "Content-Type": "application/json",
            "x-api-key": ${api_key},
        },
        body: JSON.stringify({
            // The message from the user.
            message: string,

            // Optional
            // Only set this if the message is using previous questions and answers as 
            // historical context for this exchange. Session IDs are allocated by the API
            // and returned on the streaming response chunks.
            // Eg. 
            // Request: {message: "Tell me about &lt;x&gt;"} -&gt; Response Chunk Example: {content="...Would you like more details on &lt;y&gt;?", session_id=123}
            // Request: {message: "yes!", session_id=123}
            session_id: "int (optional, default: undefined)",
        }),
        ...
    }
)

Processing Responses

Responses are streamed back as SSE messages. The content is serialized as an object with the following structure:

{
    // chunk_type is an enum that allows you to track the progress
    // of response. This helps you render progress bar or messages
    // while the user is waiting for the first token.
    chunk_type: ChunkType;
    // The ID associated with the current individual chat message.
    chat_id: number;
    // The ID associated with the current chat session.
    session_id: number;
    // The content of the message. Only set for GenerationInProgress 
    // chunk type. This behavior allows you to concatenate all chunk 
    // contents to obtain the full message, without the need to look 
    // into the chunk types.
    content: string;
}

enum ChunkType {
  Retrieval = "retrieval",
  Classification = "classification"
  GenerationStarts = "generation_starts",
  GenerationInProgress = "generation_in_progress",
}

For the generated responses, it follows standard markdown format.

Putting it together, a typical event flow of the chat events should look like this:

data: '{"chunk_type": "retrieval", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "classification", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "generation_starts", "chat_id": 1337, "session_id": 42, "content" : ""}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "# A markdown response"}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "## First section"}'
data: '{"chunk_type": "generation_in_progress", "chat_id": 1337, "session_id": 42, "content" : "Start responding with `code`"}'
data: '{"chunk_type": "sources", "chat_id": 1337, "session_id": 42, "content" : "..."}'
data: '{"chunk_type": "explanation", "chat_id": 1337, "session_id": 42, "content": "..."}'
...

Handling Sources

After generating all responses, the server may send two additional chunks sources and explanation if any citation is included in the response. They provide additional data to help provide more information about the citations. We are in the process of formalizing the formats of these two chunks so that it's easier for external clients to use. For now, it takes some effort to use these two chunks in your code.

Example chunk contents (we omitted the JSON struct wrapping that includes chunk_type, chat_id, and session_id):

// sources chunk, a multi-line mrkdwn text
[first item](docs.mysite/first)
[second item](docs.myotherside/second)
Note: I wasn't able to find highly relevant data sources, but above are a few potentially relevant links. // this line is included if no good sources found

// explanations chunk, a json object
{
    "docs.mysite/first": "this is relevant because...",
    "docs.myothersite/second": "this is related to the question because..."
}

Example code handling the chunks to render customized source information

const sources = source_chunk.content // content section of source chunk
const explanations = JSON.parse(explanation_chunk.content) // content section of explanation chunk

const parseSource = (source: string) => { // parse source chunk line using mrkdwn format [name](url)
  const match = source.match(/\[(.*)\]\((.*)\)/);
  return match ? { title: match[1], url: match[2] } : null;
};

// 'sources' chunk may attach a message in the end specifying if the provided sources are highly relevant.
const UNCERTAIN_SOURCES = "Note: I wasn't able to find highly relevant data sources, but above are a few potentially relevant links."

{sources.split("\n").map((source, index) => {
  if (source === UNCERTAIN_SOURCES) {
    return (
      <Text>
        {source}
      </Text>
    );
  }
    
    // parse sources one-by-one
  const parsedSource = parseSource(source);
  // find the url in explanations
  const sourceKey = Object.keys(explanation).find(
    (key) => parsedSource && parsedSource.url.includes(key),
  );

  return (
    parsedSource && (
      <div key={index}>
        <Text>
          {parsedSource.title}
        </Text>
        {showDetails && sourceKey && ( // show explanations
            <span style={{ fontWeight: 700 }}>Why this?</span>
            &nbsp;{explanation[sourceKey]}
        )}
      </div>
    )
  );
})}

Full Example

This is a full code example that handles events from the streaming API using fetch-event-source framework:

// Bind this to the button that sends the message.
onClickSendMessage = () => {handleChat(pipeline_id, api_key, message)}

export const handleChat = async (
  pipeline_id: number,
  api_key: string,
  message: string,
  // show progress bar or render markdown content
  handleChunk: (chunk: ChunkType) => void,
  // show error messages in your client
  handleError: (msg: string) => void,
) => {
  fetchEventSource(
    `https://api.runllm.com/api/pipeline/${pipeline_id}/chat`,
    {
      method: "POST",
      headers: {
        cors: "no-cors",
        "Access-Control-Allow-Origin": "*",
        Accept: "*",
        "Content-Type": "application/json",
        "x-api-key": api_key,
      },
      body: JSON.stringify({
        // The message from the user.
        message: message,
        // This tracks the source of the message on the admin console.
        // "web" indicates it is coming from your documentation site.
        source: "web",
        // Session ID if the message is a follow-up in an existing session.
        // For the first message, leave the session_id as `undefined`.
        // The API will provide a session ID in the response to the first
        // message.
        session_id: session_id,
      }),

      // Ensures the connection is alive when browser tab is inactive.
      openWhenHidden: true,
      onopen: async (resp) => {
        if (resp.ok) {
          return;
        }

        if (resp.status === 403) {
          throw new Error(
            "Authentication failed. Please ensure your credential is valid.",
          );
        }

        throw new Error((await resp.json()).get("message"));
      },
      onmessage: (msg) => {
        const chunk = JSON.parse(msg.data) as Chunk;
        handleChunk(chunk);
      },
      onerror: (err) => {
        let msg = err.message;
        if (msg.includes("Failed to fetch")) {
          msg =
            "Failed to connect to server. Please ensure your server URL is correct.";
        }
        handleError(msg);
        throw err;
      },
    }
  )
}

Submitting Feedback

If you want to submit feedback (e.g., thumbs up/down) for a chat response, you can use the api/chat/<chat_id>/feedback route to do so. The chat_id can be obtained from the chat responses.

fetch(
    `https://api.runllm.com/api/chat/${chat_id}/feedback`,
    {
        method: "POST",
        headers: {
            cors: "no-cors",
            "Access-Control-Allow-Origin": "*",
            Accept: "*",
            "Content-Type": "application/json",
            "x-api-key": api_key,
        },
        body: {
            // action can be 'upvote', 'downvote', or any emoji.
            // We aggregate the 'upvote' and 'downvote' counts in the admin UI.
            action: "upvote",
            // Specifies where the feedback comes from.
            // "web" indicates it is coming from your documentation site.
            feedback_key: "web",
        }
    }
)

Retrieving Chat History

You can use https://api.runllm.com/api/session/${session_id} to obtain the entire chat history for your assistant.

Chat API​

Sending Requests​

Processing Responses​

Handling Sources​

Full Example​

Submitting Feedback​

Retrieving Chat History​