Customizing OpenAI with your Data (1 / 2)

When we ask to OpenAI a question that requires some specific context, example:

Who is my father?

OpenAI can either hallucinate or answer that he doesn't know.

To help OpenAI answer specific questions, you can provide extra contextual information in the prompt itself.

My father lives in Barcelona and is 50 year's old.

If we ask again to openAI the same question, OpenAI will answer including the prompt provided with contextual information.

Embeddings

OpenAI provides a capability known as text embeddings to measure the relatedness of text strings.

For every block of text, chapter, or subject we can send that information to OpenAI's Embedding service to receive back its embedding data (i.e. a vector list of floating-point numbers). Example of request:

TsgcHTTP_API_OpenAI._CreateEmbeddings('text-embedding-ada-002', 'My father lives in Barcelona and is 50 year's old.');

And the response from openAI will be something like this:

{
"data": [
{
"embedding": [
-0.006929283495992422,
-0.005336422007530928,
...
-4.547132266452536e-05,
-0.024047505110502243
],
"index": 0,
"object": "embedding"
}
],
}

Once we have collected the special data that represents the different pieces of information we want our chatbot to understand, we need to save it in a safe place (like a vector database). Remember, we only do this step one time. We get this special data for the information once, and only if the information changes, we will update it.

Finally, when we want to ask a question to the chatbot, first we convert the query to a vector and with the result we search into the previously created database which vector is the most similar to our query, once found, we add the prompt of the most similar vector to the question as an embedding.

Simple Example

Let's create a simple example to use embeddings and sgcWebSockets library. First we will describe our family and calculate the vector for every one.

oOpenAI := TsgcHTTP_API_OpenAI.Create(nil);
oOpenAI.OpenAIOptions.ApiKey := '<your api key>';
oOpenAI._CreateEmbeddings('text-embedding-ada-002', 'My father lives in Barcelona and is 50 year''s old.');
oOpenAI._CreateEmbeddings('text-embedding-ada-002', 'My mather lives in Berlin and is 47 year''s old.');
oOpenAI._CreateEmbeddings('text-embedding-ada-002', 'My sister lives in Seoul and is 28 year''s old.');
 

The previous results can be stored into a table where every row is an embedding with the prompt and the vector data. 

Prompt Vector
My father lives in Barcelona and is 50 year's old. [0.000742552,-0.0049907574...]
My mather lives in Berlin and is 47 year's old.[-0.027452856,-0.0023051118...]
My sister lives in Seoul and is 28 year's old.[-0.007873567,-0.014787777...]

Now that we've stored our vectors, we will convert the question we will send to chatgpt into a vector 

oOpenAI := TsgcHTTP_API_OpenAI.Create(nil);
oOpenAI.OpenAIOptions.ApiKey := '<your api key>';
vVectorQuery := oOpenAI._CreateEmbeddings('text-embedding-ada-002', ''Who is my father?''); 

Then we will search this vector into the database to detect which one is the most similar with the question. Find below a pseudo-code example:

// search the most similar vector using Cosine Similarity
vMostSimilarVector := 0;
Database.First;
While not Database.EOF do
begin
  vCosineSimilarity := VectorCosineSimilarity(vVectorQuery, Database.FieldByName('Vector'));
  if vConsineSimilarity > vMostSimilarVector then
  begin
    vMostSimilarVector := vCosineSimilarity;
    vContext := Database.FieldByName('Prompt');
  end;
  Database.Next;
end;

 

Finally, we ask to chatgpt adding the embedding found as a contextual information. 

vQuestion := 'Who is my father?';
ChatBot := TsgcAIOpenAIChatBot.Create(nil);
ChatBot.OpenAIOptions.ApiKey := '<your api key>';
ShowMessage(ChatBot.ChatAsUser('Answer the question based on the context below.\n\nContext:\n' + vContext + '\nQuestion:' + vQuestion + '\nAnswer:')); 
×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.

Customizing OpenAI with your Data (2 / 2)
sgcWebSockets 2023.5

Related Posts