AI product managers utilize natural language processing systems, such as ChatGPT, in various ways to deliver value to their customers and enhance their efficiency. In my Generative AI Product Management workshop and training, I assign my students the task of evaluating the performance of Generative AI SaaS solutions using zero-shot, one-shot, and few-shot techniques, and challenge them to identify a solution for their product during the workshop. However, pinpointing the right prompt for the solution is akin to searching for a needle in a haystack. In the real world, people often become impatient and jump straight to fine-tuning, which is an even more complex process. Therefore, before delving into fine-tuning, I urge my students to familiarize themselves with prompting best practices. More often than not, these best practices prove to be the game-changer, eliminating the need to engage in fine-tuning.
The main issues that arise when users interact with ChatGPT using writing prompts are listed below.
1) Ambiguity in prompts leads to subpar responses. ChatGPT is not capable of discerning the underlying intent if the prompt is vague or lacks specific details. Consequently, the response might not align with user expectations. For instance, if you ask ChatGPT to generate a summary, without specifying the desired length, it might create one that's either too short or too extensive.
2) There is a prevalent issue of ChatGPT inventing inaccurate or entirely fabricated information, especially concerning esoteric topics. Users seeking precise data might receive responses that seem legitimate but are in fact fictitious. This misinformation can be misleading and have detrimental effects if acted upon.
3) When confronted with complex tasks, ChatGPT often fails to accurately comprehend and execute them. It tends to have higher error rates with convoluted requests compared to simple ones. This can be particularly problematic in professional settings where complex queries are frequent, and high accuracy is crucial.
4) ChatGPT may rush to conclusions without sufficient reasoning, similar to a person who answers too quickly without thinking through the problem. This haste can result in errors or oversimplified answers that lack depth and critical analysis, rendering the output less valuable for professional or academic purposes.
5) ChatGPT is heavily dependent on the way the prompts are structured. Even slight changes in phrasing can lead to drastically different responses. This makes it difficult for users without extensive knowledge of ChatGPT’s intricacies to effectively communicate their queries.
I provided below the six strategies and 17 tactics that the OpenAI team put together to overcome these issues. You can access the complete documentation with examples at https://platform.openai.com/docs/guides/gpt-best-practices/six-strategies-for-getting-better-results . You can also download a PDF copy for these strategies and tactics from my LinkedIn post at https://www.linkedin.com/posts/adnanboz_gpt-best-practices-from-openai-activity-7071932106330832896-Eotr
Six strategies for getting better results
Write clear instructions
GPTs can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less GPTs have to guess at what you want, the more likely you’ll get it.
Include details in your query to get more relevant answers
Ask the model to adopt a persona
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Specify the desired length of the output
Provide reference text
GPTs can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.
Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
Split complex tasks into simpler subtasks
Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively
Give GPTs time to "think"
If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a chain of reasoning before an answer can help GPTs reason their way toward correct answers more reliably.
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning process
Ask the model if it missed anything on previous passes
Use external tools
Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents. A code execution engine can help GPTs do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.
Use embeddings-based search to implement efficient knowledge retrieval
Use code execution to perform more accurate calculations or call external APIs
Test changes systematically
Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").
Evaluate model outputs with reference to gold-standard answers
If you are looking to practice product management for a product that utilizes generative AI, then check out what the students are saying about the Generative AI for Product and Business Innovation workshop and training program.