Today, product management is witnessing an interesting paradigm shift. The deterministic approach to software development, which dominated the majority of the last decade, is slowly giving way to probabilistic methods, shaping how product managers define product specifications and user experiences. The emergence and adoption of Probabilistic User Interfaces (PUI) have sparked this trend, transforming the static, predefined user interfaces we are so accustomed to into dynamic, machine learning-powered conversational agents.
Back in the days at Yahoo, we incorporated probabilistic features within a deterministic infrastructure. Our task was optimizing news, sports, and finance recommendation engines serving over 1.4 billion users—all powered by machine learning models embedded within the deterministic apparatus of the website, server software, and corresponding components.
Today, software applications predominantly remain deterministic in nature, often accentuated with probabilistic modules. However, the advent and mass adoption of ChatGPT, a probabilistic language model, has turned the tables.
This technology has conditioned users to expect a different interface – an intelligent interactive agent based on probabilistic algorithms. It contrasts the deterministic model of stationary buttons or predictable software responses, introducing a level of randomness and variability inherent to machine learning algorithms instead. As ChatGPT absorbs more markets serving different use cases, a shift is apparent: users are adopting the natural language user interface at an accelerating pace.
However, there is a certain degree of unpredictability to probabilistic user interfaces, as one would suspect. The behavior of these interfaces can vary each time you utilize them. Despite implementing hyperparameters like 'temperature' and 'top_p,' a machine learning model will always 'roll the dice' to select the most suitable token.
This fundamental departure from deterministic models has significant implications for product management as it significantly alters the role of product management. Previously, product managers would meticulously hard code product specifications. Now, they must define the levers and breadth of probabilistic behavior instead. For example, specifying a greeting message was a clear-cut feature in product design. However, when you ask a model like ChatGPT to devise a greeting message, it could formulate something distinctly different each time. Thus, a product manager's role is evolving to stipulate the presence of a greeting message and then delineate evaluation metrics—for instance, the degree of friendliness or sincerity.
While AI product managers embrace this transformation, a looming challenge persists: the turbulent phenomenon known as "LLM drift." Researchers from Stanford and UC Berkeley have extensively studied shifts in the performance of GPT algorithms such as ChatGPT over time and outlined their findings in the paper "How Is ChatGPT’s Behavior Changing over Time?" (https://arxiv.org/pdf/2307.09009.pdf )
The researchers' in-depth analysis revealed an unexpected and substantial shift in the performance of GPT algorithms within mere months, a phenomenon they termed "LLM drift." This dynamic nature was clearly demonstrated when both GPT-3.5 and GPT-4, two extensively used LLM services, demonstrated volatile performances on various tasks over time. For instance, GPT-4 amazingly identified prime numbers with a 97.6% accuracy in March 2023, but this plummeted to a meager 2.4% by June 2023. These findings underscore the importance of consistent, ongoing monitoring and evaluation of LLM service behavior due to their mercurial nature.
This indicates a substantial, often disconcerting, alteration in the algorithm's performance over a relatively short span. Consider a scenario where a product manager evaluates an SaaS LLM solution and proceeds to define product specifications. Nevertheless, by the time the engineering team operationalizes the product, an LLM drift could potentially render the feature ineffective, or worse still, hazardously unpredictable.
Therefore, deploying AI-powered, probabilistic user interfaces necessitates additional risk analysis. Also, a robust monitoring mechanism for LLM drift is critical to mitigate the risks associated with shifts in the behavior of LLM services. This involves regular and systematic evaluation of LLM performance on diverse tasks over time, analysis of the deviations, and lever management according to the observed drifts. Continuous assessment of model's accuracy or task-specific performance, alongside a keen observation of any changes in user interface patterns, play a crucial role in this monitoring process.
These represent a fundamental transformation in product management, shifting from a deterministic ethos to embracing the probabilistic world of machine learning. This uniquely positions product managers as shepherds of an evolving technological landscape, fostering probabilistic user experiences.
If you want to learn more about AI and Generative AI for product management, please visit our website to join one of our classes at https://aiproductinstitute.com.