An introduction to PromptIDE

Mondo Entertainment Updated on 2024-01-28

The following is a translation from XAI's website aboutPromptide

xai promptideis an integrated development environment for prompt engineering and interpretive research. It accelerates prompt engineering with an SDK that allows for the implementation of sophisticated prompting techniques and rich analytics to visualize the output of the network. We use it a lot as we continue to develop grok.

We developed Promptide to transparently provide transparent access to Grok-1, the model that drives Grok, to engineers and researchers in the community. The IDE is designed to empower users and help them explore the capabilities of our large language models (LLMS) at a faster pace。At the heart of the IDE is a Python editor that, combined with a new SDK, allows for the implementation of complex prompting techniques. When performing prompts in the IDE, users can see useful analytics on tokenization, sampling probability, alternate tokens, and aggregate attention masks, among other things.

The IDE also provides some useful features. It automatically saves all prompts and has built-in version control. Analyses generated by running prompts can be stored permanently, allowing users to compare the output of different prompting techniques. Finally, users can upload small files, such as CSV files, and read them using a single Python function in the SDK. When combined with the concurrency capabilities of the SDK, even fairly large files can be processed quickly.

We also want to build a Promptide community. Any prompt can be shared publicly at the click of a button. Users can decide if they want to share only a single version of a prompt or an entire tree. When you share a prompt, you can also include any stored analytics.

Promptide is currently only available to our Early Access Program members. Below, you will find a demo of the main features of the IDE.

Thanks, XAI team.

Editor and SDK

At its core, Promptide is an editor and a Python SDK. The SDK provides a new programming paradigm that allows for the elegant implementation of complex prompting techniques. All python functions are executed in an implicit context, which is a sequence of tokens. You can use the prompt() function to manually add a token to the context, or you can use our model to generate a token based on the context using the sample() function. When sampling from the model, you can make various configurations by passing configuration options as parameters to the function:

async def sample( self, max_len: int = 256, temperature: float = 1.0, nucleus_p: float = 0.7, stop_tokens: optional[list[str]] = none, stop_strings: optional[list[str]] = none, rng_seed: optional[int] = none, add_to_context: bool = true, return_attention: bool = false, allowed_tokens: optional[sequence[union[int, str]]]= none, disallowed_tokens: optional[sequence[union[int, str]]]= none, augment_tokens: bool = true,) sampleresult: """generates a model response based on the current prompt. the current prompt consists of all text that has been added to the prompt either since the beginning of the program or since the last call to `clear_prompt`. args: max_len: maximum number of tokens to generate. temperature: temperature of the final softmax operation. the lower the temperature, the lower the variance of the token distribution. in the limit, the distribution collapses onto the single token with the highest probability. nucleus_p: threshold of the top-p sampling technique: we rank all tokens by their probability and then only actually sample from the set of tokens that ranks in the top-p percentile of the distribution. stop_tokens: a list of strings, each of which will be mapped independently to a single token. if a string does not map cleanly to one token, it will be silently ignored. if the network samples one of these tokens, sampling is stopped and the stop token *is not* included in the response. stop_strings: a list of strings. if any of these strings occurs in the network output, sampling is stopped but the string that triggered the stop *will be* included in the response. note that the response may be longer than the stop string. for example, if the stop string is "hel" and the network predicts the single-token response "hello", sampling will be stopped but the response will still read "hello". rng_seed: see of the random number generator used to sample from the model outputs. add_to_context: if true, the generated tokens will be added to the context. return_attention: if true, returns the attention mask. note that this can significantly increase the response size for long sequences. allowed_tokens: if set, only these tokens can be sampled. invalid input tokens are ignored. only one of `allowed_tokens` and `disallowed_tokens` must be set. disallowed_tokens: if set, these tokens cannot be sampled. invalid input tokens are ignored. only one of `allowed_tokens` and `disallowed_tokens` must be set. augment_tokens: if true, strings passed to `stop_tokens`, allowed_tokens` and `disallowed_tokens` will be augmented to include both the passed token and the version with leading whitespace. this is useful because most words h**e two corresponding vocabulary entries: one with leading whitespace and one without. returns: the generated text. """
The above ** runs in the execution of the python interpreter, and the interpreter can be on the web worker alone. Running multiple web workers at the same time means you can execute many prompt projects in parallel.

ConcurrencyThe SDK uses Python coroutines, which allow multiple Python functions to be processed simultaneously with @prompt fn annotations. This can significantly speed up the completion time, especially when working with CSV files.

User inputWith the User Input() function, a prompt can be made interactive via a text box in the UI. The User Input() function blocks execution until the user enters a string in the text box in the UI. The user input() function returns a string entered by the user, which can be added to the context via the prompt() function, for example. Using these APIs, it is possible to implement a chatbot with only four lines:

await prompt(preamble)while text := await user_input("write a message"): await prompt(f"\\human: \assistant:") await sample(max_len=1024, stop_tokens=[""], return_attention=true)
filesDevelopers can upload small files to the Promptide (up to 5 MIB per file.) up to 50 mib in total) and use them in prompts. The read file() function returns any uploaded file as a byte array. When combined with the concurrency features mentioned above, this can be used to implement batch processing prompts for prompt technical evaluation of various issues. The screenshot below shows a tip for calculating the MMLU score.

AnalyticsWhen prompting is executed, the user is presented with detailed instructions to help them better understand the model's output. The completion window displays the precise tokenization of the context, along with a numeric identifier for each token. When clicking on a token, the user can also see the top-p threshold and the top K tokens when the token is applied.

When using the user input() function, a text box appears in the window and the user can enter its response when the prompt runs. The following screenshot shows the results of executing the above chatbot** fragment.

Finally, the context can also be rendered in markdown to improve readability, when the token visualization feature is not needed.

Related Pages