Steve Jobs once called computers "the bicycle of the mind." Little is known about the context of his metaphor, however, when he talks about the efficiency of the movement of all species on Earth.
Generated by Dall·e 3**, prompt "Think of the computer as the bicycle of the mind".
The vultures won and topped the list, surpassing all other species. Humans are ...... in about a third of the listBut once humans get on their bikes, they can far surpass the vultures and reach the top of the list. It inspired me that humans are toolmakers, and we can make tools that amplify these inherent abilities to an astonishing degree. For me, computers have always been the bike of the mind, and it has allowed us to go far beyond our inherent capabilities. I think we're just in the early stages of this tool, very early stage. We've only walked a short distance, it's still in the formative stages, but we're already seeing a huge change. In my opinion, this is nothing compared to what will happen in the next 100 years.
Steve Jobs (1990).
#
Cautious optimism
The role of LLMs in accelerating software development has been widely discussed. It has been argued that the quality of the auto-generated ** is so low that using these ** produces a negative effect. While on the other hand, many claim that the era of programming is over. There have been numerous studies that have attempted to objectively assess the performance of LLMs on **quality benchmark datasets such as Humaneval or MBPP. These assessments are essential for the development of the field, but they are not the focus of this article.
The purpose of this article is to provide practical advice for developers, especially those who have reservations about leveraging these models. It should be noted thatI believe that with the proper use of this technology, a person's productivity can indeed achieve a qualitative leap
To be clear, the use of LLMs does not necessarily mean that you will produce high performance**. Even the most advanced models make serious mistakes. But it's important to remember that the best human engineers make mistakes, too. That's why we don't publish without guardrails, such as automated testing and peer review. LLMs don't change the nature of best practices, but they may change the ease with which they can be implemented.
Describe the process of delivering ** now (**generated by dall·e 3).
Is it really that important to write high-performance ** speed?
I'd be surprised if most professional software engineers spend most of their time typing on the keyboard. If this were the case, we might take the "number of rows" more seriously as a measure of performance. I think engineers actually spend more of their time deciding which ones to write first. At the moment, LLMs don't have the capacity to do this kind of advanced abstract thinking – OpenAI and many other institutions are working to make these steps possible, but this will most likely require more than just increasing the model's parameters or expanding the training data, but a whole new way of thinking. However, LLMs have been able to speed up a large part of our work to a great extent.
#
Write efficient functions
Let me give you an example of what steps we need to take to optimize LLM to make it useful once we have generated a small segment. Measuring distances between a point on the globe and another set of coordinates is a common task in geospatial analysis. Imagine such a scenario.
The first output you see is this:
import mathdef h**ersine(lat1, lon1, lat2, lon2): # radius of the earth in kilometers r = 6371.0 # convert degrees to radians lat1_rad = math.radians(lat1) lon1_rad = math.radians(lon1) lat2_rad = math.radians(lat2) lon2_rad = math.radians(lon2) # differences in coordinates dlat = lat2_rad - lat1_rad dlon = lon2_rad - lon1_rad # h**ersine formula a = math.sin(dlat / 2)**2 + math.cos(lat1_rad) *math.cos(lat2_rad) *math.sin(dlon / 2)**2 c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a)) # distance in kilometers distance = r * c return distancedef find_closest_oil_platform(vessel_lat, vessel_lon, platforms): closest_platform = none min_distance = float('inf') for platform in platforms: distance = h**ersine(vessel_lat, vessel_lon, platform['latitude'], platform['longitude']) if distance < min_distance: min_distance = distance closest_platform = platform return closest_platform, min_distance
So, what about the quality of this **?
The quality of this paragraph is not ideal. It has quite a few problems, most notably that it is difficult to read because it uses incomprehensible and inconsistent variable naming, including single-character naming in upper and lower case. In addition, it lacks documentation and type hints, uses an unnecessary and inefficient for loop, and doesn't handle special cases such as equal distances.
But after the modification:
from typing import tuple, listimport numpy as npdef h**ersine_vectorized(vessel_lat: float, vessel_lon: float, platform_lats: np.ndarray, platform_lons: np.ndarray) -np.ndarray: """ calculate the h**ersine distance between a vessel and multiple platforms in a vectorized manner. parameters: -vessel_lat: latitude of the vessel in decimal degrees. -vessel_lon: longitude of the vessel in decimal degrees. -platform_lats: numpy array of latitudes of oil platforms in decimal degrees. -platform_lons: numpy array of longitudes of oil platforms in decimal degrees. returns: -distances: numpy array of distances from the vessel to each platform in kilometers. """ # convert decimal degrees to radians lat1, lon1, lat2, lon2 = map(np.radians, [vessel_lat, vessel_lon, platform_lats, platform_lons]) # h**ersine formula dlat = lat2 - lat1 dlon = lon2 - lon1 a = np.sin(dlat/2)**2 + np.cos(lat1) *np.cos(lat2) *np.sin(dlon/2)**2 c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)) r = 6371 # radius of earth in kilometers return c * rdef find_closest_platform(vessel_lat: float, vessel_lon: float, platform_lats: np.ndarray, platform_lons: np.ndarray) -tuple[list[int], float]: """ finds the closest oil platform(s) to a vessel given arrays of platform latitudes and longitudes, handling equidistant platforms. parameters: -vessel_lat: latitude of the vessel in decimal degrees. -vessel_lon: longitude of the vessel in decimal degrees. -platform_lats: numpy array of latitudes for oil platforms. -platform_lons: numpy array of longitudes for oil platforms. returns: -a tuple containing a list of indices of the closest platforms and the distance to them in kilometers. """ # calculate distances to all platforms distances = h**ersine_vectorized(vessel_lat, vessel_lon, platform_lats, platform_lons) # find the minimum distance min_distance = np.min(distances) # find all indices with the minimum distance closest_indices = np.where(distances == min_distance)[0].tolist() # return the indices of all closest platforms and the minimum distance return closest_indices, min_distance
The improved ** has been significantly improved. It's easier to read, adds documentation and type hints, and replaces the for loop with a more efficient way of computing vectors.
However, the "goodness" and, more importantly, whether it meets the requirements depends on the specific environment in which it will run. You know, you can't effectively assess its quality with just a few lines, and that's true for humans as it is for LLMs.
For example, does the accuracy of this paragraph meet the user's expectations? Will it be run frequently? Is it once a year, or every microsecond? What are the hardware conditions used? Is the expected usage and scale worth the small optimizations? Is it cost-effective to do so after taking into account your salary?
Let's evaluate this paragraph on the basis of the above factors.
In terms of accuracy, while the h**ersine formula performs well, it is not the best choice because it treats the Earth as a perfect sphere, when in fact the Earth is closer to an oblate sphere. This difference becomes important when millimeter-accurate measurements are required over enormous distances. If this level of precision is really needed, there are more precise formulas available, such as the vincenty formula, but this comes with a performance trade-off. Because millimeter-level accuracy is not necessary for the user of this section (in fact, this accuracy is not relevant due to the error of the ship's coordinates derived from the satellite imagery), the half-sine function is a reasonable choice in terms of accuracy.
* Is it running fast enough? This calculation is very efficient, considering that only a few thousand offshore oil platforms need to be calculated, especially by vector calculation methods. But if the application becomes calculating the distance to any point on the shore (there are hundreds of millions of points on the shoreline), then a "divide and conquer" strategy may be more appropriate. In practice, this function is designed to run about 100 million times per day on a low-configuration virtual machine with the highest possible cost, given the need to save on compute costs.
Based on this detailed background information, we can consider the ** implementation above to be reasonable. This also means that it should be tested (I don't usually recommend relying solely on LLMs) and human peer review before it can be finally merged.
#
Accelerate forward
Not only does it save you time by using LLMs to automatically generate useful functions like you did before, but the value they bring exponentially increases when you start using them to generate entire libraries, handle dependencies between modules, write documentation, visualize (through multimodal capabilities), write readme files, develop command-line interfaces, and more.
Let's try to create, train, evaluate, and infer a new computer vision model from scratch with the help of LLMs. Take a recently published article, "A Critical Point Method for Identifying Ship Wake Components in Sentinel-2 Images by Deep Learning" (Del Prete et al., IEEE GRSL, 2023), which is what drives and inspires us to move forward.
*Links:
A ship and its wake shown in Sentinel-2 satellite imagery.
Why do we need to care about the direction of the ship in satellite imagery, and what are the difficulties of this task?
Identifying the direction of a vessel through static images is extremely valuable information for organizations that need to monitor human activity in the waters. For example, if a ship is heading towards a marine protected area, this may mean that vigilance or interception measures are required. Often, satellite imagery published around the globe is not at a sufficient resolution to accurately determine the orientation of a ship, especially those small vessels that occupy only a few pixels on the image (e.g., Sentinel-2 has an image resolution of 10 meters pixels). However, even small boats can leave a fairly noticeable ripple in the water, giving us a clue as to the direction of direction the vessel is heading, even if the tail of the boat is not directly identifiable.
The study is compelling because it uses a model based on EfficientNetB0, which is small enough to be able to be applied at scale without spending too much computing resources. While I didn't find a concrete implementation, the authors exposed the dataset including labeling, which is an appreciable step.
Let's start our exploration!
As with any new machine learning project, visualizing your data first is an enlightening step.
import osimport jsonfrom pil import image, imagedrawimport matplotlib.pyplot as pltimport seaborn as sns# define the path to your data directorydata_dir = "/path/to/your/data" # adjust this to the path of your data directoryannotations_dir = os.path.join(data_dir, "annotations")images_dir = os.path.join(data_dir, "imgs")# initialize seaborn for better visual aestheticssns.set(style="whitegrid", palette="muted")# create a list to hold file paths for images and their corresponding annotationsimage_files = annotation_files = # loop through the annotations directory to get the list of annotation filesfor annotation_file in os.listdir(annotations_dir): if annotation_file.endswith(".json"): annotation_path = os.path.join(annotations_dir, annotation_file) image_file = annotation_file.replace(".json", ".png") # assuming image file names match annotation file names image_path = os.path.join(images_dir, image_file) # check if the corresponding image file exists if os.path.exists(image_path): annotation_files.append(annotation_path) image_files.append(image_path)# plottingnum_examples = min(len(image_files), 10) # limiting to 10 examples for visualizationfig, axes = plt.subplots(2, 5, figsize=(20, 8))for idx, (image_path, annotation_path) in enumerate(zip(image_files[:num_examples], annotation_files[:num_examples]))# load the image img = image.open(image_path).convert("rgb") # ensure the image is treated as rgb draw = imagedraw.draw(img) # load the corresponding annotations and draw keypoints with open(annotation_path, 'r') as f: annotations = json.load(f) for point in annotations["tooltips"]: x, y = point["x"], point["y"] # draw keypoints in cyan for visibility draw.ellipse([(x-10, y-10), x+10, y+10)],fill='cyan', outline='black') # plot the image with keypoints ax = axes[idx // 5, idx % 5] ax.imshow(img) ax.axis('off') ax.set_title(f"image ")plt.tight_layout()plt.show()
Here are the results:
Sample of vessel and wake keypoint annotation (cyan) from Sentinel-2 satellite imagery.
The results show several key findings: first, the number of key points (cyan dots) marked in different ** is different; Second, the images are all grayscale and come in a variety of shapes; Third, although the sample size is small, its internal differences are considerable. With these foundations in mind, we'll write the ones that load the dataset, define the neural network, train it, and validate it. We'll provide some background information to the LLM before requesting it.
This results in the following output:
import osimport jsonimport torchfrom torch.utils.data import dataset, dataloaderfrom torchvision import transformsfrom torchvision.io import read_imagefrom typing import tuple, listclass wakedataset(dataset): """ custom dataset for loading wake detection imagery and annotations, with preprocessing for efficientnet. images are grayscale and will be converted to three-channel images for model compatibility. """ def __init__(self, annotations_dir: str, images_dir: str, transform=none): """ initializes the dataset with paths to annotations and images directories, and a transform. parameters: -annotations_dir: path to the directory containing annotation json files. -images_dir: path to the directory containing grayscale image files. -transform: a torchvision.transforms transform to be applied on a sample. """ self.annotations_dir = annotations_dir self.images_dir = images_dir self.transform = transform if transform else self.default_transform() self.annotations = self._load_annotations() def default_transform(self): """ defines the default transform to resize images, convert them to three-channel format, and to tensors. returns: -a torchvision.transforms composition for image preprocessing. """ return transforms.compose([ transforms.resize((224, 224)),# resize to the input size for efficientnet transforms.grayscale(num_output_channels=3), # convert grayscale to three-channel transforms.convertimagedtype(torch.float), # ensures image is a float tensor ])def _load_annotations(self) -list[tuple[str, list[int]]]""" loads and filters annotations with at least 2 keypoints. returns: -a list of tuples, where each tuple contains the image filename and its keypoints. """ annotations = for filename in os.listdir(self.annotations_dir): if not filename.endswith('.json'): continue filepath = os.path.join(self.annotations_dir, filename) with open(filepath, 'r') as f: data = json.load(f) if len(data['tooltips']) = 2: # filter annotations with at least 2 keypoints keypoints = [item for sublist in [[kp['x'], kp['y']] for kp in data['tooltips']] for item in sublist] annotations.append((filename.replace('.json', '.png'), keypoints)) return annotations def __len__(self) -int: """ returns the number of items in the dataset. """ return len(self.annotations) def __getitem__(self, idx) -tuple[torch.tensor, torch.tensor]: """ retrieves an item from the dataset at the specified index, with preprocessing for efficientnet. parameters: -idx: index of the item to retrieve. returns: -a tuple containing the preprocessed image tensor and its corresponding keypoints tensor. """ img_name, keypoints = self.annotations[idx] img_path = os.path.join(self.images_dir, img_name) image = read_image(img_path) # load image keypoints = torch.tensor(keypoints + 0] *6 - len(keypoints)),dtype=torch.float) # pad keypoints if self.transform: image = self.transform(image) return image, keypoints
The resulting ** looks reasonably reasonable, and seems to notice some of the details of the problem. While not perfect (e.g. choosing to convert an image from grayscale to RGB color mode in Default Transform was a strange decision, I personally prefer to use Pathlib over OS. listdir), but we can leave those details aside for now.
import torchimport torch.nn as nnfrom torchvision import modelsfrom typing import tupleclass efficientnetb0keypointdetector(nn.module): """ a custom efficientnet-b0 model for keypoint detection in grayscale images. the network modifies the input layer to accept single-channel images and adds a custom output layer for predicting a fixed number of keypoints. """ def __init__(self, num_keypoints: int = 3): """ initializes the model. parameters: -num_keypoints: the number of keypoints to predict. default is 3, corresponding to the maximum number of keypoints in the dataset. """ super(efficientnetb0keypointdetector, self)._init__(self.num_keypoints = num_keypoints # load a pre-trained efficientnet-b0 self.efficientnet_b0 = models.efficientnet_b0(pretrained=true) # modify the first convolutional layer to accept single-channel (grayscale) images original_first_conv = self.efficientnet_b0.features[0][0] self.efficientnet_b0.features[0][0] = nn.conv2d( in_channels=1, out_channels=original_first_conv.out_channels, kernel_size=original_first_conv.kernel_size, stride=original_first_conv.stride, padding=original_first_conv.padding, bias=false) # modify the classifier to predict keypoints (num_keypoints * 2 because x, y for each keypoint) self.efficientnet_b0.classifier = nn.linear( in_features=self.efficientnet_b0.classifier[1].in_features, out_features=num_keypoints * 2) def forward(self, x: torch.tensor) -torch.tensor: """ forward pass of the network. parameters: -x: input tensor of shape (batch_size, 1, 224, 224) returns: -a tensor of shape (batch_size, num_keypoints * 2) representing the predicted keypoints. """ return self.efficientnet_b0(x)# example usageif __name__ == "__main__": model = efficientnetb0keypointdetector() print(model) # example input tensor (batch_size, channels, height, width) example_input = torch.randn(1, 1, 224, 224) output = model(example_input) print(output.shape) # expected shape: (1, 6) for 3 keypoints (each with x, y coordinates)
import osimport torchimport torch.nn as nnimport torch.optim as optimfrom torch.utils.data import dataloaderfrom dataset import wakedataset, get_dataloader # assuming dataset.py is in the same directoryfrom net import efficientnetb0keypointdetectorfrom torchvision import transformsimport wandb# initialize weights & biaseswandb.init(project="wake_model_llm_assist", entity="your_wandb_entity", mode="offline")def custom_collate_fn(batch): """ custom collate function to pad keypoints to a fixed size. parameters: -batch: a list of tuples (image, keypoints). returns: -batched images and keypoints with padding as necessary. """ images, keypoints = zip(*batch) images = torch.stack(images, 0) max_keypoints = 3 * 2 # maximum number of keypoints (3) times 2 for (x, y) padded_keypoints = [torch.cat((kp, torch.zeros(max_keypoints - len(kp)))for kp in keypoints] keypoints = torch.stack(padded_keypoints, 0) return images, keypointsdef train_one_epoch(model: nn.module, dataloader: dataloader, optimizer: optim.optimizer, device: torch.device): """ trains the model for one epoch. parameters: -model: the neural network model. -dataloader: dataloader providing the training data. -optimizer: optimizer used for model training. -device: the device to train on. """ model.train() total_loss = 0.0 for images, keypoints in dataloader: images, keypoints = images.to(device), keypoints.to(device) optimizer.zero_grad() outputs = model(images) loss = nn.mseloss()(outputs, keypoints) loss.backward() optimizer.step() total_loss += loss.item() erage_loss = total_loss / len(dataloader) wandb.log()def evaluate(model: nn.module, dataloader: dataloader, device: torch.device): """ evaluates the model on the validation set. parameters: -model: the neural network model. -dataloader: dataloader providing the validation data. -device: the device to evaluate on. """ model.eval() total_loss = 0.0 with torch.no_grad():for images, keypoints in dataloader: images, keypoints = images.to(device), keypoints.to(device) outputs = model(images) loss = nn.mseloss()(outputs, keypoints) total_loss += loss.item() erage_loss = total_loss / len(dataloader) wandb.log()def main():# setup device = torch.device("cuda" if torch.cuda.is_**ailable() else "cpu") model = efficientnetb0keypointdetector().to(device) optimizer = optim.adam(model.parameters(),lr=1e-3) # dataloaders train_dataloader = get_dataloader('path/to/train/annotations', 'path/to/train/images', batch_size=4, transform=transforms.totensor(),collate_fn=custom_collate_fn) val_dataloader = get_dataloader('path/to/val/annotations', 'path/to/val/images', batch_size=4, transform=transforms.totensor(),collate_fn=custom_collate_fn) # training loop for epoch in range(num_epochs): train_one_epoch(model, train_dataloader, optimizer, device) evaluate(model, val_dataloader, device) # log model checkpoints, images, or other metrics as needed wandb.log({}if __name__ == "__main__": main()
When I started trying to run, I ran into a bunch of issues related to tensor shape mismatches (remembering the previous rgb vs. grayscale image and the problem with the custom collate function). I tried debugging for a few minutes, but eventually chose to copy all the modules into the tooltip and let it help me figure out what the problem was.
Doing so effectively resolved any pending issues, and I started the training ...... of the modelBut the training speed is surprisingly slow. That's when I realized I wasn't taking advantage of the Metal Performance Shader (MPS) backend on Apple silicon. Although I don't usually train models on PCs, and MPS is still relatively new to me, I decided to add a conditional judgment to use it.
Given the moderate amount of training data (581 shots in total**) and the fact that EfficientNet was already pre-trained on ImageNet, I decided to do 1000 cycles of training.
After 500 cycles, the train loss is still decreasing, but the validation loss seems to have converged (at least enough for a quick evaluation). Images are copied based on weights and biases.
It's worth noting that while there are some underlying models that have been pre-trained on satellite imagery (relative to ImageNet) that are likely to be more effective for this task, these networks are much larger than EfficientNet and therefore slower to train (and too new to be included in the LLM training dataset).
Now, let's write an inference script and a command-line interface (CLI) to test the model we just trained.
import torchfrom torchvision import transformsfrom pil import imageimport matplotlib.pyplot as pltimport click# assuming net.py contains the efficientnetb0keypointdetector class# and dataset.py contains the transforms used during trainingfrom net import efficientnetb0keypointdetectorfrom dataset import wakedataset # modify as necessary to import transformsdef load_model(model_path): """load the trained model from a file.""" model = efficientnetb0keypointdetector() model.load_state_dict(torch.load(model_path)) model.eval() # set the model to inference mode return modeldef preprocess_image(image_path, transform): """load and preprocess an image.""" image = image.open(image_path).convert("l") # assuming grayscale conversion as in your dataset image = transform(image) # add batch dimension (bxcxhxw) image = image.unsqueeze(0) return imagedef plot_keypoints(image, keypoints): """plot keypoints on the image.""" plt.imshow(image.squeeze(),cmap='gray') # remove batch dimension and show image plt.scatter(keypoints[:,0], keypoints[:,1], s=50, marker='.', c='red') plt.show()@click.command()@click.argument('model_path', type=click.path(exists=true))@click.argument('image_path', type=click.path(exists=true))def run_inference(model_path, image_path): """run inference on an image using a trained model.""" # use the same transforms as during training transform = transforms.compose([ transforms.resize((224, 224)),transforms.totensor(),transforms.grayscale(num_output_channels=3), model = load_model(model_path) image = preprocess_image(image_path, transform) # perform inference with torch.no_grad():keypoints = model(image) keypoints = keypoints.view(-1, 2).cpu().numpy() # reshape and convert to numpy for plotting # load original image for plotting original_image = image.open(image_path).convert("l") plot_keypoints(original_image, keypoints)if __name__ == '__main__': run_inference()
Let's get started!
It's not perfect, but it's reasonable for the first pass.
You can find the full README on GitHub with all modules, models, and weights (for cycle 500) and a README. It took me less than an hour to generate the entire library, a process that took much less time than writing this article. All of this work is done in my personal dev environment: MacBook Air M2 + VS Code + Copilot + auto-format on save (using Black, Isort, etc.) + one Python 39.6 of the virtual environment (..)venv)。
github:
Lessons learned
Provide the model with as much relevant context as possible to help it solve the task. Keep in mind that the model lacks many assumptions that you might take for granted. The LLM generated is often far from perfect, and the way it fails is challenging. Therefore, it is very helpful to have an auxiliary tool in the IDE such as Copilot. When you're heavily reliant on LLMs, remember that the speed at which you write is often the limiting factor. Avoid duplicate requests that don't require any changes, which not only wastes energy, but also slows down your progress. LLMs have a hard time "remembering" every line they output, and often need to be reminded of their current state (especially when there are dependencies that span multiple modules). Be skeptical of LLM-generated **. Validate as much as you can, using testing, visualization, etc. And invest time where it matters. I spend more time on the h**ersine function than on the neural part (because the expected scale requires more performance), and for neural networks, I'm more concerned with finding failures quickly. #
LLMs and the future of engineering
Only change is eternal.
Heraclitus.
Against the backdrop of the LLM-sparked boom and huge cash flows, it's easy to expect perfection in the first place. However, making effective use of these tools requires us to experiment, learn, and adapt.
Will LLMs change the fundamental structure of software engineering teams? Perhaps, we are now just a trail in front of the New World. But LLMs have democratized access. Even people with no programming experience can build functional prototypes quickly and easily. If you have strict requirements, it may be wiser to apply LLMs in areas that you are already familiar with. In my personal experience, LLMs can reduce the time it takes to write efficiently** by about 90%. If you find that they consistently output low-quality **, then maybe it's time to revisit your input.
Original link: