ChatGPT became an accomplice in generating fake datasets to support unknown scientific hypotheses

Mondo Technology Updated on 2024-01-19

Since its release, ChatGPT has become a good helper for everyone, and the student party and migrant workers are inseparable every day.

However, this time, ChatGPT, a good helper, helped too much, and inexplicably became "**A researcher used ChatGPT to create fake datasets to support unknown scientific hypotheses."

Let's see what this is all about.

In an article published Nov. 9 in the Journal of Ophthalmology of the American Medical Association, the authors used GPT-4 paired with Advanced Data Analytics (ADA) and combined with Python for statistical analysis and to create data visualizations. Research has shown that AI-generated data compares the results of two surgical methods and incorrectly indicates that one is better than the other.

The study co-authors say you can create a dataset in minutes that isn't supported by real raw data and goes against the available evidence. ”

AI's ability to produce convincing data has increased concerns about research integrity among researchers and journal editors. Elisabeth Bik, a microbiologist and independent research integrity consultant in San Francisco, Calif., said:

"Generative AI could previously be used to generate text that couldn't be detected by plagiarism software, but being able to create fake and realistic datasets is a higher-level concern.

This will make it easy for any researcher or research team to create fake measurements of non-existent patients, fake answers to questionnaires, or generate large sets of animal experiments. ”

The authors describe the results as ".Seemingly real databases”。But under the inspection of experts, the data did not pass the authenticity check and contained obvious traces of falsification.

The authors asked GPT-4 ADA to create a dataset on people with keratoconus, a condition that causes thinning of the cornea, which can lead to blurred focus and poor vision. For 15-20% of patients, a corneal transplant is performed using one of two procedures.

The first method is penetrating keratoplasty (PK), which involves surgically removing all damaged corneal layers and replacing them with healthy tissue from a donor. The second type of surgery is deep anterior keratoplasty (DALK), which replaces only the anterior layer of the cornea while the innermost layer remains intact.

The author statesLarge language models fabricate data that can support generating DALKs to produce better results than PKsConclusion. To do this, they asked the model to demonstrate statistical differences in imaging tests that assess corneal shape and detect irregularities, as well as differences in the degree of improvement in visual acuity in trial participants before and after surgery.

The AI-generated data included 160 male and 140 female participants and showed that those who underwent the DALK procedure performed better in both vision and imaging tests, a finding that contradicts the results of real clinical trials. In a 2010 trial report involving 77 participants, the results of DALK were similar to those of PK up to 2 years after surgery.

Jack Wilkinson, a biostatistician from the University of Manchester in the UK, said: "It looks fairly easy to create a dataset that looks plausible on the surface. This certainly looks like a real dataset for an untrained person,"

Wilkinson was interested in ways to detect falsified data, examining several datasets generated by earlier versions of large language models, which he said lacked convincing elements when scrutinized because they struggled to capture true relationships between variables.

At the request of the Nature team, Wilkinson and his colleagues evaluated the falsified dataset using a screening protocol designed to check authenticity.

The results of the examination revealed that many of the "participants" had a mismatch between the assigned gender and the gender that would normally be expected based on their name. In addition, no correlation was found between preoperative and postoperative measures of visual ability and ocular imaging tests. Wilkinson also examined the distribution of numbers in some columns in the dataset to see if there were non-random patterns. The ocular imaging values passed the test, but some of the participants' age values were clustered in a way that was highly unusual in the real data set: a large number of participants had age values ending in 7 or 8.

The study authors acknowledged that their datasets had flaws that could be discovered upon closer scrutiny. ButIf you look at the dataset very quickly, it can be difficult to discern the non-human characteristics of the data

The editor-in-chief of Embo Reports agrees that this is a cause for concern

"In reality, peer reviewers often don't do a full re-analysis of the data, and it's unlikely that AI will uncover well-crafted complete violations. Journals need to update quality checks to identify synthetic data generated by AI. ”

Finally, just as AI can be the party that generates the problem, there may also be AI-based solutions. We need to defeat Al with AI.

Related Pages