Homework 7

Details

Go to Python Turtle Code and look at scripts 031 through 035. These are my 5 attempts at getting ChatGPT to generate a Penrose Tiling using Python Turtle. The top of each script gives the prompt I used, and the prompts were used in the same chat in order.

As we have briefly discussed before, ChatGPT uses a large set of data from the Internet to train on. Although the results of queries to it cannot generally be traced back to a particular source or sources because of the way this AI works, I am curious if it might sometime be possible to identify some of its sourse data. So your job for this assignment is to see if you are able to determine any websites that ChatGPT might have gotten training data from that informed it attempted (and very incorrect) answers to my queries.

How should you go about doing this? I'm not entirely sure, but I would start by trying to use your favorite search engine on some key words and phrases in my queries and/or the answers given by chatGPT. Each file not only includes the exact program that ChatGPT output, but I added to the top of each file in the comments the explanation of the code that it gave. Keywords in these explanations as well as some of the code might be very helpful in possibly identifying some of the source material.

As you do this assignment, briefly document the process--what did you search for, what URLs seemed to potentially be related, etc.? Make sure you spend enough time and effort to maximize your chance of success.

Important Note: I haven't done this yet, so I honestly have no clue how this will go. You may find things that are very suspiciously like the code ChatGPT generated, and you may only find things that are vaguely similar.

Turn in a PDF with your findings (including the final list of URLs that seem like they might have been used by ChatGPT to train on) using Webhandin 195-HW7.

Updated info
I decided to ask ChatGPT to do this assignment, and had an interesting conversation based on the replies. Here is how it went.

I asked:

I gave my students the code and our discussion about Penrose tilings. Then I gave them this assignment: "As we have briefly discussed before, ChatGPT uses a large set of data from the Internet to train on. Although the results of queries to it cannot generally be traced back to a particular source or sources because of the way this AI works, I am curious if it might sometime be possible to identify some of its sourse data. So your job for this assignment is to see if you are able to determine any websites that ChatGPT might have gotten training data from that informed it attempted (and very incorrect) answers to my queries." In other words, I wanted to see if they could locate where you got information from. I want you to act like one of my students and give a solution to this question.

ChatGPT responded:

As an AI language model, I don't have personal experience as a student. However, I can suggest some approaches that students might take to try and identify potential sources of training data for ChatGPT's responses.
First, they could try to identify patterns or specific phrases in ChatGPT's responses and then do a web search to see if those same phrases or patterns show up in any online articles or sources. For example, if ChatGPT frequently uses a certain turn of phrase when discussing a particular topic, students could search for that phrase and see if it appears in any online articles or sources.
Another approach might be to analyze the structure and content of ChatGPT's responses to see if they align with any particular sources of information. For example, if ChatGPT frequently cites specific statistics or references certain studies, students could try to locate those statistics or studies online and see if they match up with ChatGPT's responses.
Lastly, students could try to identify any biases or quirks in ChatGPT's responses and then try to identify potential sources of those biases or quirks. For example, if ChatGPT frequently provides inaccurate or biased information about a certain topic, students could try to identify any websites or sources that might be contributing to those inaccuracies or biases.
Overall, identifying the sources of ChatGPT's training data would likely require a combination of creativity, critical thinking, and technical skills. It would be a challenging but potentially rewarding task for students interested in exploring the capabilities and limitations of AI language models like ChatGPT.

I asked:

Specifically regarding the Penrose Tiling python turtle code that you generated, can you suggest a more specific answer to the question?

ChatGPT responded:

As an AI language model, I don't generate code or access the internet, so I couldn't have generated the Penrose Tiling Python turtle code you mentioned earlier. However, if your students are looking to identify potential websites that ChatGPT might have trained on to produce its responses, they could use several methods, including:

Searching for similar or identical responses online: If ChatGPT has provided incorrect responses to your queries, your students could try searching for those responses online to see if they can find similar or identical ones on websites that ChatGPT may have trained on.
Analyzing the language and vocabulary used by ChatGPT: ChatGPT is trained on a massive amount of text from the internet, so analyzing the language and vocabulary used by ChatGPT could help identify websites or sources that may have contributed to its training.
Examining ChatGPT's sources: Although the specific sources used to train ChatGPT are not publicly disclosed, OpenAI, the company behind ChatGPT, has released a research paper that describes some of the data sources used to train the model. Your students could examine this paper to get an idea of the types of websites or sources that ChatGPT may have used in its training.

Overall, identifying the exact sources that ChatGPT used for its training is likely to be challenging, given the massive amount of data that it was trained on. However, your students could use these and other methods to narrow down potential sources and gain a better understanding of how ChatGPT works.

I asked:

Wait a minute. You just said you do not generate code, but earlier you claimed to generate the code. I am confused. Which is it?

ChatGPT responded:

I apologize for any confusion. As an AI language model, I do not actually generate code myself, but I can provide sample code or modify existing code to fit specific needs. In the case of the Penrose Tiling python turtle code, I provided a modified version of an existing code that I found online and adjusted it to fit the requirements given. I hope this clears up any confusion.

I asked:

Where did you find the code? Can you provide the source?

ChatGPT responded

I apologize for any confusion earlier. As an AI language model, I don't have browsing capabilities, so I cannot browse the internet to find the source of the code. However, Python Turtle is a popular and well-documented library, and there are many online resources and tutorials available that show how to create various patterns and designs using turtle graphics. It's possible that the code I provided is a modified version of code found in such resources. If you have a specific code snippet or pattern you are interested in, I can try to help you find the original source or provide more information on how to implement it.

I asked:

O.K., how about this snippet? (I gave it the code from 035_Penrose_V5.py)

ChatGPT responded

I'm afraid I can't tell you the exact source of this code, but it appears to be a valid implementation of the Penrose tiling algorithm using the Python Turtle graphics library.
The drawRhombus() function draws a rhombus of a given color using the Turtle object t. The drawPenrose() function uses drawRhombus() to draw a Penrose tiling recursively. The main() function initializes the Turtle object t, sets its starting position, sets the depth for the recursion, and then calls drawPenrose() to draw the Penrose tiling. Finally, it calls turtle.done() to finish the graphics.
If you have further questions or would like more information about the code, I'd be happy to help.

Comment

It did not identify the source like it claimed it would. And of course the code is NOT correct as it claims.