New Show Hacker News story: Show HN: Solving NYT Connections with ChatGPT
Show HN: Solving NYT Connections with ChatGPT
2 by ekms | 0 comments on Hacker News.
Just for fun I decided to see if I could use chatGPT to solve NYT Connections word puzzles. It uses a pretty straightforward BFS search in which the LLM is first prompted to generate several possible groupings of four related words, and then a different prompt is used to evaluate the soundness of each of those groupings. This approach seems to be able to produce the correct solution somewhat less than half the time. Some observations: * For whatever reason, chatGPT-4 seems to be a bit worse than 3.5 at generating Connections groupings. I haven’t tested systematically so maybe this is just some small sample size bias. But at the very least it isn’t obviously better * It really struggles with the “words that can fill in the blank” style groups. Often it will correctly come up with the right category (e.g. “words that can precede `cheese`”) but will only be able to identify 2 of 4 words in that grouping * It frequently generates very vague categories (“words that can be nouns”) despite nothing like that appearing in the proposal prompt. Also it will still sometimes score them highly, despite there being several explicitly examples in the value prompt disallowing these types of categories If you have any idea for how to improve this, please let me know (or send a PR)!
2 by ekms | 0 comments on Hacker News.
Just for fun I decided to see if I could use chatGPT to solve NYT Connections word puzzles. It uses a pretty straightforward BFS search in which the LLM is first prompted to generate several possible groupings of four related words, and then a different prompt is used to evaluate the soundness of each of those groupings. This approach seems to be able to produce the correct solution somewhat less than half the time. Some observations: * For whatever reason, chatGPT-4 seems to be a bit worse than 3.5 at generating Connections groupings. I haven’t tested systematically so maybe this is just some small sample size bias. But at the very least it isn’t obviously better * It really struggles with the “words that can fill in the blank” style groups. Often it will correctly come up with the right category (e.g. “words that can precede `cheese`”) but will only be able to identify 2 of 4 words in that grouping * It frequently generates very vague categories (“words that can be nouns”) despite nothing like that appearing in the proposal prompt. Also it will still sometimes score them highly, despite there being several explicitly examples in the value prompt disallowing these types of categories If you have any idea for how to improve this, please let me know (or send a PR)!
Comments
Post a Comment