I'm wondering about a hybrid approach. What happens if you take the data you downloaded, paste it into the chat interface, and instruct 03 (or gemini) to run through row by row?
Hey how is that different from what your function does in when searching in Wikipedia? And how is this different to giving an llm the tool do it without using your package?
Thanks! Well, it is different in the sense that once gets the right answer and the other doesn't :). But I think to your point, I would imagine that for lots of questions we give to o3, we might want it to generate code to answer the question and then run that code---partially to save computation, but more for us to understand how it is getting an answer and trace out mistakes. For things like fetching a table (which the code does) and setting up a 1-per-row evaluation---it's much easier to describe and execute in "normal" code than have o3 do some kind of inscrutable neural network-based process that approximates this (which in this case, was, after all, wrong).
I'm wondering about a hybrid approach. What happens if you take the data you downloaded, paste it into the chat interface, and instruct 03 (or gemini) to run through row by row?
It bet that would do better!
Hey how is that different from what your function does in when searching in Wikipedia? And how is this different to giving an llm the tool do it without using your package?
Super nice job btw :)
Thanks! Well, it is different in the sense that once gets the right answer and the other doesn't :). But I think to your point, I would imagine that for lots of questions we give to o3, we might want it to generate code to answer the question and then run that code---partially to save computation, but more for us to understand how it is getting an answer and trace out mistakes. For things like fetching a table (which the code does) and setting up a 1-per-row evaluation---it's much easier to describe and execute in "normal" code than have o3 do some kind of inscrutable neural network-based process that approximates this (which in this case, was, after all, wrong).
Thank you for the detailed answer John! That makes sense :)