I'm wondering about a hybrid approach. What happens if you take the data you downloaded, paste it into the chat interface, and instruct 03 (or gemini) to run through row by row?
Thanks! Well, it is different in the sense that once gets the right answer and the other doesn't :). But I think to your point, I would imagine that for lots of questions we give to o3, we might want it to generate code to answer the question and then run that code---partially to save computation, but more for us to understand how it is getting an answer and trace out mistakes. For things like fetching a table (which the code does) and setting up a 1-per-row evaluation---it's much easier to describe and execute in "normal" code than have o3 do some kind of inscrutable neural network-based process that approximates this (which in this case, was, after all, wrong).
I'm wondering about a hybrid approach. What happens if you take the data you downloaded, paste it into the chat interface, and instruct 03 (or gemini) to run through row by row?
It bet that would do better!
Thanks! Well, it is different in the sense that once gets the right answer and the other doesn't :). But I think to your point, I would imagine that for lots of questions we give to o3, we might want it to generate code to answer the question and then run that code---partially to save computation, but more for us to understand how it is getting an answer and trace out mistakes. For things like fetching a table (which the code does) and setting up a 1-per-row evaluation---it's much easier to describe and execute in "normal" code than have o3 do some kind of inscrutable neural network-based process that approximates this (which in this case, was, after all, wrong).