It's unclear whether any human could solve 80% of these coding problems on the first try. Humans typically take time to think about their answer before writing it, and during the writing process they usually edit things a bit to correct mistakes.

To be a fair comparison, the model should be allowed to use chain-of-thought techniques at the very least, and to review and edit its answer before submitting. Otherwise we are asking it to do something that no human could do.

My baby daughter was born two weeks ago, and in honor of her existence I'm building a list of about 100 technology-related forecasting questions, which will resolve in 5, 10, and 20 years. Questions like "By the time my daughter is 5/10/20 years old, the average US citizen will be able to hail a driverless taxi in most major US cities." (The idea is, tying it to my daughter's age will make it more fun and also increase the likelihood that I actually go back and look at it 10 years later.) I'd love it if the questions were online somewhere so other peopl...
I think people aren't reading the resolution criteria carefully: "In particular, the device must be a humanoid robot, and must be able to perform some physical tasks upon being given directions to do so - a remote-controlled device manually operated by a human will not count." Boston Dynamics Spot robot already meets this requirement, except that it's not humanoid. Boston Dynamics Atlas robot already meets this requirement, except that it's not for sale. Is it that hard to believe that Telsa could catch up to Boston Dynamics in a few years? Now, if t...

@BrendanFinan Please don't. There are much better ways to draw attention to this topic without delegitimizing Metaculus as a signal of what people actually think. I say this as someone who thinks there's a 10% chance we have less than one year left.

@tbyoln Thanks for this response. I hope you have a great day too. <3

"build me a general-purpose programming system that can write from scratch a deep-learning system capable of transcribing human speech." So to make sure I understand: We feed this prompt to a model, such as AlphaCode. The model then produces some code which, when run, *writes additional code,* which, when run, trains a deep-learning system capable of transcribing human speech. That seems to have one unnecessary step in the middle. Are you sure it isn't: We feed this prompt to a model, such as AlphaCode. The model then produces some code which, when run...
@(kokotajlod) A secondary argument: AIs will not progress smoothly from 10% human to 50% to 90% and so on. They already are superhuman in some domains and subhuman in others. And moreover, they have certain quirks such as hallucinations and jailbreaks that would serve as 'tells' in a real turing test. So by the time all of these issues are ironed out, and we have a system that is not subject to any jailbreaks or adversarial examples, and that is at least as good as the typical human at EVERYTHING, it will probably be vastly superhuman at most text-based ...

@nextbigfuture Given the massive damage to the pad, they may have to redesign Stage 0 significantly -- and that could by itself set them back a whole year. They may not be able to launch again until e.g. they've dug a huge flame diverter trench, or rebuilt the OLM to be higher (which would also involve rebuilding the tower to be higher)

Any ideas why progress was so much faster than people expected?

This comment was originally posted on January 14, 2022

I'm a bit confused that the community prediction has the risk highest in the thirties, and substantially higher than the risk from AGI happening in the twenties! I wonder why.

How much will funding for biorisk prevention increase after coronavirus?

We could look at specific organizations, like the CDC, and see if their 2021 budgets are substantially bigger than their 2020 or 2019 budgets.

— edited by kokotajlod

It feels a bit weird to me to read this question, with its excellent summary of some reasons to think the answer will be Yes, but not a word of argument that the answer will be No. Surely there is some other Medium post out there with arguments for No, right? Anyhow I don't mean this as a major critique, I just wanted to flag that it would be nice to have both sides represented at the top.

@notany Wait, starting to plateau? What do you mean? Last I checked there was no sign of the trend even slowing down, much less plateauing! Moreover the theory results suggest that in the near future the trend will slow down but not plateau, so even if there is a slight dip it's probably just that.

Suppose in 2025 things like Replika are not popular, but something like AI Dungeon is super popular, hundreds of millions of people have ongoing fantasies collaboratively written with the AI... and a significant portion of these fantasies are sexual and/or romantic. I think this should count, but I'm not sure, so I'm asking.

@kokotajlod Come to think of it, maybe that is what they are doing lol. They are explicitly sticking their necks out and making all this noise about the imminent invasion, so that Putin will be tempted to make them look like fools by calling it off. Looking like fools is the price they are willing to pay to prevent the invasion. :)

(I don't actually believe this theory but it seems more plausible than yours!)

@(Jgalt) Arguments I know of against the theory that vast numbers of people have already been exposed unknowingly: --This would predict that the vast majority of confirmed cases would have no apparent connection to a previous confirmed case. But IIRC that isn't true; a substantial fraction of confirmed cases did have contact with someone else confirmed at some point in the past two weeks. (South Korea should have data on this, right? They've got that app.) --This would predict that the infection fatality rate is tiny. But there are several communities ...
@(Tamay) https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html "With 10k+ Google-internal developers using the completion setup in their IDE, we measured a user acceptance rate of 25-34%. We determined that the transformer-based hybrid semantic ML code completion completes >3% of code, while reducing the coding iteration time for Googlers by 6% (at a 90% confidence level). The size of the shift corresponds to typical effects observed for transformational features (e.g., key framework) that typically affect only a subpopulation, wher...
@(isinlor) Thanks! If there's a way for me to edit my post, I don't see it. Here are my first-pass answers to your question: At a high level, this question aims to capture (or at least be a proxy for) the important thing people often debate about AI timelines: "Whether the scaling will continue, or plateau." There are various different scaling trends and various theories about why they are the way they are; I just picked this one because it's perhaps a particularly solid and prominent one. From the abstract: "We identify empirical scaling laws for the...
@(andreferretti) I agree. Hopefully one day we will meet in person, I think that would be nice & I agree we'd probably share more similarities than differences. I agree re point 1, though I think it's not a huge part of the explanation for the discrepancy. Re point 3, that applies also to the survey participants (they can base their predictions on what they hear other people say, what sounds normal to believe, etc.) so I don't think it's a strong distinguishing factor. I think point 2 is the main source of the divergence. tbyoln and I agree on that. :...