@alexlyzhov

Note: this question mirrors an existing question, extending the scale to the 1-8B range. Make sure your prediction is consistent across both questions.

@ThirdEyeOpen

There's no NYT policy that prohibits using pseudonyms.

Can you provide a source for that claim?

What if team developing AI announce that their agent defeated high-rank Starcraft 2 player months ago? I mean situation similar to [AlphaGo vs Fan Hui](https://en.wikipedia.org/wiki/AlphaGo_versus_Fan_Hui): >AlphaGo versus Fan Hui was a five-game Go match between European champion Fan Hui, a 2-dan (out of 9 dan possible) professional, and AlphaGo, a computer Go program developed by DeepMind, held at DeepMind's headquarters in London in October 2015. AlphaGo won all the five games. This was the first time a computer Go program had beaten a professional h...

Tesla delivers first $35,000 Model 3s to a happy few customers

Now several buyers have reached out to Electrek to confirm that they have taken delivery of the ‘Model 3 Standard Range’.

So it seems that question resolves positive.

@Jgalt I agree. When I posted my comment I was half-seriously referring to ETA below question.

@Spirit59 Can @moderators reach out to @Matthew_Barnett (creator of the question) to clarify the issue? They didn't respond to me here or on Twitter.

Vanity Fair: [“Infamy Is Kind Of Fun”: Grimes on Music, Mars, and Her Secret New Baby With Elon Musk](https://www.vanityfair.com/style/2022/03/grimes-cover-story-on-music-and-mars) > When “Player of Games” first dropped, Grimes’s fans assumed it was about her rumored split from Musk, when in fact they were welcoming their second child and spending the holidays together as a family. The idea for the song came to her during a conversation with friends two years ago while she was three or four months pregnant with X, when Musk casually mentioned that he pl...

https://blogs.nasa.gov/commercialcrew/2018/01/11/nasas-commercial-crew-program-target-test-flight-dates-2/

Targeted Test Flight Dates:

Boeing Orbital Flight Test (uncrewed): August 2018

Boeing Crew Flight Test (crewed): November 2018

SpaceX Demonstration Mission 1 (uncrewed): August 2018

SpaceX Demonstration Mission 2 (crewed): December 2018

Michael Vassar offers to bet at least $500 against AI winning Gold Medal

1)

Is any Metaculus user willing to make a large, real, legal bet on this claim?

2)

Was offering 50%. Can do $500 if admin is simple. I have one taker at $1K already, so there are apparently at least some people who do somewhat believe these estimates. Will see how many.

I think it would be helpful to explicitly disambiguate that (as I judge by these comments [[1]](https://www.metaculus.com/questions/1673/will-spacex-start-testing-a-starship-bfs-before-2020/#comment-11446) [[2]](https://www.metaculus.com/questions/1673/will-spacex-start-testing-a-starship-bfs-before-2020/#comment-11459) ) both "Starhopper" (currently built in Boca Chica and recently damaged by wind) and orbital Starship prototype (aimed to debut in June 2019) count for resolution criteria, so that users don't have to dig in the comment section to find th...

New Google technic reaches 75.4

Further, Flan-U-PaLM achieves a new state-of-the-art on the MMLU benchmark with a score of 75.4% when combined with chain of thought and self-consistency.

@Matthew_Barnett @RyanBeck Thanks for clarification. (1) seems fine to me. IMO would be good to put the explicit info in the question that sparse models also count.

[**Program Synthesis with Large Language Models.**](https://arxiv.org/abs/2108.07732) From the paper's abstract: >This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language d...