I'm really surprised that prediction did not plummet after my previous comment. Seems like competition is still running, but based on [BBC article](https://www.bbc.com/news/technology-49578503) the format of the competition changed: > There will no longer be a panel of judges. Instead, the chatbots will be judged by the public and there will be no human competitors. Not sure if winning Silver medal should count for positive resolution anymore. Actually, I don't know whether concept of Silver medal even exists anymore. BTW - [Mitsuku won this year as w...
New SOTA on few-shot SuperGLUE of 75.4 with only 223M parameters. [It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners](https://arxiv.org/abs/2009.07118) > When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is...
New SOTA on few-shot SuperGLUE of 75.4 with only 223M parameters. [It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners](https://arxiv.org/abs/2009.07118) > When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is...

@mumpskin The problem with judging troops buildup is that bluff must look convincing in order to work.

The best way to produce convincing bluff is to order actual preparation for invasion pending order to attack.

@(kwathomas0) Indeed, the USA seems to not be doing great. I hope once they start testing, they will realize that mobilization is necessary. I'm actually reasonably happy about Europe response, at least we don't seem to be worse than China. - Italy = 4.2% CFR - France = 1.4% CFR - Spain = 1.3% CFR - UK = 1.2% CFR - Netherlands = 0.8% CFR - Switzerland = 0.5% CFR - No deaths in other major EU countries. Outside Italy, in Europe, we have 0.7% CFR. With Italy that jump to 1.6% - 2.9% with 2000 - 7400 cases respectively. China had 2.4% - 2.9% CFR while...
@(RedBox) While I personally don't like political discussions on Metaculus, I think it may be worthwhile to provide Polish perspective on this. > That is a highly ideologic idea built on the liberal idea of any state's right to self-determination wghcih as always been a pipe dream. In a perfect world I would agree that it is preferable that each state can decide it's own fate from a moral standpoint, but nevertheless that idea has no basis in reality and the world has never worked in such a way (...) It's not a liberal idea, not in this part of the wor...

What about resolving it "no" for now and changing to "yes" if situation changes?

@Aotho

My data points:

  1. People at NYT are aware of the issue - there were leaks of discussions from internal channels
  2. The reporter seems to be still doing interviews
  3. Considerable effort that went already into the article
  4. Community predictions distribution

I also keep a rule that predictions regarding decisions of a small group of people I know almost nothing about are naturally very uncertain, so I would not feel comfortable going higher than 75%.

— edited by isinlor

I've made a histogram of crew sizes based on historical mission plans proposals.

The list is from the Wikipedia: List of crewed Mars mission plans.

Also, if a whole fleet is send right away in the first launch window (e.g. Von Braun Mars 1952) then the crew size of the first spaceship doing burn putting spaceship on orbit intersecting Mars orbit should count.

I'm curious what is the average age of Metaculus users and how well does it coincide with the range 1984 - 1992 - 1998 of the current peak :) . I give people in this range 5% to live up to 1000 years. The oldest known specimen of the the Greenland shark is 392 ± 120 years old. From mammals Bowhead whales can reach up to 200-300 years of age. So, 1000 years for complex organisms seems to be doable. IMO the highest chance of success will have children already born with specifically designed DNA. I would also expect AI to contribute to this type of devel...
I personally see no way for Ukraine to join EU before 2024 unless Poland and Hungary do 180 degree turn on democratic values. It's possible in case of my country, Poland. There is no hope for Hungary. 180 degree turn in Poland may result in suspension of Hungary voting rights under Article 7 of the Treaty on European Union, but that will happen no sooner than 2024. Also, worth remembering Dutch advisory referendum on the approval of the Ukraine–European Union Association Agreement. It was held in the Netherlands on 6 April 2016. The referendum question...
I feel like we may be almost there. [OpenAI GPT-2](https://blog.openai.com/better-language-models/) model with some fine tunning and a good sampling strategy fitted to conversation should be able to fool a human. I would try biasing it based on movie subtitles full of dialogues or twitter conversations. Long term self consistency is already there and I think it was the biggest issue with using ML solution in chatting. For example, keeping track of named entities is certainly far from perfect, but it is happening. And it appears that expanding model alo...
@(Anthony) Looking at comments I would actually strongly prefer if everyone got awarded for good community prediction instead the other way around. Seems like some people decided to withdraw information that will make this question most likely to resolve ambiguous or negative. Regardless of whether it happened, the fact that I have this suspicion is not good for community spirit. Will we see at some point someone trying to actually actively mislead community to optimize points? Anyway, from: https://medium.com/pandorabots-blog/mitsuku-wins-loebner-prize...

@alexanderpolta I would recommend you to find some way to operationalize your prediction and ask Metaculus whether it will happen or not. I think Tesla is legitimate business, but I still think your question is very much worthwhile.

I'm personally curious what will be the longterm outcome of Musk vs. SEC confrontation. I don't think SEC wants to damage Tesla investors, but if Musk will continue to be in contempt of the law, it will be the law that will win and Tesla/Musk will lose.

@(kokotajlod) I have proposed a question series to estimate how the outbreak will progress. How many new cases of COVID-19 in: - [the 2nd quarter of 2020?](https://www.metaculus.com/questions/3765/how-many-new-confirmed-cases-of-covid-19-in-the-2nd-quarter-of-2020/) - [the 3rd quarter of 2020?](https://www.metaculus.com/questions/3766/how-many-new-confirmed-cases-of-covid-19-in-the-3rd-quarter-of-2020/) - [the 4th quarter of 2020?](https://www.metaculus.com/questions/3767/how-many-new-confirmed-cases-of-covid-19-in-the-4th-quarter-of-2020/) - [the 1st ...

@Jgalt Thanks for asking. Yes, I feel perfectly fine now. Also, my estimate that something actually was happening to my lungs went up. I was making noodle soup the next day I posted comment here and breathing in steam was actually painful. I couldn't eat it hot as I normally do, because of the steam. But then all symptoms went away very quickly. Whatever was happening my body handled it very well. Yesterday, I was making another soup and I had no issues at all with steam.

[Tesla to wide-release Full Self-Driving ‘by the end of this year’](https://www.teslarati.com/tesla-fsd-suite-wide-release/) > “We’re starting very slow and very cautiously because the world is a very complex and messy place,” Musk said when talking about the Beta rollout of the FSD suite to a minimal group of people, which began late Tuesday night. “We put it out there last night, and then we’ll see how it goes, and then probably release it to more people this weekend or early next week. Then gradually step it up until we hopefully have a wide-release ...

@Jim1776 WTF o.O

Edit: I agree with Ilay - It seems like a correct claim of the paper would be:

We can make Codex produce correct answers if we manually try hard enough, but we won't report exactly how hard.

— edited by isinlor

I think it's also interesting to incorporate evidence other than statistical. [The analysis done by Charisma on Command](https://www.youtube.com/watch?v=LibRNYJmZ-I) of 2016 campaigns in May is really interesting. To me the most interesting is how Trump was using crowds to select his catchphrases. I'm not American and I wasn't really interested in USA politics in 2016, I'm happy to tune it out as mach as possible, but Trumps branding "crooked Hilary" was so good that it was almost all I knew about Clinton in 2016. Well, I knew a lot more but this catc...
I think there are currently two interesting avenues for further development: The first is augmenting language models with information retrieval. An example of that idea: > Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. > To capture knowledge in a more modular and interpretable way, we augment language model pre-training wi...