Initial post: November 15, 2019
First off: of course I don't mean it in a pathological DSM sense. So seriously, save your judgmental "oh it must be terrible to be your student" derision and go back under the bridge whence you came. Of course when I say it to my students it is couched within a broader context, and they understand that context. You don't, because you read my slogan in a tweet. I mean what I say in the "only the paranoid survive" Andy Grove sense and the "customer obsession" Jeff Bezos sense.
With that aside, back to the slogan. First, let me illustrate, (pseudo-)mathematically, why this statement is likely to be true when uttered to a broad audience. Hinton estimates that there are around 10k graduates students in China "doing neural nets". Assume that's a reasonable estimate. That's just in China. So the entire worldwide population of "people doing neural nets" is likely a multiplicative factor of that. Let's then consider the pigeonhole principle: How many distinct novel "ideas" do you think there are on the frontier of deep learning? Are there 10k? If there are less (which I believe), then a collision is inevitable. Even if there are more than 10k (even way more), there is a birthday paradox effect when referring to an audience. Of course ideas aren't discrete and countable, and of course the distributions of ideas and effort aren't uniform. This isn't meant to be a formal proof, just a prima facie argument about the veracity of the claim.
Now let's consider the objections.
Okay, your solution is simple. Don't work on deep learning. Knuth says this. Yes, if you're Knuth you can work on anything you want. If you don't want to work on deep learning, that's of course completely fine. However, consider this: often, the reason why a particular approach is popular is usually because it works. Today, we if encounter someone who refuses to adopt data-driven techniques and stubbornly insists on only writing rules, we'd think that's kinda silly. It is likely that in a few years, a categorical resistance to deep learning would seem just as silly.
I'll provide a bit of historical context: the previous major paradigm shift in NLP was from rule-based to "statistical" approaches in the 1990s [1]. Yes, there were researchers back then who simply refused to count tokens and compute likelihoods. I know because I lived through those years (as a student). Today, most people would find that laughable.
Oh, there's also a practical matter: You'll struggle getting your *ACL papers accepted.
Next, the old, trite, uninspired retort: "well, if you're in danger of being scooped, you should work on a more interesting problem." I argue the exact opposite: many people are working on a problem precisely because it's interesting. Or useful, impactful, etc. It is this reason why DL/NLP/etc. has advanced so quickly: because people quickly pick the low hanging fruit (tasty!), rush to ask all the obvious questions, quickly determine the answers, and then tell everyone else about it (what you call flag planting derisively I call rapid dissemination of knowledge). In this way, the research frontier expands rapidly, to everyone's benefit. It feels absolutely exhilarating to be on this frontier, to be working on something that boatloads of people care about. It feels absolutely intoxicating to sit atop a leaderboard, knowing that you and your colleagues, at that moment are the best in the world that this very thing.
But the downside is that you might get scooped. Hence the slogan.
You're not in danger of getting scooped? That doesn't mean you're working on an interesting problem. It may just mean that you're working on a problem no one cares about.
Many of the remaining retorts I've heard could be characterize as hubris (defined by Merriam-Webster as "exaggerated pride or self-confidence").
"I work on something that I'm uniquely suited for...", "... that only I can do", "... that plays to my strengths", or something along those lines. Oh really? You must have a high opinion of yourself. You may truly be unique, or it's just hubris. As Bill Gates famously said, in China when you are one in a million, there are thirteen hundred people just like you.
"I like to take my time to think", "... look further ahead", "... work on really difficult problems", or something along those lines. Oh really? The rest of us are just running around like headless chickens, and our work is no more principled than Brownian motion? You can think two steps ahead of everyone else without rolling up your sleeves and actually working on step one? Wow, that's amazing. Perhaps you're just so insightful that you see something all of us have missed. It may be true, or it's just hubris.
The final cluster of objections is what I would characterize as "intrinsic motivation": instances include, for example, "for the simple joy of discovery", or "I'm not paranoid because I don't care about precedence, publications, h-indexes, citations, etc.", or "I'm not doing this for external validation". In some cases, these sentiments are expressed by people who are in the enviable position of not needing to care, for example, post-tenure academics. This strikes me as disingenuous if offered as serious advice, i.e., "you are in a position where you don't need to run the publication rat race, but advising students that publication count doesn't matter at all"? When these sentiments are expressed, for example, by pre-tenure academics, I really hope they are cognizant of the consequences of their approach (on promotion and tenure decisions, etc.). I've heard many times: "if this tenure thing doesn't work out, I'll just do X, but until then I'll do what I do". And they truly mean it. I genuinely admire such individuals and wish them the best of luck.
Thomas Kuhn talks about paradigm shifts followed by "normal science". Clayton Christensen talks about sustaining innovating vs. disruptive innovative. If you think you can bring about a paradigm shift and disrupt the status quo, then, sure, you're less likely to be scooped. I truly look forward to being enlightened by your brilliance when you change the world. Maybe it'll happen, or it's just hubris.
While you are thinking "your deep thoughts", I'll be here, grinding away, poking tiny new frontiers in knowledge, bit by bit. Perhaps it is through this process that I will stumble upon upon some brilliant, revolutionary idea.
Nah, that's just hubris.
I end this essay noting, with great relish, that even Albert Einstein was nearly scooped.
Update (2020/05/18): I submit as supporting evidence, the following Exhibit A.
"Be Paranoid! Someone is working on your idea right now!" was controversial #nlproc https://t.co/xyECGcMCch I submit as further evidence: two papers, both accepted to #acl2020nlp on same basic idea (BERT early exits). Ours: https://t.co/junzZZybZw Theirs: https://t.co/W8jVfpKp4b
— Jimmy Lin (@lintool) May 18, 2020
Update (2024/08/04): Documenting another "real-life scoopage" related to work from my group, "LLMs Can Patch Up Missing Relevance Judgments in Evaluation", by Upadhyay et al., posted on arXiv May 8. Reaction:
It's great to see more work in this area, but don't we already know that (less powerful) LLMs can produce (soft) graded relevance labels on even less data @ehsk0 @lintool?https://t.co/6lthbvBX1B https://t.co/r3m7im3MBr pic.twitter.com/NNQCk1aDLo
— Sean MacAvaney (@macavaney) May 9, 2024
To which I responded:
Well, the short honest answer is... we got scooped by you et al. This thread of work had been languishing for a long time here - finally was able to push it out.
— Jimmy Lin (@lintool) May 9, 2024
And, just a day later, a similar paper was posted on arXiv:
Same happened to us. Our arXiv will be out soon 😂
— Mohammad Aliannejadi (@maliannejadi) May 9, 2024
The paper "Can We Use Large Language Models to Fill Relevance Judgment Holes?" by Abbasiantaeb et al. was posted on arXiv May 9. Yup, that's a difference of one day.
[1] I put "statistical" in quotes because that was the term of art back then, but "statistical" is better characterized as "data-driven" in today's parlance.