The debate about Experimental Philosophy is usefully compared with debates elsewhere in the human sciences about intuition gathering. As it happens, in syntax, the time-honored tradition of "armchair linguistics" is facing a similar challenge, although at a considerably lower decibel rate. On this score, I've found this paper by Colin Phillips quite useful. Some choice quotes:
Although the typical 'Armchair linguist' does not systematically test his generalizations using large sets of example sentences and many naive informants, empirical claims nevertheless undergo extensive vetting before they attain the status of 'widely accepted generalizations'. If a key judgment is questionable, this is likely to be pointed out by a colleague, or by audience members in a talk, or reviewers of an abstract or journal article. If the questionable generalization somehow makes it past that point, then it will still be subjected to widespread scrutiny before it becomes a part of linguistic lore.
In our lab we frequently conduct controlled acceptability judgment studies...We have to run the judgment studies in order to convince skeptical reviewers that we are investigating real phenomena, but the results are rarely surprising...in our experience, carefully constructed tests of well-known grammatical generalizations overwhelmingly corroborate the results of 'armchair linguistics'.
And bearing more directly on what has emerged in the comments thread about Machery, Mallon, Nichols, and Stich's work on Kripke:
Acceptability contrasts that are clear when using the much maligned 'ask a couple of friendly linguists' method generally remain clear when testing a large number of non-expert informants. If the larger sample makes the contrast seem less clear, this is just as likely to reflect experimenter error (misleading instructions, poorly matched examples, etc.) as distortion of facts by linguists.