"There's this idea I've seen presented by mcclure111 of 'grey data', in the model of 'grey water' - that is, water that we can be 100% certain is not healthy for human consumption but is still nonetheless water and can be put to some uses like sewage and irrigation.
"There is stuff that has absolutely useful application to understanding systems and how they train things, or as useful low-end stand-in tools, like AI generative art, but which absolutely are not ethically acceptable for the majority of human-interfacing uses"
-- Talen Lee (Talen_Lee), 2022-08-14
Searching for the phrase on Twitter, I found:
"what animals (including humans) have access to that machines lack is access to many sources of truth - humans can do arithmetic on our fingers, by counting out beans, purely in our heads, etc. AI esp ML etc don't 'know' when one of its axioms is faulty, its truth is too fragile.
"and i think despite the common defense that even when an AI is wrong about something, it's kinda right about some larger idea is largely irrelevant given how often it's being pressed into service where concrete truth is critical, eg if a car is about to run over an old lady."
-- JP (vectorpoem), 2022-04-08
"I've been thinking lately about trying to formally develop the idea of 'gray data', named in the sense of graywater. Data that's ok for limited/frivolous uses, like aesthetics, but 'not fit for human consumption'. Anything that comes out of machine learning ('AI') is gray-data.
"The thing about machine learning is ML only works when there are no consequences for the ML model being wrong. Decisions should never be made based on ML output, and ML output should be quarantined, like graywater, to make sure it isn't fed into a decisionmaking process.
"Actually I'm not sure even THIS approach works since even tech applied for pure aesthetics can do harm (imagine an app that makes cute cartoon selfies, but because of a limited training set breaks on people of color). But if you're gonna use ML at all u have to outline its limits"
-- mcc (mcclure111), 2022-04-09
Searching on Google, I found other references to the phrase, using it in at least two different ways, distinct from this but kinda overlapping if you look at them right, at least as far back as 2010:
- vast piles of unstructured data archived by an organization but not structured in any especially useable way -- piles of email, automaticaly generated logs, etc.
- un-peer-reviewed / unverified data that may include spam, disinformation, and garbage -- in contrast to verified and trustworthy "white literature", and mostly hidden internal analytics that are seldom examined directly but are used indirectly ("black data").
Of all these, I like the analogy-to-grey-water version best.