Computer literacy should absolutely be universal in the adult population and hopefully universally under development in the entire juvenile population. I’m not sure programming literacy is the most important component of this literacy. While there’s more than a little truth to the old saw “program or be programmed,” I’m currently more concerned about the level of data literacy in the public. Public policy discourse around counterterrorism, law enforcement and so much more is full of talk about “inter-agency information sharing,” while consumer issues getting a lot of airplay include things like “privacy policies,” “data brokers” and datasets said to have been stripped of “personally identifiable data.” As an inoculation or public health measure I would recommend a high school course, strongly encouraged if not required, that introduces database concepts, perhaps using SQL (which seems to me to have a gentle and intuitive learning curve, but my mileage may be atypical). I would want this introduction to take the students at least as far as the concept of a table join. That is because this is (in my opinion) where the “magic” of SQL happens. This is why having access to two datasets confers more than twice as much informational power (and Information *is* power) than having access to one dataset. Exercises for the kind of data literacy course I’m proposing would ask questions such as,
- How would you go about trying to infer individual identities from records in this dataset which has been stripped of identifying information?
- How would you go about devising a system for calculating a numeric “score” for each of the [people, products, locations, etc.] in this dataset, where your goal is that higher scores might be predictive of higher probabilities of [a crime taking place, a loan going into arrears, a consumer making a purchase, etc.]
- How would you go about building a recommendation engine? Again, I’d like to see the emphasis less on the coding and more on the choice of what data, and data relationships, to work into the recommendations and in what way.
One more thing: Far too many of the programming courses I have taken (in conventional colleges and universities) rely far too heavily on quasi-business problems that are grossly oversimplified and unrealistic. I seem to remember a “write a simple reservation system for a simple hypothetical airline” or something. No wonder there’s no such thing as an entry-level job. I would hope that for the data literacy course the datasets would be empirical, which is to say, real world data. I would also hope that at least some of the datasets would be large-ish. Keeping in mind that (unfortunately for my purposes) information does NOT want to be free, some class assignments may be data collection assignments, perhaps sending the students out to conduct some surveys, or keep a food diary, or do some GPS-surveying or what have you.