AI has become part of nearly every conversation in education. We use it to save time, generate ideas, and even design lessons. But as impressive as these systems are, they carry hidden layers that deserve attention. The most important of these is bias.
Every AI system learns from data, and that data comes from people: our words, our images, our history. It reflects how the world has been shaped, with all its inequalities and assumptions. So when we say AI isn’t neutral, we’re really saying that it reflects the values, gaps, and biases of the society that built it.
Kate Crawford’s Atlas of AI explores this idea powerfully. She compares data to oil, a resource extracted, processed, and used to fuel powerful industries, often without consent or care for those affected. The same logic drives today’s AI systems. Let’s unpack how that plays out in practice.
1. Data Carries History with It
AI doesn’t collect stories; it collects traces. Imagine a facial recognition system trained on police databases. It can detect faces but knows nothing about the circumstances behind them. It doesn’t ask why someone was in that photo or what that moment meant.
When those details disappear, the data loses context and context is what gives meaning. AI models trained on such data absorb bias because they inherit patterns from real-world systems, including policing and surveillance, that have long reflected inequality. The machine sees a face, not a life.
2. The Myth of “Ground Truth”
Engineers like to talk about “ground truth,” as if datasets represent the real world objectively. In reality, they capture fragments of it. These datasets are pulled from wherever information is easiest to find, Wikipedia, Reddit, social media, and stitched together without much thought for balance or accuracy.
That patchwork becomes the foundation of AI’s “truth.” So if most of the data reflects Western perspectives, the model will learn a narrow version of reality. It won’t recognize cultural diversity or local nuance; it will simply echo the world as seen through the dominant lens of the internet.
3. Patterns Can Mislead
AI learns by looking for patterns and making inferences. If every picture of an apple in a training set is red, the system assumes all apples are red. That’s a harmless example until you replace apples with people.
A system trained mostly on lighter-skinned faces might struggle with darker ones. A voice assistant tuned to a certain accent might misinterpret others. These aren’t small glitches; they’re predictable outcomes of biased data. The machine isn’t being unfair on purpose, it’s simply limited by what it has seen.
4. The Problem with Benchmark Datasets
Researchers often rely on benchmark datasets to test and compare models. The idea is to create a shared baseline. The problem is that these benchmarks become the standard across labs, shaping what AI learns and how it performs.
When everyone uses the same narrow datasets, they end up reinforcing the same blind spots. Progress looks measurable, but the field keeps circling around the same limitations. It’s like teaching from the same outdated textbook for years and calling it innovation.
5. Language Reveals the Politics of AI
Words shape the way AI understands the world. Large language models, like ChatGPT, are trained on text scraped from online platforms filled with opinions, jokes, and arguments. Every word carries social weight—cultural assumptions, power dynamics, and political undertones.
When a model learns from that language, it absorbs those biases too. A chatbot trained on Reddit or Twitter might adopt aggressive tones or biased phrasing. The result isn’t an evil machine, it’s a reflection of the digital spaces we’ve built.
6. Data Extraction and Digital Colonialism
Crawford’s metaphor comparing data mining to colonial extraction hits close to home. Just as colonial powers once took land and resources without consent, modern tech companies extract personal data from billions of users. Photos, posts, clicks, all become raw material for training AI systems.
That process turns lived experiences into a commodity. The stories, emotions, and identities behind the data vanish. What remains is a sanitized product, something profitable but disconnected from its human source.
This extractive cycle has consequences. It reinforces inequality and shifts control to those who already hold technological power. Meanwhile, the people whose data fuels these systems rarely share in the benefits.
7. When Bias Becomes Inequality
We’ve seen how this plays out in the real world. Credit algorithms that give women lower limits than men with identical financial profiles. Criminal justice tools that mislabel Black individuals as high-risk. Voice recognition systems that can’t understand regional or female voices.
These aren’t isolated mistakes. They’re reminders that every dataset carries a worldview. When AI trains on biased inputs, it reproduces biased outputs—affecting lives, not just numbers.
8. The Environmental and Human Cost
AI doesn’t just consume data; it consumes energy, minerals, and labor. The systems we rely on every day run on servers powered by cobalt, lithium, and rare earth metals—often mined in unsafe or exploitative conditions.
9. Moving Toward Ethical AI Literacy
The goal isn’t to reject AI, it’s to understand it with clear eyes. As educators, we can help students see that every system reflects choices made by people. AI isn’t an oracle; it’s a mirror. It shows us both our creativity and our collective blind spots.
Teaching AI literacy means going beyond how it works to why it works the way it does. Who builds it? Whose voices are missing? What assumptions shape its decisions? These are the questions that prepare students to engage critically with technology.