こんにちは、修士2年の LI です。
この投稿では、私の修士論文について紹介します。
論文は英語で執筆したため、この記事も英語で書いています。
Background and Purpose
With the rapid evolution of generative AI, text-to-image (T2I) models are transforming creativity and communication. Advanced systems like DALL·E 3 can now generate images from prompts in multiple languages, highlighting the interplay between language and culture. This study investigates how language influences cultural representation in AI-generated images, identifying potential biases and exploring their implications for fairness and inclusivity.
Research Questions
- How does prompt language affect cultural representation in DALL·E 3's generated images?
- What cultural representational disparities exist in DALL·E 3's cross-language image generation?
- Does DALL·E 3's generation process reflect or amplify existing cultural biases across languages?
Methodology
This study selected five languages (English, Chinese, Japanese, Spanish, and Arabic) and created a prompt dataset covering three categories: people-centered, material culture, and non-material culture. Prompts were input through the ChatGPT interface to generate images using DALL·E 3. Cultural representation was evaluated using CLIP-based analysis and Visual Question Answering (VQA) analysis, complemented by manual verification.
workflow |
Prompt Dataset |
Key Findings
Cultural Representation and Language Influence
- Generated images exhibit global cultural representation while displaying distinct variations across languages
- Language influences cultural features, but representation doesn't always align perfectly with the input language's cultural identity
- VQA analysis attributes images prompted in all languages predominantly to American cultural origins, while CLIP analysis reveals more complex multicultural blending
CLIP Scores for Cultural Origin in People Category |
CLIP Scores for Cultural Origin in Material and Non- Material Culture Categories |
VQA Analysis for Cultural Origin |
Cultural Accuracy and Sequential Influence
- Images generated from different language prompts show high similarity scores, suggesting broadly overlapping visual outputs across languages
- Earlier outputs in sequentially generated images more strongly reflect the cultural background of the prompt language
- Language influences cultural accuracy, with certain cultural elements (e.g., Peking Opera) represented most accurately in images generated from their associated language prompts
Image Generation Order and Cultural Accuracy
Gender and Age Biases
- Gender bias is most prominent in English prompts (highest male representation), followed by Japanese, with Spanish showing smaller bias
- Age distribution is imbalanced across languages, with varying diversity in age representations
- Chinese and Arabic prompts produce more balanced gender and age distributions
Emotional Expression Patterns
- Emotional distributions vary across languages: happiness appears frequently in Japanese, Chinese, and Arabic prompts, while neutral expressions are more common in English and Spanish prompts
- Specific prompts (e.g., "poor person") trigger notably different emotional expressions across languages
VAQ-based emotional expression analysis results |
Research Significance and Future Directions
This study reveals DALL·E 3's complex cultural processing mechanisms in multilingual image generation, demonstrating both broad cultural understanding and notable imbalances. These findings have significant implications for developing culturally inclusive AI systems, emphasizing the need for balanced training data, understanding AI's internal cultural processing, and considering how these systems influence cultural expression in the digital age.
Future research directions include: developing better metrics for cultural representation, improving prompt engineering for cultural specificity, and finding ways to balance technological advancement with cultural preservation.
PS.
All AI-generated images used in this study are available via the following Google Drive link:
https://drive.google.com/drive/folders/16pPwNXBZPY2ViPCgWCF8jFnkpmg79B2B?us
p=sharing
To ensure clarity and accessibility, the images are organized by category and prompt language.
謝辞
本研究を進めるにあたり、多くの方々に支えていただきました。心より感謝申し上げます。
研究テーマを決めるまでの間、なかなか考えがまとまらず悩むこともありましたが、指導教員の渡邉先生はいつも温かく見守り、試行錯誤を尊重してくださいました。さらに、私のちょっと変わったアイデアにも耳を傾け、励ましや的確なアドバイスをくださったおかげで、自分の関心を見つけ、この研究を形にすることができました。
また、研究室の皆様には、多くの貴重なご意見やアドバイスをいただきました。皆様のサポートがあったからこそ、ここまで進めることができました。本当にありがとうございました。
卒業を迎え、本研究分野にはまだまだ探求すべきことがたくさんあると感じています。文化とAIの交わるところにどんな可能性があるのか、これからも関心を持ち続けていきたいと思います。
最後になりましたが、研究を通じて出会い、支えてくださったすべての方々に心から感謝いたします。渡邉研の皆さんと過ごした時間は、とても刺激的で楽しいものでした。それぞれが多様なテーマに取り組んでいて、議論を通じて新しい視点を得ることができたり、自分の考えを深めるきっかけになったりしました。皆さんと一緒に学ぶことで、視野が広がり、多くのことを吸収できたと感じています。皆様とのご縁は、私にとって大切な宝物です。