Semiotics-based prompt engineering for architectural text-to-image generation processes
DOI:
https://doi.org/10.18537/est.v014.n028.a09Keywords:
architectural design, generative ai, text-to-image generative models, prompt engineering, semioticsAbstract
Text-to-image generative AI tools have gained significant attention in the architectural community; however, they are currently being used by trial-and-error with simple textual inputs. This is largely due to the lack of established frameworks for crafting prompts that yield semantically rich architectural outputs. This paper proposes using semiotics as an analytical method facilitating text-to-image generation processes. Two experiments were conducted to investigate the effects of semiotic analysis and adding context modifiers to prompts on the relevancy of outputs of three mainstream text-to-image generation tools (DALL-E, Midjourney, and Stable Diffusion). The results indicate the effectiveness of the proposed method and reveal opportunities and limitations of current text-to-image generative models in architecture. It is concluded that a human-centered approach to Human-AI interaction is needed to overcome issues regarding control, transparency, and data quality.
Downloads
References
Akcay Kavakoglu, A., Almac, B., Eser, B. & Alacam, S. (2022). AI driven creativity in early design education – A pedagogical approach in the age of Industry 5.0, Proceedings of the 40th International Conference on Education and Research in Computer Aided Architectural Design in Europe (1), 133-142. https://doi.org/10.52842/conf.ecaade.2022.1.133
Autodesk. (2024). How generative AI for architecture is transforming design. https://www.autodesk.com/design-make/articles/generative-ai-for-architecture
Bach, S.; Sanh, V.; Yong, Z.; Webson, A.; Raffel, C.; Nayak, N.; Sharma, A.; Kim, T.; Bari, M.; Fevry, T.; Alyafeai, Z.; Dey, M.; Santilli, A.; Sun, Z.; Ben-David, S.; Xu, C.; Chhablani, G.; Wang, H.; Fries, J.; Maged, S.; Al-Shaibani; Sharma, S.; Thakker, U.; Almubarak, K.; Tang, X.; Radev, D.; Tian-Jian Jiang, M.; & Rush, A. (2022). Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2202.01279
Baghadlian, S. (2023). The complete timeline of text-to-image evolution. Artificial Intelligence in Plain English. https://ai.plainenglish.io/the-complete-timeline-of-text-to-image-evolution-b63298234ed6
Barthes, R. (1967). Elements of Semiology. Hill and Wang.
Berliner, M. S. (2007). Howard Roark and Frank Lloyd Wright. In R. Mayhew (Ed.), Essays on Ayn Rand’s The Fountainhead (pp. 41-64). Lexington Books, Plymouth, UK.
Chaillou, S. (2019). AI + Architecture, Towards a New Approach (Master’s thesis). Harvard Graduate School of Design.
Chaillou, S. (2020). ArchiGAN: Artificial Intelligence x Architecture. In P. F. Yuan, M. Xie, N. Leach, J. Yao, & X. Wang (Eds.), Architectural Intelligence (pp. 117–127). Springer Nature Singapore.
Chen J. & Stouffs Z. (2021). From exploration to interpretation - adopting deep representation learning models to latent space interpretation of architectural design alternatives. In A. Globa, J. Van Ameijde, a. Fingrut, N. Kim, T.T.S. Lo (Eds.), PROJECTIONS - Proceedings of the 26th CAADRIA Conference - Volume 1, the Chinese University of Hong Kong and Online, Hong Kong, (pp. 131-140).
Crowson, K. (2021) CLIP Guided Diffusion: Generates images from text prompts with CLIP guided diffusion. https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA
Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
Cullum-Swan, B. E. T. S., & Manning, P. (1994). Narrative, content, and semiotic analysis. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 463-477). Sage Publications.
DALL-E Prompt Book. (2022). Guide to effective prompting. https://dallery.gallery/the-dalle-2-prompt-book/
Del Campo, M., Carlson, A., & Manninger, S. (2021). Towards hallucinating machines-designing with computational vision. International Journal of Architectural Computing, 19(1), 88-103. https://doi.org/10.1177/1478077120963366
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34, 8780-8794. https://doi.org/10.48550/arXiv.2105.05233
Eco, U. (1979). A theory of semiotics. Indiana University Press.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27, 2672-2680. https://doi.org/10.48550/arXiv.1406.2661
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X, Wang, G., Cai, J. & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377. https://doi.org/10.48550/arXiv.1512.07108
Hao, Y., Chi, Z., Dong, L., & Wei, F. (2024). Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems, 36.
Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D. P., Poole, B., Norouzi, M., N., Fleet, D., & Salimans, T. (2022). Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2210.02303
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851. https://doi.org/10.48550/arXiv.2006.11239
Horvath, A. S., & Pouliou, P. (2024). AI for conceptual architecture: Reflections on designing with text-to-text, text-to-image, and image-to-image generators. Frontiers of Architectural Research, 13(3), 593-612. https://doi.org/10.1016/j.foar.2024.02.006
Huang, J., Johanes, M., Kim, F. C., Doumpioti, C., & Holz, G. C. (2021). On GANs, NLP and architecture: combining human and machine intelligences for the generation and evaluation of meaningful designs. Technology| Architecture+ Design, 5(2), 207-224. https://doi.org/10.1080/24751448.2021.1967060
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815
Liu, V., & Chilton, L. B. (2022). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-23.
Luo, Z., & Huang, W. (2022). FloorplanGAN: Vector residential floorplan adversarial generation. Automation in Construction, 142, 104470. https://doi.org/10.1016/j.autcon.2022.104470
Nauata, N., Chang, K. H., Cheng, C. Y., Mori, G., & Furukawa, Y. (2020). House-gan: Relational generative adversarial networks for graph-constrained house layout generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 162-177). Springer International Publishing.
Oppenlaender, J. (2023). A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology, 1-14. https://doi.org/10.1080/0144929x.2023.2286532
Oppenlaender, J., Linder, R., & Silvennoinen, J. (2023). Prompting ai art: An investigation into the creative skill of prompt engineering. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2303.13534
Paananen, V., Oppenlaender, J., & Visuri, A. (2023). Using text-to-image generation for architectural design ideation. International Journal of Architectural Computing, 14780771231222783.
Pavlichenko, N., & Ustalov, D. (2023). Best prompts for text-to-image models and how to find them. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2067-2071). https://doi.org/10.48550/arXiv.2209.11711
Phenaki. (2023). Phenaki: A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes. https://phenaki.video/
Psarra, S. (2009). Architecture and narrative: the formation of space and cultural meaning. Routledge.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S. & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763. PMLR. https://doi.org/10.48550/arXiv.2103.00020
Rand, A. (2015). The Fountainhead. Penguin Publishing Group.
Ross, S. M., & Morrison, G. R. (2013). Experimental research methods. In Handbook of research on educational communications and technology (pp. 1007-1029. Routledge.
Runway ML. (2024). Runway ML Web Site. https://runwayml.com/
Sahoo, P., Meharia, P., Ghosh, A., Saha, S., Jain, V., & Chadha, A. (2024). Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey. arXiv preprint arXiv. https://arxiv.org/pdf/2405.09589v1
Saussure, F. M. (2011). Course in general linguistics. Columbia University Press.
Smith, E. (2022). The traveler’s guide to the latent space. https://sweet-hall-e72.notion.site/A-Traveler-s-Guide-to-the-Latent-Space-85efba7e5e6a40e5bd3cae980f30235f
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of Machine Learning Research, 37:2256-2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html.
Steinfeld, K. (2023). Clever little tricks: a socio-technical history of text-to-image generative models. International Journal of Architectural Computing, 21(2), 211-241.
Sun, J., Wu, W., Liu, L., Min, W., Zhang, G., & Zheng, L. (2022). WallPlan: synthesizing floorplans by learning to generate wall graphs. ACM Transactions on Graphics (TOG), 41(4), 1-14.
Whang, S. E., Roh, Y., Song, H., & Lee, J. G. (2023). Data collection and quality challenges in deep learning: A data-centric ai perspective. The VLDB Journal, 32(4), 791-813.
Wu, A. N., Stouffs, R., & Biljecki, F. (2022). Generative adversarial networks in the built environment: A comprehensive review of the application of GANs across data types and scales. Building and Environment, 109477.
Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W. & Lu, D. (2020). Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5741-5750).
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Estoa. Journal of the Faculty of Architecture and Urbanism

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The Journal declines any responsibility for possible conflicts derived from the authorship of the works that are published in it.
The University of Cuenca in Ecuador conserves the patrimonial rights (copyright) of the published works and will favor the reuse of the same ones, these can be: copy, use, diffuse, transmit and expose publicly.
Unless otherwise indicated, all contents of the electronic edition are distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.