Semiotics-based prompt engineering for architectural text-to-image generation processes

Şule Taşlı Pektaş; Bilge Sağlam

doi:10.18537/est.v014.n028.a09

Authors

Şule Taşlı Pektaş OSTIM Technical University https://orcid.org/0000-0003-0596-6405
Bilge Sağlam OSTIM Technical University https://orcid.org/0000-0003-4655-7771

DOI:

https://doi.org/10.18537/est.v014.n028.a09

Keywords:

architectural design, generative ai, text-to-image generative models, prompt engineering, semiotics

Abstract

Text-to-image generative AI tools have gained significant attention in the architectural community; however, they are currently being used by trial-and-error with simple textual inputs. This is largely due to the lack of established frameworks for crafting prompts that yield semantically rich architectural outputs. This paper proposes using semiotics as an analytical method facilitating text-to-image generation processes. Two experiments were conducted to investigate the effects of semiotic analysis and adding context modifiers to prompts on the relevancy of outputs of three mainstream text-to-image generation tools (DALL-E, Midjourney, and Stable Diffusion). The results indicate the effectiveness of the proposed method and reveal opportunities and limitations of current text-to-image generative models in architecture. It is concluded that a human-centered approach to Human-AI interaction is needed to overcome issues regarding control, transparency, and data quality.

Downloads

Download data is not yet available.

References

Akcay Kavakoglu, A., Almac, B., Eser, B. & Alacam, S. (2022). AI driven creativity in early design education – A pedagogical approach in the age of Industry 5.0, Proceedings of the 40th International Conference on Education and Research in Computer Aided Architectural Design in Europe (1), 133-142. https://doi.org/10.52842/conf.ecaade.2022.1.133

Autodesk. (2024). How generative AI for architecture is transforming design. https://www.autodesk.com/design-make/articles/generative-ai-for-architecture

Bach, S.; Sanh, V.; Yong, Z.; Webson, A.; Raffel, C.; Nayak, N.; Sharma, A.; Kim, T.; Bari, M.; Fevry, T.; Alyafeai, Z.; Dey, M.; Santilli, A.; Sun, Z.; Ben-David, S.; Xu, C.; Chhablani, G.; Wang, H.; Fries, J.; Maged, S.; Al-Shaibani; Sharma, S.; Thakker, U.; Almubarak, K.; Tang, X.; Radev, D.; Tian-Jian Jiang, M.; & Rush, A. (2022). Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2202.01279

Baghadlian, S. (2023). The complete timeline of text-to-image evolution. Artificial Intelligence in Plain English. https://ai.plainenglish.io/the-complete-timeline-of-text-to-image-evolution-b63298234ed6

Barthes, R. (1967). Elements of Semiology. Hill and Wang.

Berliner, M. S. (2007). Howard Roark and Frank Lloyd Wright. In R. Mayhew (Ed.), Essays on Ayn Rand’s The Fountainhead (pp. 41-64). Lexington Books, Plymouth, UK.

Chaillou, S. (2019). AI + Architecture, Towards a New Approach (Master’s thesis). Harvard Graduate School of Design.

Chaillou, S. (2020). ArchiGAN: Artificial Intelligence x Architecture. In P. F. Yuan, M. Xie, N. Leach, J. Yao, & X. Wang (Eds.), Architectural Intelligence (pp. 117–127). Springer Nature Singapore.

Chen J. & Stouffs Z. (2021). From exploration to interpretation - adopting deep representation learning models to latent space interpretation of architectural design alternatives. In A. Globa, J. Van Ameijde, a. Fingrut, N. Kim, T.T.S. Lo (Eds.), PROJECTIONS - Proceedings of the 26th CAADRIA Conference - Volume 1, the Chinese University of Hong Kong and Online, Hong Kong, (pp. 131-140).

Crowson, K. (2021) CLIP Guided Diffusion: Generates images from text prompts with CLIP guided diffusion. https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA

Crawford, K. (2021). Atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

Cullum-Swan, B. E. T. S., & Manning, P. (1994). Narrative, content, and semiotic analysis. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 463-477). Sage Publications.

DALL-E Prompt Book. (2022). Guide to effective prompting. https://dallery.gallery/the-dalle-2-prompt-book/

Del Campo, M., Carlson, A., & Manninger, S. (2021). Towards hallucinating machines-designing with computational vision. International Journal of Architectural Computing, 19(1), 88-103. https://doi.org/10.1177/1478077120963366

Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34, 8780-8794. https://doi.org/10.48550/arXiv.2105.05233

Eco, U. (1979). A theory of semiotics. Indiana University Press.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27, 2672-2680. https://doi.org/10.48550/arXiv.1406.2661

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X, Wang, G., Cai, J. & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377. https://doi.org/10.48550/arXiv.1512.07108

Hao, Y., Chi, Z., Dong, L., & Wei, F. (2024). Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems, 36.

Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D. P., Poole, B., Norouzi, M., N., Fleet, D., & Salimans, T. (2022). Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2210.02303

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851. https://doi.org/10.48550/arXiv.2006.11239

Horvath, A. S., & Pouliou, P. (2024). AI for conceptual architecture: Reflections on designing with text-to-text, text-to-image, and image-to-image generators. Frontiers of Architectural Research, 13(3), 593-612. https://doi.org/10.1016/j.foar.2024.02.006

Huang, J., Johanes, M., Kim, F. C., Doumpioti, C., & Holz, G. C. (2021). On GANs, NLP and architecture: combining human and machine intelligences for the generation and evaluation of meaningful designs. Technology| Architecture+ Design, 5(2), 207-224. https://doi.org/10.1080/24751448.2021.1967060

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9), 1-35. https://doi.org/10.1145/3560815

Liu, V., & Chilton, L. B. (2022). Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI conference on human factors in computing systems (pp. 1-23.

Luo, Z., & Huang, W. (2022). FloorplanGAN: Vector residential floorplan adversarial generation. Automation in Construction, 142, 104470. https://doi.org/10.1016/j.autcon.2022.104470

Nauata, N., Chang, K. H., Cheng, C. Y., Mori, G., & Furukawa, Y. (2020). House-gan: Relational generative adversarial networks for graph-constrained house layout generation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 162-177). Springer International Publishing.

Oppenlaender, J. (2023). A taxonomy of prompt modifiers for text-to-image generation. Behaviour & Information Technology, 1-14. https://doi.org/10.1080/0144929x.2023.2286532

Oppenlaender, J., Linder, R., & Silvennoinen, J. (2023). Prompting ai art: An investigation into the creative skill of prompt engineering. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2303.13534

Paananen, V., Oppenlaender, J., & Visuri, A. (2023). Using text-to-image generation for architectural design ideation. International Journal of Architectural Computing, 14780771231222783.

Pavlichenko, N., & Ustalov, D. (2023). Best prompts for text-to-image models and how to find them. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2067-2071). https://doi.org/10.48550/arXiv.2209.11711

Phenaki. (2023). Phenaki: A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes. https://phenaki.video/

Psarra, S. (2009). Architecture and narrative: the formation of space and cultural meaning. Routledge.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S. & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763. PMLR. https://doi.org/10.48550/arXiv.2103.00020

Rand, A. (2015). The Fountainhead. Penguin Publishing Group.

Ross, S. M., & Morrison, G. R. (2013). Experimental research methods. In Handbook of research on educational communications and technology (pp. 1007-1029. Routledge.

Runway ML. (2024). Runway ML Web Site. https://runwayml.com/

Sahoo, P., Meharia, P., Ghosh, A., Saha, S., Jain, V., & Chadha, A. (2024). Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey. arXiv preprint arXiv. https://arxiv.org/pdf/2405.09589v1

Saussure, F. M. (2011). Course in general linguistics. Columbia University Press.

Smith, E. (2022). The traveler’s guide to the latent space. https://sweet-hall-e72.notion.site/A-Traveler-s-Guide-to-the-Latent-Space-85efba7e5e6a40e5bd3cae980f30235f

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of Machine Learning Research, 37:2256-2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html.

Steinfeld, K. (2023). Clever little tricks: a socio-technical history of text-to-image generative models. International Journal of Architectural Computing, 21(2), 211-241.

Sun, J., Wu, W., Liu, L., Min, W., Zhang, G., & Zheng, L. (2022). WallPlan: synthesizing floorplans by learning to generate wall graphs. ACM Transactions on Graphics (TOG), 41(4), 1-14.

Whang, S. E., Roh, Y., Song, H., & Lee, J. G. (2023). Data collection and quality challenges in deep learning: A data-centric ai perspective. The VLDB Journal, 32(4), 791-813.

Wu, A. N., Stouffs, R., & Biljecki, F. (2022). Generative adversarial networks in the built environment: A comprehensive review of the application of GANs across data types and scales. Building and Environment, 109477.

Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W. & Lu, D. (2020). Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5741-5750).

Semiotics-based prompt engineering for architectural text-to-image generation processes

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Important Information

Impact factor

Semiotics-based prompt engineering for architectural text-to-image generation processes

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Important Information

Impact factor

Follow us