Text-to-image is the simplest AI generation mode: prompt in, picture out. "Cozy reading nook with warm wood and brass lighting" → an image. There's no input photo to anchor the result, so the model invents the entire scene.
For interior design, text-to-image is most useful at the start of a project — moodboard generation, idea exploration, "what does this style look like applied to this room type?" It's less useful once you have a real space to redesign, because there's no way to constrain the output to your room's actual proportions and structure. For that, you switch to img2img.
The quality of text-to-image output is heavily prompt-dependent. Vague prompts produce generic stock-style renders; specific prompts (with material names, lighting setup, camera angle) produce more interesting results. This is why prompt-engineering guides exist — small wording differences materially change what the model produces.