Speed vs Quality Trade-off. Use fewer steps (e.g., 10-step which takes ~4s/image on a single A6000 GPU) for faster generation, but quality may be lower.
Inpaint Position Freedom. Inpainting positions are flexible - they don't necessarily need to match the original text locations in the input image.
Iterative Editing. Drag outputs from the gallery to the Image Editing Panel (clean the Editing Panel first) for quick refinements.
Mask Optimization. Adjust mask size/aspect ratio to match your desired content. The model tends to fill the masks, and harmonizes the generation with background in terms of color and lighting.
Reference Image Tip. White-background references improve style consistency - the encoder also considers background context of the given reference image.
Resolution Balance. Very high-resolution generation sometimes triggers spelling errors. 512/768px are recommended considering the model is trained under the resolution of 512.