Home Programming News MediaPipe On-Gadget Textual content-to-Picture Era Answer Now Accessible for Android Builders — Google for Builders

MediaPipe On-Gadget Textual content-to-Picture Era Answer Now Accessible for Android Builders — Google for Builders

MediaPipe On-Gadget Textual content-to-Picture Era Answer Now Accessible for Android Builders — Google for Builders


Posted by Paul Ruiz – Senior Developer Relations Engineer, and Kris Tonthat – Technical Author

Earlier this 12 months, we previewed on-device text-to-image era with diffusion fashions for Android through MediaPipe Options. Immediately we’re blissful to announce that that is obtainable as an early, experimental resolution, Picture Generator, for builders to check out on Android gadgets, permitting you to simply generate photographs completely on-device in as rapidly as ~15 seconds on increased finish gadgets. We will’t wait to see what you create!

There are three major ways in which you need to use the brand new MediaPipe Picture Generator process:

  1. Textual content-to-image era based mostly on textual content prompts utilizing normal diffusion fashions.
  2. Controllable text-to-image era based mostly on textual content prompts and conditioning photographs utilizing diffusion plugins.
  3. Custom-made text-to-image era based mostly on textual content prompts utilizing Low-Rank Adaptation (LoRA) weights that can help you create photographs of particular ideas that you just pre-define in your distinctive use-cases.


Earlier than we get into the entire enjoyable and thrilling components of this new MediaPipe process, it’s necessary to know that our Picture Era API helps any fashions that precisely match the Steady Diffusion v1.5 structure. You should use a pretrained mannequin or your fine-tuned fashions by changing it to a mannequin format supported by MediaPipe Picture Generator utilizing our conversion script.

You can too customise a basis mannequin through MediaPipe Diffusion LoRA fine-tuning on Vertex AI, injecting new ideas right into a basis mannequin with out having to fine-tune the entire mannequin. You’ll find extra details about this course of in our official documentation.

If you wish to do this process out as we speak with none customization, we additionally present hyperlinks to some verified working fashions in that very same documentation.

Picture Era via Diffusion Fashions

Essentially the most simple technique to attempt the Picture Generator process is to present it a textual content immediate, after which obtain a outcome picture utilizing a diffusion mannequin.

Like MediaPipe’s different duties, you’ll begin by creating an choices object. On this case you’ll solely have to outline the trail to your basis mannequin recordsdata on the gadget. Upon getting that choices object, you possibly can create the ImageGenerator.

val choices = ImageGeneratorOptions.builder().setImageGeneratorModelDirectory(MODEL_PATH).construct()
imageGenerator = ImageGenerator.createFromOptions(context, choices)

After creating your new ImageGenerator, you possibly can create a brand new picture by passing within the immediate, the variety of iterations the generator ought to undergo for producing, and a seed worth. This may run a blocking operation to create a brand new picture, so you’ll want to run it in a background thread earlier than returning your new Bitmap outcome object.

val outcome = imageGenerator.generate(prompt_string, iterations, seed)
val bitmap = BitmapExtractor.extract(outcome?.generatedImage()

Along with this straightforward enter in/outcome out format, we additionally assist a manner so that you can step via every iteration manually via the execute() operate, receiving the intermediate outcome photographs again at completely different levels to indicate the generative progress. Whereas getting intermediate outcomes again isn’t really helpful for many apps attributable to efficiency and complexity, it’s a good technique to reveal what’s occurring underneath the hood. This is a bit more of an in-depth course of, however you’ll find this demo, in addition to the opposite examples proven on this submit, in our official instance app on GitHub.

Moving image of an image generating in MediaPipe from the following prompt: a colorful cartoon racoon wearing a floppy wide brimmed hat holding a stick walking through the forest, animated, three-quarter view, painting

Picture Era with Plugins

Whereas with the ability to create new photographs from solely a immediate on a tool is already a big step, we’ve taken it a bit additional by implementing a brand new plugin system which permits the diffusion mannequin to just accept a situation picture together with a textual content immediate as its inputs.

We at present assist three alternative ways that you would be able to present a basis in your generations: facial constructions, edge detection, and depth consciousness. The plugins provide the capacity to offer a picture, extract particular constructions from it, after which create new photographs utilizing these constructions.

Moving image of an image generating in MediaPipe from a provided image of a beige toy car, plus the following prompt: cool green race car

LoRA Weights

The third main function we’re rolling out as we speak is the flexibility to customise the Picture Generator process with LoRA to show a basis mannequin a couple of new idea, similar to particular objects, folks, or types offered throughout coaching. With the brand new LoRA weights, the Picture Generator turns into a specialised generator that is ready to inject particular ideas into generated photographs.

LoRA weights are helpful for circumstances the place it’s your decision each picture to be within the fashion of an oil portray, or a specific teapot to look in any created setting. You’ll find extra details about LoRA weights on Vertex AI within the MediaPipe Steady Diffusion LoRA mannequin card, and create them utilizing this pocket book. As soon as generated, you possibly can deploy the LoRA weights on-device utilizing the MediaPipe Duties Picture Generator API, or for optimized server inference via Vertex AI’s one-click deployment.

Within the instance under, we created LoRA weights utilizing a number of photographs of a teapot from the Dreambooth teapot coaching picture set. Then we use the weights to generate a brand new picture of the teapot in numerous settings.

A grid of four photos of teapots generated with training prompt 'a photo of a monadikos teapot'on the left, and a moving image showing an image being generated in MediaPipe from the propmt 'a bright purple monadikos teapot sitting in top of a green table with orange teacups'

Picture era with the LoRA weights

Subsequent Steps

That is just the start of what we plan to assist with on-device picture era. We’re wanting ahead to seeing the entire nice issues the developer neighborhood builds, so remember to submit them on X (formally Twitter) with the hashtag #MediaPipeImageGen and tag @GoogleDevs. You’ll be able to take a look at the official pattern on GitHub demonstrating all the things you’ve simply discovered about, learn via our official documentation for much more particulars, and keep watch over the Google for Builders YouTube channel for updates and tutorials as they’re launched by the MediaPipe group.


We’d wish to thank all group members who contributed to this work: Lu Wang, Yi-Chun Kuo, Sebastian Schmidt, Kris Tonthat, Jiuqiang Tang, Khanh LeViet, Paul Ruiz, Qifei Wang, Yang Zhao, Yuqi Li, Lawrence Chan, Tingbo Hou, Joe Zou, Raman Sarokin, Juhyun Lee, Geng Yan, Ekaterina Ignasheva, Shanthal Vasanth, Glenn Cameron, Mark Sherwood, Andrei Kulik, Chuo-Ling Chang, and Matthias Grundmann from the Core ML group, in addition to Changyu Zhu, Genquan Duan, Bo Wu, Ting Yu, and Shengyang Dai from Google Cloud.



Please enter your comment!
Please enter your name here