What started off as a parody project turned into a serious learning experience and gift for my family at Christmas
Backstoryβ
This month I spent a bit of time creating a book for my son using AI (Midjourney (MJ), Lightroom Upscaling, and a little help from Grammarly & ChatGPT). It started as a children's book parody and even included the lyric 'AI-AI ...Oh?'. After seeing another AI-generated book going viral, my interests shifted to try to answer a few questions:
- How consistent can I get a collection of images without any base image prompts?
- What kind of quality and content is possible in ~20 hours?
- I read about 5 books a night so how about I make a book for my son π
So, after a little tinkering, reading the MJ manual and reading older blogs about image prompt optimisation I set up a discord server and invited the Midjourney bot to the party. From then on I was in full commitment mode. I've broken into a bit more detail below but here is the finished book (~60MB).
The Discord Serverβ
After some playing about on the Midjourney discord, I decided that although it was quite interesting and inspirational seeing thousands of people submit their prompts (and countless newbies uploading selfies) I would be better organised using the bot in my own server. A few clicks later and I was ready to go. My approach was to create a separate channel for each subject matter so that I could always reference back to something quickly, this worked very well as the number of images increased, although it was quite fiddly at first.
Character developmentβ
From messing around on the MJ discord I obtained a starter image to work with for further development alone. My hero was a 7-year old boy with blonde, curly hair, and blue eyes. I also decided to add in an appropriate colour pallete to work with; "Vaporwave" has been popular this year, and since the book is a mixture of retro and tech I went with that.
Viola, I had the first hero image.
I upscaled the 4th option, and re-used this in further MJ prompts.
Next, I thought about the typical traits found in my hero? What adjectives can be used to describe the behaviour of a young child of this age? Here were a few I reused.
curious, innocent, happy, cute, tears, emotional, contempt, belly laughing, funny, full of awe, inspired, tantrum, screaming
Now that I had a few more adjectives, and a source image I combined them to produce a series of images with my hero in various poses. First up, side profile...
That's a bit too much vaporwave, perhaps not re-use that word again ...
Much better. Now how about other poses
Surprisedβ
Sittingβ
You get the idea. Many more images later I had a great set of poses and positions to work with.
Scenesβ
Next up were the scenes that I wanted to include in the book. I have to admit that by the time I had spent an hour or so looking at this I wanted to just pop my hero in the scene rather than combine the two, however I did have some unique results combining both the scene and character. The takeaway here is that in MJ v4 it is far better to describe than to use multiple source images.
Bedroomβ
In the one scene I described in detail I went for a Wes Anderson style here.
Pagesβ
The remaining time was spent describing the scenes of the book and generating hundreds of variations. Below are a few favourites that didn't make it to the final copy.
Putting it togetherβ
Putting the book together was pretty quick; Canva is by far the most intuitive web based design tool available for print. I decided that the book dimensions would be 8.5" x 8.5", a common children's book format (or at least in our household).
There was quite a lot of back and forth between canva and MJ as the pages came together. I found myself going back to replace images as the consistency, quality and my prompting improved. There were many images that I scrapped just because they no longer fit.
Finally, at about 15 hours in I finalised the story and pages. Knowing exactly what I wanted I was able to go through each page once more to see how I could improve the quality or overall consistency throughout.
Manual edits and AI up-scalingβ
I want to be able to print this book eventually but MJ v4 only offered the support for 1MP images, which at 300dpi printed would not look that great, So I loaded each image into Adobe Lightroom and ran it through a further AI upscale, achieving 2MP images. This is still not exactly the best print candidate resolution but not bad for 30 seconds of work. I was tempted to run multiple passes through but didn't want to risk a real world Pinocchio.
I also used Lightroom to manually edit some images, removing skin markings, adjusting lighting, colours and focal points, and blurring weird stuff that would scare the living heck out of a child.
Midjourney AI Key Takewaysβ
Thanks for staying until the end, here's a few key takeways from what I learned from this project
πΆβπ«οΈ - It's amazing at generating consistent faces given a prompt and seed parameter. None of the images generated used actual photos as source material, it's all text. Although I did reinsert generated images to generate variations.
π± - the additional parameters are your best friend, for all the images I used the exact same seed number. The seed number generates random noise as a starting point for MJ to base all of its outputs on. It's vital for maintaining consistency. RTM π
π§ - for character development, start with a simple description and generate various poses of them for the scenes you want. I created my own discord server for this project so had a different channel for each pose/subject.
π€· - MJ is pretty bad at interpreting directions of multiple objects in a scene, so it's generally easier to describe everything looking in the same direction or toward/away from each other. Same goes for the 'front' or 'back' of things, and which way objects are facing.
β- hands/fingers are terrible generally random results, although I had good experience by including "symmetrical hands" in my prompts
π - like hands, and unless the main focus of the image is a face, eyes are hit or miss. Adding "symmetrical eyes" did improve things
βοΈ- similar to direction, orientation is hard to get right. I really struggled to get anything good for swimming, falling back, sleeping or horizontal prompts.
Final Thoughtsβ
As the famous quote goes, "With great power comes great responsibility." I believe that AI has the potential to revolutionize the creative process, and it is our responsibility as creators to use this technology ethically and responsibly.
There's been a lot of heated discussion and news about AI tools' general availability and ethical uses recently (ChatGPT, Midjourney, DALLΒ·E 2, Runway etc.). AI is here to stay, and tools like ChatGPT are already changing how I work and think about the future of working/collaborating with creators. Clearly, though, many discussions and legislation are needed around attribution and licensing for proper commercial use.
All in all, this was a fun personal project, and I hope it sparks your imagination/interest into what can be accomplished.