The utilization of AI To Form ‘Most efficient Girl’ While I Show cowl How It Works and How It will Switch Anime

Anime News


One of these images was drawn by a real artist. The rest of them were generated using AI models we'll talk about later. Can you tell which one is real? … It's this one. In this video, I'm gonna give you a general idea of how these things work, where and when it's useful, and the implications of this tech, not just for artists, but for the anime and manga.

Industry. To make this more interesting, we're gonna try and work towards a goal. Create a set of AI generated images that represent best girl. I'll explain later. Big thing to note, this will not be a highly technical video. It's meant to be accessible to people who have no AI background. I'll be simplifying the concepts to the point where you should.

Have the functional understanding needed to use the AI models and not much else. If it catches your interest after that, there's tons of free resources available on YouTube alone to learn more. If you're already a machine learning enthusiast or don't care about how it works, feel free to skip ahead to the sections showing results and impact. We start super simple.

Artificial intelligence is a very general term. Any algorithm capable of mimicking human thought processing in order to achieve a task counts as AI. Any kind of spam filter, the algorithms that control video game enemies, and of course the obvious stuff like sentient robots. Science has now validated your birth mother's decision to abandon you on a.

Doorstep. But the type of AI we'll be working with to achieve our goal behaves like this. Pretend this black box is our AI model. We don't know anything about how it works, but if we give it some kind of input, say a blank piece of paper, it will process it and spit out a result. That output would be the picture of an anime character. However, the box is initially.

Dumb. Whatever it draws is terrible and it won't even look like an anime character. We need to teach it how to do its task. This is done by giving it thousands of images of anime art. It will analyze each one and start to recognize the patterns that identify an image as an anime character. The drawings that it generates will slowly get better. If this process.

Goes on long enough, it will finally make something that resembles the real images we've been feeding it. This whole process is called training. We made the dumb AI smart by giving it real data it could use to correct itself. The field of AI that deals with this general approach is what we call machine learning. Once the model is trained, we can now just give it a.

Blank piece of paper and it'll create the drawing based on what it remembers. Though in reality, the piece of paper isn't blank. It's actually a bunch of random noise. The reason why we have to use this instead of a blank page is a bit nuanced, but to keep things simple, it's because we want variety. The model isn't very useful if it's generating the same thing every.

Time. When you're using AI models, you can grab already trained ones other people have uploaded to the internet. But if you're dealing with a very niche task, chances are you'll have to train your own. Now let's worry about what's in the box. It's a neural network. These are a subset of machine learning algorithms modeled after the human brain. Neural networks.

Consist of a bunch of nodes connected to other nodes, similar to how neurons in your brain connect to other neurons. When the model receives some kind of input, the first set of neurons will take it and do some processing. Some of them will light up depending on certain conditions and send a signal forward to other neurons. This continues until we get to the end.

Where the input has now been transformed into something different. That being whatever output we wanted the model to give us. Now we can start working towards our goal. So every year, the Reddit community r/anime has a popularity tournament for best girl. Users vote for best anime girl in an elimination bracket. I'm going to feed an AI the top four.

Finalists from these contests from 2014 to 2022 in hopes that it can learn from this data and give me a character that truly represents best girl. We're first going to look at a model called a GAN. GANs actually consists of a pair of neural networks, two brains that compete with each other. One of them is called the generator. Its job is to make the fake.

Images. The other one is the discriminator. This one's job is to try and catch the generator. Once the generator creates this fakes, they're mixed with some real images. The discriminator then tries to figure out which ones are real and which ones are fake. So the inputs and outputs of the two look like this. We want the generator to.

Win, but at the beginning, he's pretty dumb. All of his fakes are very different from the real images. Through the training process, the generator keeps improving until eventually he's repeatedly fooling the discriminator, but it's not like the discriminator's just going to stand by. In fact, this guy's not the smartest in the beginning too. He's also learning,.

Figuring out what separates a real image from a fake one and getting better at making this distinction. By competing with each other, both parties learn and are kind of responsible for improving the other. There are many types of GANs. I started with a very simple one called a DCGAN. If you're a machine learning enthusiast, here's the architecture. Don't.

Worry about what this visual means otherwise. The annoying thing about these models is that they need a lot of data in order to train them. I couldn't find a dataset consisting of thousands of images of just Mikasa's face anywhere. Same goes for the other best girl candidates, but I know it's out there somewhere. I decided to split the task into two phases. First,.

I'm going to try and get a model that's good at generating anime waifus in general. I'll be using a dataset of over 21,000 anime faces found on Kagle. Once that's accomplished, I'll focus on turning it into one that focuses on best girl candidates only. I don't expect the first run to go well, so I'm going to use small images to speed up the training. Excuse.

The low resolution. Here are the results of phase one. Yeah, we need to do better. Some of these are getting somewhere. I can tell these are anime girls. But then what is this? This one's hitting horror manga level stuff. Here's a look at the training process over time. You can see how the model's getting better at both variety and quality. We could do a lot to improve it.

This baseline only took an afternoon to put together, but we need to think bigger. Something put together thanks to years of research with lots of smart people's contributions. Enter StyleGAN3. This is a newer tech developed by NVIDIA. It's still a GAN, but it's much larger and more complex, so I expect it to be way better at learning features. A high quality GAN.

Takes tons of images to train properly, so I swapped out my 21K image dataset for a 90K one. There are a few problems here. To train StyleGAN3 from scratch on 1024 by 1024 resolution images, it would take me around six days if I was using eight NVIDIA A100 GPUs. I'm sitting here with one RTX 2070, so I took some shortcuts. I'm forced to use 64 by 64 resolution this.

Time, but I think this is fine for a proof of concept, especially since I know where this video is going. I also started with a version of this model NVIDIA had already trained on the Metfaces dataset. This is an image set of human faces from works of art. Why is this okay? I'm using a technique called transfer learning. The idea is pretty intuitive. If you wanna.

Learn a new skill that's similar to another skill you already know, there's a good chance that the past experience will help you pick it up faster, and maybe even do it better. The same logic applies to the model. These faces are kinda close to these faces, right? Hopefully. So I trained this one for about a day, and this is what we got. Better. Could use more.

Training, but I'll leave it there. We still get some wonky stuff, but it's usually not as bad as the first one, or as prevalent. Now we can move on to phase two. I put together a small dataset of images of the top four from every best girl contest. Keep in mind, the characters are the following. This is important to remember. There were duplicates across the.

Years, so we have 24 people total. Then I continued training the model, swapping to this new data. This is also transfer learning, but a more precise term would be fine tuning. I'm training the model on a very similar dataset to the point where you could consider this a subset. Our final results. Not too terrible. I needed more data. You can see the features of the.

Best girl candidates represented in our images, but some of these are too close to a single character. A state of the art GAN is an acceptable approach to synthetic image generation, so long as you have the compute power to train it. And I'm hoping this little experiment gets the point across. But if not, this is what StyleGAN3 can do with humans and animals. None of.

These are real people or animals. We're not stopping here. There is a whole other class of models we need to consider. Rather than generating images from noise alone, we can add words as an additional input. Text to image models involve typing in a prompt to describe the kind of image you want. That brings us to stable diffusion. To be blunt, this method beats.

GANs. This model has a lot of components. There's multiple neural networks working together this time. I'm just gonna focus on the core idea. I'll provide a link to a good beginner's guide if you want a more complete overview. As implied by the name, the model uses a concept called diffusion. Remember this from science class? It's this applied to images. When training this.

Model, there's both a forward and reverse diffusion phase. We are looping our images through the model multiple times, adding a little bit of noise to the image each time. It keeps doing this until it becomes completely just noise over thousands of steps. Then it learns to recover the values of the pixels it lost through the reverse process. By the end of training,.

The model should be capable of taking a noise distribution and denoising it. The result should be something similar to our target based on the patterns it's learned. So where does the text come in? Basically, the text guides the denoising process. The data used to train these need to be a combination of the images plus their prompts. Without text, the model would.

Just be creating random images based on the data. This type of model is way too computationally expensive for me to even want to attempt to train. I'll spare my graphics card the pain, but luckily generating waifus is like the number one use case for stable diffusion. So there are a lot of existing models already trained on anime art. The ones I'll be.

Using are Anything V5, Waifu Diffusion 1 .4, Dark Sushi 2.5, and NyanMix. You probably noticed that they all tend to have a certain style to them, but that's because I'm using specific tags like high quality and masterpiece. Because I include these in the text prompt, the models are returning whatever their understanding of high quality and masterpiece is. If I.

Remove these prompts, this is what Waifu Diffusion is capable of generating. Very different from the images I showed you before. Of course, this isn't always the case. Some of the models have been designed to show a specific style. You can also use negative prompts. These are things you don't want to see in the final result. Here's a summary of the top 1000.

Most used terms and prompts from CivitAI, a website that hosts a bunch of fine tuned stable diffusion models. Pretty much what I expected. I won't judge the community. Here's a summary of the top 1000 negative terms as well. Most of them exist to avoid the weird stuff the models tend to generate sometimes. As the user, you have a lot more parameters you can control too.

Sampling method, swapping variable auto encoders, changing denoising strength and more. I've never used stable diffusion before this video and there's no way I can optimize the results all that well without a bunch of practice. So I decided to keep everything at the recommended settings if mentioned in the respective repos. If not, I use the defaults. No multiple passes.

Either. Stable diffusion actually lets you provide an image plus text as input, if you want that is. The result is another image based on the one you provided. For example, I might think that Ventus from Kingdom Hearts isn't anime boy enough for me. I can change that by using this image plus this prompt and putting that into anything V5 and boom, problem solved. You.

Can tell how important the prompt is from this. I didn't specify the hair color I wanted, so it took liberties. It's also not good at key blades. A pretty popular technique is to put your already generated image back into the model to further modify it using this image to image feature. We're not gonna do that. This is the specific negative prompt I'm using,.

That's it. For the normal prompt, what I did was include the names of all our characters in addition to some extra tags shown here. I randomly shuffled the order of the terms and randomly added or removed them to further increase variety. When it comes to using pre-trained stable diffusion models, this is honestly not a great approach, but I think it is the most.

Entertaining. I don't know exactly what data these have been trained on and there's a good chance some of these words don't carry any meaning to the model, but we do know they've certainly been trained on fan art of anime characters. I'm also giving up a lot of control. You're supposed to be pretty precise with your text prompts. I told it to make one girl.

And then dump 24 names on it. Who knows how it decides to interpret that? Here's how it went. Anything V5 has a habit of including multiple characters despite me specifying one girl and solo as tags. The additional people look pretty weird too. All of the models had trouble with these girls. It seems likely that their names may not carry any meaning to the model or.

There were very few samples for these. For Winry, if I eliminated all of their characters and added in Fullmetal Alchemist as a term, it would register. But for Yui, I still got nothing. Hollow was not able to get wolf ears in anywhere, unfortunately. Though if I added Spice and Wolf as a term and removed all other characters, I did get ears, just not.

Hollow's ears. It's probably interpreting the wolf part of the term, not the spice. NyanMix might have done the worst as far as the goal went, but its style is very distinct. The images are quite colorful with a high contrast. It's a semi -realistic style, often using dark backgrounds with some detail or simple white ones. Dark Sushi is a very popular.

Model. The 2.5 version I'm using is a bit different from the regular one, merging another model called Chillout Mix into it, as well as a bit of Niji Express V2, as the repo describes. It's got slightly darker and heavier tones. I think Waifu Diffusion performed the best with respect to the goal. I'm seeing aspects of characters here I didn't see in the other.

Models. I think a large part of that is because Waifu Diffusion doesn't have as much of a distinct style. Look at that. And to conclude, I'll point out some limitations. Hands are tricky. Sometimes you only see two fingers, sometimes it's a fist. Too many combinations, not constant across samples. Actually generating anything that's complex is an issue for.

The models. Combining 2B and Mikasa, pretty doable. Even the outfits look fused. Making 2B fight Mikasa, that's trickier. I often just got two Mikasas or two 2Bs. Never got them to be actively fighting each other. As of now, the model looks at each word individually and can't figure out the context. Maybe you can get lucky and the right connection between the.

Words are made. Otherwise, you'll have to rely on post-processing with potentially limited success. Though there are some good ways to work around that now. Before I move on, bonus segment. There are stable diffusion models trained with a focus on male characters. Here are the best guy candidates from the r/anime contest. And here's some results based on a model.

Called Mature Mail Mix. To the model, I guess All Might meant shredded and buff and Saitama meant partial capes and partial baldness. Koro Sensei seems to be the yellow balls. That's the end of the crash course. Now, is AI art as a whole, visual art, music, writing, and so on, a threat? It depends what you mean by threat, but it's probably one of these two.

Things. For the first point, my opinion as a software engineer who sometimes optimizes and improves AI models for deployment is, I don't know. Though full disclosure, I don't work with generative AI models and I'm not an AI expert. My domain is like object detection and super resolution. Your job is definitely gonna be fine with respect to current day tech.

I showed you some limitations, but you've probably seen how badly these models can perform when pushed to do anything complex, even before this video. The strategy here is to use the weaknesses of the models as your strengths and focus on creating what the AI can't deliver. The future, however, is difficult to predict. We don't know when the next big.

Breakthrough, like the jump from GANs to diffusion models will happen or the breakthrough after that one. The pace is unpredictable. It has the potential to change everything and I think I might be a little terrified. For point two, yes, your art will be stolen and I'm not sure if any countermeasures you take will do anything. The whole scenario seems pretty familiar.

We've had to rethink copyright laws multiple times as technology has evolved. In the early 20th century, it was crazy that record labels and music performers could take and sell music compositions without the composer's permission. Fast forward to when radio stations started broadcasting records. Record labels turned around and hated that. Cable TV then.

Became an upset for radio broadcasters. VCRs were an upset for cable operators. Recording shows, that's crazy. The internet was an upset for major movie studios. Looping back around to music, remember Napster? Biggest change to the music industry to date. While AI models often use copyrighted material for training, the output of the models.

Themselves are original. Artists are currently fighting back, but so did all the other industries I mentioned before. Maybe this will be considered fair use of copyrighted material in 10 years, who knows? Right now at least, most major corporations are playing by the rules and licensed datasets. Many are also open source in the first place, like the ones.

On Kaggle. This video you're watching might one day be used to train some AI model, and I might never know about it. The ethical and philosophical debate is a thing I don't want to get into here, I just want to bring attention to it. I would summarize the popularity of AI models down to five things. These three are often combined into the idea of.

Improving your own workflow. These two are often combined for entertainment. I guess that's what this video is. Utility brings us to the manga and anime industry. As we've all heard, being an animator is difficult. Lots of overwork, low pay. Can AI help to relieve this, or will it just take the jobs away? There is a short film called The Dog and the Boy that was.

Released by Netflix in joint production with Rina Inc. and Wit Studio. It used AI to generate the background scenes. Netflix justified its existence by saying it was an effort to help the anime industry's labour shortage, which just invited a lot of criticism and response. AI is probably going to be something animation studios continue to experiment with. If there's.

Something that might save time and money on the table, and you're a business, you wouldn't just ignore it, right? Though I have to stress this point. The primary goal of incorporating AI right now is not to replace, it's meant to assist. Then we turn to manga. Similar thing, mangakas work very long hours and are often underpaid. Assuming they draw digital, AI.

Could be a big time saver if they were to use it in the workflow, maybe for generating their backgrounds as well. What if a mangaka trained a model on their own drawings? Having some base image of a character doing something, then going in and modifying it is a thing people already do. But this has the effect of decreasing assistant involvement. Many manga artists.

Start off as assistants. What kind of implications would this have for their skill development? There's a philosophical argument to be made too. The work made by an artist reflects personal experiences, emotions, and intentions. It's those aspects that create a deep connection between the work and its audience. AI art is replicating emotion. Does this devolve.

The work's meaning? As our boy Miyazaki once said after watching an AI generated animation, I strongly feel that AI is an insult to life itself. Love this guy's quotes. Our world is changing and it's without certainty in terms of how. Check out these videos. Hopefully this content won't be easy for a machine to replicate anytime soon.

Sharing is caring!

3 thoughts on “The utilization of AI To Form ‘Most efficient Girl’ While I Show cowl How It Works and How It will Switch Anime

Leave a Reply