Voice synthesis

Discussion in 'General Modification' started by Sitra Achara, Jan 25, 2021.

Remove all ads!
  1. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
    This is pretty neat.



    Might be useful for generating voiced dialogue. I imagine it's not that easy to get quality output, depending on how complex / emotive you want the dialogue to be.

    I wonder if it can be adapted to emulate ToEE's NPCs so we can fill in the additional unvoiced dialogue. Sounds like sthg @Gaear would want to play with.

    I saw a similar AI thing for portraits, but couldn't get it to generate ToEE-like portraits unfortunately.
     
    anatoliy likes this.
  2. Gaear

    Gaear Bastard Maestro Administrator

    Joined:
    Apr 27, 2004
    Messages:
    11,027
    Likes Received:
    37
    I haven't seen this ^ but Artbreeder could generate infinite unique portraits now, for everything, based on existing portraits. Amazing how far technology has come.
     
  3. Endarire

    Endarire Ronald Rynnwrathi

    Joined:
    Jan 7, 2004
    Messages:
    874
    Likes Received:
    90
    Thankee! Alleluia!
     
  4. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
    Been following this. It's an impressive project, still being actively developed - and you can even grab it on steam now:

    https://store.steampowered.com/app/1765720/xVASynth_v2/

    About generating voices based on ToEE sound files (AKA training voice models): this capability isn't publicly available right now, since the toolset to do that is too complicated for the normal user. Dev's working on a streamlined tool called xVATrainer, however. It's pretty far along now, seems like it should be available in the next few months. Will keep an eye on it.

    On a personal note, back when I tried my hand at modding ToEE quests (the Traders mod etc), one of the things that held me back was that I didn't want to add too many lines to voiced NPCs, since it would contrast heavily with the original content. Not that I plan to revisit that, but it'd be so cool if that problem was finally solved once and for all :)
     
    anatoliy likes this.
  5. Shiningted

    Shiningted I want my goat back Administrator

    Joined:
    Oct 23, 2004
    Messages:
    12,581
    Likes Received:
    313
    That would indeed be awesome. I would love to add some dlg to Calmert for one.
     
  6. Pygmy

    Pygmy Established Member Supporter

    Joined:
    Oct 8, 2010
    Messages:
    631
    Likes Received:
    53
    Don't you mean Jaroo rather than Calmert? Jaroo's spider quest dialog is presently silent.
     
  7. Shiningted

    Shiningted I want my goat back Administrator

    Joined:
    Oct 23, 2004
    Messages:
    12,581
    Likes Received:
    313
    No, I meant Calmert - there's all sorts of things I'd like him to say.
     
  8. Endarire

    Endarire Ronald Rynnwrathi

    Joined:
    Jan 7, 2004
    Messages:
    874
    Likes Received:
    90
    CALMERT: "Ach! I want yer money!"
     
  9. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
    xVATrainer is progressing more rapidly than I thought! Steam release is tentatively slated for April 8.
     
  10. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
  11. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
    It's out!

    Sadly, it looks like my GPU is not up to task... I guess it's pretty old by now.

    I asked around if there's a cloud solution and apparently someone already set up a Colab notebook:

    https://colab.research.google.com/drive/1YqNFvFZRAUbZ1xAFJNXVTGxn1C6tOfVQ?usp=sharing

    I'm trying that out now, training a voice model based on Burne. Also got the Colab Pro subscription, it's just 10$ a month (and the shekel is strong now, hehe). We'll see how it turns out.


    (Note that the database preparation still has to be done in the xVATrainer framework, but it's fairly straightforward)
     
    anatoliy and Buffed Rabbit like this.
  12. Sitra Achara

    Sitra Achara Senior Member

    Joined:
    Sep 1, 2003
    Messages:
    3,592
    Likes Received:
    515
    A few notes in case anyone else is interested:

    1. Keep the sound clips shorter than 10s.
    You can use the diarization tool for that (e.g. put all the >500kB files in the input folder and run it on them). It will split 'em up for you.
    I've started over the training from scratch because some of the lines were 17s long, and after shortening the length the training time was around 50% shorter, and that's despite adding even more lines to the dataset.
    2. You don't have to parse the dlg files - the auto-transcribe tool is pretty good, you can make do with it + manual correction here and there. It doesn't have to be perfect anyway.
    3. Training takes a really long time. Depends on hardware in general, but you're looking at 2-3 days minimum, and that's with a high end GPU (this is well known and not particular to my setup/database or anything like that).
    Training in Colab Pro, you get a P100 with 16GB of VRAM. FastPitch model stages 1-3 took me 13 hours, and the notorious stage 4 is still running (currently already about 2 days in, and it looks like it could take another day or two to finish). This is why I wouldn't recommend doing it on your PC, unless you have a really beastly GPU, like RTX 3090 or whatever with over 20GB VRAM.
    4. If there's enough demand for ToEE voice models, I could arrange a one-month subscription for Colab Pro+ which has even higher end hardware (40GB GPU), so I could probably train a whole bunch of voices in a short span of time. But I guess we should wait and see how Burne turns out first :) I saw others arranging for a similar setup on the xVA discord, perhaps we could hitch a ride on that as well.
     
    anatoliy likes this.
Our Host!