There was once a time when I actually did interesting things with neural networks. Arguably my one claim to having a footnote in AI and machine learning history was something called Music-RNN.
Back around 2015, Andrej Karpathy released one of the first open source libraries for building (then small) language models. It was called Char-RNN, and it was unreasonably effective.
I had, back in 2014, just completed a master’s thesis and published a couple papers in lower tier conferences on stuff like neural networks and occluded object recognition, and figuring out the optimal size of feature maps in an convolutional neural network. I’d been interested in neural nets since undergrad, and when Char-RNN came out, I had an idea.
As someone who likes to compose and play music as a hobby, I decided to try modifying the library to process raw audio data and train it on some songs by the Japanese pop-rock band Supercell and see what would happen. The result, as you can tell, was a weird, vaguely music-like gibberish of distilled Supercell. You can see a whole playlist of subsequent experimental clips on YouTube where I tried various datasets (including my own piano compositions and a friend’s voice) and techniques.
Note that this was over a year before Google released WaveNet, which was the first of the real generally useful raw audio based neural net models for things like speech generation.
I posted my experiments on the Machine Learning Reddit and got into some conversations there with someone who was then part of MILA. They would, about a year later, release the much more effective and useful Sample-RNN model. Did my work inspire them? I don’t know, but I could hope that it perhaps made them aware that something was possible.
Music-RNN was originally made with the Lua-based version of Torch. Later, I would switch to using Keras with Theano and then Tensorflow, but I found I couldn’t quite reproduce as good results as I had with Torch, possibly because the LSTM implementations in those libraries was different, and not automatically stateful.
I also moved on from just audio modelling, to attempting audio style transfer. My goal was to try to get, for instance, a clip of Frank Sinatra’s voice singing Taylor Swift’s Love Story, or Taylor Swift singing Fly Me To The Moon. I never quite got it to work, and eventually, others developed better things.
These days there’s online services that can generate decent quality music using only text prompts, so I consider Music-RNN to be obsolete as a project. I also recognize the ethical concerns with training on other people’s music, and potentially competing with them. My original project was ostensibly for research and exploring what was possible.
Though, back in the day, it helped me land my first job in the AI industry with Maluuba, as a nice portfolio project along with the earthquake predictor neural network project. My posts on the Machine Learning Reddit also attracted the attention of a recruiter at Huawei, and got me set towards that job.
Somewhat regrettably, I didn’t open source Music-RNN when it would have still mattered. I was convinced by my dad back then to keep it a trade secret in case it proved to be a useful starting point for some kind of business, and I was also a bit concerned that it could potentially be used for voice cloning, which had ethical implications. My codebase was also, kind of a mess that I didn’t want to show anyone.
Anyways, that’s my story of a thing I did as a machine learning enthusiast and tinkerer back before the AI hype train was in full swing. It’s a minor footnote, but I guess I’m somewhat proud of it. I perhaps did something cool before people realized it was possible.