An eccentric dreamer in search of truth and happiness for all.

Category: Projects

The Story of Music-RNN

There was once a time when I actually did interesting things with neural networks. Arguably my one claim to having a footnote in AI and machine learning history was something called Music-RNN.

Back around 2015, Andrej Karpathy released one of the first open source libraries for building (then small) language models. It was called Char-RNN, and it was unreasonably effective.

I had, back in 2014, just completed a master’s thesis and published a couple papers in lower tier conferences on stuff like neural networks and occluded object recognition, and figuring out the optimal size of feature maps in an convolutional neural network. I’d been interested in neural nets since undergrad, and when Char-RNN came out, I had an idea.

As someone who likes to compose and play music as a hobby, I decided to try modifying the library to process raw audio data and train it on some songs by the Japanese pop-rock band Supercell and see what would happen. The result, as you can tell, was a weird, vaguely music-like gibberish of distilled Supercell. You can see a whole playlist of subsequent experimental clips on YouTube where I tried various datasets (including my own piano compositions and a friend’s voice) and techniques.

Note that this was over a year before Google released WaveNet, which was the first of the real generally useful raw audio based neural net models for things like speech generation.

I posted my experiments on the Machine Learning Reddit and got into some conversations there with someone who was then part of MILA. They would, about a year later, release the much more effective and useful Sample-RNN model. Did my work inspire them? I don’t know, but I could hope that it perhaps made them aware that something was possible.

Music-RNN was originally made with the Lua-based version of Torch. Later, I would switch to using Keras with Theano and then Tensorflow, but I found I couldn’t quite reproduce as good results as I had with Torch, possibly because the LSTM implementations in those libraries was different, and not automatically stateful.

I also moved on from just audio modelling, to attempting audio style transfer. My goal was to try to get, for instance, a clip of Frank Sinatra’s voice singing Taylor Swift’s Love Story, or Taylor Swift singing Fly Me To The Moon. I never quite got it to work, and eventually, others developed better things.

These days there’s online services that can generate decent quality music using only text prompts, so I consider Music-RNN to be obsolete as a project. I also recognize the ethical concerns with training on other people’s music, and potentially competing with them. My original project was ostensibly for research and exploring what was possible.

Though, back in the day, it helped me land my first job in the AI industry with Maluuba, as a nice portfolio project along with the earthquake predictor neural network project. My posts on the Machine Learning Reddit also attracted the attention of a recruiter at Huawei, and got me set towards that job.

Somewhat regrettably, I didn’t open source Music-RNN when it would have still mattered. I was convinced by my dad back then to keep it a trade secret in case it proved to be a useful starting point for some kind of business, and I was also a bit concerned that it could potentially be used for voice cloning, which had ethical implications. My codebase was also, kind of a mess that I didn’t want to show anyone.

Anyways, that’s my story of a thing I did as a machine learning enthusiast and tinkerer back before the AI hype train was in full swing. It’s a minor footnote, but I guess I’m somewhat proud of it. I perhaps did something cool before people realized it was possible.

Letting Go

I’ve officially discontinued the Earthquake Predictor project for now.  I still need to do backups of the data, but I’m no longer going to have the daily update script running.  It had stopped for a while now due to a bug anyway, and I wasn’t able after a cursory effort to debug it so that it worked again.  It also honestly didn’t do anything relevant, as the model only predicted high frequency, low magnitude quakes and not the low frequency, high magnitude ones that mattered.  The architecture I was using was my old LSTM-RNN model, which has largely been superseded in the literature by the transformer architecture, so it badly needed to be taken offline and retrained anyway.  Not sure if I’ll ever get around to retraining it with the new architecture.  It kinda seems like a very silly longshot of a project that kept one entire GPU busy because I could never figure out how to properly unload models from memory without stopping the process.  Even if it only updated once a day, it was a hassle because the thing would randomly error out and I’d have to start it up again after having missed several days of updates and having to backdate the sequence of updates made with the appropriate 24 hours of data.  All in all, not a great system.

The Great Debates

I’m currently still trying to decide what I should even post here. I tend to post more personal stuff on Facebook and to a lesser extent on Twitter, but my fiancee thinks it might be unwise to publish personal details on a public facing blog like this one.

Possibly I could focus more on professionally relevant ideas, but I’m not sure what I can offer in that regard. Anything really worth publishing should probably go into a proper paper rather than some random blog on the Internets. I suppose I could write opinions about philosophical things, but that overlaps with the Pax Scientia wiki that I was working on building earlier.

I probably have too many of these projects that don’t get enough attention anyway. I’ve been trying to consolidate them recently, but I worry that the resulting web presence is still far too sprawling and even less clear to navigate without the delineations.

Another debate I’ve been having recently is whether to put more effort into my creative writing. I want to eventually write a novel. It’s a vague goal I’ve had since I was a kid. I have lots of ideas for stories, but I’ve always had trouble actually getting down to writing the ideas down into actual narratives. Sometimes I wonder if I actually have the writing ability to justify the effort, whether it makes sense to add yet another piece of literature to the ever expanding pile of books in the world.

I spent a long time working out in my head the worlds that I want to write about. In some sense, if I don’t write, it’ll have been a waste. But I’m not sure my imagination is that much more extraordinary enough to justify the effort in the first place.

I also claim to be a music composer and a game designer, the other two arts that I have some capacity in. To what extent would those be more appropriate uses of my time? To what extent is writing more worthwhile than composing songs for instance? I can hash out a song somewhat faster than a novel, but I also as yet don’t consider my songs to be particularly notable either.

My thoughts on why writing was my first choice in terms of artistic expression were originally and ostensibly because writing allows me to communicate ideas rather than just emotions like with music. And writing can be done on my own, rather than needing an artist and a team for game development. Admittedly, the creator of Stardew Valley did it on his own, but I don’t have the visual art skills for that, and I don’t see myself having the patience to become good at drawing at this point.

In another debate, I’ve also been considering a change of career path. Working in machine learning has been exciting and lucrative, but the market now seems increasingly saturated as the most competent folks in the world recognized the hype and adjusted their trajectories to compete with mine. Whereas a few years ago I was one of maybe a couple hundred, now there seem like thousands of people with PhDs who outclass me.

At the same time, I’ve wondered about whether or not the A.I. Alignment problem, the existential risk of which has been the focus of several books by prominent philosophers and computer scientists, isn’t a more important problem that needs more people working on it. So I’ve wondered if I should try switching into this field.

Admittedly, this field seems to be still in its infancy. There’s a bunch of papers looking at defining terms and building theoretical frameworks, and little in the way of even basic toy problems that can be coded and tested. I’m personally more of an experimentalist than a theoretician when it comes to AI and ML, mostly because my mathematical acumen is somewhat lackluster, so I’m not sure how much I can help push forward the field anyway.

On a more philosophical note, it seems the social media filter bubble has been pushing me more to the left politically. At least, I find myself debating online with Marxists about things and becoming more sympathetic to socialism, even though a couple years ago I was a moderate liberal. I’m not sure how much to blame the polarization of social media, and how much it’s the reality of disillusionment with the existing world.

I also have mixed feelings in part because the last company I worked for was, according to media outlets, controversial, but to me it was the company that gave me a chance to work on some really cool things and paid me handsomely for my time and energy. Admittedly, as a lowly scientist working in an R&D lab, I wouldn’t have been privy to anything untoward that could have been happening, but it was always jarring to see the news articles that attacked the company.

I left more for personal reasons, partly some issues of office politics that I wasn’t particularly good at dealing with. My own criticisms of the company culture would be much more nuanced, aware that any major corporation has its internal issues, and that many of them are general concerns of large tech companies.

The debates in my head are somewhat bothersome to be honest. But at the same time, it means I’m thinking about things, and open to updating my understanding of the truth according to new evidence, factored with my prior knowledge.

The March of Progress

Where to begin. I guess I should start with an update about some of the projects I’ve been working on recently… First, the Earthquake Predictor results that can be found at cognoscitive.com/earthquakepredictor are just over a year out of date. I still need to update the dataset to include the past year’s earthquakes from the USGS, but I’ve been busy first using the existing data as a benchmark to test some changes to the loss function and architecture that I want to utilize. I’m still debating whether to continue using an asymmetrical loss like Exp Or Log, or Smooth L1 Or L2, or to switch to the symmetric Smooth L1, which would reduce false positives substantially. My original reason for an asymmetric loss was to encourage the model to make higher magnitude predictions, but I worry that it makes the model too eager to guess everywhere that earthquakes are frequent, rather than being more discriminating.

Music-RNN has run into a weird problem where I’m having difficulty reproducing the results that I got with the old Torch library a few years ago with the Keras port. It’s probably because the Keras version isn’t stateful, but it could also be that some of the changes I made to improve the model have backfired for this task, so I need to do some ablation studies to check. My modification for Vocal Style Transfer is on hold until then.

In other news, a couple of neat projects I’ve been trying include: Lotto-RNN, and Stock-RNN.

Lotto-RNN is a silly attempt to predict Lotto Max numbers on the theory that some of them, like the Maxmillions draws, are pseudorandom because they are done by computer rather than ball machine, and thus might be predictable… Alas, so far no luck. Or rather, the results so far are close to chance. I’m probably not going to spend more time on this long shot…

Stock-RNN is a slightly more serious attempt to predict future daily price deltas of the S&P500 given previous daily price deltas. It uses the same online stateful architecture that seemed to work best for the Earthquake Predictor before. The average result of ten different initializations is about +9% annual yield, which falls below the +10.6% that you’d get from just buying and holding the index over the same period. Technically, the best individual model result achieved +14.9%, but I don’t know if that’s a fluke and won’t regress to the mean.

I also tried a stateless model for Stock-RNN, but it performed much worse. There are some things I can do to adjust this project. For instance, I could modify the task to try to predict the annual price delta instead, and train it on many stocks instead of just the S&P500 index, and use it to pick stocks for a year rather than just guess where to buy or sell daily. Alternatively, I could try to find a news API for headlines and use word vectors to convert them into features for the model.

On the research front, I was also able to confirm that the output activation function I originally named Topcat, does seem to work, and doesn’t require the loss function modifications that I’d previously thought were necessary, but works if you use it with binary crossentropy in place of softmax and categorical crossentropy. I still need to confirm the results on more tasks though before I can seriously consider publishing the result somewhere. There’s actually a few variants, mainly two different formulas and various modifications that seem to be functional.

It also looks like a hidden activation function I was working on that I named Iris also seems to work better than tanh. (Edit: More testing is required before I can be confident enough to say that.) Like with Topcat, I have several variants of this as well that I need to decide between.

Another thing that seems to help is scaling the norm of the gradients of an RNN, rather than just clipping the norm as is standard. Previously, I’d thought that setting the scaling coefficient to the Golden Ratio worked best, but my more recent tests suggest that 1.0 works better. Again, it’s something I need to double check on more tasks.

Some things that turned out to not work reliably better than the control include: LSTM-LITE, my tied weights variant of the LSTM, my naively and incorrectly implemented version of Temporal Attention for sequence-to-sequence models, and a lot of the places where I used the Golden Ratio to scale things. The formula for Iris does have a relation to Metallic Ratios, but it’s not as simple as scaling tanh by the Golden Ratio, which weirdly works on some small nets, but doesn’t scale well. Interestingly, the Golden Ratio is very close to the value suggested to scale tanh in this thread on Reddit about SELU. So, it’s possible that that would be theoretical justification for it. Otherwise, I was at a loss as to why that seemed to work sometimes.

I’m also preparing to finally upgrade my training pipeline. In the past I’ve used Keras 2.0.8 with the Theano 1.0.4 backend in Python 2.7. This was originally what I learned to use when I was at Maluuba, and conveniently was still useful at Huawei, for reasons related to the Tensorflow environment of the NPU. But, it’s way out of date now, so I’m looking at Tensorflow 2.1 and PyTorch 1.4. An important requirement is that the environment needs to be deterministic, and Tensorflow 2.1 introduced better determinism, while it’s been in PyTorch for several versions now.

I’ve used both Tensorflow and PyTorch in the past at work, though most of my custom layers, activations, and optimizers are written in Keras. Tensorflow 2.0+ incorporates Keras, so in theory, it should be easier to switch to that without having to rewrite all the customizations, but just adjust the import statements.

I’ve also switched to Python 3, as Python 2 has apparently reached end-of-life. Mostly, this is requires some small changes to my code, like replacing xrange with range, and possibly paying attention to / versus // in terms of division of integers.

One thing I’ve realized is that my research methodology in the past was probably not rigorous enough. It’s regrettable, but the reality is that I wasted a lot of experiments and explorations by not setting the random seeds and ensuring determinism before.

Regardless, I’m glad that at least some of my earlier results have been confirmed, although there are some mysterious issues still. For instance, the faulty Temporal Attention layer shouldn’t work, but in some cases it still improves performance over the baseline, so I need to figure out what exactly it’s actually doing.

In any case, that’s mostly what I’ve been up to lately on the research projects front…

Powered by WordPress & Theme by Anders Norén