An eccentric dreamer in search of truth and happiness for all.

Category: Life

On Dreams

When I was a child, I wanted, at various times, to be a jet fighter pilot, the next Sherlock Holmes (unaware he was fictional), or a great scientist like Albert Einstein. As I grew older, I found myself drawn to creative hobbies, like writing stories (or at least coming up with ideas for them) and making computer games in my spare time. In grade 8 I won an English award, mostly because I’d shown such fervour in reading my teacher’s copy of The Lord Of The Rings, and written some interesting things while inspired to be like J.R.R. Tolkien, or Isaac Asimov.

In high school my highest grades were reserved for computer science initially, where I managed to turn a hobby of making silly computer games into a top final project a couple years in a row. Even though, at the end of high school, I won another award, this time the Social Science Book award, after doing quite well in a modern history class, I decided to go into computer science in undergrad.

For various reasons, I got depressed at the end of high school, and the depression dragged through the beginning of undergrad where I was no longer a top student. I struggled with the freedom I had, and I wasn’t particularly focused or diligent. Programming became work to me, and my game design hobby fell by the wayside. Writing essays for school made me lose interest in my novel ideas as well.

At some point, one of the few morning lectures I was able to drag myself to was presented by a professor who mentioned he wanted a research assistant for a project. Later that summer, I somehow convinced him to take me on and spent time in a lab trying to get projectors to work with mirrors and fresnel lenses to make a kind of flight simulator for birds. It didn’t go far, but it gave me a taste for this research thing.

I spent the rest of my undergrad trying to shore up my GPA so I could get into a masters program and attempt to learn to be a scientist. In a way, I’d gone full circle to an early dream I had as a child. I’d also become increasingly interested in neural networks as a path towards AI, having switched from software design to cognitive science as my computing specialization early on.

The masters was also a struggle. Around this time emotional and mental health issues made me ineffective at times, and although I did find an understanding professor to be my thesis supervisor, I was often distracted from my work.

Eventually though, I finished my thesis. I technically also published two papers with it, although I don’t consider these my best work. While in the big city, I was also able to attend a machine learning course at a more prestigeous university, and got swept up in the deep learning wave that was happening around then.

Around then I devoted myself to some ambitious projects, like the original iteration of the Earthquake Predictor and Music-RNN. Riding the wave, I joined a startup as a data scientist, briefly, and then a big tech company as a research scientist. I poured my heart and soul into some ideas that I thought had potential, unaware that most of them were either flukes of experimental randomness, or doomed to be swept away by the continuing tide of new innovations that would quickly replace them.

Still, I found myself struggling to keep working on the ideas I thought were meaningful, and became disillusioned when it became apparent that they wouldn’t see support and I was sidelined into a lesser role than before, with little freedom to pursue my research.

In some sense, I left because I wanted to prove my ideas on my own. And then I tried to do so, and realized that I didn’t have the resources or the competency. Further experiments were inconclusive. The thing I thought was my most important work, this activation function that I thought could replace the default, turned out to be less clearly optimal than I’d theorized. My most recent experiments suggest it still is something that is better calibrated and leads to less overconfident models, but I don’t think I have the capabilities to turn this into a paper and publish it anywhere worthwhile. And I’m not sure if I’m just still holding onto a silly hope that all the experiments and effort that went into this project weren’t a grand waste of time.

I’d hoped that I could find my way into a position somewhere that would appreciate the ideas that I’d developed, perhaps help me to finally publish them. I interviewed with places of some repute. But eventually I started to wonder if what I was doing even made sense.

This dream of AI research. It depended on the assumption that this technology would benefit humanity in the grandest way possible. It depended on the belief that by being somewhere in the machinary of research and engineering, I’d be able to help steer things in the right direction.

But then I read in a book about how AI capability was dramatically outpacing AI safety. I was vaguely aware of this fact before. The fact is that these corporations and governments want AI for the power that it offers them, and questions about friendliness and superintelligence seem silly and absurd when looking at the average model that simply perceives and reports probabilities that such and such a thing is such.

And I watched as the surveillance economy grew on the backs of these models. I realized that the people in charge weren’t necessarily considering the moral implications of things. I realized that by pursuing my dream, I was allowing myself to be a part of a machine that was starting to more closely resemble a kind of dystopian nightmare.

So I made a decision. That this dream didn’t serve the greatest good. That my dream was selfish. I took the opportunity that presented itself to go back to another dream from another life, the old one about designing games and telling stories. Because at least, I could see no way for those dreams to turn out wrong.

In theory, I could work on AI safety directly. In theory I could try a different version of the dream. But in practice, I don’t know where to begin that line of research. And I don’t want to be responsible for a mistake that ends the world.

So, for now at least, I’m choosing a dream I can live with. Something less grand, but also less dangerous. I don’t know if this is the right way to go. But it’s the path that seems open to me now. What else happens, I cannot predict. But I can try to take the path that seems best. Because my true dream is to do something that brings about the best world, with the most authentic happiness. How I go about it, that can change with the winds.

The Great Debates

I’m currently still trying to decide what I should even post here. I tend to post more personal stuff on Facebook and to a lesser extent on Twitter, but my fiancee thinks it might be unwise to publish personal details on a public facing blog like this one.

Possibly I could focus more on professionally relevant ideas, but I’m not sure what I can offer in that regard. Anything really worth publishing should probably go into a proper paper rather than some random blog on the Internets. I suppose I could write opinions about philosophical things, but that overlaps with the Pax Scientia wiki that I was working on building earlier.

I probably have too many of these projects that don’t get enough attention anyway. I’ve been trying to consolidate them recently, but I worry that the resulting web presence is still far too sprawling and even less clear to navigate without the delineations.

Another debate I’ve been having recently is whether to put more effort into my creative writing. I want to eventually write a novel. It’s a vague goal I’ve had since I was a kid. I have lots of ideas for stories, but I’ve always had trouble actually getting down to writing the ideas down into actual narratives. Sometimes I wonder if I actually have the writing ability to justify the effort, whether it makes sense to add yet another piece of literature to the ever expanding pile of books in the world.

I spent a long time working out in my head the worlds that I want to write about. In some sense, if I don’t write, it’ll have been a waste. But I’m not sure my imagination is that much more extraordinary enough to justify the effort in the first place.

I also claim to be a music composer and a game designer, the other two arts that I have some capacity in. To what extent would those be more appropriate uses of my time? To what extent is writing more worthwhile than composing songs for instance? I can hash out a song somewhat faster than a novel, but I also as yet don’t consider my songs to be particularly notable either.

My thoughts on why writing was my first choice in terms of artistic expression were originally and ostensibly because writing allows me to communicate ideas rather than just emotions like with music. And writing can be done on my own, rather than needing an artist and a team for game development. Admittedly, the creator of Stardew Valley did it on his own, but I don’t have the visual art skills for that, and I don’t see myself having the patience to become good at drawing at this point.

In another debate, I’ve also been considering a change of career path. Working in machine learning has been exciting and lucrative, but the market now seems increasingly saturated as the most competent folks in the world recognized the hype and adjusted their trajectories to compete with mine. Whereas a few years ago I was one of maybe a couple hundred, now there seem like thousands of people with PhDs who outclass me.

At the same time, I’ve wondered about whether or not the A.I. Alignment problem, the existential risk of which has been the focus of several books by prominent philosophers and computer scientists, isn’t a more important problem that needs more people working on it. So I’ve wondered if I should try switching into this field.

Admittedly, this field seems to be still in its infancy. There’s a bunch of papers looking at defining terms and building theoretical frameworks, and little in the way of even basic toy problems that can be coded and tested. I’m personally more of an experimentalist than a theoretician when it comes to AI and ML, mostly because my mathematical acumen is somewhat lackluster, so I’m not sure how much I can help push forward the field anyway.

On a more philosophical note, it seems the social media filter bubble has been pushing me more to the left politically. At least, I find myself debating online with Marxists about things and becoming more sympathetic to socialism, even though a couple years ago I was a moderate liberal. I’m not sure how much to blame the polarization of social media, and how much it’s the reality of disillusionment with the existing world.

I also have mixed feelings in part because the last company I worked for was, according to media outlets, controversial, but to me it was the company that gave me a chance to work on some really cool things and paid me handsomely for my time and energy. Admittedly, as a lowly scientist working in an R&D lab, I wouldn’t have been privy to anything untoward that could have been happening, but it was always jarring to see the news articles that attacked the company.

I left more for personal reasons, partly some issues of office politics that I wasn’t particularly good at dealing with. My own criticisms of the company culture would be much more nuanced, aware that any major corporation has its internal issues, and that many of them are general concerns of large tech companies.

The debates in my head are somewhat bothersome to be honest. But at the same time, it means I’m thinking about things, and open to updating my understanding of the truth according to new evidence, factored with my prior knowledge.

The March of Progress

Where to begin. I guess I should start with an update about some of the projects I’ve been working on recently… First, the Earthquake Predictor results that can be found at cognoscitive.com/earthquakepredictor are just over a year out of date. I still need to update the dataset to include the past year’s earthquakes from the USGS, but I’ve been busy first using the existing data as a benchmark to test some changes to the loss function and architecture that I want to utilize. I’m still debating whether to continue using an asymmetrical loss like Exp Or Log, or Smooth L1 Or L2, or to switch to the symmetric Smooth L1, which would reduce false positives substantially. My original reason for an asymmetric loss was to encourage the model to make higher magnitude predictions, but I worry that it makes the model too eager to guess everywhere that earthquakes are frequent, rather than being more discriminating.

Music-RNN has run into a weird problem where I’m having difficulty reproducing the results that I got with the old Torch library a few years ago with the Keras port. It’s probably because the Keras version isn’t stateful, but it could also be that some of the changes I made to improve the model have backfired for this task, so I need to do some ablation studies to check. My modification for Vocal Style Transfer is on hold until then.

In other news, a couple of neat projects I’ve been trying include: Lotto-RNN, and Stock-RNN.

Lotto-RNN is a silly attempt to predict Lotto Max numbers on the theory that some of them, like the Maxmillions draws, are pseudorandom because they are done by computer rather than ball machine, and thus might be predictable… Alas, so far no luck. Or rather, the results so far are close to chance. I’m probably not going to spend more time on this long shot…

Stock-RNN is a slightly more serious attempt to predict future daily price deltas of the S&P500 given previous daily price deltas. It uses the same online stateful architecture that seemed to work best for the Earthquake Predictor before. The average result of ten different initializations is about +9% annual yield, which falls below the +10.6% that you’d get from just buying and holding the index over the same period. Technically, the best individual model result achieved +14.9%, but I don’t know if that’s a fluke and won’t regress to the mean.

I also tried a stateless model for Stock-RNN, but it performed much worse. There are some things I can do to adjust this project. For instance, I could modify the task to try to predict the annual price delta instead, and train it on many stocks instead of just the S&P500 index, and use it to pick stocks for a year rather than just guess where to buy or sell daily. Alternatively, I could try to find a news API for headlines and use word vectors to convert them into features for the model.

On the research front, I was also able to confirm that the output activation function I originally named Topcat, does seem to work, and doesn’t require the loss function modifications that I’d previously thought were necessary, but works if you use it with binary crossentropy in place of softmax and categorical crossentropy. I still need to confirm the results on more tasks though before I can seriously consider publishing the result somewhere. There’s actually a few variants, mainly two different formulas and various modifications that seem to be functional.

It also looks like a hidden activation function I was working on that I named Iris also seems to work better than tanh. (Edit: More testing is required before I can be confident enough to say that.) Like with Topcat, I have several variants of this as well that I need to decide between.

Another thing that seems to help is scaling the norm of the gradients of an RNN, rather than just clipping the norm as is standard. Previously, I’d thought that setting the scaling coefficient to the Golden Ratio worked best, but my more recent tests suggest that 1.0 works better. Again, it’s something I need to double check on more tasks.

Some things that turned out to not work reliably better than the control include: LSTM-LITE, my tied weights variant of the LSTM, my naively and incorrectly implemented version of Temporal Attention for sequence-to-sequence models, and a lot of the places where I used the Golden Ratio to scale things. The formula for Iris does have a relation to Metallic Ratios, but it’s not as simple as scaling tanh by the Golden Ratio, which weirdly works on some small nets, but doesn’t scale well. Interestingly, the Golden Ratio is very close to the value suggested to scale tanh in this thread on Reddit about SELU. So, it’s possible that that would be theoretical justification for it. Otherwise, I was at a loss as to why that seemed to work sometimes.

I’m also preparing to finally upgrade my training pipeline. In the past I’ve used Keras 2.0.8 with the Theano 1.0.4 backend in Python 2.7. This was originally what I learned to use when I was at Maluuba, and conveniently was still useful at Huawei, for reasons related to the Tensorflow environment of the NPU. But, it’s way out of date now, so I’m looking at Tensorflow 2.1 and PyTorch 1.4. An important requirement is that the environment needs to be deterministic, and Tensorflow 2.1 introduced better determinism, while it’s been in PyTorch for several versions now.

I’ve used both Tensorflow and PyTorch in the past at work, though most of my custom layers, activations, and optimizers are written in Keras. Tensorflow 2.0+ incorporates Keras, so in theory, it should be easier to switch to that without having to rewrite all the customizations, but just adjust the import statements.

I’ve also switched to Python 3, as Python 2 has apparently reached end-of-life. Mostly, this is requires some small changes to my code, like replacing xrange with range, and possibly paying attention to / versus // in terms of division of integers.

One thing I’ve realized is that my research methodology in the past was probably not rigorous enough. It’s regrettable, but the reality is that I wasted a lot of experiments and explorations by not setting the random seeds and ensuring determinism before.

Regardless, I’m glad that at least some of my earlier results have been confirmed, although there are some mysterious issues still. For instance, the faulty Temporal Attention layer shouldn’t work, but in some cases it still improves performance over the baseline, so I need to figure out what exactly it’s actually doing.

In any case, that’s mostly what I’ve been up to lately on the research projects front…

Page 3 of 3

Powered by WordPress & Theme by Anders Norén