Predicting stock directionality with NuPIC and Aqua

Thanks for all your replies! That’s exactly what I needed to know. Let me give you a bit more background on what I’m using Nupic for.

Presently, I have almost completed a system which accepts a list of stock ticker symbols, to make predictions of each selected stock. Or even Forex currency, bond or any other financial instrument. The end goal being development of a virtual analyst and trader combined into one, similar to other AI investor projects I’ve read about. Like in this article in Wired magazine.

I’ve developed a library which parses out the column names and data types from an input.csv file, and formats the input.csv file using Nupic’s format, meaning the data type names in row 2, a blank row 3 and the data begins on row 4. The parser’s main job is to return a return a swarm description object.

As you might imagine, in its next step it queues the list of stocks and input files up and swarms over them one at a time, saving each model to a database for later use. After the models are built, it needs to spawn multiple processes to pump data through each model as fast as it can, hence my question. Basically, it’s exactly what HTM engine does (which is why I posted the thread there originally), but I decided to build it from scratch using Nupic’s API. There are many other APIs embedded in Aqua, but Nupic is the backbone of everything.

I built a proof of concept, which was a great success! As a way for the public to begin to garner confidence in the predictions made by the system (code named Aqua), I published results of Aqua’s stock predictions over a 1 month period in Feb. 2017 with directional accuracy at 75% and as high as 94% with predictions not published over a 3 month time period from Dec. 9, 2016 to Feb 28, 2017.

Before I go further, I have to say big time props to the entire Numenta team! Because, your technology is easier to use, stable (meaning not subject to challenges presented by random initialization, at least as much as ANNs and Deep Networks are) and most importantly far more accurate than any other machine learning framework I’ve used (and I’ve used more than a few!). :slight_smile:

There are two end goals with Aqua.

Goal # 1

Launch mobile and web applications allowing a regular lay person to have a competitive advantage over large organizations like JPMorgan Chase, Goldman Sachs, Merrill Lynch and the like. I don’t have anything against those companies, they’re fine companies. But, there seems to be a big gap in access to opportunity between the wealthy and the less than wealthy, which is a lot of people! It seems like there’s a real opportunity to earn wealth and bring much needed value to many regular and hardworking people and families.

In the next 10-20 years, I see AI becoming a powerful force in the work place, eventually replacing most jobs. And I think regular people will need a way to generate a household income as more and more jobs are replaced by AI and eventually robotics. It seems like various governments are making some moves to bring financial security to regular “Janes” and “Joes”, but I don’t think it’ll be good enough for the millions of people in America or billions living abroad. It’s just my way of giving back to a world who’s given me so much, while at the same time accomplishing goal # 2.

Goal # 2

The secondary goal, but because of my passion for technology could easily be listed as equally high a priority to me personally, is to garner capital from financial systems in order to fund research in technologies I believe will make the greatest difference to human kind. Namely, and in no particular order, research in artificial general intelligence (I have dreams of one day making a financial contribution to Numenta!), biomedical technologies to cure disease and extend human life, and technology developments specifically targeting those believed to improve and enrich the quality of human life (read nano-tech, artificial limbs and organs, digitizing the human mind… I could go on, but hopefully you get the idea).

Except for the predictions made in February, this is the first openly public news we’ve released about this project. I hope this is the right forum for such news. I and my two partners will be launching a Kickstarter in the upcoming few weeks to raise funding and secure servers and infrastructure, so that we can share Aqua with the world. The version 1.0 release will be limited to US citizens only, mainly because of international business requirements. But, Aqua will be released world-wide as soon as possible. Possibly by middle of next year.

I have a few reasons for posting this information here. First and foremost to say thank you to Numenta for all your amazing research and work! I’ve truly been amazed and inspired by your work! And to say thank you for making the Nupic framework available via open-source. Viva le open-source!! :slight_smile: And I’m curious if anyone in the Nupic forum might be interested in partnering with us as we venture forward?

I’ve launched three other companies in the past (the first when I was 17 and in high school) each with a greater level of success than the prior one. However, I’ve never attempted anything at the level I am now. Version 1.0 is 3 weeks from completion. Depending on how much funding we secure, we need to hire at least one, but preferably two, Sr. level developers. We need Jack’s of all trades and masters with Nupic’s framework!

Launching something of this magnitude is… well… I’m not sure how to describe it. Perhaps lots and lots of work, very difficult, scary at times, but mostly exhilarating and amazing! But, I imagine I’m preaching to the choir if you’re on this forum. I’m willing to bet many of you have participated in launching successful companies. After all, great minds think alike, no?

Presently, our most immediate need is to secure angel investor capital to acquire infrastructure. I must admit, I know next to nothing about raising capital. I have an aunt in Texas who successfully raised capital to stand up several hyperbaric chamber clinics inside hospital grounds. Her advice has been valuable and well received. And I have many contacts in the venture/angel investment world. I’m finalizing a presentation today and tomorrow. But, I’m more a thought leader (read tech nerd/addict), and definitely open to any suggestions from those with greater experience. Especially in how to approach investors, how to structure the right message, how to provide adequate evidence our technology can do what we say it can, and how to align the company’s interests with those of potential investors.

At any rate, sorry my long-winded-ness. But, if any of you are interested in partnering, or if you have any ideas or suggestions, I’d be grateful for any advice, suggestions or a point in the right direction.

2 Likes

P.S. I should have also mentioned I understand to use Nupic in a commercial system it needs to be licensed. I’ll be in touch with the right people at Numenta before releasing anything in a commercial product.

That is false, actually. You may use NuPIC in a commercial system without talking to Numenta about a dual commercial license as long as you respect the terms of the AGPL.

1 Like

What exactly are you saying here? You are predicting market close prices will be either “up” or “down” with 75% accuracy? That is quite a feat if so!

2 Likes

but because of my passion for technology could easily be listed as equally high a priority to me personally, is to garner capital from financial systems in order to fund research in technologies

why not trade it then and donate the proceeds as a grant to a research team… Like Numenta for instance :smile_cat:

incidentally - in todays news : AI wins $290,000 in Chinese poker competition - BBC News

1 Like

Sounds like an awesome project. Your ambition is inspiring, and I hope you succeed, whether in this project or the next.

I do think there is some due diligence left on prediction accuracy though. The last few months have been an extremely bullish run for stock markets, thereby making an ‘up or down’ prediction different from a 50/50 chance event. Therefore, the significance of the accuracy should be tested.
Furthermore, directionality of stock prices is actually rarely comparable to a coin flip. It is usually true (depending on the frequency of data: daily, minute, second,…, and the observed period) that there are more down ticks than up, but that the upticks are stronger than the downward movements. That is why comparing the predictions of directionality with a 50/50 chance would be misguided.

Personally, I am researching the earnings-generating processes of businesses and have little evidence that the temporal pooler is contributing to the prediction accuracy on my data, despite the results being relatively good. Stock prices at a daily frequency are even more aggregated forms of data than mine, and I would be astonished if it could be shown that NuPIC can predict any aspect of them, except perhaps considering high-frequency movements.

I suggest you try out disabling the temporal pooler and see what happens to the accuracy of the predictions. If the algorithm exclusively uses the spatial pooler, then, for scalar variables, the prediction should be a type of weighted average of what usually follows a given observation, rather than a given sequence. In other words: a blunt first-order prediction. If disabling the TP does not worsen your prediction accuracy, then you have an indication that sequences are actually being recognized. (There are likely more ways to investigate this).

To start testing accuracy more thoroughly, why not set the algorithm to work on historical data to evaluate potency? Plenty of databases are available. I can send you a database with more than a year’s worth of minute data of S&P100 firms if you are interested.

For your business, prediction accuracy might not be crucial, but if it is, it should not be assumed.

Regardless of these reservations, I admire your courage, and hope you succeed. Good luck! :slight_smile:

1 Like

Very well put. Made me think again.

Quick correction: when you said “temporal pooler” what you meant was temporal memory. And as an aside…

I don’t think temporal memory alone is enough to get the prediction performance we want, because the “next active cells” doesn’t contain enough information alone. Adding temporal pooling in the way Jeff has talked about (as a sensorimotor inference layer) is a step in the right direction. We are trying to figure it out from a completely different angle now, but the result will be something we can re-use for temporal pooling over the TM. The fact that the distal input to a layer can potentially represent so many things is a mind-blower for me.

NuPIC has never been that great at prediction unless there were specific recognizable temporal patterns involved. We never expected it to do well with stock prediction. We found the value of NuPIC in anomaly detection. Our most successful apps have all been anomaly detection apps. That is where we focused most of our engineering efforts for several years, before going back to research on sensorimotor stuff.

To get really good predictions, you’re going to need to know what sequences are occurring at any point, and that requires temporal pooling.

3 Likes

@mellertson I’m a 13yr trading veteran and have been dabbling in doing the very thing you’re describing in your post with Nupic. I also happen to work in the startup ecosystem now specifically on the fundraising side of things. I’d be interested in having a more detailed conversation and seeing how we might collaborate.

2 Likes

This is likely worth it’s own category, because I’m sure a lot of people are dabbling with AI alchemy (myself included).

I’ve been playing with a couple different systems and I think HTM has a lot of promise for this application, but I believe it needs to be applied in a more aggregate way then a tick prediction.

When making a risk managed investment you really need to understand ranges so you don’t “stop out” before it moves your way. What I’m exploring now is an extension of the area of Technical Analysis. I’ve had some success in FOREX/Stock experiments using Candlestick Patterns, but it is tedious and time consuming to do manually. The image processing success of VitaminD/Sighthound made me think I could accomplish something useful by recognizing candle patterns and predicting the next ones.

I haven’t gone to a vision based system yet. Basically, I’m feeding the system normalized candle data and seeking to predict the [open, high, low, close] values for the next day. I’ve normalized the candles to provide a consistent frame, and allow for the historical data of the stock to be treated on the same range. I just got it running this week by using the htm.java package written by @cogmission. I need to work on the parameters though to get something meaningful.

I’m happy to share my code, and welcome your help on the parameters.

1 Like

Thanks! And yes sir, that’s exactly what I’m saying. To clarify, let me be more specific. The system, Aqua, predicts whether the stock’s close price will close at a higher price (“up”) or if it will close at a lower price (“down”) on the following trading day. I’m calling the “up” or “down” movement the “directional movement” of the stock.

During the month of February I decided to publish results on Facebook, as a way to “prove to the world” (so to speak) of Aqua’s capabilities. Talk about a nerve racking month! It’s one thing to say your system can do a thing. But to prove it in a public forum… phew! But, getting back on topic. During the month of February 2017 I published the predictions before the trading day’s opening bell (meaning before 6:30 am PST). And and I kept a running score card of the results and also published it on Facebook.

Here’s the score card I kept for both time-frames.

As you can see, sometimes the system doesn’t spit out a prediction. On those days I indicate “No Prediction” in the column titled “Predicted Direction” and I omit that day from the score keeping.

At the end of February, excluding Feb 6 till Feb 13th (the server was down due to maintenance), Aqua had predicted the directional movement of UWT with 75% accuracy for 13 trading days.

While results were only published for February 2016, from Dec. 21, 2016 until Feb. 8, 2017 the accuracy of Aqua’s predictions reached 78.95%.

@Andres_Diana Thanks for the interest! Can you shoot me your office’s contact info? I’m curious to learn more about your company. And about any successful fundraising project’s you’ve worked on.

I completely agree, and have seen the same. To your point, in the month of March 2017 OIL stocks in general took a nose dive due to many factors including moves by OPEC affecting the supply of oil on the market, and instability and war in the middle eastern regions. Case in point, the predictions were correct a very low percentage of the time from March 1 until the first week in April.

That being said, I’m not just feeding in raw price data into Nupic. There are a lot of really smart people in the world, and I’m sure they would figured that out long before I got here. :slight_smile: I’ve been trying to solve this problem for 3+ years. Long story short, Nupic’s API is a big part of the “secret sauce”, but there are many other discoveries that have been made in the last 3+ years.

@Timo_1028 thanks so much for the offer to use your minute data! I’ll gladly take you up on that offer. How much storage space does your data take up? If it’s under 2.5GBytes, I can give you my public Dropbox link.

Good to hear you’re closely following realized prediction accuracies.

I am glad I can help with some data. The file is about 930MB (182MB zipped), so I’ll drop it off as I get the link :slight_smile:

1 Like

Thanks @Timo_1028, I’ll look forward to seeing the link.

In the mean time, I came across a new company, Alpha Vantage, who is publishing historical and real-time equity data for free via a RESTful JSON API. I believe their from New Zealand, at least that’s where their domain name is registered to (I did a bit of sleuthing).

I contacted them a couple weeks ago, they seem like good guys and/or gals. I’m not sure how far back their historical data goes, but I seem to recall it only went back a few years. I’m not 100% sure, so just don’t quote me on that! :slight_smile:

And I haven’t verified if their data is accurate. I’m not saying it’s inaccurate, I just haven’t verified the accuracy of their data.

@jon I just re-read your reply. I’ve also experienced the same time / tedium requirement when applying machine learning to prediction of financial markets. I was literally obsessed for the first year, staying up all night several nights in a week. I’m sure I grew a few grey hairs in the process.

I finally decided to automate many of the tedious and time consuming tasks via a genetic algorithm. And it’s paid off with more experiments running on their own without the need for manually tuning parameters.

On additional success it lead to was also based, at least in part, on a theory that many stocks/ETFs/currency might move through times where they become more of less predictable. For example, it makes sense that stocks/funds based on corn would tend to fall around harvest time due to increased supply (an oversimplification I know, but just for example sake). One could analyze weather patterns to predict higher probability of severe crop damaging storms in the mid-west and yielding in higher predictability of corn prices over long time periods.

But, there’s always the random chance a freak storm based on who knows what destroys crops at a whim, introducing seemingly random patterns to a well understood system.

The presumption for this example being, corn prices might be moderately to highly predictable during normal weather patterns. But, random storms or pest / disease outbreaks might introduce unforeseen variables resulting in seemingly “random” price swings or “unpredictability”.

With that theory in mind, it seemed to me I needed to find a way to detect whether a stock was in a predictable period of time or not. That proved to be quite difficult, so I finally decided to predict every stock on the market, and created a metric showing how predictable a particular stock was over the specified period of time. In that way, I could just take the top 10 or 20 stocks for the latest time period and viola, they’d be guaranteed to be the most predictable.

The next challenge was how to predict the entire stock market, in real-time, and constantly, and autonomously adapt to changing market conditions. That problem as it turns out is solved by two things:

  1. lots and lots of servers
  2. tons of real-time and historical market data

Or in other words, a big budget to buy all the servers and data. Couple a big budget with the right technology and you have Aqua. Actually, Aqua was just a code name used on this forum. The actual name of the product we’ll be releasing is DaviidAI (re-branded from DaviidtheQuant). The idea behind the name is DaviidAI VS the Goliaths on Wall Street. Round one, ready fight! :smile:

In the next week we’ll be publishing a Kickstarter campaign to raise funding for the servers we need and additional feeds of real-time stock data. I’ll post a link to the Kickstarter here. We’re pretty excited about it! And we’re all to happy to plug Numenta if @rhyolight approves!

Mike, did you try real-time inference? Could you please share some metrics. What’s the Nupic response time when it is fully trained and in inference mode? How long it takes to give you a prediction after single input? Thanks

@maxima120, yes, the system is using real-time inference. Presuming you are asking for performance metrics, I don’t have exact numbers. But, it takes something like 15 seconds to predict ~1,000 input records. Please take this number with a grain of salt, I’m just using a very subjective human observation when I say 1,000 records in 15 seconds.

I’m literally right at the beginning of our Kickstarter campaign, so my time is quite limited until mid to late July. I’d be happy to get you some specific metrics around the end of July. If you’d like me to do that, please shoot me a message around the end of July as a reminder. :slight_smile:

Thank you I will do that.

1 Like

@mellertson

Have you tried your algorithm on Forex exchange data ? Also when you say 75% to 95% how many testing samples have you used to come up with this number.

1 Like