Approved Ideas
These ideas have mentors assigned to them. There is a better chance that student proposals will be accepted if implementing an approved idea.
1. Benchmark NuPIC CLA with other Machine Learning techniques on standard datasets
The HTM/CLA theory is a novel approach to machine learning (ML) and even the NuPIC implementation is still developing rapidly. To gain more interest from the academia, and for potential users, it is important to evaluate real performance of NuPIC CLA (*) with other (the best, most commonly used) ML algorithms being used, and know performance on the standard datasets.
Example ML algorithms to compare could be: SVMs, (recurrent)NN, echo-state NN, LTST-mem (long-term short-term memory), HMMs, Rule-based- forests of classification trees, anomaly detection, …
Interesting datasets could be: the “Iris dataset” (classification, 100 features), sequence mining datasets, “shopping cart” analysis data, signal predictions (medical, financial,…); possible with more work even extending NuPIC to domains it does not cover yet (vision data)-> digit recognition,…
Expected Outcome
Repository with code that allows to run NuPIC and the other ML algorithms (can use existing libraries) on chosen datasets.
Time needs to be taken to select appropriate ML algorithm for given datasets, tune both CLA and other ML algorithm for “best” results on given domain, and evaluate the results.
Summary of the experiments in a form of a wiki page or a paper (pdf).
Knowledge Prerequisite
- good knowledge of Artificial Intelligence/Machine Learning to be able to choose appropriate datasets, and best algorithms, tune them and interpret results
- basic programming needed to implement the experiments
- (optional) medium C++/Python programming for extending NuPIC on new domains (vision, …)
Difficulty
Mid (up to hard) – depends on how much work you want to devote to fine-tuning the algorithms and how many (new, difficult) domains you want to conquer. Personally I think it would be possible (even suitable) to have more students working on this idea - either on different domains/algorithms, or even on same (independently) and comparing results.
Mentor: from dept. Cybernetics, CTU, Chetan Surpur (csurpur)
Proposed by: Marek Otahal (~breznak)
Interested parties: Marek Otahal (~breznak)
2. Implement, document and evaluate hierarchical CLAs in NuPIC
Hierarchies of CLAs play an integral role in the HTM/CLA theory. However the state of implementation in NuPIC is currently incomplete. The task would be to explore the existing C++ (Link) code we have, finalize it, and provide documentation and examples of creating and running hierarchies of CLA, provide some benchmarks of performance.
Expected Outcome
Submit (and have it reviewed and accepted) PRs to NuPIC that implement support for forming hierarchies of CLAs (some older code already present in NuPIC), provide documentation and examples of hierarchical CLAs.
Provide performance of running the hierarchical CLA.
Knowledge Prerequisites
- good C++ programming (to review and optionally finalize code for hierarchies)
- medium Python (to write tests and examples)
- knowledge of the HTM/CLA theory CLA White Paper
Difficulty
- medium (largely depends on state of hierarchies in NuPIC, which to find out is part of the task)
Mentors: Marek Otahal (~breznak), Scott Purdy (~scottpurdy)
Proposed by: Marek Otahal (~breznak)
Interested parties:
3. Research connection capabilities between CLAs and biological (wet) neural networks (and spiking ANNs)
This is more of a research idea. As HTM/CLA is a very strongly nature-human neocortex inspired model, it would be interesting to evaluate (theoretically and on practical real data) its capability to communicate with real (human, wet, biological) neural networks. This could be very interesting for medical/“cyborg” research.
Expected Outcome
- provide capability to operate between two different neural networks models (CLA-spiking NNS) - an encoder (?)
- model the communication between CLA and “biologically complex, accurate NNs” (SNNs, other advanced architectures, project BlueBrain) artificial NN models
- model ability of CLA to process biologically obtained (from a neural field brain probe) data
- write a paper about the research
Knowledge Prerequisites
- strong background in AI and biology
- good knowledge of HTM/CLA theory
- (medium) programming needed to conduct the experiments
- proactive attitude (to obtain the data, etc)
Difficulty
Mentor: from dept. Cybernetics, CTU
Proposed by: Marek Otahal (~breznak)
Interested parties: Marek Otahal (~breznak)
4. Refactor nupic.core
for a flow based programming model
So that it behaves more like flow based programming components or Lego pieces that can be snapped together.
- General Background: Nupic consists of three main parts. Encoders, regions and classifiers (henceforth named components). Encoders are like our sensory organs, regions are the layers in the brain and the classifier essentially gives meaning to the streams of data.
-
Background Specific to the task:
Nupic.core has a graph component manager written in C++. It manages execution of the components. One needs to explicitly specify the types of links between components (2D or 3D). Each of these links are bounded buffers and are setup at compile time. - Goal: These bounded buffer links should be implicit to the linking of components. At no point does one mention a link(edge). All one does is say the output of this component goes to the input of that component. Components should know what type of link they need (2D or 3D) at runtime. Secondly these components should be able to talk over TCP/IP networks via sockets (preferably use ROS for that).
Expected Outcome
The graph component manager class is no longer part of nupic.core
and is no longer used. Components may talk to each other without explicitly mentioning the types of links, and will communicate via sockets/ROS. Link information is determined at runtime. The components can be used separatedly, with a mix of “other components” and for all sorts for connections.
Knowledge Prerequisite
- C++
- (ROS)
- Read CLA White Paper
Difficulty
Mentor: Marek Otahal (~breznak)
Proposed by: Stewart Mackenzie
Interested parties:
5. Demonstrate core properties of the SpatialPooler
The goal of this project is to explore core properties of the spatial pooler, and demonstrate these to the community. The student would write code for each experiment plus detailed documentation describing it.
Here are some core properties and questions to explore:
- Impact of similar inputs: how does the output change as inputs have varying levels of similarity?
- Impact of noise: how does the output change as you add noise?
- Capacity of SpatialPooler: how many distinct patterns can the spatial pooler represent accurately?
- Trained vs an untrained SpatialPooler: how do the SDR’s differ?
- Impact of changing various parameters such as number of columns, sparsity and increment/decrement ratios.
- Training two fields (e.g. using two encoders) where the fields have varying levels of correlation
- Demonstrate boosting
- Spatial pooling with 2D inputs
Expected Outcome
- Working code + documentation that potentially becomes part of NuPIC
- Graphs and figures
- Maybe a video presentation
- If the person is ambitious and does a nice job of the writeup, it could potentially be submitted for publication somewhere.
Knowledge Prerequisite
- Knowledge of Python+Numpy or C++
- Good programming skills
- Basic background in probability or statistics will be very helpful
Difficulty
Intermediate level. The person will need to develop a very deep understanding of the SpatialPooler as described in the white paper, plus the impact of various parameters.
Mentor: Subutai Ahmad (subutaiahmad, subutai@numenta.org)
Proposed by: Subutai Ahmad
Interested Parties:
- add your name here
- if you’re interested in
- working on this idea
6. (Re)introduce reconstruction to NuPIC
The aim of this project is to allow top-down
flow of information in all separate parts of NuPIC (encoders, spatial pooler, temporal pooler). To illustrate that image the input of CLA ‘A’ gets encoded as 0011111000
. With reconstruction we should be able to insert a pattern P=0011111000
(or slightly modified 0011.11000
) on the top of the region, follow the existing weights (top-down flow) and produce input that most likely produced pattern P
.
Challenges and current state:
- Very useful for number of experiments with separate parts of CLA
- Some interested parties (rightfully) say it’s not 100% how it biologically works, however it also “kind of” is how it works (top-down propagation of information)
- Allows us to replace functionality of
Classifier
- Once there has been this functionality, so maybe it can be resurrected from Git, or taken as inspiration
- Could be done in either (best both) parts of C++ and Python code
- Q: Noisy patterns can produce invalid inputs, eg: A->
0011
, B->1100
;0110
=?? - Q: Temporal pooler could be tricky with the predictive cells (horizontal activations)
Expected Outcome
- Working code + documentation that becomes part of NuPIC
- Potentially a comparison with existing Classifier
- Maybe a video presentation
- FAME
Knowledge Prerequisite
- Knowledge Python and/or C++
- Good programming skills
- Good skills with Git could be useful
- Communicative and active, willing to discuss edge-cases with people on ML
- Knowledge of principles of the CLA (Whitepaper) and functionality of internal parts
Difficulty
Intermediate level. The person will need basic understanding how the key parts of CLA function (encoder, SP, TP) (according to the Whitepaper) and very good knowledge how the work internally (both in principle and code-wise). There are more people interested in achieving this goal, so you can be sure to receive good attention and help when working on this.
Mentor: Marek Otahal (breznak, markotahal@gmail.com)
Proposed by: Marek Otahal
Interested Parties:
Ross Story (Random-Word, ross.story@dal.ca)
PS: Of all the ideas I could be mentoring, this one would be my favorite.
7. Anomaly Detection Example Application
Numenta’s commercial product, Grok, focuses on the anomaly detection capabilities of NuPIC. But there are no decent examples of setting up and using NuPIC specifically for anomaly detection. The addition of a sample application that showcases NuPIC’s powerful anomaly detection would be useful for potential users interested in using that feature.
Expected Outcome
A new sample application within NuPIC that sets up model parameters and input specifically for anomaly detection against an input data set. The data set should be easily accessible and/or checked into the codebase. The app should be well-documented with a README.md explaining the goal of the application and the concepts involved. The application should run from the command line and somehow identify anomalies in the input data as it receives them.
Bonus points for using a freely available public data stream for input.
Knowledge Prerequisite
List of things potential participants will need some experience with for consideration.
- Python
- Documentation skillz
- An eye for usability and explanation
Difficulty
Beginner
Mentor: Matthew Taylor (rhyolight)
Proposed by: Matthew Taylor (rhyolight)
Interested Parties:
8. Create a compelling NuPIC demo
NuPIC is currently fairly new to open source and one of its aims for this year is to attract new users. However it currently lacks a really good demonstration application to show off its abilities.
A good demo could focus on NuPIC’s differences from other machine learning technologies such as adapting to streaming data. Or it could just be cool. Look at the past hackathon projects for ideas.
This is a fairly open ended project which can be adapted to the expertise of the student.
Expected Outcome
One or more working demo applications to impress new users and help them understand what NuPIC can do.
Knowledge Prerequisite
- An ability to code
- Knowledge of the strengths/features of NuPIC/CLA
- Familiarity with the NuPIC code base, or willingness to become familiar
Difficulty
Beginner - Intermediate
Mentor: Chetan Surpur (csurpur)
Proposed by: Matthew Taylor (rhyolight)
Interested Parties: Julien Hoachuck (mechaman)
9. Investigate Theano for use in the CLA algorithms
Theano is a Python library for use in deep learning software, that offers GPU-accelerated sparse matrix operations. Investigate whether it would improve the CLA algorithms, including the spatial pooler and temporal pooler, and if so, implement optimized versions of the algorithms using Theano.
Expected Outcome
An analysis of Theano for use with implementations of the CLA, and maybe source code for optimized CLA algorithms using Theano.
Knowledge Prerequisite
List of things potential participants will need some experience with for consideration.
- Python
- Understanding the CLA algorithms
- A willingness to experiment with Theano
Difficulty
Intermediate
Mentor: Chetan Surpur (csurpur)
Proposed by: Chetan Surpur (csurpur)
Interested Parties:
UnApproved Ideas
These ideas do not have mentors assigned to them. Students may still create proposals based on these ideas, but there is a smaller chance they will be accepted because no mentors have assigned themselves to the ideas.
1. Hierarchical Implementation of the CLA
If CLA could be used in layers, with higher order representations being more stable, it may be cool. One thought :suppose we have 3 layers of CLA. The lowest [1] middle [2] and topmost [3].
If there is a sentence “How are you” . CLA [1] would get a sequence H-O-W A-R-E Y-O-U .
CLA[1] will have three patterns symbolizing three letters “H” “O” “W” for first word “HOW”. Every cell in CLA[1]would have connections with ‘n’ random cells in CLA[2] which is higher to CLA[1]. Neurons have lot of connections anyways. Every ON cell in CLA[1] would excite all of its connected cells in CLA[2] putting them in a prediction state. So a pattern “H” “O” “W” will “together” create many predictive cells in CLA[2]. with local inhibition we choose ‘g’ number of cells in CLA[2] to represent the word “HOW” as whole. Connections between these ‘g’ cells and those ON in CLA[1] is strengthened so next time they predict better. Same goes with other two sequences- “A” -“R”-“E” & “Y”-“O”-“U”.
Now CLA[2] would have 3 patterns each for a word “HOW” “ARE” “YOU”. Each of these 3 patterns would have ‘g’ ON cells. Again every cell in every CLA would have connections to ‘n’ random cells in the higher CLA. ON cells in CLA[2] would excite cells in CLA[3] and local inhibition can choose highest firing ‘g’ cells to represent the sequence at higher level.
So now we would have the whole sentence “HOW ARE YOU” represented in CLA[3] by ‘g’ number of cells. Even when letters input on CLA[1] change quickly [ H O W A R E Y O U] CLA[3] will be stable. There can also be connections from cells in CLA[1] to CLA[3] for direct prediction. I have read such things somewhere in neuroscience.
ASSOCIATIVE MEMORY :: cortex layer 1 has long horizontal connections.!!
Three similar CLA blocks lined parallel to the ones described above. These are for answering a question asked.Initially while learning, we answer the question ourself. Suppose the answer to “HOW are you” is as you predicted " I am fine".let These parallel CLA’s be named answer_CLA.
Here we have 3 layers too. so answer_CLA[1],answer_CLA[2] & answer_CLA[3]. answer_CLA[[3] being the Topmost. While learning, when we provide an input “I AM FINE” to this parallel line of answer_CLA it goes in the similar way through answer_CLA[1] as “HOW ARE YOU” goes through CLA[1].
At the top in answer_CLA[3] we have a group of cells representing “I AM FINE”.
these ON cells in answer_CLA[3] are connected with ON cells in CLA[3]. So next time there is a question " HOW ARE YOU" , we would have a prediction “I am fine”.
Expected Outcome
We can have a program that talks back to us in the way humans do, probably. It may pass the Turing test. We cannot say it understands the language, because for that it needs to be able to express itself in that language and ask questions, but I imagine this would be theoretically more satisfying. This Algorithm is not application specific. answer_CLA could also be trained to make voluntary eye movements, if the visual input is decoded in a way. It could be used in more motor movements driven by input as mentioned by Jeff Hawkins. If eyes know where they are going to move, then first line of CLA would predict what its going to see, or how the input will change !! like If it knows the answer “I AM FINE” it can predict the question “HOW ARE YOU”. May be language is an expression of feelings, and if in future we can prototype an Amygdala, or just classify good, bad, happy etc. it might do better.
Knowledge Prerequisite
Already knowing or a willingness to learn the following:
- Nupic
- Python
- Neuroscience is a plus
Difficulty
For a rookie it can be moderate.
It can be difficult to people not interested in building intelligent algorithms.
I mostly classify things as interesting and boring. This to me is extremely interesting.
Mentor:
Proposed by: Aseem Hegshetye axh118830@utdallas.edu
Interested Parties:
- Aseem Hegshetye
2. Simple AI for games using nupic
1)The AI for classic game of pong can be trained using nupic. Every point on the game screen can be represented in terms of x and y coordinates. Using appropriate encoders, this information can be converted into an input vector. An input vector is sent to the OPF everytime the ball strikes an edge. Multistep prediction can be used to determine the location where the ball will hit the left/right edge and the paddle can be moved to the appropriate position.
2)If the temporal pooler can be used effectively, it would be possible to feed a sequence of arrow key presses to nupic indicating the keys pressed by a human player while playing a racing simulation. The exact sequence of arrow key presses would be different every time the player races that particular track. With some learning, nupic should be able to reproduce the set of sequences to successfully navigate the vehicle through a particular track.
Expected Outcome
A pong game where the computer player can defend properly and/or a program that can learn how to navigate a vehicle through a particular track for a racing simulation.
Knowledge prerequisite
- Python
- Pygame/any 2D graphics API in python (if implementing AI for pong)
- OPF (nupic Online Prediction Framework)
Difficulty
Intermediate
Mentor :
Proposed by : Kevin Martin Jose (youcancallmekevin@gmail.com)
Interested parties : Kevin Martin Jose (youcancallmekevin@gmail.com), Chirag Mirani (chiragmirani@gmail.com)
Please use the template below when adding new ideas to the list
1. Idea Template
General description of the idea in 1-3 short paragraphs.
Expected Outcome
What is expected for this idea to be considered complete?
Knowledge Prerequisite
List of things potential participants will need some experience with for consideration.
- some programming language
- some necessary API
- some particular technology that must be used
Difficulty
Beginner, intermediate, expert?
Mentor: Name and Github username of mentor (could also include email address or other contact info).
Proposed by: Name and contact info of person proposing idea.
Interested Parties:
- add your name here
- if you’re interested in
- working on this idea