Sparse dataset (input sparse matrix in neural model)

StephenMaturrin · May 11, 2018, 4:01pm

Hi dear colleagues!

I am currently involved in a project where we are working with a neural model.
The point is that we are starting to train our model with a sparse input (sparse matrix) and I cannot find the best way to preprocess the dataset. Even I am not sure if it is better to use the sparse input or a pre-processed input.

In one hand the dataset sparsity is about 80% to 90% and on other hands, dataset sparsity reaches 20 %. Datasets have the same origin but were picked up in different environmental conditions.
So far, I have trained my model just with the columns with nonzeros values but I realize that it is a waste of information do that.

Here is where my question comes, how could I train my neural network with a sparse input? and what set of methods can I implement to improve the performance of my NN with this kind of input?

Many thanks,

rhyolight · May 11, 2018, 5:59pm

Are you using NuPIC as your “neural model”? What does your data represent? When you say that your input is already sparse, do you mean it is already in binary format? 80% is NOT sparse.

StephenMaturrin · May 14, 2018, 8:16am

Thanks for the reply, I have built my NN model in TensorFlow core. When I said that I have a sparsity ranging from 80% to 90% is like to say that in a vector of 10 elements 8 or 9 are zero values.

[0,1,0,0,0,0,1,0,0,0] => we have 80% of zeros values. I hope it’s clear enough for you.

rhyolight · May 14, 2018, 4:35pm

So you are not using NuPIC, and not using HTM. I think you are in the wrong forum?

StephenMaturrin · May 14, 2018, 7:16pm

I am pretty sure that this topic covers and cross through any language or library. So far, you asked me how I have built my model and what was 80% sparsity.

rhyolight · May 14, 2018, 7:33pm

You are free to discuss it here. But I moved it from the #nupic forum into #other-topics:community-lounge.

Paul_Lamb · May 14, 2018, 7:43pm

From your description it sounds like when you say sparse input, you a referring to missing datapoints (you mentioned “columns with nonzero values” which sounds like input rows that contained multiple fields). Is that correct?

We would probably need some further details about the nature of the data. How many total fields are being measured? Is there a regular interval between each measurement (taken every second, for example)? Are there gaps between inputs?

Also, what kind of problem are you trying to solve? Classification? Prediction? Anomaly detection?

abshej · May 16, 2018, 4:40am

I suppose if you are using ML NNs and not memory-prediction, then you will have compatibility and accuracy issues with such varying input sparsities(20% to 80%)(20% sparsity, as you used the term, is not suitable for HTMs). You might have to do some input conditioning before it passes on to the network to make it more reasonable for the network to work on.

The preprocessing you are after might be the Spatial Pooler.

Topic		Replies	Views
Evading adversarial attacks with sparse representations and its implications NuPIC	2	652	January 13, 2021
Proof of concept: Trainable universal encoder architecture Engineering encoders	1	1324	July 17, 2017
Understanding nupic.torch Machine Learning pytorch	4	861	July 22, 2019
Research on NN sparsity Lounge	10	612	February 19, 2023
Coding Help - How to handle sparse distribution? Implementations	8	565	July 11, 2020

Sparse dataset (input sparse matrix in neural model)

Related topics