### A problem worth solving?

Posted:

**Thu Oct 20, 2016 5:36 am UTC**Right now, I am doing PhD in Computer Science from University of Minnesota Twin Cities. I am using Machine Learning(ML) algorithm on CRM solutions that will help in ways (that human minds can't expect).

My thesis include implementation of feeding Non-Linear inputs to Support Vector Machines(SVM) along with using Kernel Methods and Radial Base Functions. I am only allowed to use 10000 data points with number of restrictions. People, who are familiar with ML know that 10000 inputs are not sufficient for best result outputs, knowing the fact that you have to evaluate Overfitting(which range between 1 to 10% in my case), use regularization and validation. Further, I am facing a weird criteria of non-linear inputs (which also require more inputs). In order to make the situation more worse, allot of information will be lost by Kernel methods and Radial bases functions. So 10000 inputs are like a threshold for me.

My questions here are:

Right now, I am assigned with a military welfare organization, who is providing valuable help and information to the veterans and active personnel.

They have assigned a website that contain military information. At the end of the day, I will be provided information with visitors to individual pages(in real time suppose those are the military personnel looking for information), their location, time they spent on the site and number of pages they have visited, comments they left. And after that I have to make the predictions. But remember, I can't exceed 10000 inputs overall.

Example: Suppose a person A try to visit a militarybase page at time T hour,M-minute,S-Seconds and left at T+X hour,M+Y-minutes,S+Z-seconds. He visited K pages and left L comments, each with variable Bn length and algorithm will provide with a probability P, which will tell interest of the visitor A in terms of P1, P2...Pn (type of interests).

I have failed badly but I don't want to fail again.

Looking forward for serious help.

My thesis include implementation of feeding Non-Linear inputs to Support Vector Machines(SVM) along with using Kernel Methods and Radial Base Functions. I am only allowed to use 10000 data points with number of restrictions. People, who are familiar with ML know that 10000 inputs are not sufficient for best result outputs, knowing the fact that you have to evaluate Overfitting(which range between 1 to 10% in my case), use regularization and validation. Further, I am facing a weird criteria of non-linear inputs (which also require more inputs). In order to make the situation more worse, allot of information will be lost by Kernel methods and Radial bases functions. So 10000 inputs are like a threshold for me.

My questions here are:

- What will the solution here?
- How can I get more inputs without touching current inputs?
- What would be the best way to implement regularization, generalization and validation?
- How can I optimize the above results?

Right now, I am assigned with a military welfare organization, who is providing valuable help and information to the veterans and active personnel.

They have assigned a website that contain military information. At the end of the day, I will be provided information with visitors to individual pages(in real time suppose those are the military personnel looking for information), their location, time they spent on the site and number of pages they have visited, comments they left. And after that I have to make the predictions. But remember, I can't exceed 10000 inputs overall.

Example: Suppose a person A try to visit a militarybase page at time T hour,M-minute,S-Seconds and left at T+X hour,M+Y-minutes,S+Z-seconds. He visited K pages and left L comments, each with variable Bn length and algorithm will provide with a probability P, which will tell interest of the visitor A in terms of P1, P2...Pn (type of interests).

I have failed badly but I don't want to fail again.

Looking forward for serious help.