## A problem worth solving?

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

EmelinaVollmering
Posts: 11
Joined: Tue Oct 18, 2016 10:24 am UTC

### A problem worth solving?

Right now, I am doing PhD in Computer Science from University of Minnesota Twin Cities. I am using Machine Learning(ML) algorithm on CRM solutions that will help in ways (that human minds can't expect).

My thesis include implementation of feeding Non-Linear inputs to Support Vector Machines(SVM) along with using Kernel Methods and Radial Base Functions. I am only allowed to use 10000 data points with number of restrictions. People, who are familiar with ML know that 10000 inputs are not sufficient for best result outputs, knowing the fact that you have to evaluate Overfitting(which range between 1 to 10% in my case), use regularization and validation. Further, I am facing a weird criteria of non-linear inputs (which also require more inputs). In order to make the situation more worse, allot of information will be lost by Kernel methods and Radial bases functions. So 10000 inputs are like a threshold for me.

My questions here are:
• What will the solution here?
• How can I get more inputs without touching current inputs?
• What would be the best way to implement regularization, generalization and validation?
• How can I optimize the above results?
These are teasing questions. Right now, I am only able to achieve 48% accuracy and failing badly. I am assigned to reach 75%(which is extremely high criteria for me, given the circumstances).

Right now, I am assigned with a military welfare organization, who is providing valuable help and information to the veterans and active personnel.
They have assigned a website that contain military information. At the end of the day, I will be provided information with visitors to individual pages(in real time suppose those are the military personnel looking for information), their location, time they spent on the site and number of pages they have visited, comments they left. And after that I have to make the predictions. But remember, I can't exceed 10000 inputs overall.

Example: Suppose a person A try to visit a militarybase page at time T hour,M-minute,S-Seconds and left at T+X hour,M+Y-minutes,S+Z-seconds. He visited K pages and left L comments, each with variable Bn length and algorithm will provide with a probability P, which will tell interest of the visitor A in terms of P1, P2...Pn (type of interests).

I have failed badly but I don't want to fail again.
Looking forward for serious help.