So my data set looks like this and the alphabets are factor variables. Pred columns are for predicted observations for each ID and Real columns are the real observations. I want to calculate the overall accuracy of the predicted values for each ID.
ID Pred1 Pred2 Pred3 Real1 Real2 Real3 1 A C E A D B 2 A B D E C C 3 E C A A B D 4 D A B B B D 5 B A C C A B
So I want to mutate a column called 'score' which gives you a percentage of the number of matched observations between Pred1,2,3 and Real1,2,3 columns. I only care about finding any 'Pred' values in any of 'Real' columns. If Pred1 is found in one of Real1,Real2 and Real3, then I give a score of 1/3. If Pred1 AND Pred2 (not Pred3) are both found in any of Real1,Real2 and Real3 columns, (*the order does NOT matter. Pred1 can be found in Real2 or Real3- just anywhere in 'Real' columns), then I give a score of 2/3. I hope it makes sense. The order does not matter and I only care about finding any of 'Pred' values in any of 'Real' columns. So I want something like below.
ID Pred1 Pred2 Pred3 Real1 Real2 Real3 Score 1 A C E A D B 1/3 2 A B D E C C 0 3 E C A A B D 1/3 4 D A B B E D 2/3 5 B A C C A B 1
I am trying to write a function and tried something like ifelse("Pred1" %in% c("Real1","Real2","Real3") , 1/3 ,0 )) but it didn't work well.. (had error messages with coercing to logical etc which I didn't know how to solve) So I am trying different things too but keep getting stuck with errors... Can anyone help please? Thank you in advance!