KDD 2019 | Policy Learning for Malaria Elimination


Thanks for your experiments! I feel exactly the same way. I am very interested in the organizer’s environments.
We are proud of our own methods! So I want them to check the codes again or release the environments.



@zoulixin Thank you for sharing! Your method is cool! (specifically, MCTS)
It is interesting that our team is similar to the situation of your team (below). Our team used Bayesian Optimization, Gaussian Process, Thompson Sampling, and some Data Science technique.

Our method can find the policy [[0.2,0.9],[xx,xx],[xx,xx],[xx,xx],[xx,xx]]. The reward is around 300-400. Sometimes, we can find a better policy [[0.2,0.9],[xx,xx],[xx,xx],[xx,xx],[0.0,0.5]], the reward is around 550. However, the phenomenon of getting a bigger reward in state 5 is hard to generalize to other states.


ID Name Score
471 Hzk 27.51338676
470 RL-Hacker 26.57160963
232 Alpha 25.58103156
367 vlad 24.79111349
218 et373 24.55199456
226 Plato 21.68589721
317 LOLs 18.36882472
216 DoPlus 9.755636278
214 El 8.986782034
438 Anand 8.800777597
383 Pidgey 8.370028016
274 Winter_Is_Coming 7.455849098
376 ZCYM 6.417596042
506 JINGJINGXIAO 5.914394111
421 MathCo 5.870184194
396 NCKU_shallow_learning 5.624212873
348 Bonum 5.592861217
375 NULL 4.499057396
496 it_bites 3.520599066
319 mr_sandman 3.438478984
284 xinxinxinxinxin 1.950938848
288 ka2yama 1.751666869
417 Ironball 1.708535288
363 gnemoto 1.541115892
360 All_you_need_is_deep_learning 1.333807658
491 rohansaphal 0.619751826
386 A2 0.185512041
442 fightraccoon -0.323796657
213 NTT_DOCOMO -1.215853829
394 Ashish -2.54799673
402 saite_fan -
1 Like


These were the scores for participants who submitted to the verification phase (- : invalid code)



Actually, I did not use [0,1],[1,0] at beginning. I just test the state 1 with five actions [0,1],[0,0],[1,0],[1,1],[0.5,0.5] at the beginning for initializing the Gaussian Process. The policy [0,1],[1,0] is chosen by Tree Seach.



So why there are JINGJINGXIAO and Hzk on this list with two different score, but Hzk claimed that they are merged?
He said “I’m Hzk , after merged with my teammate I find my team name changed to JINGJINGXIAO. Actually I’m in 73th in public leaderboard. And I just submit a random solution in public leaderboard.”
It must be one of the two cases: 1 same code, very different score 2 different code, same team.
Any one of these two cases will be a fatal issue.

1 Like


Thanks for posting the scores! Also thanks for the teams posting their solutions.

I concur with the transparency issue raised here. hzk is not in the public leaderboard… I thought only the ones in the leaderboard could make the check phase, no?



I’d like to add some clarity here that all solutions had been registered on the platform as such they were received and submitted to us for evaluation. It’s our position that the solution is valid and stands while being received and evaluated within the window and finally not outside of the 100 submissions which could be reasonably evaluated for the final scoring. Our aim is to encourage participation and sharing of solutions and motivate highly performing submissions, regardless there is no violation of this as already explored through the forum.



Sharing my solution using Expected Value SARSA. I will update the readme and describe it in detail later.



Hkz’s solution is interesting and maybe robust than mine on average. I ran the code in 1000 epochs on ChallengeProveEnvironment and ChallengeSeqDecEnvironment. In ChallengeProveEnvironment, the rewards are 160.55209270530327, 310.68763820289087, 448.4390538855134. In ChallengeSeqDecEnvironment, the rewards are 257.45016648706724, 303.5336914830634 and 325.28351139980515. I am really curious about the final evaluation environment. To know the gap between our solutions.

When will the final test environment be available? @oetbent



I read this excellent paper from bent https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/16148
there is something about environment and this task



My solution (25th/32 :tired_face: )

Our team used Bayesian Optimization, Gaussian Process, Thompson Sampling, some Data Science techniques, and Rule Based.
Our model had success in the SeqSecEnvironment (~500 on avg) and ProveEnvironment (~300 on avg), but not TestEnvironment.
I think our algorithm failed due to like a overfitting, because the scale of rewards was very small in the test environment. (But, I don’t know the truth…)
This competition was very difficult for me, and I learned a lot…



The competition is interesting and challenge. We have spent a lot of the time on this task. For curiosity, we really want to know the final environment, verify our code and motivate us to make a more general solution. @oetbent

1 Like


@zoulixin @ikki407 @thyrixyang, we’re working to open up the API for the final verification environment. Rest assured it will be posted here for you shortly, we just have to move it from our internal systems :slight_smile:

1 Like


thank you all for your participation



Kudos to @oetbent @sekou and hexagon-ml team for running this awesome kdd cup 2019 competition.
You guys truly did an awesome job.

Thank you all contestants for creating this vibrant community !! Keep learning, sharing and participating !



Hi @oetbent . First of all, I think it’s a wonderful and challenge task for RL community. During my participation, I have learned a lot of knowledge about exploration techniques. From your position rather than changing the position of our team, just running on a task can hardly persuade us that it is a general solution as accidentally overfitted to one task might influence our judgment on the algorithm. So, I suggest that you officially run all submitted solutions on these three environments and give an official and fair leaderboard on these all three tasks, which can help us comprehensively understand the generalizability of these submitted solutions. @taposhdr @oetbent @vlad @hzk123 @thyrixyang

1 Like


What does evaluating on an environment that you have had unlimited access to have to do with generalizability? :slight_smile:
The whole point of the competition (not only to me, this directly corresponds to how our submissions were evaluated) was to develop an algorithm that performs well on the unseen environment, not to optimize the score on the training and validation environments. It makes no sense to change the competition’s metric after it is already over.

1 Like


I am not intended to change the competition’s metric. I just suggest that officially evaluate all our solutions on these three tasks and publish all results. A well-designed solution should at least has the same level performance on train, validation and test tasks. Otherwise, if the solution works nearly the same as the random policy on training or validation task but it performances very well on the test task, it is hard to believe it is a good solution. Additionally, maybe all our submitted solutions are bad on the test environment, so what’s the difference with lottery. @oetbent @taposhdr



Our solutions with optimized Q-learning: