Understanding and Questions about Malaria RL Environment


Although the problem wrapped as RL environment, it is different from typical RL environments.
The processes of an agent interaction with the environment can be checked method evaluateAction() of ChallengeSeqDecEnvironment class in challenge.py:
Step 1.experimentsRemaining - 1 each time interact with the environment, and this variable restricted 20 epochs.
Step 2. Post the actions as a JSON object to https://seqenvironment.eu-gb.mybluemix.net, and get a response as JSON like below:
{‘data’: ‘-26.975036936486404’, ‘statusCode’: 202}
and the get opposite number of data value as a reward, in this example it will be 26.975036936486404
Step 3. and then state += 1 until reach 5, which means the state is nothing to do with interaction and the webpage. It is just increase every time to send a post to the webpage.

Here are some questions:

  1. RL typically learning from states of the environment and the next state is base on the actions of the agent. The RL algorithm may be hard to learn from states in this case because the state just a counts of interaction.
  2. Only 20 epochs allow to learn is not reasonable. A regular RL project has two steps: train the model with hundreds epochs and evaluate the model with the trained model. It would be nice if the organizers offer such an example.


I also would like to know if these 20 epoch represent a hard constraint or we can reset the environment and train more



@karimbelaid I think we are not allowed to train over the already trained model. This has been talked over at this location.



@apoorvagni it sounds strange if train an agent just 20 epochs.

1 Like


Not sure if I am the right person to tell if this is strange or not. :sweat_smile: @ainilaha