In today’s world water shortages are a major problem in many developed and developing countries in the world. In a serious threat, it often leads to the emergence of a food crisis. Due to scarcity the increase in water volume is critical to the need for available water management.
To improve water management practices and maximize water productivity, In the current situation, models to predict future water requirements based on data mining techniques can be useful.
In this section we will be walking-through an experiment in Azure Machine Learning Studio that will help us to predict future water requirements based on data mining technique.
In order to achieve our goal, we will be using Two Class Decision Forest for our solution that comes with Azure Machine Learning Studio. Please note that the data we collected for this experiment was through KAISPE Agriculture Remote Monitoring solution using Azure IoT.
So, in the first step we will upload the dataset on Azure Machine Learning Studio in the CSV file format, where it contains historical data composed of attributes on various weather parameters such as Temperature, Humidity, Soil Moisture, phLevel, Rainfall and Wind speed combined with Crop Type and Water Usage.
First, open Azure Machine Learning home page and Click +NEW at the bottom of the window
- Select DATASET .
- Select FROM LOCAL FILE.
In the Upload a new dataset dialog, click Browse, and find the KSP-Irrigation-Data.csv file you created.
Now in the next step we will create an experiment in Machine Learning Studio that uses the dataset you uploaded. So, click +NEW at the bottom of the window and Select EXPERIMENT, and then select “Blank Experiment”.
Select the default experiment name at the top and rename it to Predictive Water Irrigation Experiment and in the module palette to the left of the experiment page, expand Saved Datasets. Find the dataset you created under My Datasets and drag it onto the canvas.
Now let’s prepare the data by using Select Columns in Dataset which is useful if you want to reduce the size of the dataset by deleting unwanted columns.
- Search and drag Select Columns in Dataset, and in the Properties pane to the right page, click Launch column selector and select the following columns:
In addition, we will be using Filter Based Feature Selection which will help us to identifies the column with the strongest predictive power in the input dataset.
- Search and drag Filter Based Feature Selection and in the Properties pane to the right page, click Launch column selector and select the following target column and feature scoring method:
Now, we’ll use our data for both training the model and testing it by splitting the data into separate training and testing datasets.
- Search and drag Split Data onto the canvas and connect to the last Select Columns in Dataset.
- Click Split Data and in the Properties pane to the right of the canvas and set it to 0.75. In this way, we will use 75% of the data to train the model and 25% of the data for testing.
In the next step we will be applying Two Class Decision Forest machine learning algorithm which is most suitable for our Predictive Water Irrigation Experiment.
- Expand the Machine Learning category in the module palette on the left side of the canvas
- Expand Initialize Model. This shows several types of modules that can be used to initialize machine learning algorithms.
- For this experiment, we are selecting the Two Class Decision Forest under the Classification and drag it to the experimental canvas.
So, in the next step we will find and drag the Train Model module to the experiment canvas.
- Connect the output of the Two Class Decision Forest module to the left input of the Train Model module
- Connect the training data output (left port) of the Split Data module to the right input of the Train Model module.
Click the Train Model module, click Launch column selector in the Properties pane, and then select the waterUsage column. waterUsage is the value that our model is going to predict.
Now let’s Run the experiment so, we can have a trained model that can be used to score new Irrigation data to make Water Usage predictions.
Now that we have trained the model with 75% of the data, we can use it to score the other 25% of the data to understand the function of our model.
- Search and drag the Score Model module to the experiment canvas.
- Connect the output of the Train Model module to the left input port of Score Model.
- Connect the test data output (right port) of the Split Data module to the right input port of Score Model.
Let’s Run the experiment and view the output of the Score Model module by clicking on the output port of the Score Model, then select Visualize. The output shows the predicted value of the price and the known value in the test data.
Now as we are on the final stage to test the quality of the results. We will select the Evaluate Model module and drag it to the experimental canvas, then connect the output of the Score Model module to the left input of the Evaluate Model. The final experiment should look like this:
In the next step we will be Running our experiment to Visualize the output of our Evaluate Model.
In addition, we will be using Cross-Validate Model which is an important technique commonly used in machine learning to assess the variability of data sets and the reliability of any model trained using that data.
- Search and drag Cross-Validate Model and in the Properties pane to the right page, click Launch column selector and select the following label column:
Now Run the experiment to See the results for a description of the report.
I hope you found this blog post helpful. If you have any questions, please feel free to contact me firstname.lastname@example.org