NoSQL databases, like MongoDB, scale more easily, are adept at handling less structured data and can simplify development in many respects. There are also a number of reasons why you may need a real-time analytics system to complement your NoSQL database:
- Better performance on analytical queries
- More complex analytics, including SQL aggregations and joins
- Tap into the ecosystem of SQL users and SQL tools
- Protect the performance of your primary database by moving analytics to a separate system
In this tutorial, we will create a sample app to analyze weather and pollution data, following these steps:
- Connect Rockset to MongoDB Atlas
- Continuously write weather and air quality data into MongoDB
- Formulate a query in Rockset on the real-time data coming from MongoDB
- Create an API based on the query we wrote
- Execute the API from our app
Some basics:
- Make sure your OS in on the latest software
- Have Python 3.7 or later
- Install Rocket’s Python Client
- Create (if you haven’t) a MongoDB, Rockset, and ClimaCell developer account
SEE ALSO: Schema Performance Tuning in MongoDB
Setting Up MongoDB
- After you’ve created an account on MongoDB Atlas, navigate to Clusters and click on Collections then click on Create database. We are going to create a
weather_pollution_db
and it’s going to have aweather_data
collection. - Under the
weather_pollution_db
, there is going to be a plus sign. Click on the plus sign, and add another collection,air_pollution_data
. - Navigate to Database Access → Custom Roles and click on Add New Custom Role named
rockset-role
. Add the following Actions or roles to the newly createdrockset-role
:find, changeStream, collStats
, and enter the names of the database and collections we have just created. - Navigate to the Database Users tab and click on Add New Database User. Create a new user named
rockset-user
and attachrockset-role
to it. Remember the password here, because we will need to use it again, when we create an integration on Rockset. - Create another database user that has Atlas admin privileges. We will be using this database user in our Python app. You can name this user
yourName-admin
. Make sure you remember the password here, because we will use this in our Python app. - Navigate to the Network Access click on Add IP Address and whitelist these IPs:
- 50.112.53.111/32
- 35.160.119.116/32
- 54.214.107.166/32
- Your current IP address that you will be working from. Mine is not shown below, but make sure you add yours. This will be used for our Python app that we will build shortly.
- Go back to Clusters and click on Connect and click on Connect your application. Copy the string, because we will use it in Rockset. When we are in our Rockset account, the username is rockset-user and the password is the password you used when you created
rockset-user
. In our Python app, the username will beyourName-admin
and the password associated with that database user.
Building Our Python App
- Create a project folder, and in that project, create a file .env.
- In the .env file add this:
- Mongo_URI=“<uri string>”
- The “<uri string>” is your connection string from MongoDB. Make sure you replace the username and password in the connection string with yourName-admin and the password you used when you created that database user.
- It should look something like this:
- mongodb://nadine-role-admin:password….
- If you use a virtualenv go ahead activate a env for this project. I personally use Pyenv, but feel free to use whatever you want!
- Install
python-dotenv
:$ pip install python-dotenv
- Install pymongo and dnspython:
- $ pip install pymongo
- ○ $ pip install dnspython==1.16.0
- Inside our project folder, go ahead and create
settings.py
This file should look like this: https://gist.github.com/nfarah86/f87a9d37f1f72bb2d4a73d9b73dc87b4#file-settings-py - Create another file in the project folder called
mongo_config.py
. It should look like this: https://gist.github.com/nfarah86/1fc7bc9987d27edbec0fa9b32be95163#file-mongo_config-py - In the project folder, go ahead and create and run a file called
script.py
. All we’re going to do is make sure our Python app is connecting to MongoDB: https://gist.github.com/nfarah86/4d8e87ff6e70e1da1c017e80b8daeef2#file-script-py - Under Clusters, click on the collections button. Go to
weather_pollution_db
and click onweather_data
. You should see the entry you inserted. - Now that we know we can insert data into MongoDB, let’s go ahead and create a ClimaCell developer account and get an API KEY.
- Go ahead and put your API_KEY in the
.env
file:
https://gist.github.com/nfarah86/118155308a4adae8a5ca037ffd99e7c4#file-env
- Go ahead and put your API_KEY in the
- In
settings.py
go ahead and add this:○ CLIMACELL_API_KEY = os.environ.get('CLIMACELL_API_KEY')
- I chose ClimaCell because they offer realtime data for weather and air pollution. We’re going to work with this API. They have different parameters that can be added to the request. You can explore those here.
- In our project folder go ahead and pip install a few libraries:
$ pip install requests
$ pip install timeloop
- In script.py go ahead modify the packages we’re going to use: https://gist.github.com/nfarah86/a49cbaa033239c636ef4f3bbe1dca2d0#file-script-py
- Timeloop a library that can run jobs at designated intervals
- Keep
insert_to_mongo()
and add this function inscript.py
to get the weather data: https://gist.github.com/nfarah86/d2e3cc9236547e2fa630fd368dfee994#file-script-py- That
lat
andlon
correspond to Beijing.
- That
- Now, we’re going to add this function to get the air quality: https://gist.github.com/nfarah86/c598dbea0274d43215f15c9f01eca672#file-script-py
- We’ll modify
insert_to_mongo()
to look like this: https://gist.github.com/nfarah86/e43f4ad2d8f7e3ca4b8d761408bc853c#file-script-py - To make sure we’re running continuously, write this: https://gist.github.com/nfarah86/959d875ad5ffcc08e16e3bf25358385a#file-script-py
- 19. After, write
main()
like this: https://gist.github.com/nfarah86/831e295b663aceb93603d9986c815b43#file-script-py - Here’s a gist of what your
script.py
should look like:
https://gist.github.com/nfarah86/85caee5b14639e238e34715094cc5436#file-script-py - Now, run:
$ python script.py to populate MongoDB
.
- While the script is running, let’s get started on Rockset.
Getting Started on Rockset
- Login to Rockset and navigate to the Integrations tab on the left. Click on Add Integration. Click on MongoDB and click on start.
- Check the first box MongoDB Atlas. We’re going to name this integration
Rockset-Mongo-Integration
. For the username and password, go ahead and putrockset-user
and the password you use when you created this database user. Paste the connection string in the next box and click on Save Integration.-
- Each integration can be used to access multiple databases and collections in the same MongoDB cluster
-
- Under Collections click on Create Collection. Select MongoDB as the source.
- Click on the
rockset-mongo-integration
. - We’re going to name our new collection on Rockset
weather_data_collection
. This is not tied to MongoDB. Go ahead and fill out the rest of the page with the database and collection we created on MongoDB. We’re going to add 2 collections, but let’s start with theweather_data
from MongoDB.- You see, Rockset is able to connect to MongoDB. You can verify what data will be ingested into the Rockset collection on the right-hand side. When you’ve created a collection and running a data-driven app in real-time, Rockset will continuously sync with MongoDB so your data can have the latest information.
- Let’s click Create at the bottom.
- Follow the same steps 3-5 to create
air_pollution_data_collection
SEE ALSO: NoSQL and automated databases – “DBAs can be the thought leaders”
Construct a Query on Rockset
- On the left bar, let’s navigate to the Query Editor.
- On the tab if we write:
Select * from commons.air_pollution_data_collection
; we should see some output:
- We’re going to write this sample query:
https://gist.github.com/nfarah86/2d9c5bc316d55cfd0fcf17b4ded9141f#file-average_pm_10_data-sql- We’re averaging the PM10 data and the weather temperature data. We’re going to join both of these collections based on the date.
- Run the query.
- After we run this query, we want to embed it in our app.
Build an API from Our Query
- On the top corner, click on Create Query Lambda. A Query Lambda is a way to make an API endpoint to the SQL query you write.
- Give your Query Lambda a name and description. Afterwards, you should see some code snippets on the next screen.
- Let’s navigate back on the Query Editor and write another query to get current weather in a new tab. Sometimes we may get a null field, so let’s go ahead and write this in the Query Editor: https://gist.github.com/nfarah86/4581c6bc09d30045ae75a5f330a8d72f#file-current_weather-sql
- Create another new Query Lambda.
- If we want to grab the code snippet or URL of the Query Lambdas we just created, navigate on the left side menu to Query Lambda and click on the lambda you created.
Execute APIs on our app
- Once you create a Query Lambda, you’ll see something like this:
- There are two ways I’ll show how we can execute a lambda:
- Make an HTTP Request
- Rockset’s Python client
- Make an HTTP Request:
- Let’s go ahead and make an HTTP request to get the current_weather data. Here are the steps to do this:
- Go ahead and set your
ROCKSET_API_KEY
in your.env
. Import it in settings.py like we did before. - On Rockset, navigate to the Query Lambda that has the average pm10 query. Copy the query lambda endpoint.
- We’re going to write this function that will make an HTTP request to that endpoint: https://gist.github.com/nfarah86/3a0ef9b1524532247e3ea7c504489d23#file-script-py
- Go ahead and set your
- Let’s go ahead and make an HTTP request to get the current_weather data. Here are the steps to do this:
- Use the Rockset Client to send a request:
$ pip install rockset
- On Rockset, navigate to the Query Lambda that has the average PM10 query.
- Copy the Python code snippet at the bottom.
- In our app, go ahead and add this to the top of script.py and write this function: https://gist.github.com/nfarah86/cef26bc4d631e95ffbfe4ddbbfbc7668#file-script-py
- Then, we’re going to display the result: https://gist.github.com/nfarah86/a0d1e15319bc117ef55ce35187fb6480#file-script-py
- We’re going to change
sample_job_every_120s()
to add make_requests so we can execute the Query Lambdas and display the data: https://gist.github.com/nfarah86/0a54e082c9026aa5c9940b24836d9c65#file-script-py - Write make_requests() so it looks like this:
https://gist.github.com/nfarah86/dea06329b25887bb58a0ef74c4a12fb0#file-script-py - After you run the script, you should see this: https://gist.github.com/nfarah86/32b35bd3269fbd1701dc57252fa783e4#file-results-txt
- That’s it! This wraps it up for the MongoDB-Rockset Python app. You can find the full project, including the SQL statements here.
The post A hands-on tutorial for real-time analytics on MongoDB using Rockset appeared first on JAXenter.
Source : JAXenter