Let’s say you have a table of events. Each event has a beginning and an ending date. How would you get a list of the events that happened within each quarter?
Example: A event that begins on 2-1-2017 and ends on 8-1-2017 would have an entries for Q1 2017, Q2 2017, and Q3 2017.
If you’re dealing with a single value, you can just use DATEPART(quarter, [DateValue]), but if you’re looking to figure out if an event occurs in a range, it’s a little more complicated. Here’s how to do it.
The first step to getting this data is to get a list of quarters with their respective date ranges. One way to do this is to use a recursive common table expression. CTE’s are snippets of SQL that you can put above your main query to create tables to join against. A recursive CTE creates values by calling itself. Here’s a CTE that’ll give us a list of quarters from the start date to the current date. When using recursive CTEs, make sure you have a termination condition or you’ll have a nice infinite loop on your hands.
DECLARE @start Date = '01-01-2010'; DECLARE @End Date = GETDATE(); WITH Quarters_CTE AS ( SELECT @start as [Start], DateAdd(quarter, 1, @start) AS [End], DATEPART(quarter, @start) AS [Quarter], DATEPART(year, @start) AS [Year] UNION ALL SELECT DateAdd(quarter, 1, [Start]), DateAdd(quarter, 1, [End]), DATEPART(quarter, DateAdd(quarter, 1, [Start])), DATEPART(year, DateAdd(quarter, 1, [Start])) FROM Quarters_CTE WHERE [End] < @end ) SELECT * FROM Quarters_CTE Q
The next step is compare our events with the quarters. We want a different entry for each quarter the event occurs in. To do this you can join up with the table. You can use non-equality based comparison operators in join queries, so we filter by date range. Here’s the complete query:
DECLARE @start Date = '01-01-2010'; DECLARE @End Date = GETDATE(); WITH Quarters_CTE AS ( SELECT @start as [Start], DateAdd(quarter, 1, @start) AS [End], DATEPART(quarter, @start) AS [Quarter], DATEPART(year, @start) AS [Year] UNION ALL SELECT DateAdd(quarter, 1, [Start]), DateAdd(quarter, 1, [End]), DATEPART(quarter, DateAdd(quarter, 1, [Start])), DATEPART(year, DateAdd(quarter, 1, [Start])) FROM Quarters_CTE WHERE [End] < @end ) SELECT * FROM Quarters_CTE Q JOIN [Events] E ON e.StartDate < q.[End] AND (e.EndDate is null OR e.EndDate > q.[Start])
From there you can aggregate your data as needed.
In addition to the productivity benefits of using the right tool for the right job, it’s good for your personal development to learn new programming languages. The Pragmatic Programmer recommends learning a new one each year. Learning new languages improves your thinking and makes you better at your primary development stack.
“There’s gold in them servers.”
Data is money. Large companies are using the data you generate as a goldmine. Uses range from using data to optimize advertising to using it to make even more addictive products. In addition to user generated data, we also have the mountains of data generated by IOT devices. Sometimes we can use it for small gains, like using a Nest Thermostat to optimize your heating and cooling, but sensor networks can have a much greater impact. We have access to more data than in all of human history. If you can figure out how to mine insights from that data, you will be rewarded handsomely.
“There are three kinds of lies: lies, damned lies, and statistics.”
With plenty of data comes plenty of people using that data to manipulate you. Every political cause has a stable of statistics behind it. Even if they fall apart under scrutiny, people believe them because numbers sound fancy. People trying to sell you something use numbers to appear more credible. If you want to thrive in our data soaked economy, it’s essential to become data literate, so you can spot these manipulations.
R has several features that make it a great tool for learning about data analysis. First, it’s really easy to learn. R is a simple language that you can pick up in a few hours. Additionally, R has an easy to use built in help system. If you need info on any command or method, it’s a few keystrokes away. R also has a lot of built in data sets to play with statistical techniques. This includes lots of popular demo statistical data sets that are well known in the statistics community.
As data analysis becomes more prevalent in the enterprise, you’re probably going to wind up working with data analysts and data scientists. Learning about some of the tools and techniques they use gives you common ground. It’s the same reason software developers should develop business and industry knowledge. Being able to connect with your team members on their terms makes you more than a run of the mill software developer.
If you’re an enterprise developer, R is worth a look. You can use R to learn valuable new skills using familiar tools. With a little effort, you’ll be able to slice and dice data for fun and profit.
To learn more, check out my post on R Resources
In an effort to improve my data analysis skills, I’ve been learning and speaking about the R programming language. Even if you don’t want to be a data scientist, (whatever the hell that means this week) learning some analysis skills can pay dividends. Data literacy is an essential skill in our data soaked economy and R is a good learning tool for analysis skills.
One of the harder things to do when starting in a new area is finding useful resources. It’s tough to find the digital needle in the web powered haystack. To make your life a little easier, here’s a list of the R resources I found to be useful.
There are three paths to getting R setup on your machine. If you’re a Visual Studio 2017 user, the easiest way to get R is to install the Data Science workload in Visual Studio. This will get you the Microsoft flavor of R and R Tools for Visual Studio.
If you’re not into Visual Studio, you can also install an R interpreter and R Studio. R Studio is a free R IDE. For interpreters, you can go with either the Microsoft flavor or the standard CRAN flavor of R.
If neither of those options work for you, you can also run R in a Jupyter Notebook. Jupyter is a web-based environment that makes it really easy to mix text and code. It’s used in many contexts including scientific research and virtual textbooks. To setup a local copy, start off by installing Anaconda. Anaconda is a data science environment that includes a plethora of handy analysis tools. After you install Anaconda, you’ll need to install R using the conda package manager. Then you can run Jupyter using the “jupyter notebook” command.
conda install -c r r-essentials jupyter notebook
R Studio Cheat Sheets
A collection of useful R related guides in PDF format.
R Tutor Tutorials
This site came in handy a few times while trying to find specific R issues.
Flowing data has a variety of useful articles on R and other data topics.
Don’t forget about the built-in R help system. Prefix any command with a question mark and it’ll search the R documentation for you. (Example: “?kmeans”)
I skimmed through a bunch of books on R, but the one I really liked was R: Recipes for Analysis, Visualization and Machine Learning. The writing was clear and the content was pragmatic. The task based format was easy to follow and implement. Another book that I used was R for Data Science.
R: Recipes for Analysis, Visualization and Machine Learning
R for Data Science
This list of resources is enough to help you get started in learning R. Go forth and learn how to slice and dice your data.
I’m really enjoying using R Tools for Visual Studio. It’s nice to learn something new (R) with something familiar (Visual Studio). I did, however, hit a snag when trying out the new IDE.
Upon installing the data science workload in Visual Studio (which is how you install R Tools), I couldn’t open up or create a project. File -> New Project just hung indefinitely. Usually, you expect these things to actually work, so I dropped a bug onto the R Tools for Visual Studio Github page. To my surprise, within about thirty minutes (on a Sunday night), someone asked me about my issue. While they didn’t give me an exact answer, they gave me the hint I needed to fix my issue. I was impressed by the speed and helpfulness of their response.
As some of you already know, our good friends at Microsoft maintain their own version of R. This version is faster, but it’s a point release behind the latest one. It’s a basic trade off between shiny and speedy. Turns out R Tools for Visual Studio doesn’t yet support the latest version. I had previously installed the non-Microsoft R, which was at v3.4 and Visual Studio defaulted to that version.
There are two ways to solve the issue. The easiest way is to just uninstall R 3.4 and use the Microsoft versions of R. If you are using R Studio as well, Microsoft’s R works fine. The second way is to go to R-Tools -> Windows -> Workspaces. From there, you can pick the version of R that’s being used by Visual Studio.
Regardless of which solution you go with, this issue, while vexing, is easy enough to fix.
Happy data hunting.