Neal Waterstreet

  • Home
  • About
  • Book Reviews
    • 2018 Reading List
    • 2017 Reading List
    • 2016 Reading List
    • 2015 Reading List
    • 2014 Reading List
    • 2013 Reading List
  • Projects
    • OBi200
    • DS716+ memory upgrade
    • Cord cutting
  • Presentations
  • Contact

Archives for February 2015

Problem with SCD

February 15, 2015 by neal Leave a Comment

Problem

A common (and very well documented) best practice for performance is to specify the columns in a SQL statement instead of using SELECT *. This applies to SSIS in the Data Flow Task, but there are other reasons to avoid this too.

At a recent client the current member row was being recreated daily for a SCD 2 dimension. The data in the source rarely changed, so it was obvious that something was wrong.

My first step was to look at the code, including the Conditional Split task. All of the logic there looked correct – all of the comparisons appeared to be functioning as intended and NULL values were being accounted for in the code.

I then checked some of the columns to determine if there was a problem with spaces by using the RTRIM() and LTRIM() functions. That reminded me that there is still not a TRIM() function. All of this still looked to be functioning correctly.

Finally I checked the Source and Destination Connection Managers and found they were configured to use tables as a source. The end result of this was that the current member rows were not filtered in the Destination and all rows were incuded.

Solution

I changed the Destination SQL statement to select only the current member rows selected in the WHERE clause. This resolved the issue and the SCD 2 no longer created duplicate rows.

Filed Under: Data Warehouse, ETL, SSIS

Hadoop training Day 3

February 12, 2015 by neal Leave a Comment

Training this week is starting to go by quickly. Today we’ll be covering these topics:

  • Hive programming
  • ngrams
  • HCatalog

Even in a limited testing environment it is easy to see the benefits of Hive. We also looked at a comparison of Hive versus SQL. For a couple of the labs we used ngrams to search email data. Towards the end of the session we discussed HCatalog, which is the central schema repository.

As I mentioned earlier, this week is going by quickly and I know we are just scratching the surface of some of these topics. This class is the same as almost all IT training, there will be a lot of work to do on my own to get the most out of this.

Filed Under: Career, Certification, Hadoop, Training

2015 Goals

February 9, 2015 by neal Leave a Comment

Back at it with another beginning of the year goal post. The best part of this kind of post is being able to look back and say “Man, what was I thinking? or “Wow, I really changed direction from where I thought I was going!”. My plan for this year is to make all of my goals for this year quantifiable. We’ll see how it goes…

Blogging: Its number one on my list for a reason. I really want to improve here and I will set the goal pretty high, at least for me.
Goal: 100 posts for 2015

Presentations: I am speaking at a meeting in March (need to finish that up) and planning on submitting to a few SQLSaturdays to see if I can present. I’d also like to present online, maybe to the Healthcare VC.
Goal: 3 presentations total

Books: I’m going to change this up this year and not just set a number, but also add a requirement of writing up a review.
Goal: 5 technical/career related books with reviews

Conference attendance: This one may be tough since attending may be out of my control. I had big plans this year of going to 3, but that won’t be happening.
Goal: 1

SQLSaturday: I went to the very first SQLSaturday and in many ways that was the tipping point for me choosing a career with data. I have not attended nearly as many of these as I intended mainly due to commitments with family. I’m going to make a better effort to attend more this year.
Goal: 3

Certifications: This is one goal where it should be pretty reasonable to make show progress. I am one exam away from MCSE and I need to get that wrapped up. The Tableau Desktop and Hortonworks Certified Apache Hadoop 2.0 Developer certification exams look possible here too. Maybe not both, but I should be able to make one of them happen in 2015.
Goal: 2

Twitter: Hard to set a goal for this one. Do I measure in terms of interaction? Followers? Tweets? Tweeting more will make the other two follow, right? As of now, I have 257 total. Doubling that should not be too difficult.
Goal: 250 tweets for the year

PASS Healthcare Virtual Chapter: This one is especially difficult due to the limited presenters available. We are trying to get some momentum going with the group. If that happens a meeting every other month should be attainable.
Goal: 6 presentations

OK, there there are. Goals always change due to unknown factors and I don’t expect these to be any different. I want to keep up this year and adjust as needed, possibly even adding more depending on what happens.

Filed Under: Career, Goals, Work

Accessing CSV files on a UNC file share

February 8, 2015 by neal Leave a Comment

Recently I was working on a project to extract data from CSV files for a client. The package was pretty standard and it ran fine under my account in BIDS. The issue occurred when I tried to deploy and executed under the service account. The error message “Description: The file name property is not valid. The file name is a device or contains invalid characters” was not too helpful, but it did lead me in the right direction.

After asking around I found that another developer had a similar issue and pointed me to a couple of blog posts:

http://www.sqlservercentral.com/Forums/Topic1128829-148-2.aspx

http://sqlblogcasts.com/blogs/simons/archive/2010/03/03/SSIS—Access-Denied-with-UNC-paths—The-file-name-is-a-device-or-contains-invalid-characters.aspx

It turned out that the service account had access to the directory that contained the CSV files, but not the entire path.

\\Server\Directory 1\Directory 2\CSV Files\My File.csv

Once the account was granted access to ‘Directory 1’ and ‘Directory 2’ the issue was resolved. Hopefully this post will save someone some time if they run into this.

Filed Under: SSIS, Work

  • 1
  • 2
  • 3
  • Next Page »
February 2015
M T W T F S S
 1
2345678
9101112131415
16171819202122
232425262728  
« Jan   Mar »

Blogroll

  • Atlanta MDF
  • Atlanta Microsoft Business Intelligence Group
  • Atlanta Modern Excel User Group
  • Chris Webb's BI Blog
  • codegumbo
  • Lance England
  • PowerPivotPro
  • Prologika
  • SQLBI

Tags

#SQLSatATLBI Aereo at Atlanta Microsoft BI Azure Azure DataFest Backups Batch Certification changeset Chicago Conference CrashPlan Data Modeling Data Quality DAX Deployment DVR Entity-based staging Error Messages Excel Filter Goals Headphones Healthcare Leaf Tables MDM MDS MDS Add-in for Excel Oops! Something went wrong PASS Healthcare VC Power BI Power Pivot review Roku SQLSaturday SQL Saturday 800 SQLSaturday Atlanta BI SSIS Streaming Subscription View Time Training Validation Version Flag

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright © 2025 · Minimum Pro Theme on Genesis Framework · WordPress · Log in