Transcript
00:01My name is Craig Gillgrass, and to my far left is Thomas Breed, and Erik Hoel will be joining us in a minute.
00:07We're going to be talking about effective geodatabase programming.
00:10We're all people who work on the geodatabase team. I'm one of the product engineers on the geodatabase team...
00:15...Eric is the lead developer, Tom Breed is one of our geodatabase developers.
00:22Hoping we can, over the next 75 minutes, talk to you about some aspects of geodatabase programming...
00:27...that we'd like you to be aware of, things we'd like to highlight...
00:31...mistakes that we've seen developers make over the years both internally...I'll be honest...and externally as well...
00:38...in code that we've seen come in through support and helping users with things.
00:42So these are really the things that you need to be aware of to really master programming against the geodatabase.
00:52We're going to assume that you've got a good working knowledge of the geodatabase; you know what it is...
00:55...you know what it can contain, you know the goodness that it brings, all of the datasets that you can create within it.
01:01We're not going to go over that stuff in this session.
01:03We're going to assume that you know all of that and just kind of jump into best practices.
01:07We're going to show a lot of different code samples today.
01:11Some of them are going to be in Java, C#, C++, a little bit of mix of everything.
01:16We do have a lot of technical content.
01:17I think we have 300 slides to go through in 75 minutes in the last revision I saw of the deck.
01:24We do have a lot of content, though, so we'll be moving pretty fast, but we'll still take time for questions.
01:30We won't save them until the end, so, you know, if you've got a question, raise your hand...
01:33...and we'll see if we can deal with it.
01:36During the demo at the end, that Tom's going to show, we're just going to ask if you hold your questions till the end of it.
01:41Just kind of let him get through that.
01:43Really, the slides are also meant for your reference.
01:45So there's going to be a lot of information in the slides; I'm not going to talk to all of it.
01:49But the slides are available for you after the conference...
01:52...so you can go and take a look at some of the information that's in there that we've included.
01:58Alright, so here's just a quick summary of what we're going to be talking about over the next 75 minutes.
02:03Like I said, lots of stuff you need to know about the geodatabase to really understand how it works.
02:08Those will be the first one, two, three, four things we're going to talk about...
02:12...then Eric and Tom are going to talk a little bit about something called plug-in data sources.
02:18Anybody in here using a plug-in data source right now? Anybody implemented one? No?
02:23Well, hopefully we can change that after Eric and Tom show you what we've done.
02:27We've got something really cool that we've put together back in Redlands with plug-in data sources...
02:34...consuming a MongoDB, so you...we have a discussion about what that is and sort of why we did it.
02:40Then we're going to finish it up with top developer mistakes, so just kind of bookend the conversation with...
02:45...that we're having today with things you need to know about the geodatabase, and then at the end...
02:48...just highlight the top developer mistakes that we've seen you make over the years, seen developers make...
02:55...and things we just want to bring your attention to.
02:58So, the very first thing we'll get into is the unique instancing of objects.
03:03So in the geodatabase, objects are going to have, at most, just one instance that's instantiated.
03:12Examples of this are the datasets in the geodatabase, tables, feature classes, and so forth, also workspaces and versions.
03:21Now, with the geodatabase, one of the great things about the API for it is there's lots of different ways to do things.
03:27One of the bad things about the API is there's lots of different ways to do things.
03:32So you have to kind of know which one you want to use; they're all meant for different scenarios and so forth.
03:39But it can get a little confusing, and for that reason...
03:41...it doesn't really matter which method call that you're making to open up these...
03:45...these types of datasets or these objects that are in the geodatabase.
03:49You're always going to be returned the same instance of that object.
03:54Oh, that last bullet I kind of skipped over really quickly.
03:57So any holder, anyone who's holding on to a reference to that object is going to be impacted by that change.
04:03So something you have to keep in mind of when you're working with the different datasets.
04:07And these are just a couple code examples that demonstrate this concept.
04:11The top one is about what happens with versions, and the bottom one is a little more detailed with feature classes.
04:17So the thing that I want to highlight on the top code example is you can see what I'm doing is I'm getting a versioned workspace...
04:22...and then I'm asking it for its default version, and it's returning that to me as a variable I've defined as default1.
04:30I'm then asking the versioned workspace to find a version with a findVersion method call with the name of DEFAULT.
04:38Same version being returned as a different variable, default2, but when I check to see if they're equal, we'll see it's true.
04:45So I'm using two different method calls to get at the exact same version...
04:49...and because of the way unique instancing works in the geodatabase, it's actually pointing me to that exact same instance.
04:54Okay? So this is a really good example, really straightforward example about how we have these different methods...
04:59...to get at these different instances.
05:01Now, the example down at the bottom highlights this with feature classes.
05:04This is an example where there's, again, several different methods you can use to get a feature class...
05:10...but then several different parameters you can specify for those methods as well.
05:15So you can see at the top I'm just simply opening a feature class and just passing in the name to that feature class.
05:21Then the second call I'm making is again opening the feature class...
05:24...but I'm passing in the fully qualified name for it in the enterprise geodatabase.
05:29Then lastly what I'm doing is I'm actually walking through the geodatabase structure from the feature dataset...
05:35...and then opening up that exact same feature class.
05:36And you can see down below I'm printing out the debug statements...
05:39...and it's all showing me that no matter which way I get at that feature class, I'm pointing at the exact same object.
05:45So I'm not getting different objects referenced in memory; they're pointing me at the exact same one.
05:53So that's how it works with datasets.
05:54The unique instancing of objects with rows and features are a little different.
05:59They're uniquely instanced only within an edit session.
06:04So while you're holding on to that instance of the feature or row in memory...
06:09...and before you've saved your edits down to the database and committed them; you're holding them in memory...
06:15...you can see that what I'm going to do at the very beginning is not start an edit session, right?
06:21I'm just simply opening up a feature class, and then I'm calling getFeature twice on the same feature.
06:27I'm getting back F1 and F2, and then because I'm not in an edit session...
06:32...they're pointing to different instances of that object in memory even though it's the same feature.
06:37But down below, once I start an edit session, you'll see what it's doing now is I'm actually getting, again...
06:45...two instances to the same feature pointing to two different variables...
06:48...and when I ask if they're the same, it's returning as true. So a little different than how datasets work.
06:53With features and rows, you need to be aware of whether you're within an edit session or outside of an edit session.
07:00So something that...it's pretty fundamental to geodatabases and something that, you know, we probably have...
07:06...could've done a much better job of describing this over the years, about how unique instancing works in the geodatabase.
07:11We think we've done a pretty good job with that now in our SDK.
07:14But just something to highlight to get us started on talking about...
07:17...things that you need to know as a developer with respect to geodatabases.
07:21Now, the next topic is cursors, and, I mean, I don't think it's a bit of an overstatement to say...
07:27...this is probably where most of the confusion with geodatabases occurs. How cursors work...
07:33 Thank you for just nodding in agreement with me.
07:35Oh, yeah, absolutely. Yes, quite.
07:38It's probably where the most questions that we get--what are the different types of cursors?
07:42Why do you have so many different types? When do I use them? What's recycling? What's buffering?
07:48Hopefully, we'll shed light on a couple of those things here.
07:51So we've got two main types of cursors. One is a class cursor, and there's three different types of those...
07:57...and the second type is a QueryDef cursor.
08:01Now the differences between those.
08:04A class cursor, the rows that you get back from a class cursor are bound to the class that created that cursor.
08:11So kind of makes sense, right? We called it class cursor for a reason.
08:15The cursor is generated and bound to that individual class, so when you get rows or features from it...
08:19...they can actually point back, or are bound to, that class.
08:23QueryDefs are not. Okay? QueryDefs, while you construct them based on one to many classes...
08:29...they're not bound to the class that they're constructed from.
08:33This example demonstrates that.
08:35So up at the top, you can see what I'm doing here is issuing a call for a search cursor on that second line...
08:44...and I'm just simply returning everything in that table.
08:48I'm then iterating to that very first row returned from the cursor, and then I'm getting the table related to that individual row...
08:56...and I'm able to print out that information on the bottom line where it says "PRINTS true."
09:02Down below with QueryDefs, they work a little bit differently.
09:05Again, I'm opening the same table. I'm setting the table for this QueryDef to be parcels...
09:10...which is the exact same one that I'm doing above.
09:12I'm evaluating that QueryDef and then iterating over to the first row...
09:16...but when I try to get at the table from the QueryDef, I'm getting an error.
09:20This is something that trips people up when they're using QueryDefs, especially just on an individual class.
09:25They're confused as to why they can't get back to that class.
09:28And, again, it goes back to that point that class cursors are bound to the class you create them from; QueryDefs are not.
09:35Because while in this example they're just running on one class, you can run QueryDefs across multiple classes...
09:41...so we can't get back to that class the row came from.
09:46So, a little bit of talk about the different types of class cursors that we have.
09:49The first one are search cursors, and this is really sort of the all-purpose cursor.
09:54It's not just for searching; you can also modify the results of a search, as I'm pointing to down at the bottom.
10:01It's probably the most used cursor that we have in the geodatabase, both internally and by developers as well.
10:07One important point to keep in mind; that when you're in an edit session with a search cursor...
10:13...the result of that query that you're issuing to create the cursor may be satisfied by a cache...
10:18...either a spatial cache that you've enabled or your users have enabled...
10:22...or caches that we're maintaining within the geodatabase.
10:27Now when you use a search cursor within its edit session, it will only flush the class's cached rows...
10:37...that that search cursor's been created on.
10:39Okay, we'll talk about flushing and what the impacts of that are in a little bit.
10:43But the issuance of that search cursor could result in a flush...
10:47...and pushing any outstanding edits for that class down into the database. Okay? So be aware of that.
10:55The second type of cursor we have is an update cursor.
10:58We call these positional update cursors because they don't work in a batch manner.
11:02You request an update cursor, and then you get that cursor back and it has a collection of rows or features...
11:07...that satisfy the query you were looking for.
11:10You then iterate over each feature in that cursor, updating the values from it.
11:14That's why we call it a positional update cursor.
11:17With update cursors, the query is never satisfied by the cache. Okay? We're always going to go back to the database...
11:22...to make sure we're getting the most up-to-date version of the features that we're looking for in it.
11:28One important point about update cursors is that very bottom record...
11:33...that you should never combine the use of update cursors with IFeature.Store or Delete.
11:40There's analogous methods on update cursors to handle that, okay?
11:44Don't combine the two; you're going to run into issues if you do that.
11:50One important point about update cursors as well is that if the class supports Store events...
11:58...what we're going to do is we're going to create an internal search cursor on Update and DeleteRow...
12:05...and these essentially become equivalent to delegating down to IFeature or IRow Store and Delete.
12:14A lot of times with update cursors and also insert cursors as well, you get a lot of good performance out of them...
12:19...and you can improve performance.
12:21But if events being triggered on Store are mandatory or you're running on a class that's a complex feature class...
12:29...you're not going to see that big performance benefit...
12:31...because we still have to maintain all the geodatabase behavior and fire those different events. Okay?
12:36How do you tell what kind of behavior you're going to get?
12:39Well, the easiest way is actually checking the feature type of the class that you're running against...
12:44...to see whether it's simple or not, and there's a couple of other examples down there as well...
12:49...but whether they're in geometric networks or relationship classes.
12:54With update cursors, this is a pretty important thing.
12:57How many people are not on 10.0, so you're still on 9.x something?
13:01Raise them high; it's okay. Don't be ashamed; it's all right.
13:04Okay, so this is something you're especially going to need to know of.
13:08For those of you that have already moved to 10...
13:10...if your code was using this practice, you might have already seen this as well.
13:14At 10, we changed the behavior of the geodatabase to scope cursors within edit operations, when you're editing.
13:23We'd always documented this as a best practice, but at 10, we actually changed it to raise an error.
13:28If not, you're risking corruption being introduced because of that cursor not being scoped properly within the edit operation.
13:37We realize not everyone reads the doc; that's why we added this change at 10.0.
13:42So if you're not on 10.0 yet, go back, take a look at your code, find places like this, make those changes.
13:48Or if you're moving to 10 or already there, this could explain what you're going to see.
13:52So the best practice really is to always scope your cursor within an edit operation.
14:00Insert cursor is the last type of cursor that we have.
14:03It's primarily used for bulk inserts.
14:04There's really, essentially, two ways to create features in the geodatabase.
14:07The first one is creating the feature using the CreateFeature call on the feature class and then issuing Store.
14:17That's fine if you're creating individual features, but once you start creating more features...
14:23...you know, let's say, a couple hundred, getting into the thousands definitely...
14:27...that's where you want to use something like an insert cursor to load lots of features or rows into your tables and feature classes.
14:35You get the best performance with your insert cursors when you combine their use with buffering and flushing.
14:40I'll talk about those in a little bit.
14:43So really, if you're using your code right now and within your code, the way that you're creating rows and features...
14:50...is through the individual method by first calling CreateFeature or CreateRow and then Store...
14:55...you want to go and look at that and look at using insert cursors to get better performance.
15:01With insert cursors, a really key part to that is buffering, like I mentioned.
15:04It's really easy to enable buffering on an insert cursor when you create it; it's just a parameter that you pass in.
15:10For useBuffering, you set it to true.
15:12Essentially, what buffering does is it allows you stage the edits.
15:17It allows you to insert the same attribute values for similar features, so it can save a lot of the processing that you're doing.
15:25You want to periodically call Flush when you're using buffering as well.
15:29What we generally recommend is to call Flush around every thousand features. That's a good number to start at. Okay?
15:35We generally don't like giving you magic numbers.
15:38This is one where we've realized we need to give you a magic number, and we picked a thousand.
15:42We picked a thousand from our own usage internally, and that's just a good number that you're set at.
15:47Could you set it to be more? Yes, but you then need to be aware that you're risking losing more if you run into issues on flush.
15:58I'll talk about that in a little bit in the slides.
16:01This third bullet's one that's pretty interesting actually.
16:04It's one that I actually always forget about until I go over the slides the week before the DevSummit.
16:10When you're using insert cursors, if you're using spatial caches--again, which is something that I'll talk about...
16:16...there's extra processing that's required.
16:19What a spatial cache does is it keeps a cache of the edits that you're performing local on the client...
16:25...and then when you save and push those edits down in the database, then it pushes them down.
16:29Caching's really useful when you've got a really thin network pipe or a lot of network traffic...
16:34...and it's expensive to go across the network; you want to perform a lot of your edits locally and then push them in.
16:40But if you're doing insert cursors, you're essentially paying the cost twice, right?
16:44Once to go into the cache and once into the database.
16:47So if you're using insert cursors, consider changing your code to first check to see if there's a cache or not...
16:52...and then either discarding that cache automatically or prompting the user if they'd like to discard that cache.
17:01Okay, so this slide is, you know, an attempt to explain how buffering works...
17:07...and why buffering works with enterprise geodatabases and how our cursor logic works.
17:12So, you know, there's a lot going on on it, but I'm going to walk through it in a little bit.
17:17The dashed line in the center of the screen is the logical difference between the client and the server.
17:24So in this instance, we're talking about the client being something like my laptop...
17:27...with the application we're developing in ArcGIS running on it, and the server is where my data is stored...
17:33...in my enterprise geodatabase, usually across a network--Oracle, SQL Server, or something like that.
17:39If you're using direct connect, you know, the terms can get a little interchangeable...
17:44...but logically, we're talking about the client being this machine and the server being where things are going.
17:49So when you have an insert cursor, and the first thing you do is insert the row.
17:54When buffering is used, we're first going to stage that in the transmission buffer here.
17:59Now what we're going to do, when that feature is inserted and staged in the buffer...
18:04...is we're going to ask, Is the transmission buffer filled?
18:07Well, if it's not filled, easy. We're going to return a result to you, and then you can continue filling the buffer.
18:15However, if it is full, we're going to flush it onto the server side for you, okay?
18:23You haven't called Flush; we're pushing it over onto the server side, onto something called the server buffer.
18:30This same question is asked on the server side with the server buffer. Is it filled?
18:35No, it's not filled. Great. Return a result to you again so you get that result.
18:39If it is filled, then we're going to push those edits down into the database and, again...
18:43...down in the bottom there, return a result to you.
18:45Now why is this critical?
18:47Well, you can see that there's essentially two ways that your edits can be flushed.
18:53You can call Flush, which I'm highlighting now in red, which is where you should be checking for errors.
19:00You're either running out of table space, maybe you've got field values that are invalid.
19:04Essentially you're trying to catch something that's now invalid once the flush is occurring to the database.
19:09So you want to do that error checking on flush, but you also have to do the error checking on InsertRow.
19:14Because, like I demonstrated here, you could get an implicit flush based on these different transmission buffers being filled.
19:23So really key thing to keep in mind. If you're using insert cursors, go back, take a look...
19:27...make sure you've got that proper logic guarding inserting the features as well as on flush.
19:33So the second type of cursor that we have is a QueryDef cursor, and like I mentioned before...
19:37...these aren't bound to the class. This is a case where, you know, the user creates...
19:42...the user could be you internally, or you could be exposing QueryDefs to your end users in your application...
19:47...asking them to create a query.
19:49And it is essentially a user-defined query. It's a SQL-like statement; not SQLite, SQL-like statement.
19:59QueryDef cursors are always going to bypass any cache held on a class or workspace and go right to the database.
20:06Also, if you're calling IQueryDef.Evaluate within an edit session, it's going to cause all the cache rows to be flushed.
20:13This is another case where you might be seeing periodic errors and you're not really sure what's going on.
20:18Take a look about your use of QueryDef.Evaluate within edit sessions; see if you're properly trapping for errors.
20:26QueryDefs are read-only, okay? So they don't support store, they don't support delete...
20:30...they don't support the updating of the features returned from them.
20:34The other aspect of cursors is that we get a lot of questions about are recycling cursors.
20:38Well, what a recycling cursor is, is it's a cursor that you create, that's meant to be generally for read-only use...
20:46...and it doesn't create a new client-side row object for each iteration you perform on it, each request for a new row.
20:55The data structures that we use within ArcGIS in the geodatabase--memory, objects, instances...
21:00...we're going to reuse those with a recycling cursor.
21:08This is an example, some code here of using a recycling cursor.
21:13You can see what I'm doing up at the top is I'm opening a table called road_node...
21:18...and then I'm issuing a search call to my ITable interface, and you can see that the second parameter I'm passing in is true.
21:25That's whether I want it to be recycling or not.
21:28I'm then calling two NextRow statements to get two different row variables returned...
21:33...and then, lastly, I'm checking to see whether they're equal or not. And they're equal.
21:37Because on the second call to NextRow for that second row, we're going to reuse the internal structures.
21:45So we're not holding on a reference to that first feature.
21:50Sorry. That's a massive faux pas, by the way. Hello? Mm-hmm.
22:00This happens all the time.
22:02So the other part about recycling cursors that you need to be aware about...
22:11The lead developer, ladies and gentlemen.
22:19So the second example of what you can see here is, again, I'm opening up a table, I'm calling search on it...
22:24...getting a recycling cursor back, and then you can see what I'm doing is I'm again getting those first two rows back...
22:29...but the difference here is I'm querying over to the geometry.
22:32So the key thing that you have to keep in mind with this code example that I'm showing...
22:36...is that it's not just about the attributes that are the same or even the row object itself we're holding in memory.
22:42The geometries are going to be the same as well, okay?
22:45We highlight this slide because we have seen cases in the past where you've been using recycling cursors...
22:52...and incorrectly holding on to references to that first feature and passing it around...
22:57...not knowing the geometry isn't going to be the same, and bad things ensue with it.
23:07So when do you use recycling cursors? When you don't need to persist a reference to the row, okay?
23:12Don't pass them around at all. They're meant to be accessing that row object really quickly...
23:17...getting the information, moving on to the next one.
23:19You don't need to keep references to that object once you move on to your next row.
23:25Never directly edit a recycled row. Okay?
23:28Again, because you really don't know, you can't tell the provenance of that row object that came in.
23:35But if you're careful about it, the proper use of recycling cursors within an edit session...
23:40...can really drop your resource consumption.
23:43But you need to be really aware of what their exact usage is.
23:47If you're just getting at the data, reading it, and moving on, you can really drop a lot of the memory usage.
23:53And this slide demonstrates that. Again, we're just simply issuing through a cursor...
23:57...but there's a difference between whether we're doing it as recycling or nonrecycling.
24:01Hopefully, you can see that down at the bottom, especially at the back of the room.
24:05I've got about 50,000 rows with 75 fields, and when recycling is set to true, I'm using about 60 megabytes of memory.
24:12When I set it to false, it's a hundred and eighty-five.
24:14So a big drop in the use of memory depending on whether the cursor is recycling or not.
24:20She had a girl.
24:23The baby was a girl? Excellent.
24:25I'll see it tonight. My wife had a daughter.
24:33No, that was the car mechanic telling me I don't have a leak in my water reservoir. No, sorry. That's terrible.
24:40We've worked him hard in Redlands.
24:42You guys are going, Who's that guy not home with... No, it's my auto repairman. Sorry about that.
24:50I was happy too. It saved me 250 bucks.
24:53Nonrecycling cursors. Continue. What are nonrecycling cursors?
24:58Well, essentially, it's where we're going to create a new client-side row object for each row that you're retrieving from the cursor.
25:04So we're going to create new internal data structures. They're the exact opposite of recycling cursors.
25:10When do you use nonrecycling cursors? When you want to pass around references to that feature or that row, okay?
25:17A lot of times they're commonly used, always used within edit sessions when you want to edit the results of the cursors...
25:23...when you want to cache a set of the rows and maintain long-lived references to them...
25:28...pass them around to other aspects of your application.
25:31And again, the really key thing, you always edit nonrecycled rows.
25:35If you're ever wondering, you know, what should I do; I'm really not sure, use nonrecycling. Okay?
25:41Yes, you're going to get a little bit more memory, but from an editing perspective...
25:44...you're always going to get the exact result that you're looking for.
25:47So if you're ever not sure, recycling equals false.
25:52Okay. Caching of geodatabase data. So there's two key things about caching geodatabase data that I'd like to talk about.
25:57The first one is the spatial cache. Anybody use the Map Cache or the Feature Cache toolbar in ArcMap?
26:07Like two people; three. Alright, okay. That uses a spatial cache internally within the geodatabase.
26:13This is the caching mechanism that I've mentioned so far.
26:16Essentially, what it's meant for is the speeding up of queries because it's going to pull everything...
26:20...that's within the envelope that you define as a developer, or intersecting it, over to the client.
26:27The really great use for caching of data, again, you've got a thin network pipe, you've got a lot of data...
26:32...and you want to speed up the editing, display, labeling, query, any of those types of performance.
26:41Really useful with enterprise and ArcSDE geodatabases as well.
26:46So here's a good example of using a spatial cache and when you should use one.
26:50I've got a number of different spatial queries that I'd like to perform, and I'm performing it within a local area.
26:56It's outlined by the dashed red line.
27:00It's really simple to create a spatial query. Essentially, you just create something called the spatial cache manager...
27:07...and then issue the call to fill the cache, passing in the envelope.
27:10You know, it's hard, actually, to give you guidance on the size of the spatial cache you could create.
27:16You really need to do some tests with it and see what's applicable.
27:22So for that reason, it's difficult to give you a lot of guidance with it.
27:25One thing we can say is generally you don't want to create it, say, over a statewide worth of data...
27:30...'cause, again, you're caching it all on memory, right?
27:32You're probably going to blow out the memory on the machine because of that.
27:36So it's really meant for a lot of local usage. You can see the screen shot that I had here was a local usage.
27:41So if the workflow that your editors are using is zooming in to a particular area and then doing a lot of work in that area...
27:47...well, that's probably a good place to use the spatial cache and enable it.
27:53The other type of cache we have is something called the schema cache. Anybody ever used the schema cache in here?
27:58No one's...yeah. No one ever uses it.
28:00It's something that can be really useful, but it's got a very specific application.
28:05It's a cached snapshot of the geodatabase schema.
28:10So not the data in the tables in the geodatabase; the schema of the geodatabase.
28:15We use it within ArcGIS when we're opening up map documents and accessing the geodatabase as well.
28:20It does require a static data model, but it comes in really useful when you've got a static and large data model...
28:28...and you're accessing a lot of that data model at any one point in time.
28:31It can really improve performance. It's going to drop a lot of the round trips to the database to get schema information.
28:39Yeah, again, it's useful with, you know, really large static data models.
28:43If you're opening and using lots of classes, right, going back to the database lots...
28:47...look at using the schema cache and turning that on.
28:49It could give you a big jump in performance.
28:53Essentially, you enable the schema cache before you start opening up the tables. You start...
28:58You create an object called the schema cache manager...
29:00...and then you tell it what workspace you'd like to enable schema caching for.
29:04The cache needs to be fresh, though.
29:07So if you're doing lots of schema changes, the schema cache is not going to work for you.
29:11It could actually slow things down a little bit, okay? So again, it's for static, large data models.
29:16And another theme that's kind of with the geodatabase, things that you create, like the schema cache...
29:24...schema locks, which I'll talk about in a little bit, it's your responsibility to disable and discard it.
29:29We're not going to handle that for you.
29:32And here's just a code example that just walks through it really quickly.
29:36I have a workspace factory that I'm going to be using of a particular type.
29:39I then QI or cast over to IWorkspaceFactorySchemaCache and then open up the factory.
29:48I then pass in the workspace that I'd like to enable schema caching on and then start opening up my classes...
29:53...and at the very end, I want to disable that schema cache and shut it off.
30:00Okay. So the last thing that I'd like to talk about is schema locks.
30:03And, you know, next to cursors, this is actually probably the second most confusing part of the Geodatabase API...
30:09...I'd say, that people run into. It's certainly the one that we get the most questions on.
30:13What are schema locks? Well, what they do is they're going to prevent clashes...
30:16...between other users working with your data when you're changing the geodatabase structure.
30:20There's two types of schema locks, exclusive and shared.
30:25You're always going to get a shared schema lock.
30:28You don't have to request one, okay? The geodatabase hands one out to you when you start opening.
30:33What you do with that shared lock is you can then promote it to an exclusive lock...
30:38...when you need to make changes to that dataset or that table.
30:43Why do we do that? Why don't we just let you go and make changes?
30:46Well, I mean, you could actually go and just start making changes.
30:49The reason that we give you the best practice of using exclusive schema locks is because...
30:58...the model we have in ArcGIS with the geodatabase is that when a dataset is opened...
31:02...you can be assured that its structure will not change out from underneath you.
31:06If you decide to label on a field, you can rest assured that field will not suddenly disappear on you...
31:12...because someone else decided to get rid of it.
31:15That's really useful because you know once you've got something opened, it's not going to change.
31:18You don't have to keep going back to the geodatabase and saying...
31:21...Has this structure changed yet? Has it changed yet? Okay?
31:24So that's where they come in very handy.
31:28Exclusive locks, like I mentioned, are not applied or removed automatically.
31:35So when do you use schema locks? Whenever you're doing any of these schema modifications.
31:39I'm not going to drone on and go over them in detail.
31:42They're all well documented in the help on schema locking about, you know, when to use it.
31:49A couple key things about schema locks, though.
31:51You need to demote the exclusive lock to a shared lock when you're done.
31:54This also includes in your error checking. That's one we see quite a bit.
31:58We've had it in our own code, actually, from time to time...
32:01...where we weren't properly releasing the lock on something when an error was being raised.
32:06So go back and check your code with the use of exclusive schema locks.
32:10Make sure you're properly demoting it to a shared lock on an error.
32:14Keep your use of exclusive schema locks very tight.
32:18Gather the information you need, go get the lock, make the changes, release the lock.
32:24You probably don't want to get an exclusive schema lock when you first pop open the dialog...
32:28...asking the user if they want to do something, 'cause that user might take off for the weekend...
32:33...and then just leave the dialog sitting there with the lock on the data.
32:38Okay? That might annoy other people in the organization.
32:41That's one of the reasons we added Disconnect User at 10.1, so they could just drop them off. Okay?
32:48But guard your code from that so that you keep your use of exclusive schema locks tight.
32:53Generally, that's a good bit of advice for a lot of things with the geodatabase--edit operations, edit sessions.
33:00Anything where you're changing the state of something like with locks or with an edit operation or edit session.
33:07Keep your use of them tight. Don't start them up and then start collecting information.
33:11Get all that information ahead of time.
33:15Okay, so enough of me. I'm going to pass it over...
33:17Hopefully Eric can keep his mind on the slides and not his newborn child.
33:22...for Eric and Tom to talk about plug-in data sources and some of the work they've done with MongoDB.
33:30Okay, it's working. Good. Now, this is an interesting one.
33:36Craig asked the question earlier how many of you were using plug-in data sources...
33:40...and I don't think anyone in here said they were.
33:44This is something, actually, this is a piece of architectural infrastructure that we developed...
33:49...way back in 8.0, 8.1 time frame, and what we were doing was...
33:55...some of the data sources that we were supporting were read-only, such as read-only access to CAD files...
34:00...then there may be some other characters that are some attribute.
34:02...or read-only access to SDC files.
34:05These are unusual data files that are not in standard tabular representation that we only wanted to provide read-only access to it.
34:14What we ended up doing was saying, hey, we thought it was kind of useful, so we ended up...
34:19...exposing the same mechanism or the same infrastructure to developers such as yourselves...
34:24...so that you could implement your own plug-in workspaces or data sources.
34:30These plug-ins are really used for providing read-only access to unusual or new data types...
34:40...that are not supported by the ArcGIS infrastructure or the geodatabase.
34:45We'll talk about a little bit later the code sample we have in ArcGIS Online, where it was an unusual table in ASCII file...
34:54...where the first six characters are an x-value, the next characters are a y-value...
35:01But it's just one row in an ASCII file; there's no tab delimiting, no comma-separated values.
35:08It's just data, ASCII data.
35:10Well, you can turn that, you can create a plug-in data source for something like that.
35:15Now, when you create a plug-in, you will then...it'll tie in to the rest of the Desktop infrastructure.
35:20You can then do standard things through the Catalog window, such as browsing, managing, previewing.
35:26You can also pull it into ArcMap, the desktop, and be able to select things, render things, query things, read things.
35:33And as Craig was pointing out yesterday, yeah, you can also incorporate those into geoprocessing workflows, okay?
35:40That's very useful.
35:41Now, plug-ins, they expose a collection, a small collection, a handful of interfaces that you need to implement...
35:48...that are then tied in to the rest of the system.
35:51And like I was mentioning before, they've been around since the 8.x time frame; they've been here for quite a long time.
35:58Now, recently back in Redlands, Tom and David and some other people on our team started playing with NoSQL databases.
36:10How many of you know what a NoSQL database is? Raise your hands. Does anyone...has anyone...
36:16No, I guess...well, maybe. Kind of.
36:18Anyway, these are new types of data sources. We'll talk a little bit more about them...
36:21...but this is something for which we do not have native geodatabase support for this new class of databases or data sources...
36:28...that people are liking to use these days for big data problems.
36:32We'll show some prototyping work in the demo that Tom has put together talking about that.
36:37Now, plug-ins. They only support a very simple feature model--tables, feature classes, and feature datasets.
36:44Other dataset types or controllers are not supported.
36:48You have only read-only access to the data. You can't update the data currently. That is a limitation with plug-ins.
36:55And when you implement these, you are going to have to write some code.
36:59There's no wizard to build the thing for you; you're going to be writing code.
37:04You can use any reasonable language to implement these things--C++, C#, or VB .NET--in order to implement a plug-in.
37:15Now, when you implement a plug-in, there are basically four different components that you will need to support.
37:21First is a workspace factory helper.
37:24You'll have to provide support for a workspace helper, a dataset helper, and a cursor helper.
37:30Okay? These are the four basic things that you're going to need to be able to support in order to implement a plug-in.
37:37Now, in the ArcGIS at resources.arcgis.com, we actually have some pretty darn good documentation talking about these things.
37:46We were playing with this, and we challenged Tom with implementing one of these things...
37:50...on top of some MongoDB infrastructure that we had put together.
37:55You know, where do you go? He went to our online help to find out what was going on with these things.
38:00So there's actually some surprisingly good doc here, okay, talking about these things...
38:06...how you use them, what you can do with them.
38:08Now, some of the most interesting things here is that we actually have sample code that shows you how to implement a plug-in.
38:19And you can go and grab these....it's implemented in C#, the example, and you can just download the code.
38:25And this is a sample that actually implements what I was talking about earlier...
38:30...where you have a data file, bunch of ASCII text file, six characters for x, six characters for y...
38:38...and then trailing stuff is some just attribute value.
38:42And so what's this sample does is it shows you how to implement; it's a great starting point for implementing a plug-in.
38:52Let's see. Let's talk a little bit about NoSQL.
38:55NoSQL, this is a new sort of data store, a new way of looking at data storage that's come out over the past...
39:03...I don't know, 10, 12 years; it's all sort of the Web 2.0 type of stuff for, say, your Twitter or your Facebook or whatever.
39:11You have massive piles of data that you need to be able to work with.
39:15You have so much data that a standard relational database, even big exotic clustered relational databases, cannot handle.
39:24Just way too much darn data.
39:27So people started saying, well, let's think about data management a little bit differently.
39:31What if we relax certain constraints? What can we do? How can we get really big scalability?
39:37Well, these NoSQL databases, they're intended to run on commodity hardware...
39:44...commodity load-balanced clusters of hardware, okay?
39:48We're not talking about going out and buying some, you know, hundred thousand dollar special, you know, rack thing.
39:55No, we're talking about regular commodity hardware here.
39:59Has to have massive scalability to support what are termed big data problems.
40:05When we all talk about big data, you know, it's a very nice buzzword these days, and it refers to the three Vs.
40:11First, volume; huge amounts of volume. We're talking petabytes or more.
40:17Velocity; huge amounts of data flowing in and out of the system. Think about Twitter or whatnot, every single Tweet.
40:24That's a lot of velocity when you think about all the 300 million or billion Tweets that go through every day.
40:31As well as variability. Variability means maybe all this data coming in doesn't have the same format.
40:39Maybe it's just kind of all over the place; we're getting JSON documents, we're getting Word documents.
40:45We're just getting these small little packets of information. It could be all over the place.
40:49Now, the interesting thing about NoSQL databases is that these things are really intended to allow you to grab data...
40:57...load data rapidly, and operate with it immediately.
41:01Well, our standard big relational systems on big iron, what you do is you set up your schema...
41:06...you go and plan this stuff, then you go through a very careful data load process...
41:10...and then once the data's loaded into this fixed schema, then you start manipulating it.
41:15With NoSQL databases, it's the opposite. You just kind of point it at data feeds.
41:18You start pouring this stuff in, and then you start using it.
41:22Each of the records or documents in these big databases, they can vary in terms of what fields they contain...
41:27...what kind of information's there. It can vary record by record if necessary.
41:32These NoSQL databases, they have their own query systems; they don't necessarily support SQL.
41:37Some of them do have extensions to allow you to do some level of SQL or procedural or declarative query languages.
41:47Sometimes one of the things that they relax is like ACID semantics.
41:52Maybe we say that things only need to be eventually consistent, they don't need to be consistent with every single transaction.
41:59Eventual consistency gives you a lot more latitude in...
42:02...what kind of interesting distributive processing architectures you can create, okay?
42:07So that's what this whole NoSQL thing is all about.
42:10When people start looking at these taxonomies of different types of NoSQL data stores, you start at key values...
42:16...you go all the way up to graph data stores. And these are just some of them.
42:20You can see in here we have document data stores, of which MongoDB is one.
42:24And this is where Tom will be talking to us about this thing.
42:33Alright. Let's talk a little bit about MongoDB and the plug-in that we wrote to expose that to ArcGIS.
42:41And first let's start that off with a demo, just take a look at what it looks like.
42:43You on C? I'm on C. Thank you, Eric.
42:47So here we have a bunch of points representing a bunch of lightning strikes in South Africa.
42:52You can pan around. As you can see, it is symbolizing based on one of the attributes, a value FREEDOM.
43:02We have a fully functioning attribute table with just about four million records in it.
43:08Now, in terms of big data, that's really not big data, but it's big enough to illustrate how this works.
43:17Now, we can...let me just jump to a bookmark real quick.
43:29We can symbolize, as I've shown you; we can label based on this.
43:35We can interact with it in pretty much any read-only manner that we would expect.
43:40We can export this to another geodatabase or to another data format, just as if this was an ordinary geodatabase.
43:46Now, this is all stored behind the scenes in MongoDB, which I have the service running here in a console window.
43:57Pretty much the only thing that we didn't do with this particular plug-in was implement a SQL to MongoDB query translator...
44:05...and that would be completely doable; just wasn't something that I got done for this particular prototype.
44:13But it's otherwise completely selectable and works just like a read-only data source in ArcGIS.
44:20Let's head over back to the slides so I can talk through some of the details of how this happened.
44:28So MongoDB a moment ago was mentioned as a NoSQL data store.
44:34What it stores internally are collections of documents.
44:38Those documents are in a format known as BSON, which is binary JSON.
44:43If you're familiar with JSON, it's commonly used for a variety of web technologies.
44:50It's basically documents that consist of keys and values.
44:55Each MongoDB record is a document, and those documents are grouped into collections...
45:00...and they can have any number of keys and values within each collection.
45:04So within a single collection, you might have completely variant values on each document.
45:10The documents can be queried with ad hoc queries using Mongo's querying language or analyzed using MapReduce.
45:23Basically, the question is going to be, taking one of these, which is an example of a collection...
45:30...how does this correspond in our mind to a feature, and how are we going to get this to play nice with the feature plug-in...
45:37...the plug-in data source model?
45:39Now, what we see here is JSON documents written out in human-readable terms.
45:46Now, what's being stored in the MongoDB database are not actual strings; they're in binary format...
45:54...but this is what it would look like if we translate it.
45:56And basically what you see on the left is a key, which is a string, mapped to some value, in the case of Bobby Fisher, age 29.
46:06[Inaudible] That's a good point. A good point that's worth mentioning is that there are actually different attributes.
46:15So if you look at the second record for Boris Spassky, you'll see that birthplace is listed...
46:21...while for Bobby Fisher it isn't.
46:23This is, as Eric pointed out earlier, a common problem that these NoSQL databases tackle...
46:29...that the attribution for any particular document, in this case, or record or what have you, can have different attributes.
46:38When we want to connect to MongoDB, it's actually fairly simple using their .NET driver, which is what we used.
46:45We implemented this plug-in data source using MongoDB's .NET driver as well as our standard...
46:53...basically, as Eric pointed out, the sample that I started from on our help site.
47:02Basically, you can connect to a Mongo database by providing an IP address and a database name.
47:11You provide that to the Mongo server, and from the server, you can select which database you want to connect to.
47:17Within that server, there are a set of document collections, as we mentioned earlier...
47:20...and you can reference those using standard .NET string-indexed operators.
47:28It's basically just...it's literally a collection of collections.
47:30You can iterate through them just using enumerators if you wish.
47:35When you want to perform an ad hoc query on one of these, you're going to build up an IMongoQuery object...
47:42...and get a Mongo cursor from that.
47:45Again, this is fairly...it's fairly straightforward. It's fairly approachable programming.
47:50In this case, I'm looking for a document where there is a key by the name of Name, and it has a value of Morehouse.
47:57I create a BSON value of Morehouse, and I use the MongoQuery static class to generate an equals check...
48:07...of Name equal to Morehouse and provide that as a query to the cursor.
48:11The cursor will then return a collection of documents that you can iterate through.
48:15You can use foreach as we do in this example, or you can get it as an IEnumerator and do it yourself.
48:21Each of those documents are fairly simple to access...
48:25...again, through standard, you know, standard .NET key value dictionary semantics.
48:31You provide a key; you get a value. You can assign them in; you can pull them out.
48:38Now, the question becomes, how do we map, sort of, the geodatabase objects and the MongoDB objects together...
48:44...'cause there are some differences?
48:46In this case, we went with sort of the simple approach, which is a workspace is equivalent to a database in MongoDB terms.
48:54We're going to treat a table or feature class as a collection of documents in MongoDB terms...
48:58...and each row or feature as a single document in one of those collections.
49:04This has some implications, mainly that we're kind of constraining our model a little bit.
49:09In the geodatabase world, when we're dealing with cursors, for instance, we're going to have a fixed field set.
49:14MongoDB doesn't have that restriction; that doesn't mean that we can just feed anything into our cursor model.
49:21We have to have some rhyme or reason in here; we're going to provide a fixed set of values...
49:26...or fixed set of fields that we're providing values from each document from.
49:32So here's the first interface that you would be implementing. This is the IPlugInWorkspaceFactoryHelper.
49:37This is consumed by the ArcGIS ArcObjects plug-in factory.
49:44You provide an implementation to this, and it makes it look as if you're...
49:48...as if your client code as if you're dealing with an ordinary workspace.
49:55Probably the most interesting method out of this set of methods is OpenWorkspace...
50:00...which is how we would connect to Mongo.
50:03I showed you an example of the code a moment ago and how that might work.
50:06We get a string; we use that to...we use that string to connect to the database of our choice, and then go from there.
50:18In the case of an IPlugInWorkspaceHelper, this represents a single MongoDB in our plug-in.
50:28So this is what is returned from the factory helper.
50:31Again, this isn't something that is directly consumed by ArcGIS's system.
50:39Instead it is made to look like a standard IWorkspace, which is why this all fits in so nicely with the rest of the system.
50:45It's consumed by the plug-in architecture.
50:48Probably the most interesting method here is opening a workspace. Here again, you provide a name.
50:52We're going to use that to find that collection.
50:54There's not a lot of code behind this; I'm not going to show it, but it...
50:57...basically you're taking your database that you opened earlier, the MongoDB, and as in our earlier code sample...
51:02...you supply the name of the database, and you get back the collection that represents that [set] of documents.
51:11Once you have that collection of documents, that is going to represent a dataset.
51:15Now, in this case, we're going to implement an IPlugInDatasetHelper.
51:20Now, in other plug-in dataset examples, you might, for instance, implement some sort of representation of a feature dataset...
51:29...or something more complicated.
51:31In this particular plug-in dataset code sample, we simply went with a straight feature class model...
51:39...so that there is a database that holds feature classes. Each of these is only a feature class.
51:44The most relevant methods here are the methods to get the cursors.
51:48This allows you to access the features stored within the dataset.
51:52And we're going to go through FetchByEnvelope and see how that occurs. Oh. We fixed it. Okay, this got fixed.
52:03We will in a moment. First, let's take a look at IPlugInCursorHelper.
52:06So what we're seeing here is the helper class that represents a cursor, you know, for the plug-in architecture.
52:15Basically, this is an extremely simple interface.
52:18The methods--IsFinished() tells the client code that they've iterated to the end of your collection of documents.
52:26NextRecord() moves the current selected document to the next one.
52:30QueryShape() gets...you pass...basically a geometry is passed into it, and you populate the values on that geometry.
52:37And finally, QueryValues() takes in a row buffer and populates it based on the values that are stored in the document...
52:43...so it's translating the document's information into the row.
52:48Well, let's take a look at PlugBy...or FetchByEnvelope.
52:53At the top level, this is implemented as part of the dataset helper, so we're implementing a method that...
53:00...given an envelope, will return a set of features that fall within it.
53:04That's important because that's how panning and basically all the geospatial functionality you're seeing...
53:09...when we're moving around inside of ArcMap, that's how it's working. It's all based on FetchByEnvelope queries.
53:16The parameters are classIndex, which we're going to ignore in this case...
53:19...because that only applies if were trying to implement feature datasets. So we can ignore that.
53:25Envelope is an important parameter because that's the actual bounding rectangle...
53:29...that we're trying to find the features that are located within.
53:33WhereClause isn't all that important in this case because we're not supporting SQL...
53:38...and then, finally, there's a fieldMap, which is important.
53:40FieldMap represents which fields are going to be returned by our cursor that we're going to return.
53:47That has a big impact on performance.
53:50If you've ever seen the difference of, you know, performing a million cursor, or a million row query using, you know...
53:57...star allFields versus only OID, you'll know what I'm talking about.
54:02Basically we're going to pass those all down into the MongoDB cursor constructor...
54:07...and we're going to go take a look at that class in just a moment.
54:09That's a class that I wrote as part of this plug-in that provides the implementation of ICursorPlugInHelper.
54:17So this is the meat of MongoDB cursors constructor; this is how it works.
54:23Basically we're going to get in our collection of documents that we're going to query, our envelope...
54:29...optionally, an IMongoQuery, which is a, you know, can be some sort of equivalent of a WHERE clause.
54:33In this case, we're passing null into that.
54:37The name of the shape field. The field mappings, as I mentioned a moment ago...
54:41...tell us which fields are or are not included in the returned values from our cursor.
54:47And, finally, the fields that we're looking for.
54:50So the first thing we do is build a list of all the various fields we do want returned.
54:54We need those as basically an array of strings.
54:57The way we do that is iterating through the FieldMapping.
54:59FieldMapping is an array of numbers with exactly the same length as your fields.
55:06For each value in that array, if it is negative 1, then that field with the corresponding index isn't included.
55:14I know that's a little cryptic, but that's how it works.
55:17Basically, every value in the FieldMapping array, which is an array of numbers...
55:22...corresponds to a field in the field set that you passed in.
55:27And if there is a negative 1 at the matching index, you don't include that field.
55:32So basically what I do is I rip through the array of field mappings, looking for fields that are not negative 1...
55:39...and if it's not a negative 1, then I append the name of that field on the end, because that's one of the fields I want back.
55:44And at the very least, if nothing else is included, I always include OID, because that's necessary for the system to work.
55:50Now I use the MongoDBQuery static class to get a query back.
55:57In this case, we're not creating an equals query like when we were looking...
56:01...for people with the name Morehouse earlier in our set of documents.
56:04Instead, we're looking for feature...or features that lie within a...records that lie within a rectangle.
56:11Now, MongoDB does have some geospatial indexing capabilities; that's one of the neat things about it, for points at least.
56:18And so in this particular case, we can provide the name of the shape...
56:23...the shape key for that document that holds our information about which has the spatial point in it...
56:31...as well as our bounding box, which is our x's and y's, mins and maxes.
56:34And we get back from that a query which we use to generate a cursor.
56:41We then set which fields we want back on that cursor, and we iterate it...
56:44...prime it to the very first document that that cursor's going to return...
56:48...because that is the expected behavior of an ICursorPlugInHelper, and we store the rest of our passed-in parameters.
57:01Now, what we're going to take a look at here is how we are actually returning the points from the documents.
57:06So as we're iterating, you know, as the system is reading through our documents...
57:10...how is it getting the geometries out of the documents and turning that into something you can see on your screen?
57:17Well, basically what we're doing at this point is we find the shapeElement...
57:22Remember, we stored that parameter we had earlier of what the shape field's name is.
57:27...we store that, and we use that to find the matching element in that document.
57:31So remember the document is this JSON document with matching keys and values.
57:36We find the pair that has the name, you know, shape or what have you, whatever you want to name it.
57:43And we use a utility we built to turn that from BSON to geometry.
57:48And that utility is extremely simple; this is stored as basically an array.
57:53So the array is two doubles with the name shape, and that's pretty much it.
58:00The first value has to be y in this particular case, and the second x in order for the indexing to work correctly.
58:06That's probably the trickiest part, but other than that, we just get those two values...
58:10...and we put those into the coordinates for the geometry and return that.
58:18So what can we say about this? My experience prototyping this, when I started this, I had never worked with MongoDB.
58:23I had never written a plug-in data source; in fact, I'd never really worked with a plug-in data source architecture.
58:30They're very easy to use. It took me approximately, I'd say, about 40 hours to put this whole together.
58:35Honestly, I had it basically working in about 8.
58:38It's less than 2,000 lines of code at the end.
58:41The most complicated part is handling the fact that Mongo documents can have any old sets of keys and values...
58:48...but our cursor model expects a set of fields and that we need a spatial index...or a spatial extent up front...
58:55...which we can sort of derive from that and store somewhere so that we don't have to keep recalculating it.
59:03This will be available as a code sample on the resource center. And hopefully, it'll help somebody out.
59:11Okay, that was actually very interesting because I literally, about 10 days ago or so...
59:17...I walked into Tom's office and said, Hey, put together a Mongo demo thing for us.
59:25And he literally had never used Mongo before and he had never done a plug-in before...
59:29...and he was able to bang it out in a week, getting both of them. I thought that was fantastic.
59:34So it is doable, it is a tractable problem and all that...
59:38...and it does allow you to start playing around with some of these new data sources that you do hear...
59:43...or read about in the literature or whatnot, both academic as well as commercial literature.
59:48There's a lot of interesting things you can do, and you can integrate these things in a read-only manner into the ArcGIS Desktop.
59:55So that's pretty cool stuff.
59:57Let's see. Okay, let's wind down now with the biggest mistakes people make.
1:00:05I'll do it. Would you like to do it?
1:00:07I guess I get nominated 'cause I committed like half of these, came from my code. Flip over to me.
1:00:17Okay. Okay, so go over the top developer mistakes to just kind of roll it all together here.
1:00:29So these were things we compiled a couple years ago actually.
1:00:33I think it was about, you know, two or three DevSummits ago, I said to one of the PEs on the team, you know...
1:00:39...why don't you go to tech support; get a report from tech support of all of the Geodatabase API tech support calls for the last...
1:00:46...I don't know, five years or however far back you can go...
1:00:49...and troll through it and look and see if there's any trends to kind of questions we're getting.
1:00:54So that's where these came from.
1:00:57We kind of built it up by going through there and seeing the top questions that you were actually asking...
1:01:01...calling in to tech support about.
1:01:02So the first one is the misuse of recycling cursors. I talked about this before, you know...
1:01:07...Tom and Eric were talking about MongoDB and plug-ins.
1:01:10Easily one of the biggest things we get questions on, you know, misusing them; when should I use recycling; when shouldn't I.
1:01:16Hopefully, we clarified that a little bit this afternoon by giving you that guidance.
1:01:20You know, again, like I was saying before, if you're not sure, set recycling to be false.
1:01:25Right? Then you're pretty much guaranteed it's all going to work.
1:01:28Might be a little bit more memory, but you know, it'll be work and sort of better safe than sorry.
1:01:34Another quick example, just for you, you know, after the conference, you can kind of go in and see a good code example of that.
1:01:40This is one that...this came out from a couple incidents.
1:01:44You know, people were complaining about performance and how slow things were. It was kind of interesting.
1:01:50And then we actually went into our code, and I think we found a handful of places where we were doing something like this.
1:01:55We just weren't being smart about the use of find loop.
1:01:58So what you can see here is we've got, you know, we've got a loop, or a FindField, I should say.
1:02:03We've got a loop going on, and within that loop, we're making a call to FindField. That's the very bottom line there.
1:02:10That's not an expensive call unless you're looping over like 10,000 or 12,000 times.
1:02:17Then it's every time you're doing that lookup.
1:02:21Just make the call to FindField, get the index and the name of the field that's needed or the value, right?
1:02:28And then use that within the loop. Okay? So cache that information.
1:02:31Again, there's kind of a theme, right? I talked about this with schema locks; it also applies to edit operations.
1:02:38Get your info up front when you can, and then use it. This is another example of that.
1:02:46This is one I talked about. This is the best slide ever. Simple and to the point. Don't do this; we've blocked it.
1:02:55Next. Yeah. It's pretty simple; we blocked it at 10.0, and if your code's doing it now in pre-10.0, don't do that anymore.
1:03:06Yeah, this was one that also caught us kind of off guard.
1:03:12You know, there's a definite difference in the geodatabase about DDL and DML type operations...
1:03:18...DDL being schema changes you're making...
1:03:21...DML, data manipulation or edit session type changes, manipulating the data itself.
1:03:29If you're making DDL calls or calls to change schema within an edit session...
1:03:36...sometimes the changes you're making are being committed and you don't even know.
1:03:40Outstanding edits can be flushed and committed to the database, and you may not know that.
1:03:46Don't do DDL or schema operations inside of an edit session; you should separate the two.
1:03:51Okay? Schema manipulation and schema changes outside of an edit session.
1:03:55Edit sessions are meant for editing the content of datasets, not their schema.
1:04:04If any of you are using events that are triggered by Store, take a look at your code...
1:04:12...see if you're then calling Store in the triggering event; you can end up in an endless loop that just keeps spinning, okay?
1:04:22So if you're calling, you're listening to these events that get triggered by Store to do something else...
1:04:27...maybe check the data that's being inputted, maybe change the geometry in some way.
1:04:32Don't call Store within it, okay? If you've got to make changes, that's fine; just make the changes.
1:04:36And then the call to Store that eventually is going to be made will also commit those changes as well.
1:04:41Don't make that call to Store within the event.
1:04:45Use our set-based APIs whenever possible. This is a great example of it.
1:04:48Use GetFeature when you've just got to get one feature; that's it.
1:04:52You know, it's probably safer to actually use something like GetFeatures or a cursor if you're getting more than one feature.
1:05:00Then you're pretty much guaranteed that when you're getting larger collections of features...
1:05:05...you're going to get that better performance, okay?
1:05:07So generally use the set-based APIs, things like GetFeatures, as much as possible.
1:05:13This is a good example. This was one that, you know, nobody kind of called in to tech support and said...
1:05:17...hey, um, I think I'm going careless reuse of variables.
1:05:22This was a trend we just saw come out from looking at a lot of the incidents that we were looking at.
1:05:27This is a really good example of, you know, just careless reuse of it.
1:05:31You can see I'm reusing the fieldEdit variable to first add my ObjectID and then my name.
1:05:37Probably not something that I really want to do because I haven't actually committed all those changes yet...
1:05:43...so I'm going to be losing some of that information. Okay?
1:05:46So be careful with the use of your variables.
1:05:52Oh, yeah, okay. I mentioned before, at the beginning, the great thing about the geodatabase, right?
1:05:59There's lots of ways to do things, and then the bad thing about the geodatabase.
1:06:02This is one of those. That's really what this slide is saying.
1:06:06When your...the classic example of this one is the difference between Iclass AddField and IFieldsEdit AddField.
1:06:14If you have a class that's already existing in the database and you're adding a field to it...
1:06:19...you want to use the Iclass AddField interface.
1:06:22But if you're constructing a new class, that's what IFieldsEdit is for.
1:06:27If you get the fields collection from a feature class and then start adding fields to that fields collection...
1:06:33...it's not going to get applied to the feature class. Okay? So you want to use AddField when you're doing that.
1:06:42I don't know how many...
1:06:44I wish I had some denomination of money for the number of times we've met with people and they've said...
1:06:55..."When we first built the geodatabase or we first built this application, we thought we were going to use this thing"...
1:07:03...insert "thing" of whatever it is, "but we never did." I'd have lots of money; I don't know how much, but probably lots.
1:07:12Canadian dollars would be better.
1:07:14This is a classic example. And relationship class notification is one of those; there's a bunch of other examples as well.
1:07:20Just adding feature classes to your geodatabase thinking you're going to use them, another good example of that.
1:07:25But this is one where we saw, right? People were saying, well, I've got relationship classes set up...
1:07:28...and I set my notification to go both ways, 'cause I thought I'd use it eventually...
1:07:32...but I'm seeing really bad performance when I'm inserting because of all the events and notification that's flying around.
1:07:39You know, if you don't need it, don't set it up thinking you're going to need it...
1:07:42...'cause, you know, chances are you're never going to.
1:07:45Okay? One thing with relationship class messaging.
1:07:47The other one is, you know, nothing comes free.
1:07:49So if you are setting up messaging, be very mindful that you're going to get a lot of those events that are getting fired, okay?
1:07:58So.. Oh, Craig, I took the liberty of adding a couple more slides for you to go through. Here.
1:08:08Okay. Don't do silly things. Not sure if I want to go through the rest of these slides.
1:08:14No, please do.
1:08:15Okay. Bring back the dunce cap; I'm not a fan of bringing back the dunce cap; obviously you are.
1:08:19Is that a picture of you as a child?
1:08:22Yes, but so what? Yeah.
1:08:25So write good code. Don't do silly things. Yeah, okay. Okay.
1:08:32So the top developer mistakes that we talked about...yeah, you definitely want to avoid number 3...
1:08:37...opening a tab at Blue Coyote during a DevSummit.
1:08:39I've seen some people be burned very badly.
1:08:42Yeah, that comes back to bite you in the end.
1:08:45And number 5.
1:08:49Yeah, you don't, definitely...I've never done that personally, tried to double my income by just betting on red.
1:08:53You always bet on black, dude, always. Everyone knows that.
1:08:57Yeah, I know, I know.
1:08:58So those are the things we wanted to highlight today, talking about, you know, best practices with the geodatabase.
1:09:05Hopefully, you got some use out of the top developer mistakes. Those two are probably your top developer mistakes.
1:09:12And hopefully, this stuff that Tom and, you know, the other team and Eric were talking about...
1:09:17...for MongoDB and plug-in data sources got you a little bit excited.
1:09:20Hopefully, you know, maybe you'll think of a use for plug-in data sources you hadn't thought of before...
1:09:24...and you'll be able to apply it in your own organization.
1:09:27So, thanks for suffering through the session with us this afternoon.
1:09:32I know it's a lot nicer outside where everybody would like to be than in here.
1:09:35So, if you have questions, come on up. We'll be up here for the rest of the time we've got. So, thanks for attending.
Effective Geodatabase Programming
Craig Gillgrass, Erik Hoel, and Thomas Breed share key programming techniques for developing high-performance geodatabase applications.
- Recorded: Mar 29th, 2012
- Runtime: 1:09:43
- Views: 1745
- Published: Apr 26th, 2012
- Night Mode (Off)Automatically dim the web site while the video is playing. A few seconds after you start watching the video and stop moving your mouse, your screen will dim. You can auto save this option if you login.
- HTML5 Video (Off) Play videos using HTML5 Video instead of flash. A modern web browser is required to view videos using HTML5.
Right-click on these links to download and save this video.
- 480x270:WebM (150.7 MB)MP4 (147.8 MB)
- 960x540:WebM (403.7 MB)MP4 (429.0 MB)
If you don't have an Esri Global Login ID, please register here.